ERIC Educational Resources Information Center
Felce, David; Perry, Jonathan
2004-01-01
Background: The aims were to: (i) explore the association between age and size of setting and staffing per resident; and (ii) report resident and setting characteristics, and indicators of service process and resident activity for a national random sample of staffed housing provision. Methods: Sixty settings were selected randomly from those…
Sampling Large Graphs for Anticipatory Analytics
2015-05-15
low. C. Random Area Sampling Random area sampling [8] is a “ snowball ” sampling method in which a set of random seed vertices are selected and areas... Sampling Large Graphs for Anticipatory Analytics Lauren Edwards, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller Lincoln...systems, greater human-in-the-loop involvement, or through complex algorithms. We are investigating the use of sampling to mitigate these challenges
Systematic versus random sampling in stereological studies.
West, Mark J
2012-12-01
The sampling that takes place at all levels of an experimental design must be random if the estimate is to be unbiased in a statistical sense. There are two fundamental ways by which one can make a random sample of the sections and positions to be probed on the sections. Using a card-sampling analogy, one can pick any card at all out of a deck of cards. This is referred to as independent random sampling because the sampling of any one card is made without reference to the position of the other cards. The other approach to obtaining a random sample would be to pick a card within a set number of cards and others at equal intervals within the deck. Systematic sampling along one axis of many biological structures is more efficient than random sampling, because most biological structures are not randomly organized. This article discusses the merits of systematic versus random sampling in stereological studies.
Random sampling of elementary flux modes in large-scale metabolic networks.
Machado, Daniel; Soons, Zita; Patil, Kiran Raosaheb; Ferreira, Eugénio C; Rocha, Isabel
2012-09-15
The description of a metabolic network in terms of elementary (flux) modes (EMs) provides an important framework for metabolic pathway analysis. However, their application to large networks has been hampered by the combinatorial explosion in the number of modes. In this work, we develop a method for generating random samples of EMs without computing the whole set. Our algorithm is an adaptation of the canonical basis approach, where we add an additional filtering step which, at each iteration, selects a random subset of the new combinations of modes. In order to obtain an unbiased sample, all candidates are assigned the same probability of getting selected. This approach avoids the exponential growth of the number of modes during computation, thus generating a random sample of the complete set of EMs within reasonable time. We generated samples of different sizes for a metabolic network of Escherichia coli, and observed that they preserve several properties of the full EM set. It is also shown that EM sampling can be used for rational strain design. A well distributed sample, that is representative of the complete set of EMs, should be suitable to most EM-based methods for analysis and optimization of metabolic networks. Source code for a cross-platform implementation in Python is freely available at http://code.google.com/p/emsampler. dmachado@deb.uminho.pt Supplementary data are available at Bioinformatics online.
Improvement of Predictive Ability by Uniform Coverage of the Target Genetic Space
Bustos-Korts, Daniela; Malosetti, Marcos; Chapman, Scott; Biddulph, Ben; van Eeuwijk, Fred
2016-01-01
Genome-enabled prediction provides breeders with the means to increase the number of genotypes that can be evaluated for selection. One of the major challenges in genome-enabled prediction is how to construct a training set of genotypes from a calibration set that represents the target population of genotypes, where the calibration set is composed of a training and validation set. A random sampling protocol of genotypes from the calibration set will lead to low quality coverage of the total genetic space by the training set when the calibration set contains population structure. As a consequence, predictive ability will be affected negatively, because some parts of the genotypic diversity in the target population will be under-represented in the training set, whereas other parts will be over-represented. Therefore, we propose a training set construction method that uniformly samples the genetic space spanned by the target population of genotypes, thereby increasing predictive ability. To evaluate our method, we constructed training sets alongside with the identification of corresponding genomic prediction models for four genotype panels that differed in the amount of population structure they contained (maize Flint, maize Dent, wheat, and rice). Training sets were constructed using uniform sampling, stratified-uniform sampling, stratified sampling and random sampling. We compared these methods with a method that maximizes the generalized coefficient of determination (CD). Several training set sizes were considered. We investigated four genomic prediction models: multi-locus QTL models, GBLUP models, combinations of QTL and GBLUPs, and Reproducing Kernel Hilbert Space (RKHS) models. For the maize and wheat panels, construction of the training set under uniform sampling led to a larger predictive ability than under stratified and random sampling. The results of our methods were similar to those of the CD method. For the rice panel, all training set construction methods led to similar predictive ability, a reflection of the very strong population structure in this panel. PMID:27672112
Toward a Principled Sampling Theory for Quasi-Orders
Ünlü, Ali; Schrepp, Martin
2016-01-01
Quasi-orders, that is, reflexive and transitive binary relations, have numerous applications. In educational theories, the dependencies of mastery among the problems of a test can be modeled by quasi-orders. Methods such as item tree or Boolean analysis that mine for quasi-orders in empirical data are sensitive to the underlying quasi-order structure. These data mining techniques have to be compared based on extensive simulation studies, with unbiased samples of randomly generated quasi-orders at their basis. In this paper, we develop techniques that can provide the required quasi-order samples. We introduce a discrete doubly inductive procedure for incrementally constructing the set of all quasi-orders on a finite item set. A randomization of this deterministic procedure allows us to generate representative samples of random quasi-orders. With an outer level inductive algorithm, we consider the uniform random extensions of the trace quasi-orders to higher dimension. This is combined with an inner level inductive algorithm to correct the extensions that violate the transitivity property. The inner level correction step entails sampling biases. We propose three algorithms for bias correction and investigate them in simulation. It is evident that, on even up to 50 items, the new algorithms create close to representative quasi-order samples within acceptable computing time. Hence, the principled approach is a significant improvement to existing methods that are used to draw quasi-orders uniformly at random but cannot cope with reasonably large item sets. PMID:27965601
Toward a Principled Sampling Theory for Quasi-Orders.
Ünlü, Ali; Schrepp, Martin
2016-01-01
Quasi-orders, that is, reflexive and transitive binary relations, have numerous applications. In educational theories, the dependencies of mastery among the problems of a test can be modeled by quasi-orders. Methods such as item tree or Boolean analysis that mine for quasi-orders in empirical data are sensitive to the underlying quasi-order structure. These data mining techniques have to be compared based on extensive simulation studies, with unbiased samples of randomly generated quasi-orders at their basis. In this paper, we develop techniques that can provide the required quasi-order samples. We introduce a discrete doubly inductive procedure for incrementally constructing the set of all quasi-orders on a finite item set. A randomization of this deterministic procedure allows us to generate representative samples of random quasi-orders. With an outer level inductive algorithm, we consider the uniform random extensions of the trace quasi-orders to higher dimension. This is combined with an inner level inductive algorithm to correct the extensions that violate the transitivity property. The inner level correction step entails sampling biases. We propose three algorithms for bias correction and investigate them in simulation. It is evident that, on even up to 50 items, the new algorithms create close to representative quasi-order samples within acceptable computing time. Hence, the principled approach is a significant improvement to existing methods that are used to draw quasi-orders uniformly at random but cannot cope with reasonably large item sets.
Improved Compressive Sensing of Natural Scenes Using Localized Random Sampling
Barranca, Victor J.; Kovačič, Gregor; Zhou, Douglas; Cai, David
2016-01-01
Compressive sensing (CS) theory demonstrates that by using uniformly-random sampling, rather than uniformly-spaced sampling, higher quality image reconstructions are often achievable. Considering that the structure of sampling protocols has such a profound impact on the quality of image reconstructions, we formulate a new sampling scheme motivated by physiological receptive field structure, localized random sampling, which yields significantly improved CS image reconstructions. For each set of localized image measurements, our sampling method first randomly selects an image pixel and then measures its nearby pixels with probability depending on their distance from the initially selected pixel. We compare the uniformly-random and localized random sampling methods over a large space of sampling parameters, and show that, for the optimal parameter choices, higher quality image reconstructions can be consistently obtained by using localized random sampling. In addition, we argue that the localized random CS optimal parameter choice is stable with respect to diverse natural images, and scales with the number of samples used for reconstruction. We expect that the localized random sampling protocol helps to explain the evolutionarily advantageous nature of receptive field structure in visual systems and suggests several future research areas in CS theory and its application to brain imaging. PMID:27555464
Improved high-dimensional prediction with Random Forests by the use of co-data.
Te Beest, Dennis E; Mes, Steven W; Wilting, Saskia M; Brakenhoff, Ruud H; van de Wiel, Mark A
2017-12-28
Prediction in high dimensional settings is difficult due to the large number of variables relative to the sample size. We demonstrate how auxiliary 'co-data' can be used to improve the performance of a Random Forest in such a setting. Co-data are incorporated in the Random Forest by replacing the uniform sampling probabilities that are used to draw candidate variables by co-data moderated sampling probabilities. Co-data here are defined as any type information that is available on the variables of the primary data, but does not use its response labels. These moderated sampling probabilities are, inspired by empirical Bayes, learned from the data at hand. We demonstrate the co-data moderated Random Forest (CoRF) with two examples. In the first example we aim to predict the presence of a lymph node metastasis with gene expression data. We demonstrate how a set of external p-values, a gene signature, and the correlation between gene expression and DNA copy number can improve the predictive performance. In the second example we demonstrate how the prediction of cervical (pre-)cancer with methylation data can be improved by including the location of the probe relative to the known CpG islands, the number of CpG sites targeted by a probe, and a set of p-values from a related study. The proposed method is able to utilize auxiliary co-data to improve the performance of a Random Forest.
Correcting Evaluation Bias of Relational Classifiers with Network Cross Validation
2010-01-01
classi- fication algorithms: simple random resampling (RRS), equal-instance random resampling (ERS), and network cross-validation ( NCV ). The first two... NCV procedure that eliminates overlap between test sets altogether. The procedure samples for k disjoint test sets that will be used for evaluation...propLabeled ∗ S) nodes from train Pool in f erenceSet =network − trainSet F = F ∪ < trainSet, test Set, in f erenceSet > end for output: F NCV addresses
True Randomness from Big Data.
Papakonstantinou, Periklis A; Woodruff, David P; Yang, Guang
2016-09-26
Generating random bits is a difficult task, which is important for physical systems simulation, cryptography, and many applications that rely on high-quality random bits. Our contribution is to show how to generate provably random bits from uncertain events whose outcomes are routinely recorded in the form of massive data sets. These include scientific data sets, such as in astronomics, genomics, as well as data produced by individuals, such as internet search logs, sensor networks, and social network feeds. We view the generation of such data as the sampling process from a big source, which is a random variable of size at least a few gigabytes. Our view initiates the study of big sources in the randomness extraction literature. Previous approaches for big sources rely on statistical assumptions about the samples. We introduce a general method that provably extracts almost-uniform random bits from big sources and extensively validate it empirically on real data sets. The experimental findings indicate that our method is efficient enough to handle large enough sources, while previous extractor constructions are not efficient enough to be practical. Quality-wise, our method at least matches quantum randomness expanders and classical world empirical extractors as measured by standardized tests.
NASA Astrophysics Data System (ADS)
Papakonstantinou, Periklis A.; Woodruff, David P.; Yang, Guang
2016-09-01
Generating random bits is a difficult task, which is important for physical systems simulation, cryptography, and many applications that rely on high-quality random bits. Our contribution is to show how to generate provably random bits from uncertain events whose outcomes are routinely recorded in the form of massive data sets. These include scientific data sets, such as in astronomics, genomics, as well as data produced by individuals, such as internet search logs, sensor networks, and social network feeds. We view the generation of such data as the sampling process from a big source, which is a random variable of size at least a few gigabytes. Our view initiates the study of big sources in the randomness extraction literature. Previous approaches for big sources rely on statistical assumptions about the samples. We introduce a general method that provably extracts almost-uniform random bits from big sources and extensively validate it empirically on real data sets. The experimental findings indicate that our method is efficient enough to handle large enough sources, while previous extractor constructions are not efficient enough to be practical. Quality-wise, our method at least matches quantum randomness expanders and classical world empirical extractors as measured by standardized tests.
Papakonstantinou, Periklis A.; Woodruff, David P.; Yang, Guang
2016-01-01
Generating random bits is a difficult task, which is important for physical systems simulation, cryptography, and many applications that rely on high-quality random bits. Our contribution is to show how to generate provably random bits from uncertain events whose outcomes are routinely recorded in the form of massive data sets. These include scientific data sets, such as in astronomics, genomics, as well as data produced by individuals, such as internet search logs, sensor networks, and social network feeds. We view the generation of such data as the sampling process from a big source, which is a random variable of size at least a few gigabytes. Our view initiates the study of big sources in the randomness extraction literature. Previous approaches for big sources rely on statistical assumptions about the samples. We introduce a general method that provably extracts almost-uniform random bits from big sources and extensively validate it empirically on real data sets. The experimental findings indicate that our method is efficient enough to handle large enough sources, while previous extractor constructions are not efficient enough to be practical. Quality-wise, our method at least matches quantum randomness expanders and classical world empirical extractors as measured by standardized tests. PMID:27666514
Estimation After a Group Sequential Trial.
Milanzi, Elasma; Molenberghs, Geert; Alonso, Ariel; Kenward, Michael G; Tsiatis, Anastasios A; Davidian, Marie; Verbeke, Geert
2015-10-01
Group sequential trials are one important instance of studies for which the sample size is not fixed a priori but rather takes one of a finite set of pre-specified values, dependent on the observed data. Much work has been devoted to the inferential consequences of this design feature. Molenberghs et al (2012) and Milanzi et al (2012) reviewed and extended the existing literature, focusing on a collection of seemingly disparate, but related, settings, namely completely random sample sizes, group sequential studies with deterministic and random stopping rules, incomplete data, and random cluster sizes. They showed that the ordinary sample average is a viable option for estimation following a group sequential trial, for a wide class of stopping rules and for random outcomes with a distribution in the exponential family. Their results are somewhat surprising in the sense that the sample average is not optimal, and further, there does not exist an optimal, or even, unbiased linear estimator. However, the sample average is asymptotically unbiased, both conditionally upon the observed sample size as well as marginalized over it. By exploiting ignorability they showed that the sample average is the conventional maximum likelihood estimator. They also showed that a conditional maximum likelihood estimator is finite sample unbiased, but is less efficient than the sample average and has the larger mean squared error. Asymptotically, the sample average and the conditional maximum likelihood estimator are equivalent. This previous work is restricted, however, to the situation in which the the random sample size can take only two values, N = n or N = 2 n . In this paper, we consider the more practically useful setting of sample sizes in a the finite set { n 1 , n 2 , …, n L }. It is shown that the sample average is then a justifiable estimator , in the sense that it follows from joint likelihood estimation, and it is consistent and asymptotically unbiased. We also show why simulations can give the false impression of bias in the sample average when considered conditional upon the sample size. The consequence is that no corrections need to be made to estimators following sequential trials. When small-sample bias is of concern, the conditional likelihood estimator provides a relatively straightforward modification to the sample average. Finally, it is shown that classical likelihood-based standard errors and confidence intervals can be applied, obviating the need for technical corrections.
Duchêne, Sebastián; Duchêne, David; Holmes, Edward C; Ho, Simon Y W
2015-07-01
Rates and timescales of viral evolution can be estimated using phylogenetic analyses of time-structured molecular sequences. This involves the use of molecular-clock methods, calibrated by the sampling times of the viral sequences. However, the spread of these sampling times is not always sufficient to allow the substitution rate to be estimated accurately. We conducted Bayesian phylogenetic analyses of simulated virus data to evaluate the performance of the date-randomization test, which is sometimes used to investigate whether time-structured data sets have temporal signal. An estimate of the substitution rate passes this test if its mean does not fall within the 95% credible intervals of rate estimates obtained using replicate data sets in which the sampling times have been randomized. We find that the test sometimes fails to detect rate estimates from data with no temporal signal. This error can be minimized by using a more conservative criterion, whereby the 95% credible interval of the estimate with correct sampling times should not overlap with those obtained with randomized sampling times. We also investigated the behavior of the test when the sampling times are not uniformly distributed throughout the tree, which sometimes occurs in empirical data sets. The test performs poorly in these circumstances, such that a modification to the randomization scheme is needed. Finally, we illustrate the behavior of the test in analyses of nucleotide sequences of cereal yellow dwarf virus. Our results validate the use of the date-randomization test and allow us to propose guidelines for interpretation of its results. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Health plan auditing: 100-percent-of-claims vs. random-sample audits.
Sillup, George P; Klimberg, Ronald K
2011-01-01
The objective of this study was to examine the relative efficacy of two different methodologies for auditing self-funded medical claim expenses: 100-percent-of-claims auditing versus random-sampling auditing. Multiple data sets of claim errors or 'exceptions' from two Fortune-100 corporations were analysed and compared to 100 simulated audits of 300- and 400-claim random samples. Random-sample simulations failed to identify a significant number and amount of the errors that ranged from $200,000 to $750,000. These results suggest that health plan expenses of corporations could be significantly reduced if they audited 100% of claims and embraced a zero-defect approach.
DOT National Transportation Integrated Search
2016-09-01
We consider the problem of solving mixed random linear equations with k components. This is the noiseless setting of mixed linear regression. The goal is to estimate multiple linear models from mixed samples in the case where the labels (which sample...
Humphreys, Keith; Blodgett, Janet C; Wagner, Todd H
2014-11-01
Observational studies of Alcoholics Anonymous' (AA) effectiveness are vulnerable to self-selection bias because individuals choose whether or not to attend AA. The present study, therefore, employed an innovative statistical technique to derive a selection bias-free estimate of AA's impact. Six data sets from 5 National Institutes of Health-funded randomized trials (1 with 2 independent parallel arms) of AA facilitation interventions were analyzed using instrumental variables models. Alcohol-dependent individuals in one of the data sets (n = 774) were analyzed separately from the rest of sample (n = 1,582 individuals pooled from 5 data sets) because of heterogeneity in sample parameters. Randomization itself was used as the instrumental variable. Randomization was a good instrument in both samples, effectively predicting increased AA attendance that could not be attributed to self-selection. In 5 of the 6 data sets, which were pooled for analysis, increased AA attendance that was attributable to randomization (i.e., free of self-selection bias) was effective at increasing days of abstinence at 3-month (B = 0.38, p = 0.001) and 15-month (B = 0.42, p = 0.04) follow-up. However, in the remaining data set, in which preexisting AA attendance was much higher, further increases in AA involvement caused by the randomly assigned facilitation intervention did not affect drinking outcome. For most individuals seeking help for alcohol problems, increasing AA attendance leads to short- and long-term decreases in alcohol consumption that cannot be attributed to self-selection. However, for populations with high preexisting AA involvement, further increases in AA attendance may have little impact. Copyright © 2014 by the Research Society on Alcoholism.
Chu, Hui-May; Ette, Ene I
2005-09-02
his study was performed to develop a new nonparametric approach for the estimation of robust tissue-to-plasma ratio from extremely sparsely sampled paired data (ie, one sample each from plasma and tissue per subject). Tissue-to-plasma ratio was estimated from paired/unpaired experimental data using independent time points approach, area under the curve (AUC) values calculated with the naïve data averaging approach, and AUC values calculated using sampling based approaches (eg, the pseudoprofile-based bootstrap [PpbB] approach and the random sampling approach [our proposed approach]). The random sampling approach involves the use of a 2-phase algorithm. The convergence of the sampling/resampling approaches was investigated, as well as the robustness of the estimates produced by different approaches. To evaluate the latter, new data sets were generated by introducing outlier(s) into the real data set. One to 2 concentration values were inflated by 10% to 40% from their original values to produce the outliers. Tissue-to-plasma ratios computed using the independent time points approach varied between 0 and 50 across time points. The ratio obtained from AUC values acquired using the naive data averaging approach was not associated with any measure of uncertainty or variability. Calculating the ratio without regard to pairing yielded poorer estimates. The random sampling and pseudoprofile-based bootstrap approaches yielded tissue-to-plasma ratios with uncertainty and variability. However, the random sampling approach, because of the 2-phase nature of its algorithm, yielded more robust estimates and required fewer replications. Therefore, a 2-phase random sampling approach is proposed for the robust estimation of tissue-to-plasma ratio from extremely sparsely sampled data.
NASA Astrophysics Data System (ADS)
Erener, Arzu; Sivas, A. Abdullah; Selcuk-Kestel, A. Sevtap; Düzgün, H. Sebnem
2017-07-01
All of the quantitative landslide susceptibility mapping (QLSM) methods requires two basic data types, namely, landslide inventory and factors that influence landslide occurrence (landslide influencing factors, LIF). Depending on type of landslides, nature of triggers and LIF, accuracy of the QLSM methods differs. Moreover, how to balance the number of 0 (nonoccurrence) and 1 (occurrence) in the training set obtained from the landslide inventory and how to select which one of the 1's and 0's to be included in QLSM models play critical role in the accuracy of the QLSM. Although performance of various QLSM methods is largely investigated in the literature, the challenge of training set construction is not adequately investigated for the QLSM methods. In order to tackle this challenge, in this study three different training set selection strategies along with the original data set is used for testing the performance of three different regression methods namely Logistic Regression (LR), Bayesian Logistic Regression (BLR) and Fuzzy Logistic Regression (FLR). The first sampling strategy is proportional random sampling (PRS), which takes into account a weighted selection of landslide occurrences in the sample set. The second method, namely non-selective nearby sampling (NNS), includes randomly selected sites and their surrounding neighboring points at certain preselected distances to include the impact of clustering. Selective nearby sampling (SNS) is the third method, which concentrates on the group of 1's and their surrounding neighborhood. A randomly selected group of landslide sites and their neighborhood are considered in the analyses similar to NNS parameters. It is found that LR-PRS, FLR-PRS and BLR-Whole Data set-ups, with order, yield the best fits among the other alternatives. The results indicate that in QLSM based on regression models, avoidance of spatial correlation in the data set is critical for the model's performance.
NASA Technical Reports Server (NTRS)
Rao, R. G. S.; Ulaby, F. T.
1977-01-01
The paper examines optimal sampling techniques for obtaining accurate spatial averages of soil moisture, at various depths and for cell sizes in the range 2.5-40 acres, with a minimum number of samples. Both simple random sampling and stratified sampling procedures are used to reach a set of recommended sample sizes for each depth and for each cell size. Major conclusions from statistical sampling test results are that (1) the number of samples required decreases with increasing depth; (2) when the total number of samples cannot be prespecified or the moisture in only one single layer is of interest, then a simple random sample procedure should be used which is based on the observed mean and SD for data from a single field; (3) when the total number of samples can be prespecified and the objective is to measure the soil moisture profile with depth, then stratified random sampling based on optimal allocation should be used; and (4) decreasing the sensor resolution cell size leads to fairly large decreases in samples sizes with stratified sampling procedures, whereas only a moderate decrease is obtained in simple random sampling procedures.
Ranked set sampling: cost and optimal set size.
Nahhas, Ramzi W; Wolfe, Douglas A; Chen, Haiying
2002-12-01
McIntyre (1952, Australian Journal of Agricultural Research 3, 385-390) introduced ranked set sampling (RSS) as a method for improving estimation of a population mean in settings where sampling and ranking of units from the population are inexpensive when compared with actual measurement of the units. Two of the major factors in the usefulness of RSS are the set size and the relative costs of the various operations of sampling, ranking, and measurement. In this article, we consider ranking error models and cost models that enable us to assess the effect of different cost structures on the optimal set size for RSS. For reasonable cost structures, we find that the optimal RSS set sizes are generally larger than had been anticipated previously. These results will provide a useful tool for determining whether RSS is likely to lead to an improvement over simple random sampling in a given setting and, if so, what RSS set size is best to use in this case.
Dolch, Michael E; Janitza, Silke; Boulesteix, Anne-Laure; Graßmann-Lichtenauer, Carola; Praun, Siegfried; Denzer, Wolfgang; Schelling, Gustav; Schubert, Sören
2016-12-01
Identification of microorganisms in positive blood cultures still relies on standard techniques such as Gram staining followed by culturing with definite microorganism identification. Alternatively, matrix-assisted laser desorption/ionization time-of-flight mass spectrometry or the analysis of headspace volatile compound (VC) composition produced by cultures can help to differentiate between microorganisms under experimental conditions. This study assessed the efficacy of volatile compound based microorganism differentiation into Gram-negatives and -positives in unselected positive blood culture samples from patients. Headspace gas samples of positive blood culture samples were transferred to sterilized, sealed, and evacuated 20 ml glass vials and stored at -30 °C until batch analysis. Headspace gas VC content analysis was carried out via an auto sampler connected to an ion-molecule reaction mass spectrometer (IMR-MS). Measurements covered a mass range from 16 to 135 u including CO2, H2, N2, and O2. Prediction rules for microorganism identification based on VC composition were derived using a training data set and evaluated using a validation data set within a random split validation procedure. One-hundred-fifty-two aerobic samples growing 27 Gram-negatives, 106 Gram-positives, and 19 fungi and 130 anaerobic samples growing 37 Gram-negatives, 91 Gram-positives, and two fungi were analysed. In anaerobic samples, ten discriminators were identified by the random forest method allowing for bacteria differentiation into Gram-negative and -positive (error rate: 16.7 % in validation data set). For aerobic samples the error rate was not better than random. In anaerobic blood culture samples of patients IMR-MS based headspace VC composition analysis facilitates bacteria differentiation into Gram-negative and -positive.
Adapted random sampling patterns for accelerated MRI.
Knoll, Florian; Clason, Christian; Diwoky, Clemens; Stollberger, Rudolf
2011-02-01
Variable density random sampling patterns have recently become increasingly popular for accelerated imaging strategies, as they lead to incoherent aliasing artifacts. However, the design of these sampling patterns is still an open problem. Current strategies use model assumptions like polynomials of different order to generate a probability density function that is then used to generate the sampling pattern. This approach relies on the optimization of design parameters which is very time consuming and therefore impractical for daily clinical use. This work presents a new approach that generates sampling patterns by making use of power spectra of existing reference data sets and hence requires neither parameter tuning nor an a priori mathematical model of the density of sampling points. The approach is validated with downsampling experiments, as well as with accelerated in vivo measurements. The proposed approach is compared with established sampling patterns, and the generalization potential is tested by using a range of reference images. Quantitative evaluation is performed for the downsampling experiments using RMS differences to the original, fully sampled data set. Our results demonstrate that the image quality of the method presented in this paper is comparable to that of an established model-based strategy when optimization of the model parameter is carried out and yields superior results to non-optimized model parameters. However, no random sampling pattern showed superior performance when compared to conventional Cartesian subsampling for the considered reconstruction strategy.
Contrasting lexical similarity and formal definitions in SNOMED CT: consistency and implications.
Agrawal, Ankur; Elhanan, Gai
2014-02-01
To quantify the presence of and evaluate an approach for detection of inconsistencies in the formal definitions of SNOMED CT (SCT) concepts utilizing a lexical method. Utilizing SCT's Procedure hierarchy, we algorithmically formulated similarity sets: groups of concepts with similar lexical structure of their fully specified name. We formulated five random samples, each with 50 similarity sets, based on the same parameter: number of parents, attributes, groups, all the former as well as a randomly selected control sample. All samples' sets were reviewed for types of formal definition inconsistencies: hierarchical, attribute assignment, attribute target values, groups, and definitional. For the Procedure hierarchy, 2111 similarity sets were formulated, covering 18.1% of eligible concepts. The evaluation revealed that 38 (Control) to 70% (Different relationships) of similarity sets within the samples exhibited significant inconsistencies. The rate of inconsistencies for the sample with different relationships was highly significant compared to Control, as well as the number of attribute assignment and hierarchical inconsistencies within their respective samples. While, at this time of the HITECH initiative, the formal definitions of SCT are only a minor consideration, in the grand scheme of sophisticated, meaningful use of captured clinical data, they are essential. However, significant portion of the concepts in the most semantically complex hierarchy of SCT, the Procedure hierarchy, are modeled inconsistently in a manner that affects their computability. Lexical methods can efficiently identify such inconsistencies and possibly allow for their algorithmic resolution. Copyright © 2013 Elsevier Inc. All rights reserved.
Using Non-experimental Data to Estimate Treatment Effects
Stuart, Elizabeth A.; Marcus, Sue M.; Horvitz-Lennon, Marcela V.; Gibbons, Robert D.; Normand, Sharon-Lise T.
2009-01-01
While much psychiatric research is based on randomized controlled trials (RCTs), where patients are randomly assigned to treatments, sometimes RCTs are not feasible. This paper describes propensity score approaches, which are increasingly used for estimating treatment effects in non-experimental settings. The primary goal of propensity score methods is to create sets of treated and comparison subjects who look as similar as possible, in essence replicating a randomized experiment, at least with respect to observed patient characteristics. A study to estimate the metabolic effects of antipsychotic medication in a sample of Florida Medicaid beneficiaries with schizophrenia illustrates methods. PMID:20563313
Torii, Manabu; Yin, Lanlan; Nguyen, Thang; Mazumdar, Chand T.; Liu, Hongfang; Hartley, David M.; Nelson, Noele P.
2014-01-01
Purpose Early detection of infectious disease outbreaks is crucial to protecting the public health of a society. Online news articles provide timely information on disease outbreaks worldwide. In this study, we investigated automated detection of articles relevant to disease outbreaks using machine learning classifiers. In a real-life setting, it is expensive to prepare a training data set for classifiers, which usually consists of manually labeled relevant and irrelevant articles. To mitigate this challenge, we examined the use of randomly sampled unlabeled articles as well as labeled relevant articles. Methods Naïve Bayes and Support Vector Machine (SVM) classifiers were trained on 149 relevant and 149 or more randomly sampled unlabeled articles. Diverse classifiers were trained by varying the number of sampled unlabeled articles and also the number of word features. The trained classifiers were applied to 15 thousand articles published over 15 days. Top-ranked articles from each classifier were pooled and the resulting set of 1337 articles was reviewed by an expert analyst to evaluate the classifiers. Results Daily averages of areas under ROC curves (AUCs) over the 15-day evaluation period were 0.841 and 0.836, respectively, for the naïve Bayes and SVM classifier. We referenced a database of disease outbreak reports to confirm that this evaluation data set resulted from the pooling method indeed covered incidents recorded in the database during the evaluation period. Conclusions The proposed text classification framework utilizing randomly sampled unlabeled articles can facilitate a cost-effective approach to training machine learning classifiers in a real-life Internet-based biosurveillance project. We plan to examine this framework further using larger data sets and using articles in non-English languages. PMID:21134784
Robust reliable sampled-data control for switched systems with application to flight control
NASA Astrophysics Data System (ADS)
Sakthivel, R.; Joby, Maya; Shi, P.; Mathiyalagan, K.
2016-11-01
This paper addresses the robust reliable stabilisation problem for a class of uncertain switched systems with random delays and norm bounded uncertainties. The main aim of this paper is to obtain the reliable robust sampled-data control design which involves random time delay with an appropriate gain control matrix for achieving the robust exponential stabilisation for uncertain switched system against actuator failures. In particular, the involved delays are assumed to be randomly time-varying which obeys certain mutually uncorrelated Bernoulli distributed white noise sequences. By constructing an appropriate Lyapunov-Krasovskii functional (LKF) and employing an average-dwell time approach, a new set of criteria is derived for ensuring the robust exponential stability of the closed-loop switched system. More precisely, the Schur complement and Jensen's integral inequality are used in derivation of stabilisation criteria. By considering the relationship among the random time-varying delay and its lower and upper bounds, a new set of sufficient condition is established for the existence of reliable robust sampled-data control in terms of solution to linear matrix inequalities (LMIs). Finally, an illustrative example based on the F-18 aircraft model is provided to show the effectiveness of the proposed design procedures.
A Bayesian sequential design with adaptive randomization for 2-sided hypothesis test.
Yu, Qingzhao; Zhu, Lin; Zhu, Han
2017-11-01
Bayesian sequential and adaptive randomization designs are gaining popularity in clinical trials thanks to their potentials to reduce the number of required participants and save resources. We propose a Bayesian sequential design with adaptive randomization rates so as to more efficiently attribute newly recruited patients to different treatment arms. In this paper, we consider 2-arm clinical trials. Patients are allocated to the 2 arms with a randomization rate to achieve minimum variance for the test statistic. Algorithms are presented to calculate the optimal randomization rate, critical values, and power for the proposed design. Sensitivity analysis is implemented to check the influence on design by changing the prior distributions. Simulation studies are applied to compare the proposed method and traditional methods in terms of power and actual sample sizes. Simulations show that, when total sample size is fixed, the proposed design can obtain greater power and/or cost smaller actual sample size than the traditional Bayesian sequential design. Finally, we apply the proposed method to a real data set and compare the results with the Bayesian sequential design without adaptive randomization in terms of sample sizes. The proposed method can further reduce required sample size. Copyright © 2017 John Wiley & Sons, Ltd.
Sparse sampling and reconstruction for electron and scanning probe microscope imaging
Anderson, Hyrum; Helms, Jovana; Wheeler, Jason W.; Larson, Kurt W.; Rohrer, Brandon R.
2015-07-28
Systems and methods for conducting electron or scanning probe microscopy are provided herein. In a general embodiment, the systems and methods for conducting electron or scanning probe microscopy with an undersampled data set include: driving an electron beam or probe to scan across a sample and visit a subset of pixel locations of the sample that are randomly or pseudo-randomly designated; determining actual pixel locations on the sample that are visited by the electron beam or probe; and processing data collected by detectors from the visits of the electron beam or probe at the actual pixel locations and recovering a reconstructed image of the sample.
Weighting by Inverse Variance or by Sample Size in Random-Effects Meta-Analysis
ERIC Educational Resources Information Center
Marin-Martinez, Fulgencio; Sanchez-Meca, Julio
2010-01-01
Most of the statistical procedures in meta-analysis are based on the estimation of average effect sizes from a set of primary studies. The optimal weight for averaging a set of independent effect sizes is the inverse variance of each effect size, but in practice these weights have to be estimated, being affected by sampling error. When assuming a…
Wickenberg-Bolin, Ulrika; Göransson, Hanna; Fryknäs, Mårten; Gustafsson, Mats G; Isaksson, Anders
2006-03-13
Supervised learning for classification of cancer employs a set of design examples to learn how to discriminate between tumors. In practice it is crucial to confirm that the classifier is robust with good generalization performance to new examples, or at least that it performs better than random guessing. A suggested alternative is to obtain a confidence interval of the error rate using repeated design and test sets selected from available examples. However, it is known that even in the ideal situation of repeated designs and tests with completely novel samples in each cycle, a small test set size leads to a large bias in the estimate of the true variance between design sets. Therefore different methods for small sample performance estimation such as a recently proposed procedure called Repeated Random Sampling (RSS) is also expected to result in heavily biased estimates, which in turn translates into biased confidence intervals. Here we explore such biases and develop a refined algorithm called Repeated Independent Design and Test (RIDT). Our simulations reveal that repeated designs and tests based on resampling in a fixed bag of samples yield a biased variance estimate. We also demonstrate that it is possible to obtain an improved variance estimate by means of a procedure that explicitly models how this bias depends on the number of samples used for testing. For the special case of repeated designs and tests using new samples for each design and test, we present an exact analytical expression for how the expected value of the bias decreases with the size of the test set. We show that via modeling and subsequent reduction of the small sample bias, it is possible to obtain an improved estimate of the variance of classifier performance between design sets. However, the uncertainty of the variance estimate is large in the simulations performed indicating that the method in its present form cannot be directly applied to small data sets.
Effectiveness of Modular CBT for Child Anxiety in Elementary Schools
ERIC Educational Resources Information Center
Chiu, Angela W.; Langer, David A.; McLeod, Bryce D.; Har, Kim; Drahota, Amy; Galla, Brian M.; Jacobs, Jeffrey; Ifekwunigwe, Muriel; Wood, Jeffrey J.
2013-01-01
Most randomized controlled trials of cognitive-behavioral therapy (CBT) for children with anxiety disorders have evaluated treatment efficacy using recruited samples treated in research settings. Clinical trials in school settings are needed to determine if CBT can be effective when delivered in real world settings. This study evaluated a modular…
NASA Astrophysics Data System (ADS)
von Pezold, Johann; Dick, Alexey; Friák, Martin; Neugebauer, Jörg
2010-03-01
The performance of special-quasirandom structures (SQSs) for the description of elastic properties of random alloys was evaluated. A set of system-independent 32-atom-fcc SQS spanning the entire concentration range was generated and used to determine C11 , C12 , and C44 of binary random substitutional AlTi alloys. The elastic properties of these alloys could be described using the set of SQS with an accuracy comparable to the accuracy achievable by statistical sampling of the configurational space of 3×3×3 (108 atom, C44 ) and 4×4×4 (256 atom, C11 and C12 ) fcc supercells, irrespective of the impurity concentration. The smaller system size makes the proposed SQS ideal candidates for the ab initio determination of the elastic constants of random substitutional alloys. The set of optimized SQS is provided.
Susukida, Ryoko; Crum, Rosa M; Stuart, Elizabeth A; Ebnesajjad, Cyrus; Mojtabai, Ramin
2016-07-01
To compare the characteristics of individuals participating in randomized controlled trials (RCTs) of treatments of substance use disorder (SUD) with individuals receiving treatment in usual care settings, and to provide a summary quantitative measure of differences between characteristics of these two groups of individuals using propensity score methods. Design Analyses using data from RCT samples from the National Institute of Drug Abuse Clinical Trials Network (CTN) and target populations of patients drawn from the Treatment Episodes Data Set-Admissions (TEDS-A). Settings Multiple clinical trial sites and nation-wide usual SUD treatment settings in the United States. A total of 3592 individuals from 10 CTN samples and 1 602 226 individuals selected from TEDS-A between 2001 and 2009. Measurements The propensity scores for enrolling in the RCTs were computed based on the following nine observable characteristics: sex, race/ethnicity, age, education, employment status, marital status, admission to treatment through criminal justice, intravenous drug use and the number of prior treatments. Findings The proportion of those with ≥ 12 years of education and the proportion of those who had full-time jobs were significantly higher among RCT samples than among target populations (in seven and nine trials, respectively, at P < 0.001). The pooled difference in the mean propensity scores between the RCTs and the target population was 1.54 standard deviations and was statistically significant at P < 0.001. In the United States, individuals recruited into randomized controlled trials of substance use disorder treatments appear to be very different from individuals receiving treatment in usual care settings. Notably, RCT participants tend to have more years of education and a greater likelihood of full-time work compared with people receiving care in usual care settings. © 2016 Society for the Study of Addiction.
Effective Recruitment of Schools for Randomized Clinical Trials: Role of School Nurses.
Petosa, R L; Smith, L
2017-01-01
In school settings, nurses lead efforts to improve the student health and well-being to support academic success. Nurses are guided by evidenced-based practice and data to inform care decisions. The randomized controlled trial (RCT) is considered the gold standard of scientific rigor for clinical trials. RCTs are critical to the development of evidence-based health promotion programs in schools. The purpose of this article is to present practical solutions to implementing principles of randomization to RCT trials conducted in school settings. Randomization is a powerful sampling method used to build internal and external validity. The school's daily organization and educational mission provide several barriers to randomization. Based on the authors' experience in conducting school-based RCTs, they offer a host of practical solutions to working with schools to successfully implement randomization procedures. Nurses play a critical role in implementing RCTs in schools to promote rigorous science in support of evidence-based practice.
Rational Variability in Children's Causal Inferences: The Sampling Hypothesis
ERIC Educational Resources Information Center
Denison, Stephanie; Bonawitz, Elizabeth; Gopnik, Alison; Griffiths, Thomas L.
2013-01-01
We present a proposal--"The Sampling Hypothesis"--suggesting that the variability in young children's responses may be part of a rational strategy for inductive inference. In particular, we argue that young learners may be randomly sampling from the set of possible hypotheses that explain the observed data, producing different hypotheses with…
The Recruitment and Retention of People with Disabilities. Report 301.
ERIC Educational Resources Information Center
Dench, S.; And Others
A British survey of employers examined the recruitment and retention of people with disabilities (PWDs). Telephone interviews were conducted with two samples of employers: a random sample of 1,250 and a sample of 250 registered users of the Employment Service's "Disability Symbol," which sets a good practice standard for the employment…
Richter, Randy R; Sebelski, Chris A; Austin, Tricia M
2016-09-01
The quality of abstract reporting in physical therapy literature is unknown. The purpose of this study was to provide baseline data for judging the future impact of the 2010 Consolidated Standards of Reporting Trials statement specifically referencing the 2008 Consolidated Standards of Reporting Trials statement for reporting of abstracts of randomized controlled trials across and between a broad sample and a core sample of physical therapy literature. A cross-sectional, bibliographic analysis was conducted. Abstracts of randomized controlled trials from 2009 were retrieved from PubMed, PEDro, and CENTRAL. Eligibility was determined using PEDro criteria. For outcomes measures, items from the Consolidated Standards of Reporting Trials statement for abstract reporting were used for assessment. Raters were not blinded to citation details. Using a computer-generated set of random numbers, 150 abstracts from 112 journals comprised the broad sample. A total of 53 abstracts comprised the core sample. Fourteen of 20 Consolidated Standards of Reporting Trials items for both samples were reported in less than 50% of the abstracts. Significantly more abstracts in the core sample reported (% difference core - broad; 95% confidence interval) title (28.4%; 12.9%-41.2%), blinding (15.2%; 1.6%-29.8%), setting (47.6%; 32.4%-59.4%), and confidence intervals (13.1%; 5.0%-25.1%). These findings provide baseline data for determining if continuing efforts to improve abstract reporting are heeded.
A study of active learning methods for named entity recognition in clinical text.
Chen, Yukun; Lasko, Thomas A; Mei, Qiaozhu; Denny, Joshua C; Xu, Hua
2015-12-01
Named entity recognition (NER), a sequential labeling task, is one of the fundamental tasks for building clinical natural language processing (NLP) systems. Machine learning (ML) based approaches can achieve good performance, but they often require large amounts of annotated samples, which are expensive to build due to the requirement of domain experts in annotation. Active learning (AL), a sample selection approach integrated with supervised ML, aims to minimize the annotation cost while maximizing the performance of ML-based models. In this study, our goal was to develop and evaluate both existing and new AL methods for a clinical NER task to identify concepts of medical problems, treatments, and lab tests from the clinical notes. Using the annotated NER corpus from the 2010 i2b2/VA NLP challenge that contained 349 clinical documents with 20,423 unique sentences, we simulated AL experiments using a number of existing and novel algorithms in three different categories including uncertainty-based, diversity-based, and baseline sampling strategies. They were compared with the passive learning that uses random sampling. Learning curves that plot performance of the NER model against the estimated annotation cost (based on number of sentences or words in the training set) were generated to evaluate different active learning and the passive learning methods and the area under the learning curve (ALC) score was computed. Based on the learning curves of F-measure vs. number of sentences, uncertainty sampling algorithms outperformed all other methods in ALC. Most diversity-based methods also performed better than random sampling in ALC. To achieve an F-measure of 0.80, the best method based on uncertainty sampling could save 66% annotations in sentences, as compared to random sampling. For the learning curves of F-measure vs. number of words, uncertainty sampling methods again outperformed all other methods in ALC. To achieve 0.80 in F-measure, in comparison to random sampling, the best uncertainty based method saved 42% annotations in words. But the best diversity based method reduced only 7% annotation effort. In the simulated setting, AL methods, particularly uncertainty-sampling based approaches, seemed to significantly save annotation cost for the clinical NER task. The actual benefit of active learning in clinical NER should be further evaluated in a real-time setting. Copyright © 2015 Elsevier Inc. All rights reserved.
Adiposity and Quality of Life: A Case Study from an Urban Center in Nigeria
ERIC Educational Resources Information Center
Akinpelu, Aderonke O.; Akinola, Odunayo T.; Gbiri, Caleb A.
2009-01-01
Objective: To determine relationship between adiposity indices and quality of life (QOL) of residents of a housing estate in Lagos, Nigeria. Design: Cross-sectional survey employing multistep random sampling method. Setting: Urban residential estate. Participants: This study involved 900 randomly selected residents of Abesan Housing Estate, Lagos,…
A pilot cluster randomized controlled trial of structured goal-setting following stroke.
Taylor, William J; Brown, Melanie; William, Levack; McPherson, Kathryn M; Reed, Kirk; Dean, Sarah G; Weatherall, Mark
2012-04-01
To determine the feasibility, the cluster design effect and the variance and minimal clinical importance difference in the primary outcome in a pilot study of a structured approach to goal-setting. A cluster randomized controlled trial. Inpatient rehabilitation facilities. People who were admitted to inpatient rehabilitation following stroke who had sufficient cognition to engage in structured goal-setting and complete the primary outcome measure. Structured goal elicitation using the Canadian Occupational Performance Measure. Quality of life at 12 weeks using the Schedule for Individualised Quality of Life (SEIQOL-DW), Functional Independence Measure, Short Form 36 and Patient Perception of Rehabilitation (measuring satisfaction with rehabilitation). Assessors were blinded to the intervention. Four rehabilitation services and 41 patients were randomized. We found high values of the intraclass correlation for the outcome measures (ranging from 0.03 to 0.40) and high variance of the SEIQOL-DW (SD 19.6) in relation to the minimally importance difference of 2.1, leading to impractically large sample size requirements for a cluster randomized design. A cluster randomized design is not a practical means of avoiding contamination effects in studies of inpatient rehabilitation goal-setting. Other techniques for coping with contamination effects are necessary.
Feature Selection for Ridge Regression with Provable Guarantees.
Paul, Saurabh; Drineas, Petros
2016-04-01
We introduce single-set spectral sparsification as a deterministic sampling-based feature selection technique for regularized least-squares classification, which is the classification analog to ridge regression. The method is unsupervised and gives worst-case guarantees of the generalization power of the classification function after feature selection with respect to the classification function obtained using all features. We also introduce leverage-score sampling as an unsupervised randomized feature selection method for ridge regression. We provide risk bounds for both single-set spectral sparsification and leverage-score sampling on ridge regression in the fixed design setting and show that the risk in the sampled space is comparable to the risk in the full-feature space. We perform experiments on synthetic and real-world data sets; a subset of TechTC-300 data sets, to support our theory. Experimental results indicate that the proposed methods perform better than the existing feature selection methods.
The Accuracy of Estimated Total Test Statistics. Final Report.
ERIC Educational Resources Information Center
Kleinke, David J.
In a post-mortem study of item sampling, 1,050 examinees were divided into ten groups 50 times. Each time, their papers were scored on four different sets of item samples from a 150-item test of academic aptitude. These samples were selected using (a) unstratified random sampling and stratification on (b) content, (c) difficulty, and (d) both.…
NASA Astrophysics Data System (ADS)
Chin, Fun-Tat; Lin, Yu-Hsien; Yang, Wen-Luh; Liao, Chin-Hsuan; Lin, Li-Min; Hsiao, Yu-Ping; Chao, Tien-Sheng
2015-01-01
A limited copper (Cu)-source Cu:SiO2 switching layer composed of various Cu concentrations was fabricated using a chemical soaking (CS) technique. The switching layer was then studied for developing applications in resistive random access memory (ReRAM) devices. Observing the resistive switching mechanism exhibited by all the samples suggested that Cu conductive filaments formed and ruptured during the set/reset process. The experimental results indicated that the endurance property failure that occurred was related to the joule heating effect. Moreover, the endurance switching cycle increased as the Cu concentration decreased. In high-temperature tests, the samples demonstrated that the operating (set/reset) voltages decreased as the temperature increased, and an Arrhenius plot was used to calculate the activation energy of the set/reset process. In addition, the samples demonstrated stable data retention properties when baked at 85 °C, but the samples with low Cu concentrations exhibited short retention times in the low-resistance state (LRS) during 125 °C tests. Therefore, Cu concentration is a crucial factor in the trade-off between the endurance and retention properties; furthermore, the Cu concentration can be easily modulated using this CS technique.
Alcohol risk management in college settings: the safer California universities randomized trial.
Saltz, Robert F; Paschall, Mallie J; McGaffigan, Richard P; Nygaard, Peter M O
2010-12-01
Potentially effective environmental strategies have been recommended to reduce heavy alcohol use among college students. However, studies to date on environmental prevention strategies are few in number and have been limited by their nonexperimental designs, inadequate sample sizes, and lack of attention to settings where the majority of heavy drinking events occur. To determine whether environmental prevention strategies targeting off-campus settings would reduce the likelihood and incidence of student intoxication at those settings. The Safer California Universities study involved 14 large public universities, half of which were assigned randomly to the Safer intervention condition after baseline data collection in 2003. Environmental interventions took place in 2005 and 2006 after 1 year of planning with seven Safer intervention universities. Random cross-sectional samples of undergraduates completed online surveys in four consecutive fall semesters (2003-2006). Campuses and communities surrounding eight campuses of the University of California and six in the California State University system were utilized. The study used random samples of undergraduates (∼500-1000 per campus per year) attending the 14 public California universities. Safer environmental interventions included nuisance party enforcement operations, minor decoy operations, driving-under-the-influence checkpoints, social host ordinances, and use of campus and local media to increase the visibility of environmental strategies. Proportion of drinking occasions in which students drank to intoxication at six different settings during the fall semester (residence hall party, campus event, fraternity or sorority party, party at off-campus apartment or house, bar/restaurant, outdoor setting), any intoxication at each setting during the semester, and whether students drank to intoxication the last time they went to each setting. Significant reductions in the incidence and likelihood of intoxication at off-campus parties and bars/restaurants were observed for Safer intervention universities compared to controls. A lower likelihood of intoxication was observed also for Safer intervention universities the last time students drank at an off-campus party (OR=0.81, 95% CI=0.68, 0.97); a bar or restaurant (OR=0.76, 95% CI=0.62, 0.94); or any setting (OR=0.80, 95% CI=0.65, 0.97). No increase in intoxication (e.g., displacement) appeared in other settings. Further, stronger intervention effects were achieved at Safer universities with the highest level of implementation. Environmental prevention strategies targeting settings where the majority of heavy drinking events occur appear to be effective in reducing the incidence and likelihood of intoxication among college students. Copyright © 2010 American Journal of Preventive Medicine. Published by Elsevier Inc. All rights reserved.
González-Recio, O; Jiménez-Montero, J A; Alenda, R
2013-01-01
In the next few years, with the advent of high-density single nucleotide polymorphism (SNP) arrays and genome sequencing, genomic evaluation methods will need to deal with a large number of genetic variants and an increasing sample size. The boosting algorithm is a machine-learning technique that may alleviate the drawbacks of dealing with such large data sets. This algorithm combines different predictors in a sequential manner with some shrinkage on them; each predictor is applied consecutively to the residuals from the committee formed by the previous ones to form a final prediction based on a subset of covariates. Here, a detailed description is provided and examples using a toy data set are included. A modification of the algorithm called "random boosting" was proposed to increase predictive ability and decrease computation time of genome-assisted evaluation in large data sets. Random boosting uses a random selection of markers to add a subsequent weak learner to the predictive model. These modifications were applied to a real data set composed of 1,797 bulls genotyped for 39,714 SNP. Deregressed proofs of 4 yield traits and 1 type trait from January 2009 routine evaluations were used as dependent variables. A 2-fold cross-validation scenario was implemented. Sires born before 2005 were used as a training sample (1,576 and 1,562 for production and type traits, respectively), whereas younger sires were used as a testing sample to evaluate predictive ability of the algorithm on yet-to-be-observed phenotypes. Comparison with the original algorithm was provided. The predictive ability of the algorithm was measured as Pearson correlations between observed and predicted responses. Further, estimated bias was computed as the average difference between observed and predicted phenotypes. The results showed that the modification of the original boosting algorithm could be run in 1% of the time used with the original algorithm and with negligible differences in accuracy and bias. This modification may be used to speed the calculus of genome-assisted evaluation in large data sets such us those obtained from consortiums. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
TemperSAT: A new efficient fair-sampling random k-SAT solver
NASA Astrophysics Data System (ADS)
Fang, Chao; Zhu, Zheng; Katzgraber, Helmut G.
The set membership problem is of great importance to many applications and, in particular, database searches for target groups. Recently, an approach to speed up set membership searches based on the NP-hard constraint-satisfaction problem (random k-SAT) has been developed. However, the bottleneck of the approach lies in finding the solution to a large SAT formula efficiently and, in particular, a large number of independent solutions is needed to reduce the probability of false positives. Unfortunately, traditional random k-SAT solvers such as WalkSAT are biased when seeking solutions to the Boolean formulas. By porting parallel tempering Monte Carlo to the sampling of binary optimization problems, we introduce a new algorithm (TemperSAT) whose performance is comparable to current state-of-the-art SAT solvers for large k with the added benefit that theoretically it can find many independent solutions quickly. We illustrate our results by comparing to the currently fastest implementation of WalkSAT, WalkSATlm.
Stable and efficient retrospective 4D-MRI using non-uniformly distributed quasi-random numbers
NASA Astrophysics Data System (ADS)
Breuer, Kathrin; Meyer, Cord B.; Breuer, Felix A.; Richter, Anne; Exner, Florian; Weng, Andreas M.; Ströhle, Serge; Polat, Bülent; Jakob, Peter M.; Sauer, Otto A.; Flentje, Michael; Weick, Stefan
2018-04-01
The purpose of this work is the development of a robust and reliable three-dimensional (3D) Cartesian imaging technique for fast and flexible retrospective 4D abdominal MRI during free breathing. To this end, a non-uniform quasi random (NU-QR) reordering of the phase encoding (k y –k z ) lines was incorporated into 3D Cartesian acquisition. The proposed sampling scheme allocates more phase encoding points near the k-space origin while reducing the sampling density in the outer part of the k-space. Respiratory self-gating in combination with SPIRiT-reconstruction is used for the reconstruction of abdominal data sets in different respiratory phases (4D-MRI). Six volunteers and three patients were examined at 1.5 T during free breathing. Additionally, data sets with conventional two-dimensional (2D) linear and 2D quasi random phase encoding order were acquired for the volunteers for comparison. A quantitative evaluation of image quality versus scan times (from 70 s to 626 s) for the given sampling schemes was obtained by calculating the normalized mutual information (NMI) for all volunteers. Motion estimation was accomplished by calculating the maximum derivative of a signal intensity profile of a transition (e.g. tumor or diaphragm). The 2D non-uniform quasi-random distribution of phase encoding lines in Cartesian 3D MRI yields more efficient undersampling patterns for parallel imaging compared to conventional uniform quasi-random and linear sampling. Median NMI values of NU-QR sampling are the highest for all scan times. Therefore, within the same scan time 4D imaging could be performed with improved image quality. The proposed method allows for the reconstruction of motion artifact reduced 4D data sets with isotropic spatial resolution of 2.1 × 2.1 × 2.1 mm3 in a short scan time, e.g. 10 respiratory phases in only 3 min. Cranio-caudal tumor displacements between 23 and 46 mm could be observed. NU-QR sampling enables for stable 4D-MRI with high temporal and spatial resolution within short scan time for visualization of organ or tumor motion during free breathing. Further studies, e.g. the application of the method for radiotherapy planning are needed to investigate the clinical applicability and diagnostic value of the approach.
Iterative random vs. Kennard-Stone sampling for IR spectrum-based classification task using PLS2-DA
NASA Astrophysics Data System (ADS)
Lee, Loong Chuen; Liong, Choong-Yeun; Jemain, Abdul Aziz
2018-04-01
External testing (ET) is preferred over auto-prediction (AP) or k-fold-cross-validation in estimating more realistic predictive ability of a statistical model. With IR spectra, Kennard-stone (KS) sampling algorithm is often used to split the data into training and test sets, i.e. respectively for model construction and for model testing. On the other hand, iterative random sampling (IRS) has not been the favored choice though it is theoretically more likely to produce reliable estimation. The aim of this preliminary work is to compare performances of KS and IRS in sampling a representative training set from an attenuated total reflectance - Fourier transform infrared spectral dataset (of four varieties of blue gel pen inks) for PLS2-DA modeling. The `best' performance achievable from the dataset is estimated with AP on the full dataset (APF, error). Both IRS (n = 200) and KS were used to split the dataset in the ratio of 7:3. The classic decision rule (i.e. maximum value-based) is employed for new sample prediction via partial least squares - discriminant analysis (PLS2-DA). Error rate of each model was estimated repeatedly via: (a) AP on full data (APF, error); (b) AP on training set (APS, error); and (c) ET on the respective test set (ETS, error). A good PLS2-DA model is expected to produce APS, error and EVS, error that is similar to the APF, error. Bearing that in mind, the similarities between (a) APS, error vs. APF, error; (b) ETS, error vs. APF, error and; (c) APS, error vs. ETS, error were evaluated using correlation tests (i.e. Pearson and Spearman's rank test), using series of PLS2-DA models computed from KS-set and IRS-set, respectively. Overall, models constructed from IRS-set exhibits more similarities between the internal and external error rates than the respective KS-set, i.e. less risk of overfitting. In conclusion, IRS is more reliable than KS in sampling representative training set.
A systematic examination of a random sampling strategy for source apportionment calculations.
Andersson, August
2011-12-15
Estimating the relative contributions from multiple potential sources of a specific component in a mixed environmental matrix is a general challenge in diverse fields such as atmospheric, environmental and earth sciences. Perhaps the most common strategy for tackling such problems is by setting up a system of linear equations for the fractional influence of different sources. Even though an algebraic solution of this approach is possible for the common situation with N+1 sources and N source markers, such methodology introduces a bias, since it is implicitly assumed that the calculated fractions and the corresponding uncertainties are independent of the variability of the source distributions. Here, a random sampling (RS) strategy for accounting for such statistical bias is examined by investigating rationally designed synthetic data sets. This random sampling methodology is found to be robust and accurate with respect to reproducibility and predictability. This method is also compared to a numerical integration solution for a two-source situation where source variability also is included. A general observation from this examination is that the variability of the source profiles not only affects the calculated precision but also the mean/median source contributions. Copyright © 2011 Elsevier B.V. All rights reserved.
Crampin, A C; Mwinuka, V; Malema, S S; Glynn, J R; Fine, P E
2001-01-01
Selection bias, particularly of controls, is common in case-control studies and may materially affect the results. Methods of control selection should be tailored both for the risk factors and disease under investigation and for the population being studied. We present here a control selection method devised for a case-control study of tuberculosis in rural Africa (Karonga, northern Malawi) that selects an age/sex frequency-matched random sample of the population, with a geographical distribution in proportion to the population density. We also present an audit of the selection process, and discuss the potential of this method in other settings.
Liu, Gui-Song; Guo, Hao-Song; Pan, Tao; Wang, Ji-Hua; Cao, Gan
2014-10-01
Based on Savitzky-Golay (SG) smoothing screening, principal component analysis (PCA) combined with separately supervised linear discriminant analysis (LDA) and unsupervised hierarchical clustering analysis (HCA) were used for non-destructive visible and near-infrared (Vis-NIR) detection for breed screening of transgenic sugarcane. A random and stability-dependent framework of calibration, prediction, and validation was proposed. A total of 456 samples of sugarcane leaves planting in the elongating stage were collected from the field, which was composed of 306 transgenic (positive) samples containing Bt and Bar gene and 150 non-transgenic (negative) samples. A total of 156 samples (negative 50 and positive 106) were randomly selected as the validation set; the remaining samples (negative 100 and positive 200, a total of 300 samples) were used as the modeling set, and then the modeling set was subdivided into calibration (negative 50 and positive 100, a total of 150 samples) and prediction sets (negative 50 and positive 100, a total of 150 samples) for 50 times. The number of SG smoothing points was ex- panded, while some modes of higher derivative were removed because of small absolute value, and a total of 264 smoothing modes were used for screening. The pairwise combinations of first three principal components were used, and then the optimal combination of principal components was selected according to the model effect. Based on all divisions of calibration and prediction sets and all SG smoothing modes, the SG-PCA-LDA and SG-PCA-HCA models were established, the model parameters were optimized based on the average prediction effect for all divisions to produce modeling stability. Finally, the model validation was performed by validation set. With SG smoothing, the modeling accuracy and stability of PCA-LDA, PCA-HCA were signif- icantly improved. For the optimal SG-PCA-LDA model, the recognition rate of positive and negative validation samples were 94.3%, 96.0%; and were 92.5%, 98.0% for the optimal SG-PCA-LDA model, respectively. Vis-NIR spectro- scopic pattern recognition combined with SG smoothing could be used for accurate recognition of transgenic sugarcane leaves, and provided a convenient screening method for transgenic sugarcane breeding.
CHRR: coordinate hit-and-run with rounding for uniform sampling of constraint-based models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Haraldsdóttir, Hulda S.; Cousins, Ben; Thiele, Ines
In constraint-based metabolic modelling, physical and biochemical constraints define a polyhedral convex set of feasible flux vectors. Uniform sampling of this set provides an unbiased characterization of the metabolic capabilities of a biochemical network. However, reliable uniform sampling of genome-scale biochemical networks is challenging due to their high dimensionality and inherent anisotropy. Here, we present an implementation of a new sampling algorithm, coordinate hit-and-run with rounding (CHRR). This algorithm is based on the provably efficient hit-and-run random walk and crucially uses a preprocessing step to round the anisotropic flux set. CHRR provably converges to a uniform stationary sampling distribution. Wemore » apply it to metabolic networks of increasing dimensionality. We show that it converges several times faster than a popular artificial centering hit-and-run algorithm, enabling reliable and tractable sampling of genome-scale biochemical networks.« less
CHRR: coordinate hit-and-run with rounding for uniform sampling of constraint-based models
Haraldsdóttir, Hulda S.; Cousins, Ben; Thiele, Ines; ...
2017-01-31
In constraint-based metabolic modelling, physical and biochemical constraints define a polyhedral convex set of feasible flux vectors. Uniform sampling of this set provides an unbiased characterization of the metabolic capabilities of a biochemical network. However, reliable uniform sampling of genome-scale biochemical networks is challenging due to their high dimensionality and inherent anisotropy. Here, we present an implementation of a new sampling algorithm, coordinate hit-and-run with rounding (CHRR). This algorithm is based on the provably efficient hit-and-run random walk and crucially uses a preprocessing step to round the anisotropic flux set. CHRR provably converges to a uniform stationary sampling distribution. Wemore » apply it to metabolic networks of increasing dimensionality. We show that it converges several times faster than a popular artificial centering hit-and-run algorithm, enabling reliable and tractable sampling of genome-scale biochemical networks.« less
A posteriori noise estimation in variable data sets. With applications to spectra and light curves
NASA Astrophysics Data System (ADS)
Czesla, S.; Molle, T.; Schmitt, J. H. M. M.
2018-01-01
Most physical data sets contain a stochastic contribution produced by measurement noise or other random sources along with the signal. Usually, neither the signal nor the noise are accurately known prior to the measurement so that both have to be estimated a posteriori. We have studied a procedure to estimate the standard deviation of the stochastic contribution assuming normality and independence, requiring a sufficiently well-sampled data set to yield reliable results. This procedure is based on estimating the standard deviation in a sample of weighted sums of arbitrarily sampled data points and is identical to the so-called DER_SNR algorithm for specific parameter settings. To demonstrate the applicability of our procedure, we present applications to synthetic data, high-resolution spectra, and a large sample of space-based light curves and, finally, give guidelines to apply the procedure in situation not explicitly considered here to promote its adoption in data analysis.
RandomSpot: A web-based tool for systematic random sampling of virtual slides.
Wright, Alexander I; Grabsch, Heike I; Treanor, Darren E
2015-01-01
This paper describes work presented at the Nordic Symposium on Digital Pathology 2014, Linköping, Sweden. Systematic random sampling (SRS) is a stereological tool, which provides a framework to quickly build an accurate estimation of the distribution of objects or classes within an image, whilst minimizing the number of observations required. RandomSpot is a web-based tool for SRS in stereology, which systematically places equidistant points within a given region of interest on a virtual slide. Each point can then be visually inspected by a pathologist in order to generate an unbiased sample of the distribution of classes within the tissue. Further measurements can then be derived from the distribution, such as the ratio of tumor to stroma. RandomSpot replicates the fundamental principle of traditional light microscope grid-shaped graticules, with the added benefits associated with virtual slides, such as facilitated collaboration and automated navigation between points. Once the sample points have been added to the region(s) of interest, users can download the annotations and view them locally using their virtual slide viewing software. Since its introduction, RandomSpot has been used extensively for international collaborative projects, clinical trials and independent research projects. So far, the system has been used to generate over 21,000 sample sets, and has been used to generate data for use in multiple publications, identifying significant new prognostic markers in colorectal, upper gastro-intestinal and breast cancer. Data generated using RandomSpot also has significant value for training image analysis algorithms using sample point coordinates and pathologist classifications.
Training Objectives, Transfer, Validation and Evaluation: A Sri Lankan Study
ERIC Educational Resources Information Center
Wickramasinghe, Vathsala M.
2006-01-01
Using a stratified random sample, this paper examines the training practices of setting objectives, transfer, validation and evaluation in Sri Lanka. The paper further sets out to compare those practices across local, foreign and joint-venture companies based on the assumption that there may be significant differences across companies of different…
Congruence of Standard Setting Methods for a Nursing Certification Examination.
ERIC Educational Resources Information Center
Fabrey, Lawrence J.; Raymond, Mark R.
The American Nurses' Association certification provides professional recognition beyond licensure to nurses who pass an examination. To determine the passing score as it would be set by a representative peer group, a survey was mailed to a random sample of 200 recently certified nurses. Three questions were asked: (1) what percentage of examinees…
Kahan, Brennan C
2016-12-13
Patient recruitment in clinical trials is often challenging, and as a result, many trials are stopped early due to insufficient recruitment. The re-randomization design allows patients to be re-enrolled and re-randomized for each new treatment episode that they experience. Because it allows multiple enrollments for each patient, this design has been proposed as a way to increase the recruitment rate in clinical trials. However, it is unknown to what extent recruitment could be increased in practice. We modelled the expected recruitment rate for parallel-group and re-randomization trials in different settings based on estimates from real trials and datasets. We considered three clinical areas: in vitro fertilization, severe asthma exacerbations, and acute sickle cell pain crises. We compared the two designs in terms of the expected time to complete recruitment, and the sample size recruited over a fixed recruitment period. Across the different scenarios we considered, we estimated that re-randomization could reduce the expected time to complete recruitment by between 4 and 22 months (relative reductions of 19% and 45%), or increase the sample size recruited over a fixed recruitment period by between 29% and 171%. Re-randomization can increase recruitment most for trials with a short follow-up period, a long trial recruitment duration, and patients with high rates of treatment episodes. Re-randomization has the potential to increase the recruitment rate in certain settings, and could lead to quicker and more efficient trials in these scenarios.
Sparsely sampling the sky: Regular vs. random sampling
NASA Astrophysics Data System (ADS)
Paykari, P.; Pires, S.; Starck, J.-L.; Jaffe, A. H.
2015-09-01
Aims: The next generation of galaxy surveys, aiming to observe millions of galaxies, are expensive both in time and money. This raises questions regarding the optimal investment of this time and money for future surveys. In a previous work, we have shown that a sparse sampling strategy could be a powerful substitute for the - usually favoured - contiguous observation of the sky. In our previous paper, regular sparse sampling was investigated, where the sparse observed patches were regularly distributed on the sky. The regularity of the mask introduces a periodic pattern in the window function, which induces periodic correlations at specific scales. Methods: In this paper, we use a Bayesian experimental design to investigate a "random" sparse sampling approach, where the observed patches are randomly distributed over the total sparsely sampled area. Results: We find that in this setting, the induced correlation is evenly distributed amongst all scales as there is no preferred scale in the window function. Conclusions: This is desirable when we are interested in any specific scale in the galaxy power spectrum, such as the matter-radiation equality scale. As the figure of merit shows, however, there is no preference between regular and random sampling to constrain the overall galaxy power spectrum and the cosmological parameters.
Heo, Moonseong; Litwin, Alain H; Blackstock, Oni; Kim, Namhee; Arnsten, Julia H
2017-02-01
We derived sample size formulae for detecting main effects in group-based randomized clinical trials with different levels of data hierarchy between experimental and control arms. Such designs are necessary when experimental interventions need to be administered to groups of subjects whereas control conditions need to be administered to individual subjects. This type of trial, often referred to as a partially nested or partially clustered design, has been implemented for management of chronic diseases such as diabetes and is beginning to emerge more commonly in wider clinical settings. Depending on the research setting, the level of hierarchy of data structure for the experimental arm can be three or two, whereas that for the control arm is two or one. Such different levels of data hierarchy assume correlation structures of outcomes that are different between arms, regardless of whether research settings require two or three level data structure for the experimental arm. Therefore, the different correlations should be taken into account for statistical modeling and for sample size determinations. To this end, we considered mixed-effects linear models with different correlation structures between experimental and control arms to theoretically derive and empirically validate the sample size formulae with simulation studies.
Generation of kth-order random toposequences
NASA Astrophysics Data System (ADS)
Odgers, Nathan P.; McBratney, Alex. B.; Minasny, Budiman
2008-05-01
The model presented in this paper derives toposequences from a digital elevation model (DEM). It is written in ArcInfo Macro Language (AML). The toposequences are called kth-order random toposequences, because they take a random path uphill to the top of a hill and downhill to a stream or valley bottom from a randomly selected seed point, and they are located in a streamshed of order k according to a particular stream-ordering system. We define a kth-order streamshed as the area of land that drains directly to a stream segment of stream order k. The model attempts to optimise the spatial configuration of a set of derived toposequences iteratively by using simulated annealing to maximise the total sum of distances between each toposequence hilltop in the set. The user is able to select the order, k, of the derived toposequences. Toposequences are useful for determining soil sampling locations for use in collecting soil data for digital soil mapping applications. Sampling locations can be allocated according to equal elevation or equal-distance intervals along the length of the toposequence, for example. We demonstrate the use of this model for a study area in the Hunter Valley of New South Wales, Australia. Of the 64 toposequences derived, 32 were first-order random toposequences according to Strahler's stream-ordering system, and 32 were second-order random toposequences. The model that we present in this paper is an efficient method for sampling soil along soil toposequences. The soils along a toposequence are related to each other by the topography they are found in, so soil data collected by this method is useful for establishing soil-landscape rules for the preparation of digital soil maps.
SNP selection and classification of genome-wide SNP data using stratified sampling random forests.
Wu, Qingyao; Ye, Yunming; Liu, Yang; Ng, Michael K
2012-09-01
For high dimensional genome-wide association (GWA) case-control data of complex disease, there are usually a large portion of single-nucleotide polymorphisms (SNPs) that are irrelevant with the disease. A simple random sampling method in random forest using default mtry parameter to choose feature subspace, will select too many subspaces without informative SNPs. Exhaustive searching an optimal mtry is often required in order to include useful and relevant SNPs and get rid of vast of non-informative SNPs. However, it is too time-consuming and not favorable in GWA for high-dimensional data. The main aim of this paper is to propose a stratified sampling method for feature subspace selection to generate decision trees in a random forest for GWA high-dimensional data. Our idea is to design an equal-width discretization scheme for informativeness to divide SNPs into multiple groups. In feature subspace selection, we randomly select the same number of SNPs from each group and combine them to form a subspace to generate a decision tree. The advantage of this stratified sampling procedure can make sure each subspace contains enough useful SNPs, but can avoid a very high computational cost of exhaustive search of an optimal mtry, and maintain the randomness of a random forest. We employ two genome-wide SNP data sets (Parkinson case-control data comprised of 408 803 SNPs and Alzheimer case-control data comprised of 380 157 SNPs) to demonstrate that the proposed stratified sampling method is effective, and it can generate better random forest with higher accuracy and lower error bound than those by Breiman's random forest generation method. For Parkinson data, we also show some interesting genes identified by the method, which may be associated with neurological disorders for further biological investigations.
Characterization of Friction Joints Subjected to High Levels of Random Vibration
NASA Technical Reports Server (NTRS)
deSantos, Omar; MacNeal, Paul
2012-01-01
This paper describes the test program in detail including test sample description, test procedures, and vibration test results of multiple test samples. The material pairs used in the experiment were Aluminum-Aluminum, Aluminum- Dicronite coated Aluminum, and Aluminum-Plasmadize coated Aluminum. Levels of vibration for each set of twelve samples of each material pairing were gradually increased until all samples experienced substantial displacement. Data was collected on 1) acceleration in all three axes, 2) relative static displacement between vibration runs utilizing photogrammetry techniques, and 3) surface galling and contaminant generation. This data was used to estimate the values of static friction during random vibratory motion when "stick-slip" occurs and compare these to static friction coefficients measured before and after vibration testing.
Yura, Harold T; Hanson, Steen G
2012-04-01
Methods for simulation of two-dimensional signals with arbitrary power spectral densities and signal amplitude probability density functions are disclosed. The method relies on initially transforming a white noise sample set of random Gaussian distributed numbers into a corresponding set with the desired spectral distribution, after which this colored Gaussian probability distribution is transformed via an inverse transform into the desired probability distribution. In most cases the method provides satisfactory results and can thus be considered an engineering approach. Several illustrative examples with relevance for optics are given.
Whitebird, Robin R; Bliss, Donna Zimmaro; Savik, Kay; Lowry, Ann; Jung, Hans-Joachim G
2010-12-01
Recruitment of participants to clinical trials remains a significant challenge, especially for research addressing topics of a sensitive nature such as fecal incontinence (FI). In the Fiber Study, a randomized controlled trial on symptom management for FI, we successfully enrolled 189 community-living adults through collaborations with specialty-based and community-based settings, each employing methods tailored to the organizational characteristics of their site. Results show that using the two settings increased racial and ethnic diversity of the sample and inclusion of informal caregivers. There were no differential effects on enrollment, final eligibility, or completion of protocol by site. Strategic collaborations with complementary sites can achieve sample recruitment goals for clinical trials on topics that are sensitive or known to be underreported. Copyright © 2010 Wiley Periodicals, Inc.
ERIC Educational Resources Information Center
Hanson, Alan L.
1989-01-01
A random sample of 661 U.S. and Canadian pharmacists (38 percent response) identified characteristics of pharmacy continuing education (CE) program clientele that might assist in marketing these programs. Attitude toward CE was related to sex, age, practice setting, and source of CE. Practice setting was of most value in targeting a potential…
Using Empirical Data to Set Cutoff Scores.
ERIC Educational Resources Information Center
Hills, John R.
Six experimental approaches to the problems of setting cutoff scores and choosing proper test length are briefly mentioned. Most of these methods share the premise that a test is a random sample of items, from a domain associated with a carefully specified objective. Each item is independent and is scored zero or one, with no provision for…
NASA Astrophysics Data System (ADS)
Lazarina, Maria; Kallimanis, Athanasios S.; Pantis, John D.; Sgardelis, Stefanos P.
2014-11-01
The species-area relationship (SAR) is one of the few generalizations in ecology. However, many different relationships are denoted as SARs. Here, we empirically evaluated the differences between SARs derived from nested-contiguous and non-contiguous sampling designs, using plants, birds and butterflies datasets from Great Britain, Greece, Massachusetts, New York and San Diego. The shape of SAR depends on the sampling scheme, but there is little empirical documentation on the magnitude of the deviation between different types of SARs and the factors affecting it. We implemented a strictly nested sampling design to construct nested-contiguous SAR (SACR), and systematic nested but non-contiguous, and random designs to construct non-contiguous species richness curves (SASRs for systematic and SACs for random designs) per dataset. The SACR lay below any SASR and most of the SACs. The deviation between them was related to the exponent f of the power law relationship between sampled area and extent. The lower the exponent f, the higher was the deviation between the curves. We linked SACR to SASR and SAC through the concept of "effective" area (Ae), i.e. the nested-contiguous area containing equal number of species with the accumulated sampled area (AS) of a non-contiguous sampling. The relationship between effective and sampled area was modeled as log(Ae) = klog(AS). A Generalized Linear Model was used to estimate the values of k from sampling design and dataset properties. The parameter k increased with the average distance between samples and with beta diversity, while k decreased with f. For both systematic and random sampling, the model performed well in predicting effective area in both the training set and in the test set which was totally independent from the training one. Through effective area, we can link different types of species richness curves based on sampling design properties, sampling effort, spatial scale and beta diversity patterns.
Decision tree modeling using R.
Zhang, Zhongheng
2016-08-01
In machine learning field, decision tree learner is powerful and easy to interpret. It employs recursive binary partitioning algorithm that splits the sample in partitioning variable with the strongest association with the response variable. The process continues until some stopping criteria are met. In the example I focus on conditional inference tree, which incorporates tree-structured regression models into conditional inference procedures. While growing a single tree is subject to small changes in the training data, random forests procedure is introduced to address this problem. The sources of diversity for random forests come from the random sampling and restricted set of input variables to be selected. Finally, I introduce R functions to perform model based recursive partitioning. This method incorporates recursive partitioning into conventional parametric model building.
Species undersampling in tropical bat surveys: effects on emerging biodiversity patterns.
Meyer, Christoph F J; Aguiar, Ludmilla M S; Aguirre, Luis F; Baumgarten, Julio; Clarke, Frank M; Cosson, Jean-François; Estrada Villegas, Sergio; Fahr, Jakob; Faria, Deborah; Furey, Neil; Henry, Mickaël; Jenkins, Richard K B; Kunz, Thomas H; Cristina MacSwiney González, M; Moya, Isabel; Pons, Jean-Marc; Racey, Paul A; Rex, Katja; Sampaio, Erica M; Stoner, Kathryn E; Voigt, Christian C; von Staden, Dietrich; Weise, Christa D; Kalko, Elisabeth K V
2015-01-01
Undersampling is commonplace in biodiversity surveys of species-rich tropical assemblages in which rare taxa abound, with possible repercussions for our ability to implement surveys and monitoring programmes in a cost-effective way. We investigated the consequences of information loss due to species undersampling (missing subsets of species from the full species pool) in tropical bat surveys for the emerging patterns of species richness (SR) and compositional variation across sites. For 27 bat assemblage data sets from across the tropics, we used correlations between original data sets and subsets with different numbers of species deleted either at random, or according to their rarity in the assemblage, to assess to what extent patterns in SR and composition in data subsets are congruent with those in the initial data set. We then examined to what degree high sample representativeness (r ≥ 0·8) was influenced by biogeographic region, sampling method, sampling effort or structural assemblage characteristics. For SR, correlations between random subsets and original data sets were strong (r ≥ 0·8) with moderate (ca. 20%) species loss. Bias associated with information loss was greater for species composition; on average ca. 90% of species in random subsets had to be retained to adequately capture among-site variation. For nonrandom subsets, removing only the rarest species (on average c. 10% of the full data set) yielded strong correlations (r > 0·95) for both SR and composition. Eliminating greater proportions of rare species resulted in weaker correlations and large variation in the magnitude of observed correlations among data sets. Species subsets that comprised ca. 85% of the original set can be considered reliable surrogates, capable of adequately revealing patterns of SR and temporal or spatial turnover in many tropical bat assemblages. Our analyses thus demonstrate the potential as well as limitations for reducing survey effort and streamlining sampling protocols, and consequently for increasing the cost-effectiveness in tropical bat surveys or monitoring programmes. The dependence of the performance of species subsets on structural assemblage characteristics (total assemblage abundance, proportion of rare species), however, underscores the importance of adaptive monitoring schemes and of establishing surrogate performance on a site by site basis based on pilot surveys. © 2014 The Authors. Journal of Animal Ecology © 2014 British Ecological Society.
Estimation of reference intervals from small samples: an example using canine plasma creatinine.
Geffré, A; Braun, J P; Trumel, C; Concordet, D
2009-12-01
According to international recommendations, reference intervals should be determined from at least 120 reference individuals, which often are impossible to achieve in veterinary clinical pathology, especially for wild animals. When only a small number of reference subjects is available, the possible bias cannot be known and the normality of the distribution cannot be evaluated. A comparison of reference intervals estimated by different methods could be helpful. The purpose of this study was to compare reference limits determined from a large set of canine plasma creatinine reference values, and large subsets of this data, with estimates obtained from small samples selected randomly. Twenty sets each of 120 and 27 samples were randomly selected from a set of 1439 plasma creatinine results obtained from healthy dogs in another study. Reference intervals for the whole sample and for the large samples were determined by a nonparametric method. The estimated reference limits for the small samples were minimum and maximum, mean +/- 2 SD of native and Box-Cox-transformed values, 2.5th and 97.5th percentiles by a robust method on native and Box-Cox-transformed values, and estimates from diagrams of cumulative distribution functions. The whole sample had a heavily skewed distribution, which approached Gaussian after Box-Cox transformation. The reference limits estimated from small samples were highly variable. The closest estimates to the 1439-result reference interval for 27-result subsamples were obtained by both parametric and robust methods after Box-Cox transformation but were grossly erroneous in some cases. For small samples, it is recommended that all values be reported graphically in a dot plot or histogram and that estimates of the reference limits be compared using different methods.
Ogawa, Tatsuya; Omon, Kyohei; Yuda, Tomohisa; Ishigaki, Tomoya; Imai, Ryota; Ohmatsu, Satoko; Morioka, Shu
2016-01-01
Objective: To investigate the short-term effects of the life goal concept on subjective well-being and treatment engagement, and to determine the sample size required for a larger trial. Design: A quasi-randomized controlled trial that was not blinded. Setting: A subacute rehabilitation ward. Subjects: A total of 66 patients were randomized to a goal-setting intervention group with the life goal concept (Life Goal), a standard rehabilitation group with no goal-setting intervention (Control 1), or a goal-setting intervention group without the life goal concept (Control 2). Interventions: The goal-setting intervention in the Life Goal and Control 2 was Goal Attainment Scaling. The Life Goal patients were assessed in terms of their life goals, and the hierarchy of goals was explained. The intervention duration was four weeks. Main measures: Patients were assessed pre- and post-intervention. The outcome measures were the Hospital Anxiety and Depression Scale, 12-item General Health Questionnaire, Pittsburgh Rehabilitation Participation Scale, and Functional Independence Measure. Results: Of the 296 potential participants, 66 were enrolled; Life Goal (n = 22), Control 1 (n = 22) and Control 2 (n = 22). Anxiety was significantly lower in the Life Goal (4.1 ±3.0) than in Control 1 (6.7 ±3.4), but treatment engagement was significantly higher in the Life Goal (5.3 ±0.4) compared with both the Control 1 (4.8 ±0.6) and Control 2 (4.9 ±0.5). Conclusions: The life goal concept had a short-term effect on treatment engagement. A sample of 31 patients per group would be required for a fully powered clinical trial. PMID:27496700
Model's sparse representation based on reduced mixed GMsFE basis methods
NASA Astrophysics Data System (ADS)
Jiang, Lijian; Li, Qiuqi
2017-06-01
In this paper, we propose a model's sparse representation based on reduced mixed generalized multiscale finite element (GMsFE) basis methods for elliptic PDEs with random inputs. A typical application for the elliptic PDEs is the flow in heterogeneous random porous media. Mixed generalized multiscale finite element method (GMsFEM) is one of the accurate and efficient approaches to solve the flow problem in a coarse grid and obtain the velocity with local mass conservation. When the inputs of the PDEs are parameterized by the random variables, the GMsFE basis functions usually depend on the random parameters. This leads to a large number degree of freedoms for the mixed GMsFEM and substantially impacts on the computation efficiency. In order to overcome the difficulty, we develop reduced mixed GMsFE basis methods such that the multiscale basis functions are independent of the random parameters and span a low-dimensional space. To this end, a greedy algorithm is used to find a set of optimal samples from a training set scattered in the parameter space. Reduced mixed GMsFE basis functions are constructed based on the optimal samples using two optimal sampling strategies: basis-oriented cross-validation and proper orthogonal decomposition. Although the dimension of the space spanned by the reduced mixed GMsFE basis functions is much smaller than the dimension of the original full order model, the online computation still depends on the number of coarse degree of freedoms. To significantly improve the online computation, we integrate the reduced mixed GMsFE basis methods with sparse tensor approximation and obtain a sparse representation for the model's outputs. The sparse representation is very efficient for evaluating the model's outputs for many instances of parameters. To illustrate the efficacy of the proposed methods, we present a few numerical examples for elliptic PDEs with multiscale and random inputs. In particular, a two-phase flow model in random porous media is simulated by the proposed sparse representation method.
Model's sparse representation based on reduced mixed GMsFE basis methods
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jiang, Lijian, E-mail: ljjiang@hnu.edu.cn; Li, Qiuqi, E-mail: qiuqili@hnu.edu.cn
2017-06-01
In this paper, we propose a model's sparse representation based on reduced mixed generalized multiscale finite element (GMsFE) basis methods for elliptic PDEs with random inputs. A typical application for the elliptic PDEs is the flow in heterogeneous random porous media. Mixed generalized multiscale finite element method (GMsFEM) is one of the accurate and efficient approaches to solve the flow problem in a coarse grid and obtain the velocity with local mass conservation. When the inputs of the PDEs are parameterized by the random variables, the GMsFE basis functions usually depend on the random parameters. This leads to a largemore » number degree of freedoms for the mixed GMsFEM and substantially impacts on the computation efficiency. In order to overcome the difficulty, we develop reduced mixed GMsFE basis methods such that the multiscale basis functions are independent of the random parameters and span a low-dimensional space. To this end, a greedy algorithm is used to find a set of optimal samples from a training set scattered in the parameter space. Reduced mixed GMsFE basis functions are constructed based on the optimal samples using two optimal sampling strategies: basis-oriented cross-validation and proper orthogonal decomposition. Although the dimension of the space spanned by the reduced mixed GMsFE basis functions is much smaller than the dimension of the original full order model, the online computation still depends on the number of coarse degree of freedoms. To significantly improve the online computation, we integrate the reduced mixed GMsFE basis methods with sparse tensor approximation and obtain a sparse representation for the model's outputs. The sparse representation is very efficient for evaluating the model's outputs for many instances of parameters. To illustrate the efficacy of the proposed methods, we present a few numerical examples for elliptic PDEs with multiscale and random inputs. In particular, a two-phase flow model in random porous media is simulated by the proposed sparse representation method.« less
Jackknifing Techniques for Evaluation of Equating Accuracy. Research Report. ETS RR-09-39
ERIC Educational Resources Information Center
Haberman, Shelby J.; Lee, Yi-Hsuan; Qian, Jiahe
2009-01-01
Grouped jackknifing may be used to evaluate the stability of equating procedures with respect to sampling error and with respect to changes in anchor selection. Properties of grouped jackknifing are reviewed for simple-random and stratified sampling, and its use is described for comparisons of anchor sets. Application is made to examples of item…
The Relationship between Emotional Intelligence and Problem Solving Skills in Prospective Teachers
ERIC Educational Resources Information Center
Deniz, Sabahattin
2013-01-01
This study aims to investigate the relationship between emotional intelligence and problem solving. The sample set of the research was taken from the Faculty of Education of Mugla University by the random sampling method. The participants were 386 students--prospective teachers--(224 females; 182 males) who took part in the study voluntarily.…
Use of Nutritional Information in Canada: National Trends between 2004 and 2008
ERIC Educational Resources Information Center
Goodman, Samantha; Hammond, David; Pillo-Blocka, Francy; Glanville, Theresa; Jenkins, Richard
2011-01-01
Objective: To examine longitudinal trends in use of nutrition information among Canadians. Design: Population-based telephone and Internet surveys. Setting and Participants: Representative samples of Canadian adults recruited with random-digit dialing sampling in 2004 (n = 2,405) and 2006 (n = 2,014) and an online commercial panel in 2008 (n =…
ERIC Educational Resources Information Center
Oketch, Moses O.
2009-01-01
This article examines how recent changes, leading to a diversified supply in Kenya's university education system, is reflected in prospective students' aspirations, perceptions and preferences to undertake university education. The results, based on a combination of a convenience and snowball sampling of settings, within which random samples of…
Zhao, Wenle; Weng, Yanqiu; Wu, Qi; Palesch, Yuko
2012-01-01
To evaluate the performance of randomization designs under various parameter settings and trial sample sizes, and identify optimal designs with respect to both treatment imbalance and allocation randomness, we evaluate 260 design scenarios from 14 randomization designs under 15 sample sizes range from 10 to 300, using three measures for imbalance and three measures for randomness. The maximum absolute imbalance and the correct guess (CG) probability are selected to assess the trade-off performance of each randomization design. As measured by the maximum absolute imbalance and the CG probability, we found that performances of the 14 randomization designs are located in a closed region with the upper boundary (worst case) given by Efron's biased coin design (BCD) and the lower boundary (best case) from the Soares and Wu's big stick design (BSD). Designs close to the lower boundary provide a smaller imbalance and a higher randomness than designs close to the upper boundary. Our research suggested that optimization of randomization design is possible based on quantified evaluation of imbalance and randomness. Based on the maximum imbalance and CG probability, the BSD, Chen's biased coin design with imbalance tolerance method, and Chen's Ehrenfest urn design perform better than popularly used permuted block design, EBCD, and Wei's urn design. Copyright © 2011 John Wiley & Sons, Ltd.
ERIC Educational Resources Information Center
Tanglang, Nebath; Ibrahim, Aminu Kazeem
2015-01-01
The study adopted an ex-post facto research design. Randomization sampling technique was used to select 346 undergraduate distance learners and the learners were grouped into four, High and Low Goal setter learners and High and Low Decision-making skills learners. The instruments for data collection were Undergraduate Academic Goal Setting Scale…
Housworth, E A; Martins, E P
2001-01-01
Statistical randomization tests in evolutionary biology often require a set of random, computer-generated trees. For example, earlier studies have shown how large numbers of computer-generated trees can be used to conduct phylogenetic comparative analyses even when the phylogeny is uncertain or unknown. These methods were limited, however, in that (in the absence of molecular sequence or other data) they allowed users to assume that no phylogenetic information was available or that all possible trees were known. Intermediate situations where only a taxonomy or other limited phylogenetic information (e.g., polytomies) are available are technically more difficult. The current study describes a procedure for generating random samples of phylogenies while incorporating limited phylogenetic information (e.g., four taxa belong together in a subclade). The procedure can be used to conduct comparative analyses when the phylogeny is only partially resolved or can be used in other randomization tests in which large numbers of possible phylogenies are needed.
Noise-enhanced convolutional neural networks.
Audhkhasi, Kartik; Osoba, Osonde; Kosko, Bart
2016-06-01
Injecting carefully chosen noise can speed convergence in the backpropagation training of a convolutional neural network (CNN). The Noisy CNN algorithm speeds training on average because the backpropagation algorithm is a special case of the generalized expectation-maximization (EM) algorithm and because such carefully chosen noise always speeds up the EM algorithm on average. The CNN framework gives a practical way to learn and recognize images because backpropagation scales with training data. It has only linear time complexity in the number of training samples. The Noisy CNN algorithm finds a special separating hyperplane in the network's noise space. The hyperplane arises from the likelihood-based positivity condition that noise-boosts the EM algorithm. The hyperplane cuts through a uniform-noise hypercube or Gaussian ball in the noise space depending on the type of noise used. Noise chosen from above the hyperplane speeds training on average. Noise chosen from below slows it on average. The algorithm can inject noise anywhere in the multilayered network. Adding noise to the output neurons reduced the average per-iteration training-set cross entropy by 39% on a standard MNIST image test set of handwritten digits. It also reduced the average per-iteration training-set classification error by 47%. Adding noise to the hidden layers can also reduce these performance measures. The noise benefit is most pronounced for smaller data sets because the largest EM hill-climbing gains tend to occur in the first few iterations. This noise effect can assist random sampling from large data sets because it allows a smaller random sample to give the same or better performance than a noiseless sample gives. Copyright © 2015 Elsevier Ltd. All rights reserved.
Multipartite nonlocality and random measurements
NASA Astrophysics Data System (ADS)
de Rosier, Anna; Gruca, Jacek; Parisio, Fernando; Vértesi, Tamás; Laskowski, Wiesław
2017-07-01
We present an exhaustive numerical analysis of violations of local realism by families of multipartite quantum states. As an indicator of nonclassicality we employ the probability of violation for randomly sampled observables. Surprisingly, it rapidly increases with the number of parties or settings and even for relatively small values local realism is violated for almost all observables. We have observed this effect to be typical in the sense that it emerged for all investigated states including some with randomly drawn coefficients. We also present the probability of violation as a witness of genuine multipartite entanglement.
An active learning representative subset selection method using net analyte signal.
He, Zhonghai; Ma, Zhenhe; Luan, Jingmin; Cai, Xi
2018-05-05
To guarantee accurate predictions, representative samples are needed when building a calibration model for spectroscopic measurements. However, in general, it is not known whether a sample is representative prior to measuring its concentration, which is both time-consuming and expensive. In this paper, a method to determine whether a sample should be selected into a calibration set is presented. The selection is based on the difference of Euclidean norm of net analyte signal (NAS) vector between the candidate and existing samples. First, the concentrations and spectra of a group of samples are used to compute the projection matrix, NAS vector, and scalar values. Next, the NAS vectors of candidate samples are computed by multiplying projection matrix with spectra of samples. Scalar value of NAS is obtained by norm computation. The distance between the candidate set and the selected set is computed, and samples with the largest distance are added to selected set sequentially. Last, the concentration of the analyte is measured such that the sample can be used as a calibration sample. Using a validation test, it is shown that the presented method is more efficient than random selection. As a result, the amount of time and money spent on reference measurements is greatly reduced. Copyright © 2018 Elsevier B.V. All rights reserved.
An active learning representative subset selection method using net analyte signal
NASA Astrophysics Data System (ADS)
He, Zhonghai; Ma, Zhenhe; Luan, Jingmin; Cai, Xi
2018-05-01
To guarantee accurate predictions, representative samples are needed when building a calibration model for spectroscopic measurements. However, in general, it is not known whether a sample is representative prior to measuring its concentration, which is both time-consuming and expensive. In this paper, a method to determine whether a sample should be selected into a calibration set is presented. The selection is based on the difference of Euclidean norm of net analyte signal (NAS) vector between the candidate and existing samples. First, the concentrations and spectra of a group of samples are used to compute the projection matrix, NAS vector, and scalar values. Next, the NAS vectors of candidate samples are computed by multiplying projection matrix with spectra of samples. Scalar value of NAS is obtained by norm computation. The distance between the candidate set and the selected set is computed, and samples with the largest distance are added to selected set sequentially. Last, the concentration of the analyte is measured such that the sample can be used as a calibration sample. Using a validation test, it is shown that the presented method is more efficient than random selection. As a result, the amount of time and money spent on reference measurements is greatly reduced.
Lofwall, Michelle R; Nuzzo, Paul A; Campbell, Charles; Walsh, Sharon L
2014-06-01
Aripiprazole is a partial agonist at dopamine (D2) and serotonin (5-HT1a) receptors and 5-HT2 antagonist. Because cocaine affects dopamine and serotonin, this study assessed whether aripiprazole could diminish the reinforcing efficacy of cocaine. Secondary aims evaluated aripiprazole on ad lib cigarette smoking and with a novel 40-hr smoking abstinence procedure. Adults with regular cocaine and cigarette use completed this inpatient double blind, randomized, placebo-controlled mixed-design study. A placebo lead-in was followed by randomization to aripiprazole (0, 2 or 10 mg/day/p.o.; n = 7 completed/group). Three sets of test sessions, each consisting of 3 cocaine sample-choice (i.e., self-administration) sessions and 1 dose-response session, were conducted (once during the lead-in and twice after randomization). Sample sessions tested each cocaine dose (0, 20 and 40 mg/70 kg, i.v.) in random order; subjective, observer-rated and physiologic outcomes were collected. Later that day, participants chose between the morning's sample dose or descending amounts of money over 7 trials. In dose response sessions, all doses were given 1 hr apart in ascending order for pharmacodynamic and pharmacokinetic assessment. Two sets of smoking topography sessions were conducted during the lead-in and after randomization; 1 with and 1 without 40 hr of smoking abstinence. Number of ad lib cigarettes smoked during non-session days was collected. Cocaine produced prototypic effects, but aripiprazole did not significantly alter these effects or smoking outcomes. The smoking abstinence procedure reliably produced nicotine withdrawal and craving and increased smoking modestly. These data do not support further investigation of aripiprazole for cocaine or tobacco use disorder treatment. PsycINFO Database Record (c) 2014 APA, all rights reserved.
Naugle, Alecia Larew; Barlow, Kristina E; Eblen, Denise R; Teter, Vanessa; Umholtz, Robert
2006-11-01
The U.S. Food Safety and Inspection Service (FSIS) tests sets of samples of selected raw meat and poultry products for Salmonella to ensure that federally inspected establishments meet performance standards defined in the pathogen reduction-hazard analysis and critical control point system (PR-HACCP) final rule. In the present report, sample set results are described and associations between set failure and set and establishment characteristics are identified for 4,607 sample sets collected from 1998 through 2003. Sample sets were obtained from seven product classes: broiler chicken carcasses (n = 1,010), cow and bull carcasses (n = 240), market hog carcasses (n = 560), steer and heifer carcasses (n = 123), ground beef (n = 2,527), ground chicken (n = 31), and ground turkey (n = 116). Of these 4,607 sample sets, 92% (4,255) were collected as part of random testing efforts (A sets), and 93% (4,166) passed. However, the percentage of positive samples relative to the maximum number of positive results allowable in a set increased over time for broilers but decreased or stayed the same for the other product classes. Three factors associated with set failure were identified: establishment size, product class, and year. Set failures were more likely early in the testing program (relative to 2003). Small and very small establishments were more likely to fail than large ones. Set failure was less likely in ground beef than in other product classes. Despite an overall decline in set failures through 2003, these results highlight the need for continued vigilance to reduce Salmonella contamination in broiler chicken and continued implementation of programs designed to assist small and very small establishments with PR-HACCP compliance issues.
Knacker, T; Schallnaß, H J; Klaschka, U; Ahlers, J
1995-11-01
The criteria for classification and labelling of substances as "dangerous for the environment" agreed upon within the European Union (EU) were applied to two sets of existing chemicals. One set (sample A) consisted of 41 randomly selected compounds listed in the European Inventory of Existing Chemical Substances (EINECS). The other set (sample B) comprised 115 substances listed in Annex I of Directive 67/548/EEC which were classified by the EU Working Group on Classification and Labelling of Existing Chemicals. The aquatic toxicity (fish mortality,Daphnia immobilisation, algal growth inhibition), ready biodegradability and n-octanol/water partition coefficient were measured for sample A by one and the same laboratory. For sample B, the available ecotoxicological data originated from many different sources and therefore was rather heterogeneous. In both samples, algal toxicity was the most sensitive effect parameter for most substances. Furthermore, it was found that, classification based on a single aquatic test result differs in many cases from classification based on a complete data set, although a correlation exists between the biological end-points of the aquatic toxicity test systems.
Exploring Sampling in the Detection of Multicategory EEG Signals
Siuly, Siuly; Kabir, Enamul; Wang, Hua; Zhang, Yanchun
2015-01-01
The paper presents a structure based on samplings and machine leaning techniques for the detection of multicategory EEG signals where random sampling (RS) and optimal allocation sampling (OS) are explored. In the proposed framework, before using the RS and OS scheme, the entire EEG signals of each class are partitioned into several groups based on a particular time period. The RS and OS schemes are used in order to have representative observations from each group of each category of EEG data. Then all of the selected samples by the RS from the groups of each category are combined in a one set named RS set. In the similar way, for the OS scheme, an OS set is obtained. Then eleven statistical features are extracted from the RS and OS set, separately. Finally this study employs three well-known classifiers: k-nearest neighbor (k-NN), multinomial logistic regression with a ridge estimator (MLR), and support vector machine (SVM) to evaluate the performance for the RS and OS feature set. The experimental outcomes demonstrate that the RS scheme well represents the EEG signals and the k-NN with the RS is the optimum choice for detection of multicategory EEG signals. PMID:25977705
Susukida, Ryoko; Crum, Rosa M; Ebnesajjad, Cyrus; Stuart, Elizabeth A; Mojtabai, Ramin
2017-07-01
To compare randomized controlled trial (RCT) sample treatment effects with the population effects of substance use disorder (SUD) treatment. Statistical weighting was used to re-compute the effects from 10 RCTs such that the participants in the trials had characteristics that resembled those of patients in the target populations. Multi-site RCTs and usual SUD treatment settings in the United States. A total of 3592 patients in 10 RCTs and 1 602 226 patients from usual SUD treatment settings between 2001 and 2009. Three outcomes of SUD treatment were examined: retention, urine toxicology and abstinence. We weighted the RCT sample treatment effects using propensity scores representing the conditional probability of participating in RCTs. Weighting the samples changed the significance of estimated sample treatment effects. Most commonly, positive effects of trials became statistically non-significant after weighting (three trials for retention and urine toxicology and one trial for abstinence); also, non-significant effects became significantly positive (one trial for abstinence) and significantly negative effects became non-significant (two trials for abstinence). There was suggestive evidence of treatment effect heterogeneity in subgroups that are under- or over-represented in the trials, some of which were consistent with the differences in average treatment effects between weighted and unweighted results. The findings of randomized controlled trials (RCTs) for substance use disorder treatment do not appear to be directly generalizable to target populations when the RCT samples do not reflect adequately the target populations and there is treatment effect heterogeneity across patient subgroups. © 2017 Society for the Study of Addiction.
Dong, Qi; Elliott, Michael R; Raghunathan, Trivellore E
2014-06-01
Outside of the survey sampling literature, samples are often assumed to be generated by a simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs.
Dong, Qi; Elliott, Michael R.; Raghunathan, Trivellore E.
2017-01-01
Outside of the survey sampling literature, samples are often assumed to be generated by a simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs. PMID:29200608
NASA Astrophysics Data System (ADS)
Deng, Chengbin; Wu, Changshan
2013-12-01
Urban impervious surface information is essential for urban and environmental applications at the regional/national scales. As a popular image processing technique, spectral mixture analysis (SMA) has rarely been applied to coarse-resolution imagery due to the difficulty of deriving endmember spectra using traditional endmember selection methods, particularly within heterogeneous urban environments. To address this problem, we derived endmember signatures through a least squares solution (LSS) technique with known abundances of sample pixels, and integrated these endmember signatures into SMA for mapping large-scale impervious surface fraction. In addition, with the same sample set, we carried out objective comparative analyses among SMA (i.e. fully constrained and unconstrained SMA) and machine learning (i.e. Cubist regression tree and Random Forests) techniques. Analysis of results suggests three major conclusions. First, with the extrapolated endmember spectra from stratified random training samples, the SMA approaches performed relatively well, as indicated by small MAE values. Second, Random Forests yields more reliable results than Cubist regression tree, and its accuracy is improved with increased sample sizes. Finally, comparative analyses suggest a tentative guide for selecting an optimal approach for large-scale fractional imperviousness estimation: unconstrained SMA might be a favorable option with a small number of samples, while Random Forests might be preferred if a large number of samples are available.
ERIC Educational Resources Information Center
Green, Samuel B.; Thompson, Marilyn S.; Levy, Roy; Lo, Wen-Juo
2015-01-01
Traditional parallel analysis (T-PA) estimates the number of factors by sequentially comparing sample eigenvalues with eigenvalues for randomly generated data. Revised parallel analysis (R-PA) sequentially compares the "k"th eigenvalue for sample data to the "k"th eigenvalue for generated data sets, conditioned on"k"-…
NASA Technical Reports Server (NTRS)
Chapman, G. M. (Principal Investigator); Carnes, J. G.
1981-01-01
Several techniques which use clusters generated by a new clustering algorithm, CLASSY, are proposed as alternatives to random sampling to obtain greater precision in crop proportion estimation: (1) Proportional Allocation/relative count estimator (PA/RCE) uses proportional allocation of dots to clusters on the basis of cluster size and a relative count cluster level estimate; (2) Proportional Allocation/Bayes Estimator (PA/BE) uses proportional allocation of dots to clusters and a Bayesian cluster-level estimate; and (3) Bayes Sequential Allocation/Bayesian Estimator (BSA/BE) uses sequential allocation of dots to clusters and a Bayesian cluster level estimate. Clustering in an effective method in making proportion estimates. It is estimated that, to obtain the same precision with random sampling as obtained by the proportional sampling of 50 dots with an unbiased estimator, samples of 85 or 166 would need to be taken if dot sets with AI labels (integrated procedure) or ground truth labels, respectively were input. Dot reallocation provides dot sets that are unbiased. It is recommended that these proportion estimation techniques are maintained, particularly the PA/BE because it provides the greatest precision.
Kaspi, Omer; Yosipof, Abraham; Senderowitz, Hanoch
2017-06-06
An important aspect of chemoinformatics and material-informatics is the usage of machine learning algorithms to build Quantitative Structure Activity Relationship (QSAR) models. The RANdom SAmple Consensus (RANSAC) algorithm is a predictive modeling tool widely used in the image processing field for cleaning datasets from noise. RANSAC could be used as a "one stop shop" algorithm for developing and validating QSAR models, performing outlier removal, descriptors selection, model development and predictions for test set samples using applicability domain. For "future" predictions (i.e., for samples not included in the original test set) RANSAC provides a statistical estimate for the probability of obtaining reliable predictions, i.e., predictions within a pre-defined number of standard deviations from the true values. In this work we describe the first application of RNASAC in material informatics, focusing on the analysis of solar cells. We demonstrate that for three datasets representing different metal oxide (MO) based solar cell libraries RANSAC-derived models select descriptors previously shown to correlate with key photovoltaic properties and lead to good predictive statistics for these properties. These models were subsequently used to predict the properties of virtual solar cells libraries highlighting interesting dependencies of PV properties on MO compositions.
Harrison, Rosamund; Veronneau, Jacques; Leroux, Brian
2010-05-13
The goal of this cluster randomized trial is to test the effectiveness of a counseling approach, Motivational Interviewing, to control dental caries in young Aboriginal children. Motivational Interviewing, a client-centred, directive counseling style, has not yet been evaluated as an approach for promotion of behaviour change in indigenous communities in remote settings. Aboriginal women were hired from the 9 communities to recruit expectant and new mothers to the trial, administer questionnaires and deliver the counseling to mothers in the test communities. The goal is for mothers to receive the intervention during pregnancy and at their child's immunization visits. Data on children's dental health status and family dental health practices will be collected when children are 30-months of age. The communities were randomly allocated to test or control group by a random "draw" over community radio. Sample size and power were determined based on an anticipated 20% reduction in caries prevalence. Randomization checks were conducted between groups. In the 5 test and 4 control communities, 272 of the original target sample size of 309 mothers have been recruited over a two-and-a-half year period. A power calculation using the actual attained sample size showed power to be 79% to detect a treatment effect. If an attrition fraction of 4% per year is maintained, power will remain at 80%. Power will still be > 90% to detect a 25% reduction in caries prevalence. The distribution of most baseline variables was similar for the two randomized groups of mothers. However, despite the random assignment of communities to treatment conditions, group differences exist for stage of pregnancy and prior tooth extractions in the family. Because of the group imbalances on certain variables, control of baseline variables will be done in the analyses of treatment effects. This paper explains the challenges of conducting randomized trials in remote settings, the importance of thorough community collaboration, and also illustrates the likelihood that some baseline variables that may be clinically important will be unevenly split in group-randomized trials when the number of groups is small. This trial is registered as ISRCTN41467632.
2010-01-01
Background The goal of this cluster randomized trial is to test the effectiveness of a counseling approach, Motivational Interviewing, to control dental caries in young Aboriginal children. Motivational Interviewing, a client-centred, directive counseling style, has not yet been evaluated as an approach for promotion of behaviour change in indigenous communities in remote settings. Methods/design Aboriginal women were hired from the 9 communities to recruit expectant and new mothers to the trial, administer questionnaires and deliver the counseling to mothers in the test communities. The goal is for mothers to receive the intervention during pregnancy and at their child's immunization visits. Data on children's dental health status and family dental health practices will be collected when children are 30-months of age. The communities were randomly allocated to test or control group by a random "draw" over community radio. Sample size and power were determined based on an anticipated 20% reduction in caries prevalence. Randomization checks were conducted between groups. Discussion In the 5 test and 4 control communities, 272 of the original target sample size of 309 mothers have been recruited over a two-and-a-half year period. A power calculation using the actual attained sample size showed power to be 79% to detect a treatment effect. If an attrition fraction of 4% per year is maintained, power will remain at 80%. Power will still be > 90% to detect a 25% reduction in caries prevalence. The distribution of most baseline variables was similar for the two randomized groups of mothers. However, despite the random assignment of communities to treatment conditions, group differences exist for stage of pregnancy and prior tooth extractions in the family. Because of the group imbalances on certain variables, control of baseline variables will be done in the analyses of treatment effects. This paper explains the challenges of conducting randomized trials in remote settings, the importance of thorough community collaboration, and also illustrates the likelihood that some baseline variables that may be clinically important will be unevenly split in group-randomized trials when the number of groups is small. Trial registration This trial is registered as ISRCTN41467632. PMID:20465831
Bayesian estimation of Karhunen–Loève expansions; A random subspace approach
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chowdhary, Kenny; Najm, Habib N.
One of the most widely-used statistical procedures for dimensionality reduction of high dimensional random fields is Principal Component Analysis (PCA), which is based on the Karhunen-Lo eve expansion (KLE) of a stochastic process with finite variance. The KLE is analogous to a Fourier series expansion for a random process, where the goal is to find an orthogonal transformation for the data such that the projection of the data onto this orthogonal subspace is optimal in the L 2 sense, i.e, which minimizes the mean square error. In practice, this orthogonal transformation is determined by performing an SVD (Singular Value Decomposition)more » on the sample covariance matrix or on the data matrix itself. Sampling error is typically ignored when quantifying the principal components, or, equivalently, basis functions of the KLE. Furthermore, it is exacerbated when the sample size is much smaller than the dimension of the random field. In this paper, we introduce a Bayesian KLE procedure, allowing one to obtain a probabilistic model on the principal components, which can account for inaccuracies due to limited sample size. The probabilistic model is built via Bayesian inference, from which the posterior becomes the matrix Bingham density over the space of orthonormal matrices. We use a modified Gibbs sampling procedure to sample on this space and then build a probabilistic Karhunen-Lo eve expansions over random subspaces to obtain a set of low-dimensional surrogates of the stochastic process. We illustrate this probabilistic procedure with a finite dimensional stochastic process inspired by Brownian motion.« less
Bayesian estimation of Karhunen–Loève expansions; A random subspace approach
Chowdhary, Kenny; Najm, Habib N.
2016-04-13
One of the most widely-used statistical procedures for dimensionality reduction of high dimensional random fields is Principal Component Analysis (PCA), which is based on the Karhunen-Lo eve expansion (KLE) of a stochastic process with finite variance. The KLE is analogous to a Fourier series expansion for a random process, where the goal is to find an orthogonal transformation for the data such that the projection of the data onto this orthogonal subspace is optimal in the L 2 sense, i.e, which minimizes the mean square error. In practice, this orthogonal transformation is determined by performing an SVD (Singular Value Decomposition)more » on the sample covariance matrix or on the data matrix itself. Sampling error is typically ignored when quantifying the principal components, or, equivalently, basis functions of the KLE. Furthermore, it is exacerbated when the sample size is much smaller than the dimension of the random field. In this paper, we introduce a Bayesian KLE procedure, allowing one to obtain a probabilistic model on the principal components, which can account for inaccuracies due to limited sample size. The probabilistic model is built via Bayesian inference, from which the posterior becomes the matrix Bingham density over the space of orthonormal matrices. We use a modified Gibbs sampling procedure to sample on this space and then build a probabilistic Karhunen-Lo eve expansions over random subspaces to obtain a set of low-dimensional surrogates of the stochastic process. We illustrate this probabilistic procedure with a finite dimensional stochastic process inspired by Brownian motion.« less
Instrumentation for investigation of corona discharges from insulated wires
NASA Technical Reports Server (NTRS)
Doreswamy, C. V.; Crowell, C. S.
1975-01-01
A coaxial cylinder configuration is used to investigate the effect of corona impulses on the deterioration of electrical insulation. The corona currents flowing through the resistance develop a voltage which is fed to the measuring set-up. The value of this resistance is made equal to the surge impedance of the coaxial cylinder set up to prevent reflections. This instrumentation includes a phase shifter and Schmidt trigger and is designed to sample, measure, and display corona impulses occurring during any predetermined sampling period of a randomly selectable half cycle of the 60 Hz high voltage wave.
Tangen, C M; Koch, G G
1999-03-01
In the randomized clinical trial setting, controlling for covariates is expected to produce variance reduction for the treatment parameter estimate and to adjust for random imbalances of covariates between the treatment groups. However, for the logistic regression model, variance reduction is not obviously obtained. This can lead to concerns about the assumptions of the logistic model. We introduce a complementary nonparametric method for covariate adjustment. It provides results that are usually compatible with expectations for analysis of covariance. The only assumptions required are based on randomization and sampling arguments. The resulting treatment parameter is a (unconditional) population average log-odds ratio that has been adjusted for random imbalance of covariates. Data from a randomized clinical trial are used to compare results from the traditional maximum likelihood logistic method with those from the nonparametric logistic method. We examine treatment parameter estimates, corresponding standard errors, and significance levels in models with and without covariate adjustment. In addition, we discuss differences between unconditional population average treatment parameters and conditional subpopulation average treatment parameters. Additional features of the nonparametric method, including stratified (multicenter) and multivariate (multivisit) analyses, are illustrated. Extensions of this methodology to the proportional odds model are also made.
Polynomial chaos representation of databases on manifolds
DOE Office of Scientific and Technical Information (OSTI.GOV)
Soize, C., E-mail: christian.soize@univ-paris-est.fr; Ghanem, R., E-mail: ghanem@usc.edu
2017-04-15
Characterizing the polynomial chaos expansion (PCE) of a vector-valued random variable with probability distribution concentrated on a manifold is a relevant problem in data-driven settings. The probability distribution of such random vectors is multimodal in general, leading to potentially very slow convergence of the PCE. In this paper, we build on a recent development for estimating and sampling from probabilities concentrated on a diffusion manifold. The proposed methodology constructs a PCE of the random vector together with an associated generator that samples from the target probability distribution which is estimated from data concentrated in the neighborhood of the manifold. Themore » method is robust and remains efficient for high dimension and large datasets. The resulting polynomial chaos construction on manifolds permits the adaptation of many uncertainty quantification and statistical tools to emerging questions motivated by data-driven queries.« less
A closer look at cross-validation for assessing the accuracy of gene regulatory networks and models.
Tabe-Bordbar, Shayan; Emad, Amin; Zhao, Sihai Dave; Sinha, Saurabh
2018-04-26
Cross-validation (CV) is a technique to assess the generalizability of a model to unseen data. This technique relies on assumptions that may not be satisfied when studying genomics datasets. For example, random CV (RCV) assumes that a randomly selected set of samples, the test set, well represents unseen data. This assumption doesn't hold true where samples are obtained from different experimental conditions, and the goal is to learn regulatory relationships among the genes that generalize beyond the observed conditions. In this study, we investigated how the CV procedure affects the assessment of supervised learning methods used to learn gene regulatory networks (or in other applications). We compared the performance of a regression-based method for gene expression prediction estimated using RCV with that estimated using a clustering-based CV (CCV) procedure. Our analysis illustrates that RCV can produce over-optimistic estimates of the model's generalizability compared to CCV. Next, we defined the 'distinctness' of test set from training set and showed that this measure is predictive of performance of the regression method. Finally, we introduced a simulated annealing method to construct partitions with gradually increasing distinctness and showed that performance of different gene expression prediction methods can be better evaluated using this method.
2012-03-01
with each SVM discriminating between a pair of the N total speakers in the data set. The (( + 1))/2 classifiers then vote on the final...classification of a test sample. The Random Forest classifier is an ensemble classifier that votes amongst decision trees generated with each node using...Forest vote , and the effects of overtraining will be mitigated by the fact that each decision tree is overtrained differently (due to the random
CHRR: coordinate hit-and-run with rounding for uniform sampling of constraint-based models.
Haraldsdóttir, Hulda S; Cousins, Ben; Thiele, Ines; Fleming, Ronan M T; Vempala, Santosh
2017-06-01
In constraint-based metabolic modelling, physical and biochemical constraints define a polyhedral convex set of feasible flux vectors. Uniform sampling of this set provides an unbiased characterization of the metabolic capabilities of a biochemical network. However, reliable uniform sampling of genome-scale biochemical networks is challenging due to their high dimensionality and inherent anisotropy. Here, we present an implementation of a new sampling algorithm, coordinate hit-and-run with rounding (CHRR). This algorithm is based on the provably efficient hit-and-run random walk and crucially uses a preprocessing step to round the anisotropic flux set. CHRR provably converges to a uniform stationary sampling distribution. We apply it to metabolic networks of increasing dimensionality. We show that it converges several times faster than a popular artificial centering hit-and-run algorithm, enabling reliable and tractable sampling of genome-scale biochemical networks. https://github.com/opencobra/cobratoolbox . ronan.mt.fleming@gmail.com or vempala@cc.gatech.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.
Erlandsson, Lena; Rosenstierne, Maiken W.; McLoughlin, Kevin; Jaing, Crystal; Fomsgaard, Anders
2011-01-01
A common technique used for sensitive and specific diagnostic virus detection in clinical samples is PCR that can identify one or several viruses in one assay. However, a diagnostic microarray containing probes for all human pathogens could replace hundreds of individual PCR-reactions and remove the need for a clear clinical hypothesis regarding a suspected pathogen. We have established such a diagnostic platform for random amplification and subsequent microarray identification of viral pathogens in clinical samples. We show that Phi29 polymerase-amplification of a diverse set of clinical samples generates enough viral material for successful identification by the Microbial Detection Array, demonstrating the potential of the microarray technique for broad-spectrum pathogen detection. We conclude that this method detects both DNA and RNA virus, present in the same sample, as well as differentiates between different virus subtypes. We propose this assay for diagnostic analysis of viruses in clinical samples. PMID:21853040
Correcting Classifiers for Sample Selection Bias in Two-Phase Case-Control Studies
Theis, Fabian J.
2017-01-01
Epidemiological studies often utilize stratified data in which rare outcomes or exposures are artificially enriched. This design can increase precision in association tests but distorts predictions when applying classifiers on nonstratified data. Several methods correct for this so-called sample selection bias, but their performance remains unclear especially for machine learning classifiers. With an emphasis on two-phase case-control studies, we aim to assess which corrections to perform in which setting and to obtain methods suitable for machine learning techniques, especially the random forest. We propose two new resampling-based methods to resemble the original data and covariance structure: stochastic inverse-probability oversampling and parametric inverse-probability bagging. We compare all techniques for the random forest and other classifiers, both theoretically and on simulated and real data. Empirical results show that the random forest profits from only the parametric inverse-probability bagging proposed by us. For other classifiers, correction is mostly advantageous, and methods perform uniformly. We discuss consequences of inappropriate distribution assumptions and reason for different behaviors between the random forest and other classifiers. In conclusion, we provide guidance for choosing correction methods when training classifiers on biased samples. For random forests, our method outperforms state-of-the-art procedures if distribution assumptions are roughly fulfilled. We provide our implementation in the R package sambia. PMID:29312464
Sample Size Calculations for Micro-randomized Trials in mHealth
Liao, Peng; Klasnja, Predrag; Tewari, Ambuj; Murphy, Susan A.
2015-01-01
The use and development of mobile interventions are experiencing rapid growth. In “just-in-time” mobile interventions, treatments are provided via a mobile device and they are intended to help an individual make healthy decisions “in the moment,” and thus have a proximal, near future impact. Currently the development of mobile interventions is proceeding at a much faster pace than that of associated data science methods. A first step toward developing data-based methods is to provide an experimental design for testing the proximal effects of these just-in-time treatments. In this paper, we propose a “micro-randomized” trial design for this purpose. In a micro-randomized trial, treatments are sequentially randomized throughout the conduct of the study, with the result that each participant may be randomized at the 100s or 1000s of occasions at which a treatment might be provided. Further, we develop a test statistic for assessing the proximal effect of a treatment as well as an associated sample size calculator. We conduct simulation evaluations of the sample size calculator in various settings. Rules of thumb that might be used in designing a micro-randomized trial are discussed. This work is motivated by our collaboration on the HeartSteps mobile application designed to increase physical activity. PMID:26707831
Urdea, Mickey; Kolberg, Janice; Wilber, Judith; Gerwien, Robert; Moler, Edward; Rowe, Michael; Jorgensen, Paul; Hansen, Torben; Pedersen, Oluf; Jørgensen, Torben; Borch-Johnsen, Knut
2009-01-01
Background Improved identification of subjects at high risk for development of type 2 diabetes would allow preventive interventions to be targeted toward individuals most likely to benefit. In previous research, predictive biomarkers were identified and used to develop multivariate models to assess an individual's risk of developing diabetes. Here we describe the training and validation of the PreDx™ Diabetes Risk Score (DRS) model in a clinical laboratory setting using baseline serum samples from subjects in the Inter99 cohort, a population-based primary prevention study of cardiovascular disease. Methods Among 6784 subjects free of diabetes at baseline, 215 subjects progressed to diabetes (converters) during five years of follow-up. A nested case-control study was performed using serum samples from 202 converters and 597 randomly selected nonconverters. Samples were randomly assigned to equally sized training and validation sets. Seven biomarkers were measured using assays developed for use in a clinical reference laboratory. Results The PreDx DRS model performed better on the training set (area under the curve [AUC] = 0.837) than fasting plasma glucose alone (AUC = 0.779). When applied to the sequestered validation set, the PreDx DRS showed the same performance (AUC = 0.838), thus validating the model. This model had a better AUC than any other single measure from a fasting sample. Moreover, the model provided further risk stratification among high-risk subpopulations with impaired fasting glucose or metabolic syndrome. Conclusions The PreDx DRS provides the absolute risk of diabetes conversion in five years for subjects identified to be “at risk” using the clinical factors. PMID:20144324
Urdea, Mickey; Kolberg, Janice; Wilber, Judith; Gerwien, Robert; Moler, Edward; Rowe, Michael; Jorgensen, Paul; Hansen, Torben; Pedersen, Oluf; Jørgensen, Torben; Borch-Johnsen, Knut
2009-07-01
Improved identification of subjects at high risk for development of type 2 diabetes would allow preventive interventions to be targeted toward individuals most likely to benefit. In previous research, predictive biomarkers were identified and used to develop multivariate models to assess an individual's risk of developing diabetes. Here we describe the training and validation of the PreDx Diabetes Risk Score (DRS) model in a clinical laboratory setting using baseline serum samples from subjects in the Inter99 cohort, a population-based primary prevention study of cardiovascular disease. Among 6784 subjects free of diabetes at baseline, 215 subjects progressed to diabetes (converters) during five years of follow-up. A nested case-control study was performed using serum samples from 202 converters and 597 randomly selected nonconverters. Samples were randomly assigned to equally sized training and validation sets. Seven biomarkers were measured using assays developed for use in a clinical reference laboratory. The PreDx DRS model performed better on the training set (area under the curve [AUC] = 0.837) than fasting plasma glucose alone (AUC = 0.779). When applied to the sequestered validation set, the PreDx DRS showed the same performance (AUC = 0.838), thus validating the model. This model had a better AUC than any other single measure from a fasting sample. Moreover, the model provided further risk stratification among high-risk subpopulations with impaired fasting glucose or metabolic syndrome. The PreDx DRS provides the absolute risk of diabetes conversion in five years for subjects identified to be "at risk" using the clinical factors. Copyright 2009 Diabetes Technology Society.
NASA Astrophysics Data System (ADS)
Norajitra, Tobias; Meinzer, Hans-Peter; Maier-Hein, Klaus H.
2015-03-01
During image segmentation, 3D Statistical Shape Models (SSM) usually conduct a limited search for target landmarks within one-dimensional search profiles perpendicular to the model surface. In addition, landmark appearance is modeled only locally based on linear profiles and weak learners, altogether leading to segmentation errors from landmark ambiguities and limited search coverage. We present a new method for 3D SSM segmentation based on 3D Random Forest Regression Voting. For each surface landmark, a Random Regression Forest is trained that learns a 3D spatial displacement function between the according reference landmark and a set of surrounding sample points, based on an infinite set of non-local randomized 3D Haar-like features. Landmark search is then conducted omni-directionally within 3D search spaces, where voxelwise forest predictions on landmark position contribute to a common voting map which reflects the overall position estimate. Segmentation experiments were conducted on a set of 45 CT volumes of the human liver, of which 40 images were randomly chosen for training and 5 for testing. Without parameter optimization, using a simple candidate selection and a single resolution approach, excellent results were achieved, while faster convergence and better concavity segmentation were observed, altogether underlining the potential of our approach in terms of increased robustness from distinct landmark detection and from better search coverage.
NASA Astrophysics Data System (ADS)
Schünemann, Adriano Luis; Inácio Fernandes Filho, Elpídio; Rocha Francelino, Marcio; Rodrigues Santos, Gérson; Thomazini, Andre; Batista Pereira, Antônio; Gonçalves Reynaud Schaefer, Carlos Ernesto
2017-04-01
The knowledge of environmental variables values, in non-sampled sites from a minimum data set can be accessed through interpolation technique. Kriging and the classifier Random Forest algorithm are examples of predictors with this aim. The objective of this work was to compare methods of soil attributes spatialization in a recent deglaciated environment with complex landforms. Prediction of the selected soil attributes (potassium, calcium and magnesium) from ice-free areas were tested by using morphometric covariables, and geostatistical models without these covariables. For this, 106 soil samples were collected at 0-10 cm depth in Keller Peninsula, King George Island, Maritime Antarctica. Soil chemical analysis was performed by the gravimetric method, determining values of potassium, calcium and magnesium for each sampled point. Digital terrain models (DTMs) were obtained by using Terrestrial Laser Scanner. DTMs were generated from a cloud of points with spatial resolutions of 1, 5, 10, 20 and 30 m. Hence, 40 morphometric covariates were generated. Simple Kriging was performed using the R package software. The same data set coupled with morphometric covariates, was used to predict values of the studied attributes in non-sampled sites through Random Forest interpolator. Little differences were observed on the DTMs generated by Simple kriging and Random Forest interpolators. Also, DTMs with better spatial resolution did not improved the quality of soil attributes prediction. Results revealed that Simple Kriging can be used as interpolator when morphometric covariates are not available, with little impact regarding quality. It is necessary to go further in soil chemical attributes prediction techniques, especially in periglacial areas with complex landforms.
Hengl, Tomislav; Heuvelink, Gerard B. M.; Kempen, Bas; Leenaars, Johan G. B.; Walsh, Markus G.; Shepherd, Keith D.; Sila, Andrew; MacMillan, Robert A.; Mendes de Jesus, Jorge; Tamene, Lulseged; Tondoh, Jérôme E.
2015-01-01
80% of arable land in Africa has low soil fertility and suffers from physical soil problems. Additionally, significant amounts of nutrients are lost every year due to unsustainable soil management practices. This is partially the result of insufficient use of soil management knowledge. To help bridge the soil information gap in Africa, the Africa Soil Information Service (AfSIS) project was established in 2008. Over the period 2008–2014, the AfSIS project compiled two point data sets: the Africa Soil Profiles (legacy) database and the AfSIS Sentinel Site database. These data sets contain over 28 thousand sampling locations and represent the most comprehensive soil sample data sets of the African continent to date. Utilizing these point data sets in combination with a large number of covariates, we have generated a series of spatial predictions of soil properties relevant to the agricultural management—organic carbon, pH, sand, silt and clay fractions, bulk density, cation-exchange capacity, total nitrogen, exchangeable acidity, Al content and exchangeable bases (Ca, K, Mg, Na). We specifically investigate differences between two predictive approaches: random forests and linear regression. Results of 5-fold cross-validation demonstrate that the random forests algorithm consistently outperforms the linear regression algorithm, with average decreases of 15–75% in Root Mean Squared Error (RMSE) across soil properties and depths. Fitting and running random forests models takes an order of magnitude more time and the modelling success is sensitive to artifacts in the input data, but as long as quality-controlled point data are provided, an increase in soil mapping accuracy can be expected. Results also indicate that globally predicted soil classes (USDA Soil Taxonomy, especially Alfisols and Mollisols) help improve continental scale soil property mapping, and are among the most important predictors. This indicates a promising potential for transferring pedological knowledge from data rich countries to countries with limited soil data. PMID:26110833
Hengl, Tomislav; Heuvelink, Gerard B M; Kempen, Bas; Leenaars, Johan G B; Walsh, Markus G; Shepherd, Keith D; Sila, Andrew; MacMillan, Robert A; Mendes de Jesus, Jorge; Tamene, Lulseged; Tondoh, Jérôme E
2015-01-01
80% of arable land in Africa has low soil fertility and suffers from physical soil problems. Additionally, significant amounts of nutrients are lost every year due to unsustainable soil management practices. This is partially the result of insufficient use of soil management knowledge. To help bridge the soil information gap in Africa, the Africa Soil Information Service (AfSIS) project was established in 2008. Over the period 2008-2014, the AfSIS project compiled two point data sets: the Africa Soil Profiles (legacy) database and the AfSIS Sentinel Site database. These data sets contain over 28 thousand sampling locations and represent the most comprehensive soil sample data sets of the African continent to date. Utilizing these point data sets in combination with a large number of covariates, we have generated a series of spatial predictions of soil properties relevant to the agricultural management--organic carbon, pH, sand, silt and clay fractions, bulk density, cation-exchange capacity, total nitrogen, exchangeable acidity, Al content and exchangeable bases (Ca, K, Mg, Na). We specifically investigate differences between two predictive approaches: random forests and linear regression. Results of 5-fold cross-validation demonstrate that the random forests algorithm consistently outperforms the linear regression algorithm, with average decreases of 15-75% in Root Mean Squared Error (RMSE) across soil properties and depths. Fitting and running random forests models takes an order of magnitude more time and the modelling success is sensitive to artifacts in the input data, but as long as quality-controlled point data are provided, an increase in soil mapping accuracy can be expected. Results also indicate that globally predicted soil classes (USDA Soil Taxonomy, especially Alfisols and Mollisols) help improve continental scale soil property mapping, and are among the most important predictors. This indicates a promising potential for transferring pedological knowledge from data rich countries to countries with limited soil data.
K-Fold Crossvalidation in Canonical Analysis.
ERIC Educational Resources Information Center
Liang, Kun-Hsia; And Others
1995-01-01
A computer-assisted, K-fold cross-validation technique is discussed in the framework of canonical correlation analysis of randomly generated data sets. Analysis results suggest that this technique can effectively reduce the contamination of canonical variates and canonical correlations by sample-specific variance components. (Author/SLD)
TOWARDS USING STABLE SPERMATOZOAL RNAS FOR PROGNOSTIC ASSESSMENT OF MALE FACTOR FERTILITY
Objective: To establish the stability of spermatozoal RNAs as a means to validate their use as a male fertility marker. Design: Semen samples were randomly selected for 1 of 3 cryopreservation treatments. Setting: An academic research environment. Patient(s): Men aged...
A Population of Assessment Tasks
ERIC Educational Resources Information Center
Daro, Phil; Burkhardt, Hugh
2012-01-01
We propose the development of a "population" of high-quality assessment tasks that cover the performance goals set out in the "Common Core State Standards for Mathematics." The population will be published. Tests are drawn from this population as a structured random sample guided by a "balancing algorithm."
MicroRNA array normalization: an evaluation using a randomized dataset as the benchmark.
Qin, Li-Xuan; Zhou, Qin
2014-01-01
MicroRNA arrays possess a number of unique data features that challenge the assumption key to many normalization methods. We assessed the performance of existing normalization methods using two microRNA array datasets derived from the same set of tumor samples: one dataset was generated using a blocked randomization design when assigning arrays to samples and hence was free of confounding array effects; the second dataset was generated without blocking or randomization and exhibited array effects. The randomized dataset was assessed for differential expression between two tumor groups and treated as the benchmark. The non-randomized dataset was assessed for differential expression after normalization and compared against the benchmark. Normalization improved the true positive rate significantly in the non-randomized data but still possessed a false discovery rate as high as 50%. Adding a batch adjustment step before normalization further reduced the number of false positive markers while maintaining a similar number of true positive markers, which resulted in a false discovery rate of 32% to 48%, depending on the specific normalization method. We concluded the paper with some insights on possible causes of false discoveries to shed light on how to improve normalization for microRNA arrays.
MicroRNA Array Normalization: An Evaluation Using a Randomized Dataset as the Benchmark
Qin, Li-Xuan; Zhou, Qin
2014-01-01
MicroRNA arrays possess a number of unique data features that challenge the assumption key to many normalization methods. We assessed the performance of existing normalization methods using two microRNA array datasets derived from the same set of tumor samples: one dataset was generated using a blocked randomization design when assigning arrays to samples and hence was free of confounding array effects; the second dataset was generated without blocking or randomization and exhibited array effects. The randomized dataset was assessed for differential expression between two tumor groups and treated as the benchmark. The non-randomized dataset was assessed for differential expression after normalization and compared against the benchmark. Normalization improved the true positive rate significantly in the non-randomized data but still possessed a false discovery rate as high as 50%. Adding a batch adjustment step before normalization further reduced the number of false positive markers while maintaining a similar number of true positive markers, which resulted in a false discovery rate of 32% to 48%, depending on the specific normalization method. We concluded the paper with some insights on possible causes of false discoveries to shed light on how to improve normalization for microRNA arrays. PMID:24905456
Classification of urine sediment based on convolution neural network
NASA Astrophysics Data System (ADS)
Pan, Jingjing; Jiang, Cunbo; Zhu, Tiantian
2018-04-01
By designing a new convolution neural network framework, this paper breaks the constraints of the original convolution neural network framework requiring large training samples and samples of the same size. Move and cropping the input images, generate the same size of the sub-graph. And then, the generated sub-graph uses the method of dropout, increasing the diversity of samples and preventing the fitting generation. Randomly select some proper subset in the sub-graphic set and ensure that the number of elements in the proper subset is same and the proper subset is not the same. The proper subsets are used as input layers for the convolution neural network. Through the convolution layer, the pooling, the full connection layer and output layer, we can obtained the classification loss rate of test set and training set. In the red blood cells, white blood cells, calcium oxalate crystallization classification experiment, the classification accuracy rate of 97% or more.
Simple-random-sampling-based multiclass text classification algorithm.
Liu, Wuying; Wang, Lin; Yi, Mianzhu
2014-01-01
Multiclass text classification (MTC) is a challenging issue and the corresponding MTC algorithms can be used in many applications. The space-time overhead of the algorithms must be concerned about the era of big data. Through the investigation of the token frequency distribution in a Chinese web document collection, this paper reexamines the power law and proposes a simple-random-sampling-based MTC (SRSMTC) algorithm. Supported by a token level memory to store labeled documents, the SRSMTC algorithm uses a text retrieval approach to solve text classification problems. The experimental results on the TanCorp data set show that SRSMTC algorithm can achieve the state-of-the-art performance at greatly reduced space-time requirements.
Quantum-inspired algorithm for estimating the permanent of positive semidefinite matrices
NASA Astrophysics Data System (ADS)
Chakhmakhchyan, L.; Cerf, N. J.; Garcia-Patron, R.
2017-08-01
We construct a quantum-inspired classical algorithm for computing the permanent of Hermitian positive semidefinite matrices by exploiting a connection between these mathematical structures and the boson sampling model. Specifically, the permanent of a Hermitian positive semidefinite matrix can be expressed in terms of the expected value of a random variable, which stands for a specific photon-counting probability when measuring a linear-optically evolved random multimode coherent state. Our algorithm then approximates the matrix permanent from the corresponding sample mean and is shown to run in polynomial time for various sets of Hermitian positive semidefinite matrices, achieving a precision that improves over known techniques. This work illustrates how quantum optics may benefit algorithm development.
AUTOCLASSIFICATION OF THE VARIABLE 3XMM SOURCES USING THE RANDOM FOREST MACHINE LEARNING ALGORITHM
DOE Office of Scientific and Technical Information (OSTI.GOV)
Farrell, Sean A.; Murphy, Tara; Lo, Kitty K., E-mail: s.farrell@physics.usyd.edu.au
In the current era of large surveys and massive data sets, autoclassification of astrophysical sources using intelligent algorithms is becoming increasingly important. In this paper we present the catalog of variable sources in the Third XMM-Newton Serendipitous Source catalog (3XMM) autoclassified using the Random Forest machine learning algorithm. We used a sample of manually classified variable sources from the second data release of the XMM-Newton catalogs (2XMMi-DR2) to train the classifier, obtaining an accuracy of ∼92%. We also evaluated the effectiveness of identifying spurious detections using a sample of spurious sources, achieving an accuracy of ∼95%. Manual investigation of amore » random sample of classified sources confirmed these accuracy levels and showed that the Random Forest machine learning algorithm is highly effective at automatically classifying 3XMM sources. Here we present the catalog of classified 3XMM variable sources. We also present three previously unidentified unusual sources that were flagged as outlier sources by the algorithm: a new candidate supergiant fast X-ray transient, a 400 s X-ray pulsar, and an eclipsing 5 hr binary system coincident with a known Cepheid.« less
Extending cluster Lot Quality Assurance Sampling designs for surveillance programs
Hund, Lauren; Pagano, Marcello
2014-01-01
Lot quality assurance sampling (LQAS) has a long history of applications in industrial quality control. LQAS is frequently used for rapid surveillance in global health settings, with areas classified as poor or acceptable performance based on the binary classification of an indicator. Historically, LQAS surveys have relied on simple random samples from the population; however, implementing two-stage cluster designs for surveillance sampling is often more cost-effective than simple random sampling. By applying survey sampling results to the binary classification procedure, we develop a simple and flexible non-parametric procedure to incorporate clustering effects into the LQAS sample design to appropriately inflate the sample size, accommodating finite numbers of clusters in the population when relevant. We use this framework to then discuss principled selection of survey design parameters in longitudinal surveillance programs. We apply this framework to design surveys to detect rises in malnutrition prevalence in nutrition surveillance programs in Kenya and South Sudan, accounting for clustering within villages. By combining historical information with data from previous surveys, we design surveys to detect spikes in the childhood malnutrition rate. PMID:24633656
Extending cluster lot quality assurance sampling designs for surveillance programs.
Hund, Lauren; Pagano, Marcello
2014-07-20
Lot quality assurance sampling (LQAS) has a long history of applications in industrial quality control. LQAS is frequently used for rapid surveillance in global health settings, with areas classified as poor or acceptable performance on the basis of the binary classification of an indicator. Historically, LQAS surveys have relied on simple random samples from the population; however, implementing two-stage cluster designs for surveillance sampling is often more cost-effective than simple random sampling. By applying survey sampling results to the binary classification procedure, we develop a simple and flexible nonparametric procedure to incorporate clustering effects into the LQAS sample design to appropriately inflate the sample size, accommodating finite numbers of clusters in the population when relevant. We use this framework to then discuss principled selection of survey design parameters in longitudinal surveillance programs. We apply this framework to design surveys to detect rises in malnutrition prevalence in nutrition surveillance programs in Kenya and South Sudan, accounting for clustering within villages. By combining historical information with data from previous surveys, we design surveys to detect spikes in the childhood malnutrition rate. Copyright © 2014 John Wiley & Sons, Ltd.
Dufresne, Jaimie; Florentinus-Mefailoski, Angelique; Ajambo, Juliet; Ferwa, Ammara; Bowden, Peter; Marshall, John
2017-01-01
Normal human EDTA plasma samples were collected on ice, processed ice cold, and stored in a freezer at - 80 °C prior to experiments. Plasma test samples from the - 80 °C freezer were thawed on ice or intentionally warmed to room temperature. Protein content was measured by CBBR binding and the release of alcohol soluble amines by the Cd ninhydrin assay. Plasma peptides released over time were collected over C18 for random and independent sampling by liquid chromatography micro electrospray ionization and tandem mass spectrometry (LC-ESI-MS/MS) and correlated with X!TANDEM. Fully tryptic peptides by X!TANDEM returned a similar set of proteins, but was more computationally efficient, than "no enzyme" correlations. Plasma samples maintained on ice, or ice with a cocktail of protease inhibitors, showed lower background amounts of plasma peptides compared to samples incubated at room temperature. Regression analysis indicated that warming plasma to room temperature, versus ice cold, resulted in a ~ twofold increase in the frequency of peptide identification over hours-days of incubation at room temperature. The type I error rate of the protein identification from the X!TANDEM algorithm combined was estimated to be low compared to a null model of computer generated random MS/MS spectra. The peptides of human plasma were identified and quantified with low error rates by random and independent sampling that revealed 1000s of peptides from hundreds of human plasma proteins from endogenous tryptic peptides.
Westfall, Jacob; Kenny, David A; Judd, Charles M
2014-10-01
Researchers designing experiments in which a sample of participants responds to a sample of stimuli are faced with difficult questions about optimal study design. The conventional procedures of statistical power analysis fail to provide appropriate answers to these questions because they are based on statistical models in which stimuli are not assumed to be a source of random variation in the data, models that are inappropriate for experiments involving crossed random factors of participants and stimuli. In this article, we present new methods of power analysis for designs with crossed random factors, and we give detailed, practical guidance to psychology researchers planning experiments in which a sample of participants responds to a sample of stimuli. We extensively examine 5 commonly used experimental designs, describe how to estimate statistical power in each, and provide power analysis results based on a reasonable set of default parameter values. We then develop general conclusions and formulate rules of thumb concerning the optimal design of experiments in which a sample of participants responds to a sample of stimuli. We show that in crossed designs, statistical power typically does not approach unity as the number of participants goes to infinity but instead approaches a maximum attainable power value that is possibly small, depending on the stimulus sample. We also consider the statistical merits of designs involving multiple stimulus blocks. Finally, we provide a simple and flexible Web-based power application to aid researchers in planning studies with samples of stimuli.
Using regression methods to estimate stream phosphorus loads at the Illinois River, Arkansas
Haggard, B.E.; Soerens, T.S.; Green, W.R.; Richards, R.P.
2003-01-01
The development of total maximum daily loads (TMDLs) requires evaluating existing constituent loads in streams. Accurate estimates of constituent loads are needed to calibrate watershed and reservoir models for TMDL development. The best approach to estimate constituent loads is high frequency sampling, particularly during storm events, and mass integration of constituents passing a point in a stream. Most often, resources are limited and discrete water quality samples are collected on fixed intervals and sometimes supplemented with directed sampling during storm events. When resources are limited, mass integration is not an accurate means to determine constituent loads and other load estimation techniques such as regression models are used. The objective of this work was to determine a minimum number of water-quality samples needed to provide constituent concentration data adequate to estimate constituent loads at a large stream. Twenty sets of water quality samples with and without supplemental storm samples were randomly selected at various fixed intervals from a database at the Illinois River, northwest Arkansas. The random sets were used to estimate total phosphorus (TP) loads using regression models. The regression-based annual TP loads were compared to the integrated annual TP load estimated using all the data. At a minimum, monthly sampling plus supplemental storm samples (six samples per year) was needed to produce a root mean square error of less than 15%. Water quality samples should be collected at least semi-monthly (every 15 days) in studies less than two years if seasonal time factors are to be used in the regression models. Annual TP loads estimated from independently collected discrete water quality samples further demonstrated the utility of using regression models to estimate annual TP loads in this stream system.
Lofwall, M.R.; Nuzzo, P.A.; Campbell, C.; Walsh, S.L.
2014-01-01
Aripiprazole is a partial agonist at dopamine D2 and serotonin 5-HT1a receptors and antagonist at 5-HT2 receptors. Because both dopamine and serotonin systems are involved in the action of cocaine, this study aimed to determine if aripiprazole could diminish the reinforcing efficacy of cocaine. Secondary aims evaluated aripiprazole effects on ad lib cigarette smoking and a novel 40-hour cigarette smoking abstinence procedure. Healthy adults with regular cocaine and cigarette use completed this ~30-day inpatient double blind, randomized, placebo-controlled mixed-design study. An oral placebo lead-in period was followed by randomization to oral aripiprazole (0, 2 or 10 mg daily; n=7 completed/group). Three sets of test sessions, each consisting of three cocaine sample-choice (i.e., self-administration) sessions and one dose-response session, were conducted (during the lead-in period and after randomization before and after achieving aripiprazole steady state). Sample-choice sessions tested three cocaine doses (0, 20, and 40 mg/70 kg, i.v.) with one dose (random order) administered in each sample session; subjective, observer-rated and physiologic outcomes were collected repeatedly before and after cocaine administration. Later that day, participants chose between receiving the sample dose from that morning or descending amounts of money for seven trials ($19, 16, 13, 10, 7, 4, 1). Dose response sessions administered the three cocaine doses in ascending order for pharmacodynamic and potential pharmacokinetic assessment. A set of two cigarette smoking topography sessions were conducted during placebo lead-in and after randomization; one with and one without 40-hours of cigarette smoking abstinence. Number of ad lib cigarettes smoked during non-session days was also collected. Cocaine produced prototypic pharmacodynamic effects and self-administration; neither were significantly altered by aripiprazole. The 40-hour smoking abstinence procedure reliably produced nicotine withdrawal and craving and increased smoking modestly. Aripiprazole did not significantly alter smoking outcomes. These data do not support the further investigation of aripiprazole for the treatment of cocaine or tobacco use disorders. PMID:24467369
Sample entropy analysis of cervical neoplasia gene-expression signatures
Botting, Shaleen K; Trzeciakowski, Jerome P; Benoit, Michelle F; Salama, Salama A; Diaz-Arrastia, Concepcion R
2009-01-01
Background We introduce Approximate Entropy as a mathematical method of analysis for microarray data. Approximate entropy is applied here as a method to classify the complex gene expression patterns resultant of a clinical sample set. Since Entropy is a measure of disorder in a system, we believe that by choosing genes which display minimum entropy in normal controls and maximum entropy in the cancerous sample set we will be able to distinguish those genes which display the greatest variability in the cancerous set. Here we describe a method of utilizing Approximate Sample Entropy (ApSE) analysis to identify genes of interest with the highest probability of producing an accurate, predictive, classification model from our data set. Results In the development of a diagnostic gene-expression profile for cervical intraepithelial neoplasia (CIN) and squamous cell carcinoma of the cervix, we identified 208 genes which are unchanging in all normal tissue samples, yet exhibit a random pattern indicative of the genetic instability and heterogeneity of malignant cells. This may be measured in terms of the ApSE when compared to normal tissue. We have validated 10 of these genes on 10 Normal and 20 cancer and CIN3 samples. We report that the predictive value of the sample entropy calculation for these 10 genes of interest is promising (75% sensitivity, 80% specificity for prediction of cervical cancer over CIN3). Conclusion The success of the Approximate Sample Entropy approach in discerning alterations in complexity from biological system with such relatively small sample set, and extracting biologically relevant genes of interest hold great promise. PMID:19232110
Geostatistical Sampling Methods for Efficient Uncertainty Analysis in Flow and Transport Problems
NASA Astrophysics Data System (ADS)
Liodakis, Stylianos; Kyriakidis, Phaedon; Gaganis, Petros
2015-04-01
In hydrogeological applications involving flow and transport of in heterogeneous porous media the spatial distribution of hydraulic conductivity is often parameterized in terms of a lognormal random field based on a histogram and variogram model inferred from data and/or synthesized from relevant knowledge. Realizations of simulated conductivity fields are then generated using geostatistical simulation involving simple random (SR) sampling and are subsequently used as inputs to physically-based simulators of flow and transport in a Monte Carlo framework for evaluating the uncertainty in the spatial distribution of solute concentration due to the uncertainty in the spatial distribution of hydraulic con- ductivity [1]. Realistic uncertainty analysis, however, calls for a large number of simulated concentration fields; hence, can become expensive in terms of both time and computer re- sources. A more efficient alternative to SR sampling is Latin hypercube (LH) sampling, a special case of stratified random sampling, which yields a more representative distribution of simulated attribute values with fewer realizations [2]. Here, term representative implies realizations spanning efficiently the range of possible conductivity values corresponding to the lognormal random field. In this work we investigate the efficiency of alternative methods to classical LH sampling within the context of simulation of flow and transport in a heterogeneous porous medium. More precisely, we consider the stratified likelihood (SL) sampling method of [3], in which attribute realizations are generated using the polar simulation method by exploring the geometrical properties of the multivariate Gaussian distribution function. In addition, we propose a more efficient version of the above method, here termed minimum energy (ME) sampling, whereby a set of N representative conductivity realizations at M locations is constructed by: (i) generating a representative set of N points distributed on the surface of a M-dimensional, unit radius hyper-sphere, (ii) relocating the N points on a representative set of N hyper-spheres of different radii, and (iii) transforming the coordinates of those points to lie on N different hyper-ellipsoids spanning the multivariate Gaussian distribution. The above method is applied in a dimensionality reduction context by defining flow-controlling points over which representative sampling of hydraulic conductivity is performed, thus also accounting for the sensitivity of the flow and transport model to the input hydraulic conductivity field. The performance of the various stratified sampling methods, LH, SL, and ME, is compared to that of SR sampling in terms of reproduction of ensemble statistics of hydraulic conductivity and solute concentration for different sample sizes N (numbers of realizations). The results indicate that ME sampling constitutes an equally if not more efficient simulation method than LH and SL sampling, as it can reproduce to a similar extent statistics of the conductivity and concentration fields, yet with smaller sampling variability than SR sampling. References [1] Gutjahr A.L. and Bras R.L. Spatial variability in subsurface flow and transport: A review. Reliability Engineering & System Safety, 42, 293-316, (1993). [2] Helton J.C. and Davis F.J. Latin hypercube sampling and the propagation of uncertainty in analyses of complex systems. Reliability Engineering & System Safety, 81, 23-69, (2003). [3] Switzer P. Multiple simulation of spatial fields. In: Heuvelink G, Lemmens M (eds) Proceedings of the 4th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences, Coronet Books Inc., pp 629?635 (2000).
High throughput image cytometry for detection of suspicious lesions in the oral cavity
NASA Astrophysics Data System (ADS)
MacAulay, Calum; Poh, Catherine F.; Guillaud, Martial; Michele Williams, Pamela; Laronde, Denise M.; Zhang, Lewei; Rosin, Miriam P.
2012-08-01
The successful management of oral cancer depends upon early detection, which relies heavily on the clinician's ability to discriminate sometimes subtle alterations of the infrequent premalignant lesions from the more common reactive and inflammatory conditions in the oral mucosa. Even among experienced oral specialists this can be challenging, particularly when using new wide field-of-view direct fluorescence visualization devices clinically introduced for the recognition of at-risk tissue. The objective of this study is to examine if quantitative cytometric analysis of oral brushing samples could facilitate the assessment of the risk of visually ambiguous lesions. About 369 cytological samples were collected and analyzed: (1) 148 samples from pathology-proven sites of SCC, carcinoma in situ or severe dysplasia; (2) 77 samples from sites with inflammation, infection, or trauma, and (3) 144 samples from normal sites. These were randomly separated into training and test sets. The best algorithm correctly recognized 92.5% of the normal samples, 89.4% of the abnormal samples, 86.2% of the confounders in the training set as well as 100% of the normal samples, and 94.4% of the abnormal samples in the test set. These data suggest that quantitative cytology could reduce by more than 85% the number of visually suspect lesions requiring further assessment by biopsy.
Kent, Peter; Boyle, Eleanor; Keating, Jennifer L; Albert, Hanne B; Hartvigsen, Jan
2017-02-01
To quantify variability in the results of statistical analyses based on contingency tables and discuss the implications for the choice of sample size for studies that derive clinical prediction rules. An analysis of three pre-existing sets of large cohort data (n = 4,062-8,674) was performed. In each data set, repeated random sampling of various sample sizes, from n = 100 up to n = 2,000, was performed 100 times at each sample size and the variability in estimates of sensitivity, specificity, positive and negative likelihood ratios, posttest probabilities, odds ratios, and risk/prevalence ratios for each sample size was calculated. There were very wide, and statistically significant, differences in estimates derived from contingency tables from the same data set when calculated in sample sizes below 400 people, and typically, this variability stabilized in samples of 400-600 people. Although estimates of prevalence also varied significantly in samples below 600 people, that relationship only explains a small component of the variability in these statistical parameters. To reduce sample-specific variability, contingency tables should consist of 400 participants or more when used to derive clinical prediction rules or test their performance. Copyright © 2016 Elsevier Inc. All rights reserved.
Random On-Board Pixel Sampling (ROPS) X-Ray Camera
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Zhehui; Iaroshenko, O.; Li, S.
Recent advances in compressed sensing theory and algorithms offer new possibilities for high-speed X-ray camera design. In many CMOS cameras, each pixel has an independent on-board circuit that includes an amplifier, noise rejection, signal shaper, an analog-to-digital converter (ADC), and optional in-pixel storage. When X-ray images are sparse, i.e., when one of the following cases is true: (a.) The number of pixels with true X-ray hits is much smaller than the total number of pixels; (b.) The X-ray information is redundant; or (c.) Some prior knowledge about the X-ray images exists, sparse sampling may be allowed. Here we first illustratemore » the feasibility of random on-board pixel sampling (ROPS) using an existing set of X-ray images, followed by a discussion about signal to noise as a function of pixel size. Next, we describe a possible circuit architecture to achieve random pixel access and in-pixel storage. The combination of a multilayer architecture, sparse on-chip sampling, and computational image techniques, is expected to facilitate the development and applications of high-speed X-ray camera technology.« less
A new mosaic method for three-dimensional surface
NASA Astrophysics Data System (ADS)
Yuan, Yun; Zhu, Zhaokun; Ding, Yongjun
2011-08-01
Three-dimensional (3-D) data mosaic is a indispensable link in surface measurement and digital terrain map generation. With respect to the mosaic problem of the local unorganized cloud points with rude registration and mass mismatched points, a new mosaic method for 3-D surface based on RANSAC is proposed. Every circular of this method is processed sequentially by random sample with additional shape constraint, data normalization of cloud points, absolute orientation, data denormalization of cloud points, inlier number statistic, etc. After N random sample trials the largest consensus set is selected, and at last the model is re-estimated using all the points in the selected subset. The minimal subset is composed of three non-colinear points which form a triangle. The shape of triangle is considered in random sample selection in order to make the sample selection reasonable. A new coordinate system transformation algorithm presented in this paper is used to avoid the singularity. The whole rotation transformation between the two coordinate systems can be solved by twice rotations expressed by Euler angle vector, each rotation has explicit physical means. Both simulation and real data are used to prove the correctness and validity of this mosaic method. This method has better noise immunity due to its robust estimation property, and has high accuracy as the shape constraint is added to random sample and the data normalization added to the absolute orientation. This method is applicable for high precision measurement of three-dimensional surface and also for the 3-D terrain mosaic.
Anti-Bullying Practices in American Schools: Perspectives of School Psychologists
ERIC Educational Resources Information Center
Sherer, Yiping C.; Nickerson, Amanda B.
2010-01-01
A random sample of 213 school psychologists working in a school setting completed a survey on their schools' current anti-bullying practices. Talking with bullies following bullying incidents, disciplinary consequences for bullies, and increasing adult supervision were the three most frequently used strategies. Peer juries/court, an anti-bullying…
Perceptions of the Performance of Community College Faculty: Dissertation Research Findings.
ERIC Educational Resources Information Center
Vickers, Mozelle Carver
A sample of 30 instructors nominated as effective teachers to the Piper Foundation and 31 randomly selected control instructors from the same 14 Texas colleges were evaluated by students, former students, administrators, peers, and the instructors themselves. The research instrument incorporated ten sets of characteristics expressed as polar…
We assessed the extent and characteristics of geographically isolated wetlands (i.e., wetlands completely surrounded by upland) in a series of drainage basins in the urban northeast U.S. We employed a random sampling design that stratifies study sites according to their degree o...
Adolescent School Experiences and Dropout, Adolescent Pregnancy, and Young Adult Deviant Behavior.
ERIC Educational Resources Information Center
Kasen, Stephanie; Cohen, Patricia; Brook, Judith S.
1998-01-01
This study examined predictability of inappropriate behavior in a random sample of 452 adolescents. Behaviors examined included dropping out, teen pregnancy, criminal activities and conviction, antisocial personality disorder, and alcohol abuse. Found that academic achievement and aspirations, and learning-focused school settings related to…
The Relationship Between Self Concept and Marital Adjustment.
ERIC Educational Resources Information Center
Hall, William M., Jr.; Valine, Warren J.
The purpose of this study was to investigate the relationship between self concept and marital adjustment for married students and their spouses in a commuter college setting. The sample consisted of a random selection of 50 "both spouses commuting" couples, 50 "husband only commuting" couples, and 50 "wife only…
Scott, J.C.
1990-01-01
Computer software was written to randomly select sites for a ground-water-quality sampling network. The software uses digital cartographic techniques and subroutines from a proprietary geographic information system. The report presents the approaches, computer software, and sample applications. It is often desirable to collect ground-water-quality samples from various areas in a study region that have different values of a spatial characteristic, such as land-use or hydrogeologic setting. A stratified network can be used for testing hypotheses about relations between spatial characteristics and water quality, or for calculating statistical descriptions of water-quality data that account for variations that correspond to the spatial characteristic. In the software described, a study region is subdivided into areal subsets that have a common spatial characteristic to stratify the population into several categories from which sampling sites are selected. Different numbers of sites may be selected from each category of areal subsets. A population of potential sampling sites may be defined by either specifying a fixed population of existing sites, or by preparing an equally spaced population of potential sites. In either case, each site is identified with a single category, depending on the value of the spatial characteristic of the areal subset in which the site is located. Sites are selected from one category at a time. One of two approaches may be used to select sites. Sites may be selected randomly, or the areal subsets in the category can be grouped into cells and sites selected randomly from each cell.
Effect of Setting Time on the Shear Bond Strength Between Biodentine and Composite
2015-06-01
Methods: Sample cylinders (n=134) and Biodentine capsules were randomly assigned to groups based on the setting time allowed for Biodentine (Group 1...15 minutes, Group 2 = 1 hour, Group 3 = 24 hours, Group 4 = 2 weeks). Biodentine was prepared and placed in the wells of the acrylic cylinders and...widely used as a temporary intracanal medicament during root canal therapy, as a liner , and for direct and indirect pulp capping procedures. Although
Empowering nurses for work engagement and health in hospital settings.
Laschinger, Heather K Spence; Finegan, Joan
2005-10-01
Employee empowerment has become an increasingly important factor in determining employee health and wellbeing in restructured healthcare settings. The authors tested a theoretical model which specified the relationships among structural empowerment, 6 areas of worklife that promote employee engagement, and staff nurses' physical and mental health. A predictive, non-experimental design was used to test the model in a random sample of staff nurses. The authors discuss their findings and the implication for nurse administrators.
Sample size determination for GEE analyses of stepped wedge cluster randomized trials.
Li, Fan; Turner, Elizabeth L; Preisser, John S
2018-06-19
In stepped wedge cluster randomized trials, intact clusters of individuals switch from control to intervention from a randomly assigned period onwards. Such trials are becoming increasingly popular in health services research. When a closed cohort is recruited from each cluster for longitudinal follow-up, proper sample size calculation should account for three distinct types of intraclass correlations: the within-period, the inter-period, and the within-individual correlations. Setting the latter two correlation parameters to be equal accommodates cross-sectional designs. We propose sample size procedures for continuous and binary responses within the framework of generalized estimating equations that employ a block exchangeable within-cluster correlation structure defined from the distinct correlation types. For continuous responses, we show that the intraclass correlations affect power only through two eigenvalues of the correlation matrix. We demonstrate that analytical power agrees well with simulated power for as few as eight clusters, when data are analyzed using bias-corrected estimating equations for the correlation parameters concurrently with a bias-corrected sandwich variance estimator. © 2018, The International Biometric Society.
Training set optimization under population structure in genomic selection.
Isidro, Julio; Jannink, Jean-Luc; Akdemir, Deniz; Poland, Jesse; Heslot, Nicolas; Sorrells, Mark E
2015-01-01
Population structure must be evaluated before optimization of the training set population. Maximizing the phenotypic variance captured by the training set is important for optimal performance. The optimization of the training set (TRS) in genomic selection has received much interest in both animal and plant breeding, because it is critical to the accuracy of the prediction models. In this study, five different TRS sampling algorithms, stratified sampling, mean of the coefficient of determination (CDmean), mean of predictor error variance (PEVmean), stratified CDmean (StratCDmean) and random sampling, were evaluated for prediction accuracy in the presence of different levels of population structure. In the presence of population structure, the most phenotypic variation captured by a sampling method in the TRS is desirable. The wheat dataset showed mild population structure, and CDmean and stratified CDmean methods showed the highest accuracies for all the traits except for test weight and heading date. The rice dataset had strong population structure and the approach based on stratified sampling showed the highest accuracies for all traits. In general, CDmean minimized the relationship between genotypes in the TRS, maximizing the relationship between TRS and the test set. This makes it suitable as an optimization criterion for long-term selection. Our results indicated that the best selection criterion used to optimize the TRS seems to depend on the interaction of trait architecture and population structure.
Gill, C O; McGinnis, J C; Bryant, J
1998-07-21
The microbiological effects on the product of the series of operations for skinning the hindquarters of beef carcasses at three packing plants were assessed. Samples were obtained at each plant from randomly selected carcasses, by swabbing specified sites related to opening cuts, rump skinning or flank skinning operations, randomly selected sites along the lines of the opening cuts, or randomly selected sites on the skinned hindquarters of carcasses. A set of 25 samples of each type was collected at each plant, with the collection of a single sample from each selected carcass. Aerobic counts, coliforms and Escherichia coli were enumerated in each sample, and a log mean value was estimated for each set of 25 counts on the assumption of a log normal distribution of the counts. The data indicated that the hindquarters skinning operations at plant A were hygienically inferior to those at the other two plants, with mean numbers of coliforms and E. coli being about two orders of magnitude greater, and aerobic counts being an order of magnitude greater on the skinned hindquarters of carcasses from plant A than on those from plants B or C. The data further indicated that the operation for cutting open the skin at plant C was hygienically superior to the equivalent operation at plant B, but that the operations for skinning the rump and flank at plant B were hygienically superior to the equivalent operations at plant C. The findings suggest that objective assessment of the microbiological effects on carcasses of beef carcass dressing processes will be required to ensure that Hazard Analysis: Critical Control Point and Quality Management Systems are operated to control the microbiological condition of carcasses.
2012-01-01
Background With the current focus on personalized medicine, patient/subject level inference is often of key interest in translational research. As a result, random effects models (REM) are becoming popular for patient level inference. However, for very large data sets that are characterized by large sample size, it can be difficult to fit REM using commonly available statistical software such as SAS since they require inordinate amounts of computer time and memory allocations beyond what are available preventing model convergence. For example, in a retrospective cohort study of over 800,000 Veterans with type 2 diabetes with longitudinal data over 5 years, fitting REM via generalized linear mixed modeling using currently available standard procedures in SAS (e.g. PROC GLIMMIX) was very difficult and same problems exist in Stata’s gllamm or R’s lme packages. Thus, this study proposes and assesses the performance of a meta regression approach and makes comparison with methods based on sampling of the full data. Data We use both simulated and real data from a national cohort of Veterans with type 2 diabetes (n=890,394) which was created by linking multiple patient and administrative files resulting in a cohort with longitudinal data collected over 5 years. Methods and results The outcome of interest was mean annual HbA1c measured over a 5 years period. Using this outcome, we compared parameter estimates from the proposed random effects meta regression (REMR) with estimates based on simple random sampling and VISN (Veterans Integrated Service Networks) based stratified sampling of the full data. Our results indicate that REMR provides parameter estimates that are less likely to be biased with tighter confidence intervals when the VISN level estimates are homogenous. Conclusion When the interest is to fit REM in repeated measures data with very large sample size, REMR can be used as a good alternative. It leads to reasonable inference for both Gaussian and non-Gaussian responses if parameter estimates are homogeneous across VISNs. PMID:23095325
Su, Ruiliang; Chen, Xiang; Cao, Shuai; Zhang, Xu
2016-01-14
Sign language recognition (SLR) has been widely used for communication amongst the hearing-impaired and non-verbal community. This paper proposes an accurate and robust SLR framework using an improved decision tree as the base classifier of random forests. This framework was used to recognize Chinese sign language subwords using recordings from a pair of portable devices worn on both arms consisting of accelerometers (ACC) and surface electromyography (sEMG) sensors. The experimental results demonstrated the validity of the proposed random forest-based method for recognition of Chinese sign language (CSL) subwords. With the proposed method, 98.25% average accuracy was obtained for the classification of a list of 121 frequently used CSL subwords. Moreover, the random forests method demonstrated a superior performance in resisting the impact of bad training samples. When the proportion of bad samples in the training set reached 50%, the recognition error rate of the random forest-based method was only 10.67%, while that of a single decision tree adopted in our previous work was almost 27.5%. Our study offers a practical way of realizing a robust and wearable EMG-ACC-based SLR systems.
Rekully, Cameron M; Faulkner, Stefan T; Lachenmyer, Eric M; Cunningham, Brady R; Shaw, Timothy J; Richardson, Tammi L; Myrick, Michael L
2018-03-01
An all-pairs method is used to analyze phytoplankton fluorescence excitation spectra. An initial set of nine phytoplankton species is analyzed in pairwise fashion to select two optical filter sets, and then the two filter sets are used to explore variations among a total of 31 species in a single-cell fluorescence imaging photometer. Results are presented in terms of pair analyses; we report that 411 of the 465 possible pairings of the larger group of 31 species can be distinguished using the initial nine-species-based selection of optical filters. A bootstrap analysis based on the larger data set shows that the distribution of possible pair separation results based on a randomly selected nine-species initial calibration set is strongly peaked in the 410-415 pair separation range, consistent with our experimental result. Further, the result for filter selection using all 31 species is also 411 pair separations; The set of phytoplankton fluorescence excitation spectra is intuitively high in rank due to the number and variety of pigments that contribute to the spectrum. However, the results in this report are consistent with an effective rank as determined by a variety of heuristic and statistical methods in the range of 2-3. These results are reviewed in consideration of how consistent the filter selections are from model to model for the data presented here. We discuss the common observation that rank is generally found to be relatively low even in many seemingly complex circumstances, so that it may be productive to assume a low rank from the beginning. If a low-rank hypothesis is valid, then relatively few samples are needed to explore an experimental space. Under very restricted circumstances for uniformly distributed samples, the minimum number for an initial analysis might be as low as 8-11 random samples for 1-3 factors.
Shannon, Casey P; Chen, Virginia; Takhar, Mandeep; Hollander, Zsuzsanna; Balshaw, Robert; McManus, Bruce M; Tebbutt, Scott J; Sin, Don D; Ng, Raymond T
2016-11-14
Gene network inference (GNI) algorithms can be used to identify sets of coordinately expressed genes, termed network modules from whole transcriptome gene expression data. The identification of such modules has become a popular approach to systems biology, with important applications in translational research. Although diverse computational and statistical approaches have been devised to identify such modules, their performance behavior is still not fully understood, particularly in complex human tissues. Given human heterogeneity, one important question is how the outputs of these computational methods are sensitive to the input sample set, or stability. A related question is how this sensitivity depends on the size of the sample set. We describe here the SABRE (Similarity Across Bootstrap RE-sampling) procedure for assessing the stability of gene network modules using a re-sampling strategy, introduce a novel criterion for identifying stable modules, and demonstrate the utility of this approach in a clinically-relevant cohort, using two different gene network module discovery algorithms. The stability of modules increased as sample size increased and stable modules were more likely to be replicated in larger sets of samples. Random modules derived from permutated gene expression data were consistently unstable, as assessed by SABRE, and provide a useful baseline value for our proposed stability criterion. Gene module sets identified by different algorithms varied with respect to their stability, as assessed by SABRE. Finally, stable modules were more readily annotated in various curated gene set databases. The SABRE procedure and proposed stability criterion may provide guidance when designing systems biology studies in complex human disease and tissues.
Disk Density Tuning of a Maximal Random Packing
Ebeida, Mohamed S.; Rushdi, Ahmad A.; Awad, Muhammad A.; Mahmoud, Ahmed H.; Yan, Dong-Ming; English, Shawn A.; Owens, John D.; Bajaj, Chandrajit L.; Mitchell, Scott A.
2016-01-01
We introduce an algorithmic framework for tuning the spatial density of disks in a maximal random packing, without changing the sizing function or radii of disks. Starting from any maximal random packing such as a Maximal Poisson-disk Sampling (MPS), we iteratively relocate, inject (add), or eject (remove) disks, using a set of three successively more-aggressive local operations. We may achieve a user-defined density, either more dense or more sparse, almost up to the theoretical structured limits. The tuned samples are conflict-free, retain coverage maximality, and, except in the extremes, retain the blue noise randomness properties of the input. We change the density of the packing one disk at a time, maintaining the minimum disk separation distance and the maximum domain coverage distance required of any maximal packing. These properties are local, and we can handle spatially-varying sizing functions. Using fewer points to satisfy a sizing function improves the efficiency of some applications. We apply the framework to improve the quality of meshes, removing non-obtuse angles; and to more accurately model fiber reinforced polymers for elastic and failure simulations. PMID:27563162
Disk Density Tuning of a Maximal Random Packing.
Ebeida, Mohamed S; Rushdi, Ahmad A; Awad, Muhammad A; Mahmoud, Ahmed H; Yan, Dong-Ming; English, Shawn A; Owens, John D; Bajaj, Chandrajit L; Mitchell, Scott A
2016-08-01
We introduce an algorithmic framework for tuning the spatial density of disks in a maximal random packing, without changing the sizing function or radii of disks. Starting from any maximal random packing such as a Maximal Poisson-disk Sampling (MPS), we iteratively relocate, inject (add), or eject (remove) disks, using a set of three successively more-aggressive local operations. We may achieve a user-defined density, either more dense or more sparse, almost up to the theoretical structured limits. The tuned samples are conflict-free, retain coverage maximality, and, except in the extremes, retain the blue noise randomness properties of the input. We change the density of the packing one disk at a time, maintaining the minimum disk separation distance and the maximum domain coverage distance required of any maximal packing. These properties are local, and we can handle spatially-varying sizing functions. Using fewer points to satisfy a sizing function improves the efficiency of some applications. We apply the framework to improve the quality of meshes, removing non-obtuse angles; and to more accurately model fiber reinforced polymers for elastic and failure simulations.
Unbiased feature selection in learning random forests for high-dimensional data.
Nguyen, Thanh-Tung; Huang, Joshua Zhexue; Nguyen, Thuy Thi
2015-01-01
Random forests (RFs) have been widely used as a powerful classification method. However, with the randomization in both bagging samples and feature selection, the trees in the forest tend to select uninformative features for node splitting. This makes RFs have poor accuracy when working with high-dimensional data. Besides that, RFs have bias in the feature selection process where multivalued features are favored. Aiming at debiasing feature selection in RFs, we propose a new RF algorithm, called xRF, to select good features in learning RFs for high-dimensional data. We first remove the uninformative features using p-value assessment, and the subset of unbiased features is then selected based on some statistical measures. This feature subset is then partitioned into two subsets. A feature weighting sampling technique is used to sample features from these two subsets for building trees. This approach enables one to generate more accurate trees, while allowing one to reduce dimensionality and the amount of data needed for learning RFs. An extensive set of experiments has been conducted on 47 high-dimensional real-world datasets including image datasets. The experimental results have shown that RFs with the proposed approach outperformed the existing random forests in increasing the accuracy and the AUC measures.
Macroscopic damping model for structural dynamics with random polycrystalline configurations
NASA Astrophysics Data System (ADS)
Yang, Yantao; Cui, Junzhi; Yu, Yifan; Xiang, Meizhen
2018-06-01
In this paper the macroscopic damping model for dynamical behavior of the structures with random polycrystalline configurations at micro-nano scales is established. First, the global motion equation of a crystal is decomposed into a set of motion equations with independent single degree of freedom (SDOF) along normal discrete modes, and then damping behavior is introduced into each SDOF motion. Through the interpolation of discrete modes, the continuous representation of damping effects for the crystal is obtained. Second, from energy conservation law the expression of the damping coefficient is derived, and the approximate formula of damping coefficient is given. Next, the continuous damping coefficient for polycrystalline cluster is expressed, the continuous dynamical equation with damping term is obtained, and then the concrete damping coefficients for a polycrystalline Cu sample are shown. Finally, by using statistical two-scale homogenization method, the macroscopic homogenized dynamical equation containing damping term for the structures with random polycrystalline configurations at micro-nano scales is set up.
Parental bonding in men with alcohol disorders: a relationship with conduct disorder.
Joyce, P R; Sellman, D; Wells, E; Frampton, C M; Bushnell, J A; Oakley-Browne, M; Hornblow, A R
1994-09-01
Men from a clinical treatment setting suffering from alcohol dependence, and randomly selected men from the community diagnosed as having alcohol abuse and/or dependence, completed the Parental Bonding Instrument. The men from the alcohol treatment setting perceived both parents as having been uncaring and overprotective. In the general population sample, an uncaring and overprotective parental style was strongly associated with childhood conduct disorder, but not with alcohol disorder symptoms. This discrepancy in perceived parenting highlights the difficulties in extrapolating findings about aetiological factors for alcohol disorders from clinical samples. It also suggests that childhood conduct disorder and adult antisocial behaviour could influence which men with alcohol disorders receive inpatient treatment.
Blessing of dimensionality: mathematical foundations of the statistical physics of data.
Gorban, A N; Tyukin, I Y
2018-04-28
The concentrations of measure phenomena were discovered as the mathematical background to statistical mechanics at the end of the nineteenth/beginning of the twentieth century and have been explored in mathematics ever since. At the beginning of the twenty-first century, it became clear that the proper utilization of these phenomena in machine learning might transform the curse of dimensionality into the blessing of dimensionality This paper summarizes recently discovered phenomena of measure concentration which drastically simplify some machine learning problems in high dimension, and allow us to correct legacy artificial intelligence systems. The classical concentration of measure theorems state that i.i.d. random points are concentrated in a thin layer near a surface (a sphere or equators of a sphere, an average or median-level set of energy or another Lipschitz function, etc.). The new stochastic separation theorems describe the thin structure of these thin layers: the random points are not only concentrated in a thin layer but are all linearly separable from the rest of the set, even for exponentially large random sets. The linear functionals for separation of points can be selected in the form of the linear Fisher's discriminant. All artificial intelligence systems make errors. Non-destructive correction requires separation of the situations (samples) with errors from the samples corresponding to correct behaviour by a simple and robust classifier. The stochastic separation theorems provide us with such classifiers and determine a non-iterative (one-shot) procedure for their construction.This article is part of the theme issue 'Hilbert's sixth problem'. © 2018 The Author(s).
Blessing of dimensionality: mathematical foundations of the statistical physics of data
NASA Astrophysics Data System (ADS)
Gorban, A. N.; Tyukin, I. Y.
2018-04-01
The concentrations of measure phenomena were discovered as the mathematical background to statistical mechanics at the end of the nineteenth/beginning of the twentieth century and have been explored in mathematics ever since. At the beginning of the twenty-first century, it became clear that the proper utilization of these phenomena in machine learning might transform the curse of dimensionality into the blessing of dimensionality. This paper summarizes recently discovered phenomena of measure concentration which drastically simplify some machine learning problems in high dimension, and allow us to correct legacy artificial intelligence systems. The classical concentration of measure theorems state that i.i.d. random points are concentrated in a thin layer near a surface (a sphere or equators of a sphere, an average or median-level set of energy or another Lipschitz function, etc.). The new stochastic separation theorems describe the thin structure of these thin layers: the random points are not only concentrated in a thin layer but are all linearly separable from the rest of the set, even for exponentially large random sets. The linear functionals for separation of points can be selected in the form of the linear Fisher's discriminant. All artificial intelligence systems make errors. Non-destructive correction requires separation of the situations (samples) with errors from the samples corresponding to correct behaviour by a simple and robust classifier. The stochastic separation theorems provide us with such classifiers and determine a non-iterative (one-shot) procedure for their construction. This article is part of the theme issue `Hilbert's sixth problem'.
Mahar, Benazeer; Kumar, Ramesh; Rizvi, Narjis; Bahalkani, Habib Akhtar; Haq, Mahboobul; Soomro, Jamila
2012-01-01
Information, education and communication (IEC) by health care provider to pregnant woman during the antenatal visit are very crucial for healthier outcome of pregnancy. This study analysed the quality and quantity of antenatal visit at a private and a public hospital of Bahawalpur, Pakistan. An exit interview was conducted from 216 pregnant women by using validated, reliable and pre-tested adapted questionnaire. First sample was selected by simple random sampling, for rest of the sample selection systematic random sampling was adapted by selecting every 7th women for interview. Ethical considerations were taken. Average communication time among pregnant woman and her healthcare provider was 3 minute in public and 8 minutes in private hospital. IEC mainly focused on diet and nutrition in private (86%) and (53%) public, advice for family planning after delivery was discussed with 13% versus 7% in public and private setting. None of the respondents in both facilities got advice or counselling on breastfeeding and neonatal care. Birth preparedness components were discussed, woman in public and private hospital respectively. In both settings antenatal clients were not received information and education communication according to World Health Organization guidelines. Quality and quantity of IEC during antenatal care was found very poor in both public and private sector hospitals of urban Pakistan.
Investigation of spectral analysis techniques for randomly sampled velocimetry data
NASA Technical Reports Server (NTRS)
Sree, Dave
1993-01-01
It is well known that velocimetry (LV) generates individual realization velocity data that are randomly or unevenly sampled in time. Spectral analysis of such data to obtain the turbulence spectra, and hence turbulence scales information, requires special techniques. The 'slotting' technique of Mayo et al, also described by Roberts and Ajmani, and the 'Direct Transform' method of Gaster and Roberts are well known in the LV community. The slotting technique is faster than the direct transform method in computation. There are practical limitations, however, as to how a high frequency and accurate estimate can be made for a given mean sampling rate. These high frequency estimates are important in obtaining the microscale information of turbulence structure. It was found from previous studies that reliable spectral estimates can be made up to about the mean sampling frequency (mean data rate) or less. If the data were evenly samples, the frequency range would be half the sampling frequency (i.e. up to Nyquist frequency); otherwise, aliasing problem would occur. The mean data rate and the sample size (total number of points) basically limit the frequency range. Also, there are large variabilities or errors associated with the high frequency estimates from randomly sampled signals. Roberts and Ajmani proposed certain pre-filtering techniques to reduce these variabilities, but at the cost of low frequency estimates. The prefiltering acts as a high-pass filter. Further, Shapiro and Silverman showed theoretically that, for Poisson sampled signals, it is possible to obtain alias-free spectral estimates far beyond the mean sampling frequency. But the question is, how far? During his tenure under 1993 NASA-ASEE Summer Faculty Fellowship Program, the author investigated from his studies on the spectral analysis techniques for randomly sampled signals that the spectral estimates can be enhanced or improved up to about 4-5 times the mean sampling frequency by using a suitable prefiltering technique. But, this increased bandwidth comes at the cost of the lower frequency estimates. The studies further showed that large data sets of the order of 100,000 points, or more, high data rates, and Poisson sampling are very crucial for obtaining reliable spectral estimates from randomly sampled data, such as LV data. Some of the results of the current study are presented.
Exclusion in Schools in Northern Ireland: The Pupils' Voice
ERIC Educational Resources Information Center
Knipe, Damian; Reynolds, Margaret; Milner, Sharon
2007-01-01
The Department of Education in Northern Ireland has been reviewing the procedures for suspending and expelling pupils from school. This article reports the views of a random sample of 114 children (11-16 years) towards the proposed changes. Pupils' thoughts on: dealing with misbehaviour; setting rules; the decision-making process; appropriate…
Assimilating and Following through with Nutritional Recommendations by Adolescents
ERIC Educational Resources Information Center
Pich, Jordi; Ballester, Lluis; Thomas, Monica; Canals, Ramon; Tur, Josep A.
2011-01-01
Objective: To investigate the relationship between knowledge about a healthy diet and the actual food consumption habits of adolescents. Design: A survey of several food-related aspects applied to a representative sample of adolescents. Setting: One thousand, six hundred and sixty three individuals aged 11 to 18 from 40 schools randomly selected…
The Impact of Marketing Actions on Relationship Quality in the Higher Education Sector in Jordan
ERIC Educational Resources Information Center
Al-Alak, Basheer A. M.
2006-01-01
This field/analytical study examined the marketing actions (antecedents) and performance (consequences) of relationship quality in a higher education setting. To analyze data collected from a random sample of 271 undergraduate students at AL-Zaytoonah Private University of Jordan, the linear structural relationship (LISREL) model was used to…
Drinking and Driving PSAs: A Content Analysis of Behavioral Influence Strategies.
ERIC Educational Resources Information Center
Slater, Michael D.
1999-01-01
Study randomly samples 66 drinking and driving television public service announcements that were then coded using a categorical and dimensional scheme. Data set reveals that informational/testimonial messages made up almost half of the total; positive appeals were the next most common, followed by empathy, fear, and modeling appeals. (Contains 34…
Social Responsibility and Corporate Web Pages: Self-Presentation or Agenda-Setting?
ERIC Educational Resources Information Center
Esrock, Stuart L.; Leichty, Greg B.
1998-01-01
Examines how corporate entities use the Web to present themselves as socially responsible citizens and to advance policy positions. Samples randomly "Fortune 500" companies, revealing that, although 90% had Web pages and 82% of the sites addressed a corporate social responsibility issue, few corporations used their pages to monitor…
Fruit and Vegetable Intake among Urban Community Gardeners
ERIC Educational Resources Information Center
Alaimo, Katherine; Packnett, Elizabeth; Miles, Richard A.; Kruger, Daniel J.
2008-01-01
Objective: To determine the association between household participation in a community garden and fruit and vegetable consumption among urban adults. Design: Data were analyzed from a cross-sectional random phone survey conducted in 2003. A quota sampling strategy was used to ensure that all census tracts within the city were represented. Setting:…
Adolescent Pregnancy in an Urban Environment: Issues, Programs, and Evaluation.
ERIC Educational Resources Information Center
Hardy, Janet B.; Zabin, Laurie Schwab
An in-depth discussion of national and local statistics regarding teenage and adolescent pregnancy and the developmental issues involved opens this analysis. Problems and adverse consequences of adolescent pregnancy in an urban setting are explored using a city-wide random sample of adolescent births. A model pregnancy and parenting program and…
PERSONAL AND CIRCUMSTANTIAL FACTORS INFLUENCING THE ACT OF DISCOVERY.
ERIC Educational Resources Information Center
OSTRANDER, EDWARD R.
HOW STUDENTS SAY THEY LEARN WAS INVESTIGATED. INTERVIEWS WITH A RANDOM SAMPLE OF 74 WOMEN STUDENTS POSED QUESTIONS ABOUT THE NATURE, FREQUENCY, PATTERNS, AND CIRCUMSTANCES UNDER WHICH ACTS OF DISCOVERY TAKE PLACE IN THE ACADEMIC SETTING. STUDENTS WERE ASSIGNED DISCOVERY RATINGS BASED ON READINGS OF TYPESCRIPTS. EACH STUDENT WAS CLASSIFIED AND…
Height as a Measure of Success in Academe.
ERIC Educational Resources Information Center
Hensley, Wayne E.
This paper presents the results of two studies at a large mid-Atlantic university that examined the height/success paradigm within the context of the university settings. Specifically, are the trends observed among taller persons in police and sales work equally valid for university professors? A random sample of faculty (N=90), revealed that…
Biomarker Evaluation Does Not Confirm Efficacy of Computer-Tailored Nutrition Education
ERIC Educational Resources Information Center
Kroeze, Willemieke; Dagnelie, Pieter C.; Heymans, Martijn W.; Oenema, Anke; Brug, Johannes
2011-01-01
Objective: To evaluate the efficacy of computer-tailored nutrition education with objective outcome measures. Design: A 3-group randomized, controlled trial with posttests at 1 and 6 months post-intervention. Setting: Worksites and 2 neighborhoods in the urban area of Rotterdam. Participants: A convenience sample of healthy Dutch adults (n = 442).…
Conflict in the Workplace: Social Workers as Victims and Perpetrators
ERIC Educational Resources Information Center
Ringstad, Robin
2005-01-01
Conflict and violence in the workplace have emerged as a real but inadequately explored concern in the social work profession. The present study surveyed a national random sample of 1,029 NASW members about their experiences with client violence and with physical and psychological assault in relationship to practice setting, age, gender, and…
An Investigation of the Difficulties Faced by EFL Undergraduates in Speaking Skills
ERIC Educational Resources Information Center
Al-Jamal, Dina A.; Al-Jamal, Ghadeer A.
2014-01-01
Since speaking well in English is crucial for English language literature undergraduates, the present study aimed at describing difficulties that may be encountered at an EFL setting. The sample was stratified random as drawn from six Jordanian public universities. Survey questionnaires as well as semi-structured interviews were constructed. 64…
Exploring Bullying: An Early Childhood Perspective from Mainland China
ERIC Educational Resources Information Center
Arndt, Janet S.; Luo, Nili
2008-01-01
This article explores bullying in mainland China. The authors conducted a study to determine the existence of a problem with bullying in younger Chinese children. Samples included 40 randomly selected, early childhood educators serving children ages 2 through 6, located in 10 different urban school settings along the Yangzi River. The authors…
Predictors of Career Adaptability Skill among Higher Education Students in Nigeria
ERIC Educational Resources Information Center
Ebenehi, Amos Shaibu; Rashid, Abdullah Mat; Bakar, Ab Rahim
2016-01-01
This paper examined predictors of career adaptability skill among higher education students in Nigeria. A sample of 603 higher education students randomly selected from six colleges of education in Nigeria participated in this study. A set of self-reported questionnaire was used for data collection, and multiple linear regression analysis was used…
On the Analysis of Case-Control Studies in Cluster-correlated Data Settings.
Haneuse, Sebastien; Rivera-Rodriguez, Claudia
2018-01-01
In resource-limited settings, long-term evaluation of national antiretroviral treatment (ART) programs often relies on aggregated data, the analysis of which may be subject to ecological bias. As researchers and policy makers consider evaluating individual-level outcomes such as treatment adherence or mortality, the well-known case-control design is appealing in that it provides efficiency gains over random sampling. In the context that motivates this article, valid estimation and inference requires acknowledging any clustering, although, to our knowledge, no statistical methods have been published for the analysis of case-control data for which the underlying population exhibits clustering. Furthermore, in the specific context of an ongoing collaboration in Malawi, rather than performing case-control sampling across all clinics, case-control sampling within clinics has been suggested as a more practical strategy. To our knowledge, although similar outcome-dependent sampling schemes have been described in the literature, a case-control design specific to correlated data settings is new. In this article, we describe this design, discuss balanced versus unbalanced sampling techniques, and provide a general approach to analyzing case-control studies in cluster-correlated settings based on inverse probability-weighted generalized estimating equations. Inference is based on a robust sandwich estimator with correlation parameters estimated to ensure appropriate accounting of the outcome-dependent sampling scheme. We conduct comprehensive simulations, based in part on real data on a sample of N = 78,155 program registrants in Malawi between 2005 and 2007, to evaluate small-sample operating characteristics and potential trade-offs associated with standard case-control sampling or when case-control sampling is performed within clusters.
Rigorously testing multialternative decision field theory against random utility models.
Berkowitsch, Nicolas A J; Scheibehenne, Benjamin; Rieskamp, Jörg
2014-06-01
Cognitive models of decision making aim to explain the process underlying observed choices. Here, we test a sequential sampling model of decision making, multialternative decision field theory (MDFT; Roe, Busemeyer, & Townsend, 2001), on empirical grounds and compare it against 2 established random utility models of choice: the probit and the logit model. Using a within-subject experimental design, participants in 2 studies repeatedly choose among sets of options (consumer products) described on several attributes. The results of Study 1 showed that all models predicted participants' choices equally well. In Study 2, in which the choice sets were explicitly designed to distinguish the models, MDFT had an advantage in predicting the observed choices. Study 2 further revealed the occurrence of multiple context effects within single participants, indicating an interdependent evaluation of choice options and correlations between different context effects. In sum, the results indicate that sequential sampling models can provide relevant insights into the cognitive process underlying preferential choices and thus can lead to better choice predictions. PsycINFO Database Record (c) 2014 APA, all rights reserved.
Accounting for selection bias in association studies with complex survey data.
Wirth, Kathleen E; Tchetgen Tchetgen, Eric J
2014-05-01
Obtaining representative information from hidden and hard-to-reach populations is fundamental to describe the epidemiology of many sexually transmitted diseases, including HIV. Unfortunately, simple random sampling is impractical in these settings, as no registry of names exists from which to sample the population at random. However, complex sampling designs can be used, as members of these populations tend to congregate at known locations, which can be enumerated and sampled at random. For example, female sex workers may be found at brothels and street corners, whereas injection drug users often come together at shooting galleries. Despite the logistical appeal, complex sampling schemes lead to unequal probabilities of selection, and failure to account for this differential selection can result in biased estimates of population averages and relative risks. However, standard techniques to account for selection can lead to substantial losses in efficiency. Consequently, researchers implement a variety of strategies in an effort to balance validity and efficiency. Some researchers fully or partially account for the survey design, whereas others do nothing and treat the sample as a realization of the population of interest. We use directed acyclic graphs to show how certain survey sampling designs, combined with subject-matter considerations unique to individual exposure-outcome associations, can induce selection bias. Finally, we present a novel yet simple maximum likelihood approach for analyzing complex survey data; this approach optimizes statistical efficiency at no cost to validity. We use simulated data to illustrate this method and compare it with other analytic techniques.
A Bayesian Approach to the Paleomagnetic Conglomerate Test
NASA Astrophysics Data System (ADS)
Heslop, David; Roberts, Andrew P.
2018-02-01
The conglomerate test has served the paleomagnetic community for over 60 years as a means to detect remagnetizations. The test states that if a suite of clasts within a bed have uniformly random paleomagnetic directions, then the conglomerate cannot have experienced a pervasive event that remagnetized the clasts in the same direction. The current form of the conglomerate test is based on null hypothesis testing, which results in a binary "pass" (uniformly random directions) or "fail" (nonrandom directions) outcome. We have recast the conglomerate test in a Bayesian framework with the aim of providing more information concerning the level of support a given data set provides for a hypothesis of uniformly random paleomagnetic directions. Using this approach, we place the conglomerate test in a fully probabilistic framework that allows for inconclusive results when insufficient information is available to draw firm conclusions concerning the randomness or nonrandomness of directions. With our method, sample sets larger than those typically employed in paleomagnetism may be required to achieve strong support for a hypothesis of random directions. Given the potentially detrimental effect of unrecognized remagnetizations on paleomagnetic reconstructions, it is important to provide a means to draw statistically robust data-driven inferences. Our Bayesian analysis provides a means to do this for the conglomerate test.
Ortiz, Glorimar; Schacht, Lucille
2012-01-01
Measurement of consumers' satisfaction in psychiatric settings is important because it has been correlated with improved clinical outcomes and administrative measures of high-quality care. These consumer satisfaction measurements are actively used as performance measures required by the accreditation process and for quality improvement activities. Our objectives were (i) to re-evaluate, through exploratory factor analysis (EFA) and confirmatory factor analysis (CFA), the structure of an instrument intended to measure consumers' satisfaction with care in psychiatric settings and (ii) to examine and publish the psychometric characteristics, validity and reliability, of the Inpatient Consumer Survey (ICS). To psychometrically test the structure of the ICS, 34 878 survey results, submitted by 90 psychiatric hospitals in 2008, were extracted from the Behavioral Healthcare Performance Measurement System (BHPMS). Basic descriptive item-response and correlation analyses were performed for total surveys. Two datasets were randomly created for analysis. A random sample of 8229 survey results was used for EFA. Another random sample of 8261 consumer survey results was used for CFA. This same sample was used to perform validity and reliability analyses. The item-response analysis showed that the mean range for a disagree/agree five-point scale was 3.10-3.94. Correlation analysis showed a strong relationship between items. Six domains (dignity, rights, environment, empowerment, participation, and outcome) with internal reliabilities between good to moderate (0.87-0.73) were shown to be related to overall care satisfaction. Overall reliability for the instrument was excellent (0.94). Results from CFA provided support for the domains structure of the ICS proposed through EFA. The overall findings from this study provide evidence that the ICS is a reliable measure of consumer satisfaction in psychiatric inpatient settings. The analysis has shown the ICS to provide valid and reliable results and to focus on the specific concerns of consumers of psychiatric inpatient care. Scores by item indicate that opportunity for improvement exists across healthcare organizations.
Factors influencing research productivity among health sciences librarians.
Fenske, R E; Dalrymple, P W
1992-01-01
Secondary analysis was performed of data collected in 1989 from a random sample of members of the Medical Library Association. Results show that about half the sample had at least one publication; academic health sciences librarians were much more likely than hospital librarians to have published. Almost half the sample had taken formal courses in research, but only a small percentage had taken continuing education (CE) courses in research. Institutional support services for research were most available in academic settings. The combination of institutional support, CE training, and research courses explained 31.1% of the variation in research productivity among academic librarians; these factors were less important in hospitals and other institutional settings. The authors suggest that health sciences librarians working outside academia should seek support for research from sources outside the employing institution. PMID:1422506
Dishman, Rod K; Vandenberg, Robert J; Motl, Robert W; Wilson, Mark G; DeJoy, David M
2010-08-01
The effectiveness of an intervention depends on its dose and on moderators of dose, which usually are not studied. The purpose of the study is to determine whether goal setting and theory-based moderators of goal setting had dose relations with increases in goal-related physical activity during a successful workplace intervention. A group-randomized 12-week intervention that included personal goal setting was implemented in fall 2005, with a multiracial/ethnic sample of employees at 16 geographically diverse worksites. Here, we examined dose-related variables in the cohort of participants (N = 664) from the 8 worksites randomized to the intervention. Participants in the intervention exceeded 9000 daily pedometer steps and 300 weekly minutes of moderate-to-vigorous physical activity (MVPA) during the last 6 weeks of the study, which approximated or exceeded current public health guidelines. Linear growth modeling indicated that participants who set higher goals and sustained higher levels of self-efficacy, commitment and intention about attaining their goals had greater increases in pedometer steps and MVPA. The relation between change in participants' satisfaction with current physical activity and increases in physical activity was mediated by increases in self-set goals. The results show a dose relation of increased physical activity with changes in goal setting, satisfaction, self-efficacy, commitment and intention, consistent with goal-setting theory.
Kraschnewski, Jennifer L; Keyserling, Thomas C; Bangdiwala, Shrikant I; Gizlice, Ziya; Garcia, Beverly A; Johnston, Larry F; Gustafson, Alison; Petrovic, Lindsay; Glasgow, Russell E; Samuel-Hodge, Carmen D
2010-01-01
Studies of type 2 translation, the adaption of evidence-based interventions to real-world settings, should include representative study sites and staff to improve external validity. Sites for such studies are, however, often selected by convenience sampling, which limits generalizability. We used an optimized probability sampling protocol to select an unbiased, representative sample of study sites to prepare for a randomized trial of a weight loss intervention. We invited North Carolina health departments within 200 miles of the research center to participate (N = 81). Of the 43 health departments that were eligible, 30 were interested in participating. To select a representative and feasible sample of 6 health departments that met inclusion criteria, we generated all combinations of 6 from the 30 health departments that were eligible and interested. From the subset of combinations that met inclusion criteria, we selected 1 at random. Of 593,775 possible combinations of 6 counties, 15,177 (3%) met inclusion criteria. Sites in the selected subset were similar to all eligible sites in terms of health department characteristics and county demographics. Optimized probability sampling improved generalizability by ensuring an unbiased and representative sample of study sites.
Vehicle classification in WAMI imagery using deep network
NASA Astrophysics Data System (ADS)
Yi, Meng; Yang, Fan; Blasch, Erik; Sheaff, Carolyn; Liu, Kui; Chen, Genshe; Ling, Haibin
2016-05-01
Humans have always had a keen interest in understanding activities and the surrounding environment for mobility, communication, and survival. Thanks to recent progress in photography and breakthroughs in aviation, we are now able to capture tens of megapixels of ground imagery, namely Wide Area Motion Imagery (WAMI), at multiple frames per second from unmanned aerial vehicles (UAVs). WAMI serves as a great source for many applications, including security, urban planning and route planning. These applications require fast and accurate image understanding which is time consuming for humans, due to the large data volume and city-scale area coverage. Therefore, automatic processing and understanding of WAMI imagery has been gaining attention in both industry and the research community. This paper focuses on an essential step in WAMI imagery analysis, namely vehicle classification. That is, deciding whether a certain image patch contains a vehicle or not. We collect a set of positive and negative sample image patches, for training and testing the detector. Positive samples are 64 × 64 image patches centered on annotated vehicles. We generate two sets of negative images. The first set is generated from positive images with some location shift. The second set of negative patches is generated from randomly sampled patches. We also discard those patches if a vehicle accidentally locates at the center. Both positive and negative samples are randomly divided into 9000 training images and 3000 testing images. We propose to train a deep convolution network for classifying these patches. The classifier is based on a pre-trained AlexNet Model in the Caffe library, with an adapted loss function for vehicle classification. The performance of our classifier is compared to several traditional image classifier methods using Support Vector Machine (SVM) and Histogram of Oriented Gradient (HOG) features. While the SVM+HOG method achieves an accuracy of 91.2%, the accuracy of our deep network-based classifier reaches 97.9%.
Balancing a U-Shaped Assembly Line by Applying Nested Partitions Method
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bhagwat, Nikhil V.
2005-01-01
In this study, we applied the Nested Partitions method to a U-line balancing problem and conducted experiments to evaluate the application. From the results, it is quite evident that the Nested Partitions method provided near optimal solutions (optimal in some cases). Besides, the execution time is quite short as compared to the Branch and Bound algorithm. However, for larger data sets, the algorithm took significantly longer times for execution. One of the reasons could be the way in which the random samples are generated. In the present study, a random sample is a solution in itself which requires assignment ofmore » tasks to various stations. The time taken to assign tasks to stations is directly proportional to the number of tasks. Thus, if the number of tasks increases, the time taken to generate random samples for the different regions also increases. The performance index for the Nested Partitions method in the present study was the number of stations in the random solutions (samples) generated. The total idle time for the samples can be used as another performance index. ULINO method is known to have used a combination of bounds to come up with good solutions. This approach of combining different performance indices can be used to evaluate the random samples and obtain even better solutions. Here, we used deterministic time values for the tasks. In industries where majority of tasks are performed manually, the stochastic version of the problem could be of vital importance. Experimenting with different objective functions (No. of stations was used in this study) could be of some significance to some industries where in the cost associated with creation of a new station is not the same. For such industries, the results obtained by using the present approach will not be of much value. Labor costs, task incompletion costs or a combination of those can be effectively used as alternate objective functions.« less
Effect of Sling Exercise Training on Balance in Patients with Stroke: A Meta-Analysis
Peng, Qiyuan; Chen, Jingjie; Zou, Yucong; Liu, Gang
2016-01-01
Objective This study aims to evaluate the effect of sling exercise training (SET) on balance in patients with stroke. Methods PubMed, Cochrane Library, Ovid LWW, CBM, CNKI, WanFang, and VIP databases were searched for randomized controlled trials of the effect of SET on balance in patients with stroke. The study design and participants were subjected to metrological analysis. Berg balance Scale (BBS), Barthel index score (BI), and Fugl-Meyer Assessment (FMA) were used as independent parameters for evaluating balance function, activities of daily living(ADL) and motor function after stroke respectively, and were subjected to meta-analysis by RevMan5.3 software. Results Nine studies with 460 participants were analyzed. Results of meta-analysis showed that the SET treatment combined with conventional rehabilitation was superior to conventional rehabilitation treatments, with increased degrees of BBS (WMD = 3.81, 95% CI [0.15, 7.48], P = 0.04), BI (WMD = 12.98, 95% CI [8.39, 17.56], P < 0.00001), and FMA (SMD = 0.76, 95% CI [0.41, 1.11], P < 0.0001). Conclusion Based on limited evidence from 9 trials, the SET treatment combined with conventional rehabilitation was superior to conventional rehabilitation treatments, with increased degrees of BBS, BI and FMA, So the SET treatment can improvement of balance function after stroke, but the interpretation of our findings is required to be made with caution due to limitations in included trials such as small sample sizes and the risk of bias. Therefore, more multi-center and large-sampled randomized controlled trials are needed to confirm its clinical applications. PMID:27727288
Forecasting Space Weather-Induced GPS Performance Degradation Using Random Forest
NASA Astrophysics Data System (ADS)
Filjar, R.; Filic, M.; Milinkovic, F.
2017-12-01
Space weather and ionospheric dynamics have a profound effect on positioning performance of the Global Satellite Navigation System (GNSS). However, the quantification of that effect is still the subject of scientific activities around the world. In the latest contribution to the understanding of the space weather and ionospheric effects on satellite-based positioning performance, we conducted a study of several candidates for forecasting method for space weather-induced GPS positioning performance deterioration. First, a 5-days set of experimentally collected data was established, encompassing the space weather and ionospheric activity indices (including: the readings of the Sudden Ionospheric Disturbance (SID) monitors, components of geomagnetic field strength, global Kp index, Dst index, GPS-derived Total Electron Content (TEC) samples, standard deviation of TEC samples, and sunspot number) and observations of GPS positioning error components (northing, easting, and height positioning error) derived from the Adriatic Sea IGS reference stations' RINEX raw pseudorange files in quiet space weather periods. This data set was split into the training and test sub-sets. Then, a selected set of supervised machine learning methods based on Random Forest was applied to the experimentally collected data set in order to establish the appropriate regional (the Adriatic Sea) forecasting models for space weather-induced GPS positioning performance deterioration. The forecasting models were developed in the R/rattle statistical programming environment. The forecasting quality of the regional forecasting models developed was assessed, and the conclusions drawn on the advantages and shortcomings of the regional forecasting models for space weather-caused GNSS positioning performance deterioration.
Advantages of Synthetic Noise and Machine Learning for Analyzing Radioecological Data Sets.
Shuryak, Igor
2017-01-01
The ecological effects of accidental or malicious radioactive contamination are insufficiently understood because of the hazards and difficulties associated with conducting studies in radioactively-polluted areas. Data sets from severely contaminated locations can therefore be small. Moreover, many potentially important factors, such as soil concentrations of toxic chemicals, pH, and temperature, can be correlated with radiation levels and with each other. In such situations, commonly-used statistical techniques like generalized linear models (GLMs) may not be able to provide useful information about how radiation and/or these other variables affect the outcome (e.g. abundance of the studied organisms). Ensemble machine learning methods such as random forests offer powerful alternatives. We propose that analysis of small radioecological data sets by GLMs and/or machine learning can be made more informative by using the following techniques: (1) adding synthetic noise variables to provide benchmarks for distinguishing the performances of valuable predictors from irrelevant ones; (2) adding noise directly to the predictors and/or to the outcome to test the robustness of analysis results against random data fluctuations; (3) adding artificial effects to selected predictors to test the sensitivity of the analysis methods in detecting predictor effects; (4) running a selected machine learning method multiple times (with different random-number seeds) to test the robustness of the detected "signal"; (5) using several machine learning methods to test the "signal's" sensitivity to differences in analysis techniques. Here, we applied these approaches to simulated data, and to two published examples of small radioecological data sets: (I) counts of fungal taxa in samples of soil contaminated by the Chernobyl nuclear power plan accident (Ukraine), and (II) bacterial abundance in soil samples under a ruptured nuclear waste storage tank (USA). We show that the proposed techniques were advantageous compared with the methodology used in the original publications where the data sets were presented. Specifically, our approach identified a negative effect of radioactive contamination in data set I, and suggested that in data set II stable chromium could have been a stronger limiting factor for bacterial abundance than the radionuclides 137Cs and 99Tc. This new information, which was extracted from these data sets using the proposed techniques, can potentially enhance the design of radioactive waste bioremediation.
Advantages of Synthetic Noise and Machine Learning for Analyzing Radioecological Data Sets
Shuryak, Igor
2017-01-01
The ecological effects of accidental or malicious radioactive contamination are insufficiently understood because of the hazards and difficulties associated with conducting studies in radioactively-polluted areas. Data sets from severely contaminated locations can therefore be small. Moreover, many potentially important factors, such as soil concentrations of toxic chemicals, pH, and temperature, can be correlated with radiation levels and with each other. In such situations, commonly-used statistical techniques like generalized linear models (GLMs) may not be able to provide useful information about how radiation and/or these other variables affect the outcome (e.g. abundance of the studied organisms). Ensemble machine learning methods such as random forests offer powerful alternatives. We propose that analysis of small radioecological data sets by GLMs and/or machine learning can be made more informative by using the following techniques: (1) adding synthetic noise variables to provide benchmarks for distinguishing the performances of valuable predictors from irrelevant ones; (2) adding noise directly to the predictors and/or to the outcome to test the robustness of analysis results against random data fluctuations; (3) adding artificial effects to selected predictors to test the sensitivity of the analysis methods in detecting predictor effects; (4) running a selected machine learning method multiple times (with different random-number seeds) to test the robustness of the detected “signal”; (5) using several machine learning methods to test the “signal’s” sensitivity to differences in analysis techniques. Here, we applied these approaches to simulated data, and to two published examples of small radioecological data sets: (I) counts of fungal taxa in samples of soil contaminated by the Chernobyl nuclear power plan accident (Ukraine), and (II) bacterial abundance in soil samples under a ruptured nuclear waste storage tank (USA). We show that the proposed techniques were advantageous compared with the methodology used in the original publications where the data sets were presented. Specifically, our approach identified a negative effect of radioactive contamination in data set I, and suggested that in data set II stable chromium could have been a stronger limiting factor for bacterial abundance than the radionuclides 137Cs and 99Tc. This new information, which was extracted from these data sets using the proposed techniques, can potentially enhance the design of radioactive waste bioremediation. PMID:28068401
Estimating the Size of a Large Network and its Communities from a Random Sample
Chen, Lin; Karbasi, Amin; Crawford, Forrest W.
2017-01-01
Most real-world networks are too large to be measured or studied directly and there is substantial interest in estimating global network properties from smaller sub-samples. One of the most important global properties is the number of vertices/nodes in the network. Estimating the number of vertices in a large network is a major challenge in computer science, epidemiology, demography, and intelligence analysis. In this paper we consider a population random graph G = (V, E) from the stochastic block model (SBM) with K communities/blocks. A sample is obtained by randomly choosing a subset W ⊆ V and letting G(W) be the induced subgraph in G of the vertices in W. In addition to G(W), we observe the total degree of each sampled vertex and its block membership. Given this partial information, we propose an efficient PopULation Size Estimation algorithm, called PULSE, that accurately estimates the size of the whole population as well as the size of each community. To support our theoretical analysis, we perform an exhaustive set of experiments to study the effects of sample size, K, and SBM model parameters on the accuracy of the estimates. The experimental results also demonstrate that PULSE significantly outperforms a widely-used method called the network scale-up estimator in a wide variety of scenarios. PMID:28867924
Estimating the Size of a Large Network and its Communities from a Random Sample.
Chen, Lin; Karbasi, Amin; Crawford, Forrest W
2016-01-01
Most real-world networks are too large to be measured or studied directly and there is substantial interest in estimating global network properties from smaller sub-samples. One of the most important global properties is the number of vertices/nodes in the network. Estimating the number of vertices in a large network is a major challenge in computer science, epidemiology, demography, and intelligence analysis. In this paper we consider a population random graph G = ( V, E ) from the stochastic block model (SBM) with K communities/blocks. A sample is obtained by randomly choosing a subset W ⊆ V and letting G ( W ) be the induced subgraph in G of the vertices in W . In addition to G ( W ), we observe the total degree of each sampled vertex and its block membership. Given this partial information, we propose an efficient PopULation Size Estimation algorithm, called PULSE, that accurately estimates the size of the whole population as well as the size of each community. To support our theoretical analysis, we perform an exhaustive set of experiments to study the effects of sample size, K , and SBM model parameters on the accuracy of the estimates. The experimental results also demonstrate that PULSE significantly outperforms a widely-used method called the network scale-up estimator in a wide variety of scenarios.
Signal processor for processing ultrasonic receiver signals
Fasching, George E.
1980-01-01
A signal processor is provided which uses an analog integrating circuit in conjunction with a set of digital counters controlled by a precision clock for sampling timing to provide an improved presentation of an ultrasonic transmitter/receiver signal. The signal is sampled relative to the transmitter trigger signal timing at precise times, the selected number of samples are integrated and the integrated samples are transferred and held for recording on a strip chart recorder or converted to digital form for storage. By integrating multiple samples taken at precisely the same time with respect to the trigger for the ultrasonic transmitter, random noise, which is contained in the ultrasonic receiver signal, is reduced relative to the desired useful signal.
Final Report of Outcome of Southeastern New Mexico Bilingual Program.
ERIC Educational Resources Information Center
McCracken, Wanda
The Southeastern New Mexico Bilingual Program's final report analyzed performance objectives to determine the outcome of the goals set for academic growth in the standard curriculum, as well as in the English and Spanish language arts, and growth in social development of students. The random sample consisted of 20 third and fourth graders from the…
An overview of the Columbia Habitat Monitoring Program's (CHaMP) spatial-temporal design framework
We briefly review the concept of a master sample applied to stream networks in which a randomized set of stream sites is selected across a broad region to serve as a list of sites from which a subset of sites is selected to achieve multiple objectives of specific designs. The Col...
ERIC Educational Resources Information Center
Thompson, Bruce
The relationship between analysis of variance (ANOVA) methods and their analogs (analysis of covariance and multiple analyses of variance and covariance--collectively referred to as OVA methods) and the more general analytic case is explored. A small heuristic data set is used, with a hypothetical sample of 20 subjects, randomly assigned to five…
Prevalence, Causes and Effects of Bullying in Tertiary Institutions in Cross River State, Nigeria
ERIC Educational Resources Information Center
Ada, Mary Juliana; Okoli, Georgina; Obeten, Okoi Okorn; Akeke, M. N. G.
2016-01-01
This research is an evaluation of the impact of causes, consequences and effects of bullying in academic setting on student academic performance in tertiary institutions in Cross River State, Nigeria. The research made use of purposive and random sampling techniques made up of 302 students. Questionnaire served as the data collection instrument.…
ERIC Educational Resources Information Center
Backman, Desiree; Gonzaga, Gian; Sugerman, Sharon; Francis, Dona; Cook, Sara
2011-01-01
Objective: To examine the impact of fresh fruit availability at worksites on the fruit and vegetable consumption and related psychosocial determinants of low-wage employees. Design: A prospective, randomized block experimental design. Setting: Seven apparel manufacturing and 2 food processing worksites. Participants: A convenience sample of 391…
The Construct of Creativity: Structural Model for Self-Reported Creativity Ratings
ERIC Educational Resources Information Center
Kaufman, James C.; Cole, Jason C.; Baer, John
2009-01-01
Several thousand subjects completed self-report questionnaires about their own creativity in 56 discrete domains. This sample was then randomly divided into three subsamples that were subject to factor analyses that compared an oblique model (with a set of correlated factors) and a hierarchical model (with a single second-order, or hierarchical,…
ERIC Educational Resources Information Center
Reifman, Alan; Watson, Wendy K.
2003-01-01
Students' first semester on campus may set the stage for their alcohol use/misuse throughout college. The authors surveyed 274 randomly sampled first-semester freshmen at a large southwestern university on their past 2 weeks' binge drinking, their high school binge drinking, and psychosocial factors possibly associated with drinking. They…
Identification of Violence in Turkish Health Care Settings
ERIC Educational Resources Information Center
Ayranci, Unal; Yenilmez, Cinar; Balci, Yasemin; Kaptanoglu, Cem
2006-01-01
This study sought to investigate the contributing factors to and frequency of violence against health care workers (HCWs) working in western Turkey. The population is composed of a random sample of 1,209 HCWs from 34 health care workplaces. Written questionnaires were given to HCWs at all sites, where staff were instructed to register all types of…
Arts and the Quality of Life: An Exploratory Study
ERIC Educational Resources Information Center
Michalos, Alex C.
2005-01-01
The aim of this investigation was to measure the impact of the arts broadly construed on the quality of life. A randomly drawn household sample of 315 adult residents of Prince George, British Columbia served as the working data-set. Examining zero-order correlations, among other things, it was found that playing a musical instrument a number of…
ERIC Educational Resources Information Center
Stack, Sue; Watson, Jane
2013-01-01
There is considerable research on the difficulties students have in conceptualising individual concepts of probability and statistics (see for example, Bryant & Nunes, 2012; Jones, 2005). The unit of work developed for the action research project described in this article is specifically designed to address some of these in order to help…
ERIC Educational Resources Information Center
Park, Amanda; Nitzke, Susan; Kritsch, Karen; Kattelmann, Kendra; White, Adrienne; Boeckner, Linda; Lohse, Barbara; Hoerr, Sharon; Greene, Geoffrey; Zhang, Zhumin
2008-01-01
Objective: Evaluate a theory-based, Internet-delivered nutrition education module. Design: Randomized, treatment-control design with pre-post intervention assessments. Setting and Participants: Convenience sample of 160 young adults (aged 18-24) recruited by community educators in 4 states. Study completers (n = 96) included a mix of…
ERIC Educational Resources Information Center
Gavaravarapu, Subba Rao M.; Vemula, Sudershan R.; Rao, Pratima; Mendu, Vishnu Vardhana Rao; Polasa, Kalpagam
2009-01-01
Objective: To understand food safety knowledge, perceptions, and practices of adolescent girls. Design: Focus group discussions (FGDs) with 32 groups selected using stratified random sampling. Setting: Four South Indian states. Participants: Adolescent girls (10-19 years). Phenomena of Interest: Food safety knowledge, perceptions, and practices.…
A Qualitative Study of Irish Teachers' Perspective of Student Substance Use
ERIC Educational Resources Information Center
Van Hout, Marie Claire; Connor, Sean
2008-01-01
Research Aim: This research aimed to provide an anecdotal perception of student substance use according to the teachers' personal experience in the Irish secondary level educational setting. Methodology: Sampling Interviews were conducted with teachers (n=95) at 10 randomly selected schools in County Carlow in the South East of Ireland, as part of…
Decay of random correlation functions for unimodal maps
NASA Astrophysics Data System (ADS)
Baladi, Viviane; Benedicks, Michael; Maume-Deschamps, Véronique
2000-10-01
Since the pioneering results of Jakobson and subsequent work by Benedicks-Carleson and others, it is known that quadratic maps tfa( χ) = a - χ2 admit a unique absolutely continuous invariant measure for a positive measure set of parameters a. For topologically mixing tfa, Young and Keller-Nowicki independently proved exponential decay of correlation functions for this a.c.i.m. and smooth observables. We consider random compositions of small perturbations tf + ωt, with tf = tfa or another unimodal map satisfying certain nonuniform hyperbolicity axioms, and ωt chosen independently and identically in [-ɛ, ɛ]. Baladi-Viana showed exponential mixing of the associated Markov chain, i.e., averaging over all random itineraries. We obtain stretched exponential bounds for the random correlation functions of Lipschitz observables for the sample measure μωof almost every itinerary.
Rochefort, Christian M; Buckeridge, David L; Tanguay, Andréanne; Biron, Alain; D'Aragon, Frédérick; Wang, Shengrui; Gallix, Benoit; Valiquette, Louis; Audet, Li-Anne; Lee, Todd C; Jayaraman, Dev; Petrucci, Bruno; Lefebvre, Patricia
2017-02-16
Adverse events (AEs) in acute care hospitals are frequent and associated with significant morbidity, mortality, and costs. Measuring AEs is necessary for quality improvement and benchmarking purposes, but current detection methods lack in accuracy, efficiency, and generalizability. The growing availability of electronic health records (EHR) and the development of natural language processing techniques for encoding narrative data offer an opportunity to develop potentially better methods. The purpose of this study is to determine the accuracy and generalizability of using automated methods for detecting three high-incidence and high-impact AEs from EHR data: a) hospital-acquired pneumonia, b) ventilator-associated event and, c) central line-associated bloodstream infection. This validation study will be conducted among medical, surgical and ICU patients admitted between 2013 and 2016 to the Centre hospitalier universitaire de Sherbrooke (CHUS) and the McGill University Health Centre (MUHC), which has both French and English sites. A random 60% sample of CHUS patients will be used for model development purposes (cohort 1, development set). Using a random sample of these patients, a reference standard assessment of their medical chart will be performed. Multivariate logistic regression and the area under the curve (AUC) will be employed to iteratively develop and optimize three automated AE detection models (i.e., one per AE of interest) using EHR data from the CHUS. These models will then be validated on a random sample of the remaining 40% of CHUS patients (cohort 1, internal validation set) using chart review to assess accuracy. The most accurate models developed and validated at the CHUS will then be applied to EHR data from a random sample of patients admitted to the MUHC French site (cohort 2) and English site (cohort 3)-a critical requirement given the use of narrative data -, and accuracy will be assessed using chart review. Generalizability will be determined by comparing AUCs from cohorts 2 and 3 to those from cohort 1. This study will likely produce more accurate and efficient measures of AEs. These measures could be used to assess the incidence rates of AEs, evaluate the success of preventive interventions, or benchmark performance across hospitals.
Johnson, Julene K; Nápoles, Anna M; Stewart, Anita L; Max, Wendy B; Santoyo-Olsson, Jasmine; Freyre, Rachel; Allison, Theresa A; Gregorich, Steven E
2015-10-13
Older adults are the fastest growing segment of the United States population. There is an immediate need to identify novel, cost-effective community-based approaches that promote health and well-being for older adults, particularly those from diverse racial/ethnic and socioeconomic backgrounds. Because choral singing is multi-modal (requires cognitive, physical, and psychosocial engagement), it has the potential to improve health outcomes across several dimensions to help older adults remain active and independent. The purpose of this study is to examine the effect of a community choir program (Community of Voices) on health and well-being and to examine its costs and cost-effectiveness in a large sample of diverse, community-dwelling older adults. In this cluster randomized controlled trial, diverse adults age 60 and older were enrolled at Administration on Aging-supported senior centers and completed baseline assessments. The senior centers were randomly assigned to either start the choir immediately (intervention group) or wait 6 months to start (control). Community of Voices is a culturally tailored choir program delivered at the senior centers by professional music conductors that reflects three components of engagement (cognitive, physical, and psychosocial). We describe the nature of the study including the cluster randomized trial study design, sampling frame, sample size calculation, methods of recruitment and assessment, and primary and secondary outcomes. The study involves conducting a randomized trial of an intervention as delivered in "real-world" settings. The choir program was designed using a novel translational approach that integrated evidence-based research on the benefits of singing for older adults, community best practices related to community choirs for older adults, and the perspective of the participating communities. The practicality and relatively low cost of the choir intervention means it can be incorporated into a variety of community settings and adapted to diverse cultures and languages. If successful, this program will be a practical and acceptable community-based approach for promoting health and well-being of older adults. ClinicalTrials.gov NCT01869179 registered 9 January 2013.
Decision Tree Repository and Rule Set Based Mingjiang River Estuarine Wetlands Classifaction
NASA Astrophysics Data System (ADS)
Zhang, W.; Li, X.; Xiao, W.
2018-05-01
The increasing urbanization and industrialization have led to wetland losses in estuarine area of Mingjiang River over past three decades. There has been increasing attention given to produce wetland inventories using remote sensing and GIS technology. Due to inconsistency training site and training sample, traditionally pixel-based image classification methods can't achieve a comparable result within different organizations. Meanwhile, object-oriented image classification technique shows grate potential to solve this problem and Landsat moderate resolution remote sensing images are widely used to fulfill this requirement. Firstly, the standardized atmospheric correct, spectrally high fidelity texture feature enhancement was conducted before implementing the object-oriented wetland classification method in eCognition. Secondly, we performed the multi-scale segmentation procedure, taking the scale, hue, shape, compactness and smoothness of the image into account to get the appropriate parameters, using the top and down region merge algorithm from single pixel level, the optimal texture segmentation scale for different types of features is confirmed. Then, the segmented object is used as the classification unit to calculate the spectral information such as Mean value, Maximum value, Minimum value, Brightness value and the Normalized value. The Area, length, Tightness and the Shape rule of the image object Spatial features and texture features such as Mean, Variance and Entropy of image objects are used as classification features of training samples. Based on the reference images and the sampling points of on-the-spot investigation, typical training samples are selected uniformly and randomly for each type of ground objects. The spectral, texture and spatial characteristics of each type of feature in each feature layer corresponding to the range of values are used to create the decision tree repository. Finally, with the help of high resolution reference images, the random sampling method is used to conduct the field investigation, achieve an overall accuracy of 90.31 %, and the Kappa coefficient is 0.88. The classification method based on decision tree threshold values and rule set developed by the repository, outperforms the results obtained from the traditional methodology. Our decision tree repository and rule set based object-oriented classification technique was an effective method for producing comparable and consistency wetlands data set.
School-located Influenza Vaccinations for Adolescents: A Randomized Controlled Trial.
Szilagyi, Peter G; Schaffer, Stanley; Rand, Cynthia M; Goldstein, Nicolas P N; Vincelli, Phyllis; Hightower, A Dirk; Younge, Mary; Eagan, Ashley; Blumkin, Aaron; Albertin, Christina S; DiBitetto, Kristine; Yoo, Byung-Kwang; Humiston, Sharon G
2018-02-01
We aimed to evaluate the effect of school-located influenza vaccination (SLIV) on adolescents' influenza vaccination rates. In 2015-2016, we performed a cluster-randomized trial of adolescent SLIV in middle/high schools. We selected 10 pairs of schools (identical grades within pairs) and randomly allocated schools within pairs to SLIV or usual care control. At eight suburban SLIV schools, we sent parents e-mail notifications about upcoming SLIV clinics and promoted online immunization consent. At two urban SLIV schools, we sent parents (via student backpack fliers) paper immunization consent forms and information about SLIV. E-mails were unavailable at these schools. Local health department nurses administered nasal or injectable influenza vaccine at dedicated SLIV clinics and billed insurers. We compared influenza vaccination rates at SLIV versus control schools using school directories to identify the student sample in each school. We used the state immunization registry to determine receipt of influenza vaccination. The final sample comprised 17,650 students enrolled in the 20 schools. Adolescents at suburban SLIV schools had higher overall influenza vaccination rates than did adolescents at control schools (51% vs. 46%, p < .001; adjusted odds ratio = 1.27, 95% confidence interval 1.18-1.38, controlling for vaccination during the prior two seasons). No effect of SLIV was noted among urbanschools on multivariate analysis. SLIV did not substitute for vaccinations in primary care or other settings; in suburban settings, SLIV was associated with increased vaccinations in primary care or other settings (adjusted odds ratio = 1.10, 95% confidence interval 1.02-1.19). SLIV in this community increased influenza vaccination rates among adolescents attending suburban schools. Copyright © 2018. Published by Elsevier Inc.
Knott, V; Rees, D J; Cheng, Z; Brownlee, G G
1988-01-01
Sets of overlapping cosmid clones generated by random sampling and fingerprinting methods complement data at pyrB (96.5') and oriC (84') in the published physical map of E. coli. A new cloning strategy using sheared DNA, and a low copy, inducible cosmid vector were used in order to reduce bias in libraries, in conjunction with micro-methods for preparing cosmid DNA from a large number of clones. Our results are relevant to the design of the best approach to the physical mapping of large genomes. PMID:2834694
A tale of two "forests": random forest machine learning AIDS tropical forest carbon mapping.
Mascaro, Joseph; Asner, Gregory P; Knapp, David E; Kennedy-Bowdoin, Ty; Martin, Roberta E; Anderson, Christopher; Higgins, Mark; Chadwick, K Dana
2014-01-01
Accurate and spatially-explicit maps of tropical forest carbon stocks are needed to implement carbon offset mechanisms such as REDD+ (Reduced Deforestation and Degradation Plus). The Random Forest machine learning algorithm may aid carbon mapping applications using remotely-sensed data. However, Random Forest has never been compared to traditional and potentially more reliable techniques such as regionally stratified sampling and upscaling, and it has rarely been employed with spatial data. Here, we evaluated the performance of Random Forest in upscaling airborne LiDAR (Light Detection and Ranging)-based carbon estimates compared to the stratification approach over a 16-million hectare focal area of the Western Amazon. We considered two runs of Random Forest, both with and without spatial contextual modeling by including--in the latter case--x, and y position directly in the model. In each case, we set aside 8 million hectares (i.e., half of the focal area) for validation; this rigorous test of Random Forest went above and beyond the internal validation normally compiled by the algorithm (i.e., called "out-of-bag"), which proved insufficient for this spatial application. In this heterogeneous region of Northern Peru, the model with spatial context was the best preforming run of Random Forest, and explained 59% of LiDAR-based carbon estimates within the validation area, compared to 37% for stratification or 43% by Random Forest without spatial context. With the 60% improvement in explained variation, RMSE against validation LiDAR samples improved from 33 to 26 Mg C ha(-1) when using Random Forest with spatial context. Our results suggest that spatial context should be considered when using Random Forest, and that doing so may result in substantially improved carbon stock modeling for purposes of climate change mitigation.
A Tale of Two “Forests”: Random Forest Machine Learning Aids Tropical Forest Carbon Mapping
Mascaro, Joseph; Asner, Gregory P.; Knapp, David E.; Kennedy-Bowdoin, Ty; Martin, Roberta E.; Anderson, Christopher; Higgins, Mark; Chadwick, K. Dana
2014-01-01
Accurate and spatially-explicit maps of tropical forest carbon stocks are needed to implement carbon offset mechanisms such as REDD+ (Reduced Deforestation and Degradation Plus). The Random Forest machine learning algorithm may aid carbon mapping applications using remotely-sensed data. However, Random Forest has never been compared to traditional and potentially more reliable techniques such as regionally stratified sampling and upscaling, and it has rarely been employed with spatial data. Here, we evaluated the performance of Random Forest in upscaling airborne LiDAR (Light Detection and Ranging)-based carbon estimates compared to the stratification approach over a 16-million hectare focal area of the Western Amazon. We considered two runs of Random Forest, both with and without spatial contextual modeling by including—in the latter case—x, and y position directly in the model. In each case, we set aside 8 million hectares (i.e., half of the focal area) for validation; this rigorous test of Random Forest went above and beyond the internal validation normally compiled by the algorithm (i.e., called “out-of-bag”), which proved insufficient for this spatial application. In this heterogeneous region of Northern Peru, the model with spatial context was the best preforming run of Random Forest, and explained 59% of LiDAR-based carbon estimates within the validation area, compared to 37% for stratification or 43% by Random Forest without spatial context. With the 60% improvement in explained variation, RMSE against validation LiDAR samples improved from 33 to 26 Mg C ha−1 when using Random Forest with spatial context. Our results suggest that spatial context should be considered when using Random Forest, and that doing so may result in substantially improved carbon stock modeling for purposes of climate change mitigation. PMID:24489686
Unique effects of setting goals on behavior change: Systematic review and meta-analysis.
Epton, Tracy; Currie, Sinead; Armitage, Christopher J
2017-12-01
Goal setting is a common feature of behavior change interventions, but it is unclear when goal setting is optimally effective. The aims of this systematic review and meta-analysis were to evaluate: (a) the unique effects of goal setting on behavior change, and (b) under what circumstances and for whom goal setting works best. Four databases were searched for articles that assessed the unique effects of goal setting on behavior change using randomized controlled trials. One-hundred and 41 papers were identified from which 384 effect sizes (N = 16,523) were extracted and analyzed. A moderator analysis of sample characteristics, intervention characteristics, inclusion of other behavior change techniques, study design and delivery, quality of study, outcome measures, and behavior targeted was conducted. A random effects model indicated a small positive unique effect of goal setting across a range of behaviors, d = .34 (CI [.28, .41]). Moderator analyses indicated that goal setting was particularly effective if the goal was: (a) difficult, (b) set publicly, and (c) was a group goal. There was weaker evidence that goal setting was more effective when paired with external monitoring of the behavior/outcome by others without feedback and delivered face-to-face. Goal setting is an effective behavior change technique that has the potential to be considered a fundamental component of successful interventions. The present review adds novel insights into the means by which goal setting might be augmented to maximize behavior change and sets the agenda for future programs of research. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Thompson, William L.; Miller, Amy E.; Mortenson, Dorothy C.; Woodward, Andrea
2011-01-01
Monitoring natural resources in Alaskan national parks is challenging because of their remoteness, limited accessibility, and high sampling costs. We describe an iterative, three-phased process for developing sampling designs based on our efforts to establish a vegetation monitoring program in southwest Alaska. In the first phase, we defined a sampling frame based on land ownership and specific vegetated habitats within the park boundaries and used Path Distance analysis tools to create a GIS layer that delineated portions of each park that could be feasibly accessed for ground sampling. In the second phase, we used simulations based on landcover maps to identify size and configuration of the ground sampling units (single plots or grids of plots) and to refine areas to be potentially sampled. In the third phase, we used a second set of simulations to estimate sample size and sampling frequency required to have a reasonable chance of detecting a minimum trend in vegetation cover for a specified time period and level of statistical confidence. Results of the first set of simulations indicated that a spatially balanced random sample of single plots from the most common landcover types yielded the most efficient sampling scheme. Results of the second set of simulations were compared with field data and indicated that we should be able to detect at least a 25% change in vegetation attributes over 31. years by sampling 8 or more plots per year every five years in focal landcover types. This approach would be especially useful in situations where ground sampling is restricted by access.
Signs of universality in the structure of culture
NASA Astrophysics Data System (ADS)
Băbeanu, Alexandru-Ionuţ; Talman, Leandros; Garlaschelli, Diego
2017-11-01
Understanding the dynamics of opinions, preferences and of culture as whole requires more use of empirical data than has been done so far. It is clear that an important role in driving this dynamics is played by social influence, which is the essential ingredient of many quantitative models. Such models require that all traits are fixed when specifying the "initial cultural state". Typically, this initial state is randomly generated, from a uniform distribution over the set of possible combinations of traits. However, recent work has shown that the outcome of social influence dynamics strongly depends on the nature of the initial state. If the latter is sampled from empirical data instead of being generated in a uniformly random way, a higher level of cultural diversity is found after long-term dynamics, for the same level of propensity towards collective behavior in the short-term. Moreover, if the initial state is randomized by shuffling the empirical traits among people, the level of long-term cultural diversity is in-between those obtained for the empirical and uniformly random counterparts. The current study repeats the analysis for multiple empirical data sets, showing that the results are remarkably similar, although the matrix of correlations between cultural variables clearly differs across data sets. This points towards robust structural properties inherent in empirical cultural states, possibly due to universal laws governing the dynamics of culture in the real world. The results also suggest that this dynamics might be characterized by criticality and involve mechanisms beyond social influence.
Humphreys, Keith; Blodgett, Janet C.; Wagner, Todd H.
2014-01-01
Background Observational studies of Alcoholics Anonymous’ (AA) effectiveness are vulnerable to self-selection bias because individuals choose whether or not to attend AA. The present study therefore employed an innovative statistical technique to derive a selection bias-free estimate of AA’s impact. Methods Six datasets from 5 National Institutes of Health-funded randomized trials (one with two independent parallel arms) of AA facilitation interventions were analyzed using instrumental variables models. Alcohol dependent individuals in one of the datasets (n = 774) were analyzed separately from the rest of sample (n = 1582 individuals pooled from 5 datasets) because of heterogeneity in sample parameters. Randomization itself was used as the instrumental variable. Results Randomization was a good instrument in both samples, effectively predicting increased AA attendance that could not be attributed to self-selection. In five of the six data sets, which were pooled for analysis, increased AA attendance that was attributable to randomization (i.e., free of self-selection bias) was effective at increasing days of abstinence at 3-month (B = .38, p = .001) and 15-month (B = 0.42, p = .04) follow-up. However, in the remaining dataset, in which pre-existing AA attendance was much higher, further increases in AA involvement caused by the randomly assigned facilitation intervention did not affect drinking outcome. Conclusions For most individuals seeking help for alcohol problems, increasing AA attendance leads to short and long term decreases in alcohol consumption that cannot be attributed to self-selection. However, for populations with high pre-existing AA involvement, further increases in AA attendance may have little impact. PMID:25421504
Measuring Data Quality Through a Source Data Verification Audit in a Clinical Research Setting.
Houston, Lauren; Probst, Yasmine; Humphries, Allison
2015-01-01
Health data has long been scrutinised in relation to data quality and integrity problems. Currently, no internationally accepted or "gold standard" method exists measuring data quality and error rates within datasets. We conducted a source data verification (SDV) audit on a prospective clinical trial dataset. An audit plan was applied to conduct 100% manual verification checks on a 10% random sample of participant files. A quality assurance rule was developed, whereby if >5% of data variables were incorrect a second 10% random sample would be extracted from the trial data set. Error was coded: correct, incorrect (valid or invalid), not recorded or not entered. Audit-1 had a total error of 33% and audit-2 36%. The physiological section was the only audit section to have <5% error. Data not recorded to case report forms had the greatest impact on error calculations. A significant association (p=0.00) was found between audit-1 and audit-2 and whether or not data was deemed correct or incorrect. Our study developed a straightforward method to perform a SDV audit. An audit rule was identified and error coding was implemented. Findings demonstrate that monitoring data quality by a SDV audit can identify data quality and integrity issues within clinical research settings allowing quality improvement to be made. The authors suggest this approach be implemented for future research.
Assessment of the hygienic performances of hamburger patty production processes.
Gill, C O; Rahn, K; Sloan, K; McMullen, L M
1997-05-20
The hygienic conditions of the hamburger patties collected from three patty manufacturing plants and six retail outlets were examined. At each manufacturing plant a sample from newly formed, chilled patties and one from frozen patties were collected from each of 25 batches of patties selected at random. At three, two or one retail outlet, respectively, 25 samples from frozen, chilled or both frozen and chilled patties were collected at random. Each sample consisted of 30 g of meat obtained from five or six patties. Total aerobic, coliform and Escherichia coli counts per gram were enumerated for each sample. The mean log (x) and standard deviation (s) were calculated for the log10 values for each set of 25 counts, on the assumption that the distribution of counts approximated the log normal. A value for the log10 of the arithmetic mean (log A) was calculated for each set from the values of x and s. A chi2 statistic was calculated for each set as a test of the assumption of the log normal distribution. The chi2 statistic was calculable for 32 of the 39 sets. Four of the sets gave chi2 values indicative of gross deviation from log normality. On inspection of those sets, distributions obviously differing from the log normal were apparent in two. Log A values for total, coliform and E. coli counts for chilled patties from manufacturing plants ranged from 4.4 to 5.1, 1.7 to 2.3 and 0.9 to 1.5, respectively. Log A values for frozen patties from manufacturing plants were between < 0.1 and 0.5 log10 units less than the equivalent values for chilled patties. Log A values for total, coliform and E. coli counts for frozen patties on retail sale ranged from 3.8 to 8.5, < 0.5 to 3.6 and < 0 to 1.9, respectively. The equivalent ranges for chilled patties on retail sale were 4.8 to 8.5, 1.8 to 3.7 and 1.4 to 2.7, respectively. The findings indicate that the general hygienic condition of hamburgers patties could be improved by their being manufactured from only manufacturing beef of superior hygienic quality, and by the better management of chilled patties at retail outlets.
Kubota, Chika; Okada, Takashi; Aleksic, Branko; Nakamura, Yukako; Kunimoto, Shohko; Morikawa, Mako; Shiino, Tomoko; Tamaji, Ai; Ohoka, Harue; Banno, Naomi; Morita, Tokiko; Murase, Satomi; Goto, Setsuko; Kanai, Atsuko; Masuda, Tomoko; Ando, Masahiko; Ozaki, Norio
2014-01-01
The Edinburgh Postnatal Depression Scale (EPDS) is a widely used screening tool for postpartum depression (PPD). Although the reliability and validity of EPDS in Japanese has been confirmed and the prevalence of PPD is found to be about the same as Western countries, the factor structure of the Japanese version of EPDS has not been elucidated yet. 690 Japanese mothers completed all items of the EPDS at 1 month postpartum. We divided them randomly into two sample sets. The first sample set (n = 345) was used for exploratory factor analysis, and the second sample set was used (n = 345) for confirmatory factor analysis. The result of exploratory factor analysis indicated a three-factor model consisting of anxiety, depression and anhedonia. The results of confirmatory factor analysis suggested that the anxiety and anhedonia factors existed for EPDS in a sample of Japanese women at 1 month postpartum. The depression factor varies by the models of acceptable fit. We examined EPDS scores. As a result, "anxiety" and "anhedonia" exist for EPDS among postpartum women in Japan as already reported in Western countries. Cross-cultural research is needed for future research.
Rare event simulation in radiation transport
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kollman, Craig
1993-10-01
This dissertation studies methods for estimating extremely small probabilities by Monte Carlo simulation. Problems in radiation transport typically involve estimating very rare events or the expected value of a random variable which is with overwhelming probability equal to zero. These problems often have high dimensional state spaces and irregular geometries so that analytic solutions are not possible. Monte Carlo simulation must be used to estimate the radiation dosage being transported to a particular location. If the area is well shielded the probability of any one particular particle getting through is very small. Because of the large number of particles involved,more » even a tiny fraction penetrating the shield may represent an unacceptable level of radiation. It therefore becomes critical to be able to accurately estimate this extremely small probability. Importance sampling is a well known technique for improving the efficiency of rare event calculations. Here, a new set of probabilities is used in the simulation runs. The results are multiple by the likelihood ratio between the true and simulated probabilities so as to keep the estimator unbiased. The variance of the resulting estimator is very sensitive to which new set of transition probabilities are chosen. It is shown that a zero variance estimator does exist, but that its computation requires exact knowledge of the solution. A simple random walk with an associated killing model for the scatter of neutrons is introduced. Large deviation results for optimal importance sampling in random walks are extended to the case where killing is present. An adaptive ``learning`` algorithm for implementing importance sampling is given for more general Markov chain models of neutron scatter. For finite state spaces this algorithm is shown to give with probability one, a sequence of estimates converging exponentially fast to the true solution.« less
McGarvey, Richard; Burch, Paul; Matthews, Janet M
2016-01-01
Natural populations of plants and animals spatially cluster because (1) suitable habitat is patchy, and (2) within suitable habitat, individuals aggregate further into clusters of higher density. We compare the precision of random and systematic field sampling survey designs under these two processes of species clustering. Second, we evaluate the performance of 13 estimators for the variance of the sample mean from a systematic survey. Replicated simulated surveys, as counts from 100 transects, allocated either randomly or systematically within the study region, were used to estimate population density in six spatial point populations including habitat patches and Matérn circular clustered aggregations of organisms, together and in combination. The standard one-start aligned systematic survey design, a uniform 10 x 10 grid of transects, was much more precise. Variances of the 10 000 replicated systematic survey mean densities were one-third to one-fifth of those from randomly allocated transects, implying transect sample sizes giving equivalent precision by random survey would need to be three to five times larger. Organisms being restricted to patches of habitat was alone sufficient to yield this precision advantage for the systematic design. But this improved precision for systematic sampling in clustered populations is underestimated by standard variance estimators used to compute confidence intervals. True variance for the survey sample mean was computed from the variance of 10 000 simulated survey mean estimates. Testing 10 published and three newly proposed variance estimators, the two variance estimators (v) that corrected for inter-transect correlation (ν₈ and ν(W)) were the most accurate and also the most precise in clustered populations. These greatly outperformed the two "post-stratification" variance estimators (ν₂ and ν₃) that are now more commonly applied in systematic surveys. Similar variance estimator performance rankings were found with a second differently generated set of spatial point populations, ν₈ and ν(W) again being the best performers in the longer-range autocorrelated populations. However, no systematic variance estimators tested were free from bias. On balance, systematic designs bring more narrow confidence intervals in clustered populations, while random designs permit unbiased estimates of (often wider) confidence interval. The search continues for better estimators of sampling variance for the systematic survey mean.
NASA Astrophysics Data System (ADS)
Li, Xiayue; Curtis, Farren S.; Rose, Timothy; Schober, Christoph; Vazquez-Mayagoitia, Alvaro; Reuter, Karsten; Oberhofer, Harald; Marom, Noa
2018-06-01
We present Genarris, a Python package that performs configuration space screening for molecular crystals of rigid molecules by random sampling with physical constraints. For fast energy evaluations, Genarris employs a Harris approximation, whereby the total density of a molecular crystal is constructed via superposition of single molecule densities. Dispersion-inclusive density functional theory is then used for the Harris density without performing a self-consistency cycle. Genarris uses machine learning for clustering, based on a relative coordinate descriptor developed specifically for molecular crystals, which is shown to be robust in identifying packing motif similarity. In addition to random structure generation, Genarris offers three workflows based on different sequences of successive clustering and selection steps: the "Rigorous" workflow is an exhaustive exploration of the potential energy landscape, the "Energy" workflow produces a set of low energy structures, and the "Diverse" workflow produces a maximally diverse set of structures. The latter is recommended for generating initial populations for genetic algorithms. Here, the implementation of Genarris is reported and its application is demonstrated for three test cases.
Comparative study of feature selection with ensemble learning using SOM variants
NASA Astrophysics Data System (ADS)
Filali, Ameni; Jlassi, Chiraz; Arous, Najet
2017-03-01
Ensemble learning has succeeded in the growth of stability and clustering accuracy, but their runtime prohibits them from scaling up to real-world applications. This study deals the problem of selecting a subset of the most pertinent features for every cluster from a dataset. The proposed method is another extension of the Random Forests approach using self-organizing maps (SOM) variants to unlabeled data that estimates the out-of-bag feature importance from a set of partitions. Every partition is created using a various bootstrap sample and a random subset of the features. Then, we show that the process internal estimates are used to measure variable pertinence in Random Forests are also applicable to feature selection in unsupervised learning. This approach aims to the dimensionality reduction, visualization and cluster characterization at the same time. Hence, we provide empirical results on nineteen benchmark data sets indicating that RFS can lead to significant improvement in terms of clustering accuracy, over several state-of-the-art unsupervised methods, with a very limited subset of features. The approach proves promise to treat with very broad domains.
Does rational selection of training and test sets improve the outcome of QSAR modeling?
Martin, Todd M; Harten, Paul; Young, Douglas M; Muratov, Eugene N; Golbraikh, Alexander; Zhu, Hao; Tropsha, Alexander
2012-10-22
Prior to using a quantitative structure activity relationship (QSAR) model for external predictions, its predictive power should be established and validated. In the absence of a true external data set, the best way to validate the predictive ability of a model is to perform its statistical external validation. In statistical external validation, the overall data set is divided into training and test sets. Commonly, this splitting is performed using random division. Rational splitting methods can divide data sets into training and test sets in an intelligent fashion. The purpose of this study was to determine whether rational division methods lead to more predictive models compared to random division. A special data splitting procedure was used to facilitate the comparison between random and rational division methods. For each toxicity end point, the overall data set was divided into a modeling set (80% of the overall set) and an external evaluation set (20% of the overall set) using random division. The modeling set was then subdivided into a training set (80% of the modeling set) and a test set (20% of the modeling set) using rational division methods and by using random division. The Kennard-Stone, minimal test set dissimilarity, and sphere exclusion algorithms were used as the rational division methods. The hierarchical clustering, random forest, and k-nearest neighbor (kNN) methods were used to develop QSAR models based on the training sets. For kNN QSAR, multiple training and test sets were generated, and multiple QSAR models were built. The results of this study indicate that models based on rational division methods generate better statistical results for the test sets than models based on random division, but the predictive power of both types of models are comparable.
Melfsen, Andreas; Hartung, Eberhard; Haeussermann, Angelika
2013-02-01
The robustness of in-line raw milk analysis with near-infrared spectroscopy (NIRS) was tested with respect to the prediction of the raw milk contents fat, protein and lactose. Near-infrared (NIR) spectra of raw milk (n = 3119) were acquired on three different farms during the milking process of 354 milkings over a period of six months. Calibration models were calculated for: a random data set of each farm (fully random internal calibration); first two thirds of the visits per farm (internal calibration); whole datasets of two of the three farms (external calibration), and combinations of external and internal datasets. Validation was done either on the remaining data set per farm (internal validation) or on data of the remaining farms (external validation). Excellent calibration results were obtained when fully randomised internal calibration sets were used for milk analysis. In this case, RPD values of around ten, five and three for the prediction of fat, protein and lactose content, respectively, were achieved. Farm internal calibrations achieved much poorer prediction results especially for the prediction of protein and lactose with RPD values of around two and one respectively. The prediction accuracy improved when validation was done on spectra of an external farm, mainly due to the higher sample variation in external calibration sets in terms of feeding diets and individual cow effects. The results showed that further improvements were achieved when additional farm information was added to the calibration set. One of the main requirements towards a robust calibration model is the ability to predict milk constituents in unknown future milk samples. The robustness and quality of prediction increases with increasing variation of, e.g., feeding and cow individual milk composition in the calibration model.
Qin, Li-Xuan; Levine, Douglas A
2016-06-10
Accurate discovery of molecular biomarkers that are prognostic of a clinical outcome is an important yet challenging task, partly due to the combination of the typically weak genomic signal for a clinical outcome and the frequently strong noise due to microarray handling effects. Effective strategies to resolve this challenge are in dire need. We set out to assess the use of careful study design and data normalization for the discovery of prognostic molecular biomarkers. Taking progression free survival in advanced serous ovarian cancer as an example, we conducted empirical analysis on two sets of microRNA arrays for the same set of tumor samples: arrays in one set were collected using careful study design (that is, uniform handling and randomized array-to-sample assignment) and arrays in the other set were not. We found that (1) handling effects can confound the clinical outcome under study as a result of chance even with randomization, (2) the level of confounding handling effects can be reduced by data normalization, and (3) good study design cannot be replaced by post-hoc normalization. In addition, we provided a practical approach to define positive and negative control markers for detecting handling effects and assessing the performance of a normalization method. Our work showcased the difficulty of finding prognostic biomarkers for a clinical outcome of weak genomic signals, illustrated the benefits of careful study design and data normalization, and provided a practical approach to identify handling effects and select a beneficial normalization method. Our work calls for careful study design and data analysis for the discovery of robust and translatable molecular biomarkers.
Harding, Elizabeth; Beckworth, Colin; Fesselet, Jean-Francois; Lenglet, Annick; Lako, Richard; Valadez, Joseph J
2017-08-08
Humanitarian agencies working in refugee camp settings require rapid assessment methods to measure the needs of the populations they serve. Due to the high level of dependency of refugees, agencies need to carry out these assessments. Lot Quality Assurance Sampling (LQAS) is a method commonly used in development settings to assess populations living in a project catchment area to identify their greatest needs. LQAS could be well suited to serve the needs of refugee populations, but it has rarely been used in humanitarian settings. We adapted and implemented an LQAS survey design in Batil refugee camp, South Sudan in May 2013 to measure the added value of using it for sub-camp level assessment. Using pre-existing divisions within the camp, we divided the Batil catchment area into six contiguous segments, called 'supervision areas' (SA). Six teams of two data collectors randomly selected 19 respondents in each SA, who they interviewed to collect information on water, sanitation, hygiene, and diarrhoea prevalence. These findings were aggregated into a stratified random sample of 114 respondents, and the results were analysed to produce a coverage estimate with 95% confidence interval for the camp and to prioritize SAs within the camp. The survey provided coverage estimates on WASH indicators as well as evidence that areas of the camp closer to the main road, to clinics and to the market were better served than areas at the periphery of the camp. This assumption did not hold for all services, however, as sanitation services were uniformly high regardless of location. While it was necessary to adapt the standard LQAS protocol used in low-resource communities, the LQAS model proved to be feasible in a refugee camp setting, and program managers found the results useful at both the catchment area and SA level. This study, one of the few adaptations of LQAS for a camp setting, shows that it is a feasible method for regular monitoring, with the added value of enabling camp managers to identify and advocate for the least served areas within the camp. Feedback on the results from stakeholders was overwhelmingly positive.
Thomas P. Holmes; Kevin J. Boyle
2005-01-01
A hybrid stated-preference model is presented that combines the referendum contingent valuation response format with an experimentally designed set of attributes. A sequence of valuation questions is asked to a random sample in a mailout mail-back format. Econometric analysis shows greater discrimination between alternatives in the final choice in the sequence, and the...
Baseline Survey of Sun Protection Policies and Practices in Primary School Settings in New Zealand
ERIC Educational Resources Information Center
Reeder, A. I.; Jopson, J. A.; Gray, A.
2009-01-01
The SunSmart Schools Accreditation Programme (SSAP) was launched as a national programme in October 2005 to help reduce the risk of excessive child exposure to ultraviolet radiation. As part of the need for evaluation, this paper reports the findings of a national survey of a randomly selected sample of approximately 12% of New Zealand primary…
Development and Evaluation of the Arabic Filial Piety Scale
ERIC Educational Resources Information Center
Khalaila, Rabia
2010-01-01
Objective: To examine the validity and reliability of a new Arabic Filial Piety scale (AFPS) for use with informal Arab caregivers. Background: Filial piety, a term used to describe a set of family values in relation to parental care. This is the first measure of this construct for use with Arab populations in Israel. Method: A random sample of…
Detecting and preventing error propagation via competitive learning.
Silva, Thiago Christiano; Zhao, Liang
2013-05-01
Semisupervised learning is a machine learning approach which is able to employ both labeled and unlabeled samples in the training process. It is an important mechanism for autonomous systems due to the ability of exploiting the already acquired information and for exploring the new knowledge in the learning space at the same time. In these cases, the reliability of the labels is a crucial factor, because mislabeled samples may propagate wrong labels to a portion of or even the entire data set. This paper has the objective of addressing the error propagation problem originated by these mislabeled samples by presenting a mechanism embedded in a network-based (graph-based) semisupervised learning method. Such a procedure is based on a combined random-preferential walk of particles in a network constructed from the input data set. The particles of the same class cooperate among them, while the particles of different classes compete with each other to propagate class labels to the whole network. Computer simulations conducted on synthetic and real-world data sets reveal the effectiveness of the model. Copyright © 2012 Elsevier Ltd. All rights reserved.
[Tobacco quality analysis of producing areas of Yunnan tobacco using near-infrared (NIR) spectrum].
Wang, Yi; Ma, Xiang; Wen, Ya-Dong; Yu, Chun-Xia; Wang, Luo-Ping; Zhao, Long-Lian; Li, Jun-Hui
2013-01-01
In the present study, tobacco quality analysis of different producing areas was carried out applying spectrum projection and correlation methods. The group of industrial classification data was near-infrared (NIR) spectrum in 2010 year of middle parts of tobacco plant from Hongta Tobacco (Group) Co., Ltd. Twelve hundred seventy six superior tobacco leaf samples were collected from four producing areas, in which three areas from Yuxi, Chuxiong and Zhaotong, in Yunnan province all belong to tobacco varieties of K326 and one area from Dali belongs to tobacco varieties of Hongda. The conclusion showed that when the samples were divided into two parts by the ratio of 2 : 1 randomly as analysis and verification sets, the verification set corresponded with the analysis set applying spectrum projection because their correlation coefficients by the first and second dimensional projection were all above 0.99. At the same time, The study discussed a method to get the quantitative similarity values of different producing areas samples. The similarity values were instructive in tobacco plant planning, quality management, acquisition of raw materials of tobacco and tobacco leaf blending.
Paoletti, Claudia; Esbensen, Kim H
2015-01-01
Material heterogeneity influences the effectiveness of sampling procedures. Most sampling guidelines used for assessment of food and/or feed commodities are based on classical statistical distribution requirements, the normal, binomial, and Poisson distributions-and almost universally rely on the assumption of randomness. However, this is unrealistic. The scientific food and feed community recognizes a strong preponderance of non random distribution within commodity lots, which should be a more realistic prerequisite for definition of effective sampling protocols. Nevertheless, these heterogeneity issues are overlooked as the prime focus is often placed only on financial, time, equipment, and personnel constraints instead of mandating acquisition of documented representative samples under realistic heterogeneity conditions. This study shows how the principles promulgated in the Theory of Sampling (TOS) and practically tested over 60 years provide an effective framework for dealing with the complete set of adverse aspects of both compositional and distributional heterogeneity (material sampling errors), as well as with the errors incurred by the sampling process itself. The results of an empirical European Union study on genetically modified soybean heterogeneity, Kernel Lot Distribution Assessment are summarized, as they have a strong bearing on the issue of proper sampling protocol development. TOS principles apply universally in the food and feed realm and must therefore be considered the only basis for development of valid sampling protocols free from distributional constraints.
Methods and analysis of realizing randomized grouping.
Hu, Liang-Ping; Bao, Xiao-Lei; Wang, Qi
2011-07-01
Randomization is one of the four basic principles of research design. The meaning of randomization includes two aspects: one is to randomly select samples from the population, which is known as random sampling; the other is to randomly group all the samples, which is called randomized grouping. Randomized grouping can be subdivided into three categories: completely, stratified and dynamically randomized grouping. This article mainly introduces the steps of complete randomization, the definition of dynamic randomization and the realization of random sampling and grouping by SAS software.
Rincent, R; Laloë, D; Nicolas, S; Altmann, T; Brunel, D; Revilla, P; Rodríguez, V M; Moreno-Gonzalez, J; Melchinger, A; Bauer, E; Schoen, C-C; Meyer, N; Giauffret, C; Bauland, C; Jamin, P; Laborde, J; Monod, H; Flament, P; Charcosset, A; Moreau, L
2012-10-01
Genomic selection refers to the use of genotypic information for predicting breeding values of selection candidates. A prediction formula is calibrated with the genotypes and phenotypes of reference individuals constituting the calibration set. The size and the composition of this set are essential parameters affecting the prediction reliabilities. The objective of this study was to maximize reliabilities by optimizing the calibration set. Different criteria based on the diversity or on the prediction error variance (PEV) derived from the realized additive relationship matrix-best linear unbiased predictions model (RA-BLUP) were used to select the reference individuals. For the latter, we considered the mean of the PEV of the contrasts between each selection candidate and the mean of the population (PEVmean) and the mean of the expected reliabilities of the same contrasts (CDmean). These criteria were tested with phenotypic data collected on two diversity panels of maize (Zea mays L.) genotyped with a 50k SNPs array. In the two panels, samples chosen based on CDmean gave higher reliabilities than random samples for various calibration set sizes. CDmean also appeared superior to PEVmean, which can be explained by the fact that it takes into account the reduction of variance due to the relatedness between individuals. Selected samples were close to optimality for a wide range of trait heritabilities, which suggests that the strategy presented here can efficiently sample subsets in panels of inbred lines. A script to optimize reference samples based on CDmean is available on request.
Tharwat, Alaa; Moemen, Yasmine S; Hassanien, Aboul Ella
2016-12-09
Measuring toxicity is one of the main steps in drug development. Hence, there is a high demand for computational models to predict the toxicity effects of the potential drugs. In this study, we used a dataset, which consists of four toxicity effects:mutagenic, tumorigenic, irritant and reproductive effects. The proposed model consists of three phases. In the first phase, rough set-based methods are used to select the most discriminative features for reducing the classification time and improving the classification performance. Due to the imbalanced class distribution, in the second phase, different sampling methods such as Random Under-Sampling, Random Over-Sampling and Synthetic Minority Oversampling Technique are used to solve the problem of imbalanced datasets. ITerative Sampling (ITS) method is proposed to avoid the limitations of those methods. ITS method has two steps. The first step (sampling step) iteratively modifies the prior distribution of the minority and majority classes. In the second step, a data cleaning method is used to remove the overlapping that is produced from the first step. In the third phase, Bagging classifier is used to classify an unknown drug into toxic or non-toxic. The experimental results proved that the proposed model performed well in classifying the unknown samples according to all toxic effects in the imbalanced datasets.
Experimental design and efficient parameter estimation in preclinical pharmacokinetic studies.
Ette, E I; Howie, C A; Kelman, A W; Whiting, B
1995-05-01
Monte Carlo simulation technique used to evaluate the effect of the arrangement of concentrations on the efficiency of estimation of population pharmacokinetic parameters in the preclinical setting is described. Although the simulations were restricted to the one compartment model with intravenous bolus input, they provide the basis of discussing some structural aspects involved in designing a destructive ("quantic") preclinical population pharmacokinetic study with a fixed sample size as is usually the case in such studies. The efficiency of parameter estimation obtained with sampling strategies based on the three and four time point designs were evaluated in terms of the percent prediction error, design number, individual and joint confidence intervals coverage for parameter estimates approaches, and correlation analysis. The data sets contained random terms for both inter- and residual intra-animal variability. The results showed that the typical population parameter estimates for clearance and volume were efficiently (accurately and precisely) estimated for both designs, while interanimal variability (the only random effect parameter that could be estimated) was inefficiently (inaccurately and imprecisely) estimated with most sampling schedules of the two designs. The exact location of the third and fourth time point for the three and four time point designs, respectively, was not critical to the efficiency of overall estimation of all population parameters of the model. However, some individual population pharmacokinetic parameters were sensitive to the location of these times.
Dolan, Paul; Rudisill, Caroline
2014-01-01
Financial incentives have been used in a variety of settings to motivate behaviors that might not otherwise be undertaken. They have been highlighted as particularly useful in settings that require a single behavior, such as appointment attendance or vaccination. They also have differential effects based on socioeconomic status in some applications (e.g. smoking). To further investigate these claims, we tested the effect of providing different types of non-cash financial incentives on the return rates of chlamydia specimen samples amongst 16–24 year-olds in England. In 2011 and 2012, we ran a two-stage randomized experiment involving 2988 young people (1489 in Round 1 and 1499 in Round 2) who requested a chlamydia screening kit from Freetest.me, an online and text screening service run by Preventx Limited. Participants were randomized to control, or one of five types of financial incentives in Round 1 or one of four financial incentives in Round 2. We tested the effect of five types of incentives on specimen sample return; reward vouchers of differing values, charity donation, participation in a lottery, choices between a lottery and a voucher and including vouchers of differing values in the test kit prior to specimen return. Financial incentives of any type, did not make a significant difference in the likelihood of specimen return. The more deprived individuals were, as calculated using Index of Multiple Deprivation (IMD), the less likely they were to return a sample. The extent to which incentive structures influenced sample return was not moderated by IMD score. Non-cash financial incentives for chlamydia testing do not seem to affect the specimen return rate in a chlamydia screening program where test kits are requested online, mailed to requestors and returned by mail. They also do not appear more or less effective in influencing test return depending on deprivation level. PMID:24373390
Royle, J. Andrew; Chandler, Richard B.; Yackulic, Charles; Nichols, James D.
2012-01-01
1. Understanding the factors affecting species occurrence is a pre-eminent focus of applied ecological research. However, direct information about species occurrence is lacking for many species. Instead, researchers sometimes have to rely on so-called presence-only data (i.e. when no direct information about absences is available), which often results from opportunistic, unstructured sampling. MAXENT is a widely used software program designed to model and map species distribution using presence-only data. 2. We provide a critical review of MAXENT as applied to species distribution modelling and discuss how it can lead to inferential errors. A chief concern is that MAXENT produces a number of poorly defined indices that are not directly related to the actual parameter of interest – the probability of occurrence (ψ). This focus on an index was motivated by the belief that it is not possible to estimate ψ from presence-only data; however, we demonstrate that ψ is identifiable using conventional likelihood methods under the assumptions of random sampling and constant probability of species detection. 3. The model is implemented in a convenient r package which we use to apply the model to simulated data and data from the North American Breeding Bird Survey. We demonstrate that MAXENT produces extreme under-predictions when compared to estimates produced by logistic regression which uses the full (presence/absence) data set. We note that MAXENT predictions are extremely sensitive to specification of the background prevalence, which is not objectively estimated using the MAXENT method. 4. As with MAXENT, formal model-based inference requires a random sample of presence locations. Many presence-only data sets, such as those based on museum records and herbarium collections, may not satisfy this assumption. However, when sampling is random, we believe that inference should be based on formal methods that facilitate inference about interpretable ecological quantities instead of vaguely defined indices.
Dolan, Paul; Rudisill, Caroline
2014-03-01
Financial incentives have been used in a variety of settings to motivate behaviors that might not otherwise be undertaken. They have been highlighted as particularly useful in settings that require a single behavior, such as appointment attendance or vaccination. They also have differential effects based on socioeconomic status in some applications (e.g. smoking). To further investigate these claims, we tested the effect of providing different types of non-cash financial incentives on the return rates of chlamydia specimen samples amongst 16-24 year-olds in England. In 2011 and 2012, we ran a two-stage randomized experiment involving 2988 young people (1489 in Round 1 and 1499 in Round 2) who requested a chlamydia screening kit from Freetest.me, an online and text screening service run by Preventx Limited. Participants were randomized to control, or one of five types of financial incentives in Round 1 or one of four financial incentives in Round 2. We tested the effect of five types of incentives on specimen sample return; reward vouchers of differing values, charity donation, participation in a lottery, choices between a lottery and a voucher and including vouchers of differing values in the test kit prior to specimen return. Financial incentives of any type, did not make a significant difference in the likelihood of specimen return. The more deprived individuals were, as calculated using Index of Multiple Deprivation (IMD), the less likely they were to return a sample. The extent to which incentive structures influenced sample return was not moderated by IMD score. Non-cash financial incentives for chlamydia testing do not seem to affect the specimen return rate in a chlamydia screening program where test kits are requested online, mailed to requestors and returned by mail. They also do not appear more or less effective in influencing test return depending on deprivation level. Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.
Maximizing lipocalin prediction through balanced and diversified training set and decision fusion.
Nath, Abhigyan; Subbiah, Karthikeyan
2015-12-01
Lipocalins are short in sequence length and perform several important biological functions. These proteins are having less than 20% sequence similarity among paralogs. Experimentally identifying them is an expensive and time consuming process. The computational methods based on the sequence similarity for allocating putative members to this family are also far elusive due to the low sequence similarity existing among the members of this family. Consequently, the machine learning methods become a viable alternative for their prediction by using the underlying sequence/structurally derived features as the input. Ideally, any machine learning based prediction method must be trained with all possible variations in the input feature vector (all the sub-class input patterns) to achieve perfect learning. A near perfect learning can be achieved by training the model with diverse types of input instances belonging to the different regions of the entire input space. Furthermore, the prediction performance can be improved through balancing the training set as the imbalanced data sets will tend to produce the prediction bias towards majority class and its sub-classes. This paper is aimed to achieve (i) the high generalization ability without any classification bias through the diversified and balanced training sets as well as (ii) enhanced the prediction accuracy by combining the results of individual classifiers with an appropriate fusion scheme. Instead of creating the training set randomly, we have first used the unsupervised Kmeans clustering algorithm to create diversified clusters of input patterns and created the diversified and balanced training set by selecting an equal number of patterns from each of these clusters. Finally, probability based classifier fusion scheme was applied on boosted random forest algorithm (which produced greater sensitivity) and K nearest neighbour algorithm (which produced greater specificity) to achieve the enhanced predictive performance than that of individual base classifiers. The performance of the learned models trained on Kmeans preprocessed training set is far better than the randomly generated training sets. The proposed method achieved a sensitivity of 90.6%, specificity of 91.4% and accuracy of 91.0% on the first test set and sensitivity of 92.9%, specificity of 96.2% and accuracy of 94.7% on the second blind test set. These results have established that diversifying training set improves the performance of predictive models through superior generalization ability and balancing the training set improves prediction accuracy. For smaller data sets, unsupervised Kmeans based sampling can be an effective technique to increase generalization than that of the usual random splitting method. Copyright © 2015 Elsevier Ltd. All rights reserved.
Studies on spectral analysis of randomly sampled signals: Application to laser velocimetry data
NASA Technical Reports Server (NTRS)
Sree, David
1992-01-01
Spectral analysis is very useful in determining the frequency characteristics of many turbulent flows, for example, vortex flows, tail buffeting, and other pulsating flows. It is also used for obtaining turbulence spectra from which the time and length scales associated with the turbulence structure can be estimated. These estimates, in turn, can be helpful for validation of theoretical/numerical flow turbulence models. Laser velocimetry (LV) is being extensively used in the experimental investigation of different types of flows, because of its inherent advantages; nonintrusive probing, high frequency response, no calibration requirements, etc. Typically, the output of an individual realization laser velocimeter is a set of randomly sampled velocity data. Spectral analysis of such data requires special techniques to obtain reliable estimates of correlation and power spectral density functions that describe the flow characteristics. FORTRAN codes for obtaining the autocorrelation and power spectral density estimates using the correlation-based slotting technique were developed. Extensive studies have been conducted on simulated first-order spectrum and sine signals to improve the spectral estimates. A first-order spectrum was chosen because it represents the characteristics of a typical one-dimensional turbulence spectrum. Digital prefiltering techniques, to improve the spectral estimates from randomly sampled data were applied. Studies show that the spectral estimates can be increased up to about five times the mean sampling rate.
Kangovi, Shreya; Mitra, Nandita; Turr, Lindsey; Huo, Hairong; Grande, David; Long, Judith A.
2017-01-01
Upstream interventions – e.g. housing programs and community health worker interventions-address socioeconomic and behavioral factors that influence health outcomes across diseases. Studying these types of interventions in clinical trials raises a methodological challenge: how should researchers measure the effect of an upstream intervention in a sample of patients with different diseases? This paper addresses this question using an illustrative protocol of a randomized controlled trial of collaborative-goal setting versus goal-setting plus community health worker support among patients multiple chronic diseases: diabetes, obesity, hypertension and tobacco dependence. At study enrollment, patients met with their primary care providers to select one of their chronic diseases to focus on during the study, and to collaboratively set a goal for that disease. Patients randomly assigned to a community health worker also received six months of support to address socioeconomic and behavioral barriers to chronic disease control. The primary hypothesis was that there would be differences in patients’ selected chronic disease control as measured by HbA1c, body mass index, systolic blood pressure and cigarettes per day, between the goal-setting alone and community health worker support arms. To test this hypothesis, we will conduct a stratum specific multivariate analysis of variance which allows all patients (regardless of their selected chronic disease) to be included in a single model for the primary outcome. Population health researchers can use this approach to measure clinical outcomes across diseases. PMID:27965180
A social preference valuations set for EQ-5D health states in Flanders, Belgium.
Cleemput, Irina
2010-04-01
This study aimed at deriving a preference valuation set for EQ-5D health states from the general Flemish public in Belgium. A EuroQol valuation instrument with 16 health states to be valued on a visual analogue scale was sent to a random sample of 2,754 adults. The initial response rate was 35%. Eventually, 548 (20%) respondents provided useable valuations for modeling. Valuations for 245 health states were modeled using a random effects model. The selection of the model was based on two criteria: health state valuations must be consistent, and the difference with the directly observed valuations must be small. A model including a value decrement if any health dimension of the EQ-5D is on the worst level was selected to construct the social health state valuation set. A comparison with health state valuations from other countries showed similarities, especially with those from New Zealand. The use of a single preference valuation set across different health economic evaluations within a country is highly preferable to increase their usability for policy makers. This study contributes to the standardization of outcome measurement in economic evaluations in Belgium.
Lehmann, Deborah; Kirarock, Wendy; van den Biggelaar, Anita H J; Passey, Megan; Jacoby, Peter; Saleu, Gerard; Masiria, Geraldine; Nivio, Birunu; Greenhill, Andrew; Orami, Tilda; Francis, Jacinta; Ford, Rebecca; Kirkham, Lea-Ann; Solomon, Vela; Richmond, Peter C; Pomat, William S
2017-01-01
Children in third-world settings including Papua New Guinea (PNG) experience early onset of carriage with a broad range of pneumococcal serotypes, resulting in a high incidence of severe pneumococcal disease and deaths in the first 2 years of life. Vaccination trials in high endemicity settings are needed to provide evidence and guidance on optimal strategies to protect children in these settings against pneumococcal infections. This report describes the rationale, objectives, methods, study population, follow-up and specimen collection for a vaccination trial conducted in an endemic and logistically challenging setting in PNG. The trial aimed to determine whether currently available pneumococcal conjugate vaccines (PCV) are suitable for use under PNG's accelerated immunization schedule, and that a schedule including pneumococcal polysaccharide vaccine (PPV) in later infancy is safe and immunogenic in this high-risk population. This open randomized-controlled trial was conducted between November 2011 and March 2016, enrolling 262 children aged 1 month between November 2011 and April 2014. The participants were randomly allocated (1:1) to receive 10-valent PCV (10vPCV) or 13-valent PCV (13vPCV) in a 1-2-3-month schedule, with further randomization to receive PPV or no PPV at age 9 months, followed by a 1/5 th PPV challenge at age 23 months. A total of 1229 blood samples were collected to measure humoral and cellular immune responses and 1238 nasopharyngeal swabs to assess upper respiratory tract colonization and carriage load. Serious adverse events were monitored throughout the study. Of the 262 children enrolled, 87% received 3 doses of PCV, 79% were randomized to receive PPV or no PPV at age 9 months, and 67% completed the study at 24 months of age with appropriate immunization and challenge. Laboratory testing of the many samples collected during this trial will determine the impact of the different vaccine schedules and formulations on nasopharyngeal carriage, antibody production and function, and immune memory. The final data will inform policy on pneumococcal vaccine schedules in countries with children at high risk of pneumococcal disease by providing direct comparison of an accelerated schedule of 10vPCV and 13vPCV and the potential advantages of PPV following PCV immunization. ClinicalTrials.gov CTN NCT01619462, retrospectively registered on May 28, 2012.
The topology of large-scale structure. III - Analysis of observations
NASA Astrophysics Data System (ADS)
Gott, J. Richard, III; Miller, John; Thuan, Trinh X.; Schneider, Stephen E.; Weinberg, David H.; Gammie, Charles; Polk, Kevin; Vogeley, Michael; Jeffrey, Scott; Bhavsar, Suketu P.; Melott, Adrian L.; Giovanelli, Riccardo; Hayes, Martha P.; Tully, R. Brent; Hamilton, Andrew J. S.
1989-05-01
A recently developed algorithm for quantitatively measuring the topology of large-scale structures in the universe was applied to a number of important observational data sets. The data sets included an Abell (1958) cluster sample out to Vmax = 22,600 km/sec, the Giovanelli and Haynes (1985) sample out to Vmax = 11,800 km/sec, the CfA sample out to Vmax = 5000 km/sec, the Thuan and Schneider (1988) dwarf sample out to Vmax = 3000 km/sec, and the Tully (1987) sample out to Vmax = 3000 km/sec. It was found that, when the topology is studied on smoothing scales significantly larger than the correlation length (i.e., smoothing length, lambda, not below 1200 km/sec), the topology is spongelike and is consistent with the standard model in which the structure seen today has grown from small fluctuations caused by random noise in the early universe. When the topology is studied on the scale of lambda of about 600 km/sec, a small shift is observed in the genus curve in the direction of a 'meatball' topology.
The topology of large-scale structure. III - Analysis of observations. [in universe
NASA Technical Reports Server (NTRS)
Gott, J. Richard, III; Weinberg, David H.; Miller, John; Thuan, Trinh X.; Schneider, Stephen E.
1989-01-01
A recently developed algorithm for quantitatively measuring the topology of large-scale structures in the universe was applied to a number of important observational data sets. The data sets included an Abell (1958) cluster sample out to Vmax = 22,600 km/sec, the Giovanelli and Haynes (1985) sample out to Vmax = 11,800 km/sec, the CfA sample out to Vmax = 5000 km/sec, the Thuan and Schneider (1988) dwarf sample out to Vmax = 3000 km/sec, and the Tully (1987) sample out to Vmax = 3000 km/sec. It was found that, when the topology is studied on smoothing scales significantly larger than the correlation length (i.e., smoothing length, lambda, not below 1200 km/sec), the topology is spongelike and is consistent with the standard model in which the structure seen today has grown from small fluctuations caused by random noise in the early universe. When the topology is studied on the scale of lambda of about 600 km/sec, a small shift is observed in the genus curve in the direction of a 'meatball' topology.
Jiang, Xunpeng; Yang, Zengling; Han, Lujia
2014-07-01
Contaminated meat and bone meal (MBM) in animal feedstuff has been the source of bovine spongiform encephalopathy (BSE) disease in cattle, leading to a ban in its use, so methods for its detection are essential. In this study, five pure feed and five pure MBM samples were used to prepare two sets of sample arrangements: set A for investigating the discrimination of individual feed/MBM particles and set B for larger numbers of overlapping particles. The two sets were used to test a Markov random field (MRF)-based approach. A Fourier transform infrared (FT-IR) imaging system was used for data acquisition. The spatial resolution of the near-infrared (NIR) spectroscopic image was 25 μm × 25 μm. Each spectrum was the average of 16 scans across the wavenumber range 7,000-4,000 cm(-1), at intervals of 8 cm(-1). This study introduces an innovative approach to analyzing NIR spectroscopic images: an MRF-based approach has been developed using the iterated conditional mode (ICM) algorithm, integrating initial labeling-derived results from support vector machine discriminant analysis (SVMDA) and observation data derived from the results of principal component analysis (PCA). The results showed that MBM covered by feed could be successfully recognized with an overall accuracy of 86.59% and a Kappa coefficient of 0.68. Compared with conventional methods, the MRF-based approach is capable of extracting spectral information combined with spatial information from NIR spectroscopic images. This new approach enhances the identification of MBM using NIR spectroscopic imaging.
Xue, Gang; Song, Wen-qi; Li, Shu-chao
2015-01-01
In order to achieve the rapid identification of fire resistive coating for steel structure of different brands in circulating, a new method for the fast discrimination of varieties of fire resistive coating for steel structure by means of near infrared spectroscopy was proposed. The raster scanning near infrared spectroscopy instrument and near infrared diffuse reflectance spectroscopy were applied to collect the spectral curve of different brands of fire resistive coating for steel structure and the spectral data were preprocessed with standard normal variate transformation(standard normal variate transformation, SNV) and Norris second derivative. The principal component analysis (principal component analysis, PCA)was used to near infrared spectra for cluster analysis. The analysis results showed that the cumulate reliabilities of PC1 to PC5 were 99. 791%. The 3-dimentional plot was drawn with the scores of PC1, PC2 and PC3 X 10, which appeared to provide the best clustering of the varieties of fire resistive coating for steel structure. A total of 150 fire resistive coating samples were divided into calibration set and validation set randomly, the calibration set had 125 samples with 25 samples of each variety, and the validation set had 25 samples with 5 samples of each variety. According to the principal component scores of unknown samples, Mahalanobis distance values between each variety and unknown samples were calculated to realize the discrimination of different varieties. The qualitative analysis model for external verification of unknown samples is a 10% recognition ration. The results demonstrated that this identification method can be used as a rapid, accurate method to identify the classification of fire resistive coating for steel structure and provide technical reference for market regulation.
A stratified two-stage sampling design for digital soil mapping in a Mediterranean basin
NASA Astrophysics Data System (ADS)
Blaschek, Michael; Duttmann, Rainer
2015-04-01
The quality of environmental modelling results often depends on reliable soil information. In order to obtain soil data in an efficient manner, several sampling strategies are at hand depending on the level of prior knowledge and the overall objective of the planned survey. This study focuses on the collection of soil samples considering available continuous secondary information in an undulating, 16 km²-sized river catchment near Ussana in southern Sardinia (Italy). A design-based, stratified, two-stage sampling design has been applied aiming at the spatial prediction of soil property values at individual locations. The stratification based on quantiles from density functions of two land-surface parameters - topographic wetness index and potential incoming solar radiation - derived from a digital elevation model. Combined with four main geological units, the applied procedure led to 30 different classes in the given test site. Up to six polygons of each available class were selected randomly excluding those areas smaller than 1ha to avoid incorrect location of the points in the field. Further exclusion rules were applied before polygon selection masking out roads and buildings using a 20m buffer. The selection procedure was repeated ten times and the set of polygons with the best geographical spread were chosen. Finally, exact point locations were selected randomly from inside the chosen polygon features. A second selection based on the same stratification and following the same methodology (selecting one polygon instead of six) was made in order to create an appropriate validation set. Supplementary samples were obtained during a second survey focusing on polygons that have either not been considered during the first phase at all or were not adequately represented with respect to feature size. In total, both field campaigns produced an interpolation set of 156 samples and a validation set of 41 points. The selection of sample point locations has been done using ESRI software (ArcGIS) extended by Hawth's Tools and later on its replacement the Geospatial Modelling Environment (GME). 88% of all desired points could actually be reached in the field and have been successfully sampled. Our results indicate that the sampled calibration and validation sets are representative for each other and could be successfully used as interpolation data for spatial prediction purposes. With respect to soil textural fractions, for instance, equal multivariate means and variance homogeneity were found for the two datasets as evidenced by significant (P > 0.05) Hotelling T²-test (2.3 with df1 = 3, df2 = 193) and Bartlett's test statistics (6.4 with df = 6). The multivariate prediction of clay, silt and sand content using a neural network residual cokriging approach reached an explained variance level of 56%, 47% and 63%. Thus, the presented case study is a successful example of considering readily available continuous information on soil forming factors such as geology and relief as stratifying variables for designing sampling schemes in digital soil mapping projects.
Żebrowska, Magdalena; Posch, Martin; Magirr, Dominic
2016-05-30
Consider a parallel group trial for the comparison of an experimental treatment to a control, where the second-stage sample size may depend on the blinded primary endpoint data as well as on additional blinded data from a secondary endpoint. For the setting of normally distributed endpoints, we demonstrate that this may lead to an inflation of the type I error rate if the null hypothesis holds for the primary but not the secondary endpoint. We derive upper bounds for the inflation of the type I error rate, both for trials that employ random allocation and for those that use block randomization. We illustrate the worst-case sample size reassessment rule in a case study. For both randomization strategies, the maximum type I error rate increases with the effect size in the secondary endpoint and the correlation between endpoints. The maximum inflation increases with smaller block sizes if information on the block size is used in the reassessment rule. Based on our findings, we do not question the well-established use of blinded sample size reassessment methods with nuisance parameter estimates computed from the blinded interim data of the primary endpoint. However, we demonstrate that the type I error rate control of these methods relies on the application of specific, binding, pre-planned and fully algorithmic sample size reassessment rules and does not extend to general or unplanned sample size adjustments based on blinded data. © 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Wilmoth, Siri K.; Irvine, Kathryn M.; Larson, Chad
2015-01-01
Various GIS-generated land-use predictor variables, physical habitat metrics, and water chemistry variables from 75 reference streams and 351 randomly sampled sites throughout Washington State were evaluated for effectiveness at discriminating reference from random sites within level III ecoregions. A combination of multivariate clustering and ordination techniques were used. We describe average observed conditions for a subset of predictor variables as well as proposing statistical criteria for establishing reference conditions for stream habitat in Washington. Using these criteria, we determined whether any of the random sites met expectations for reference condition and whether any of the established reference sites failed to meet expectations for reference condition. Establishing these criteria will set a benchmark from which future data will be compared.
Simultaneous Identification of Multiple Driver Pathways in Cancer
Leiserson, Mark D. M.; Blokh, Dima
2013-01-01
Distinguishing the somatic mutations responsible for cancer (driver mutations) from random, passenger mutations is a key challenge in cancer genomics. Driver mutations generally target cellular signaling and regulatory pathways consisting of multiple genes. This heterogeneity complicates the identification of driver mutations by their recurrence across samples, as different combinations of mutations in driver pathways are observed in different samples. We introduce the Multi-Dendrix algorithm for the simultaneous identification of multiple driver pathways de novo in somatic mutation data from a cohort of cancer samples. The algorithm relies on two combinatorial properties of mutations in a driver pathway: high coverage and mutual exclusivity. We derive an integer linear program that finds set of mutations exhibiting these properties. We apply Multi-Dendrix to somatic mutations from glioblastoma, breast cancer, and lung cancer samples. Multi-Dendrix identifies sets of mutations in genes that overlap with known pathways – including Rb, p53, PI(3)K, and cell cycle pathways – and also novel sets of mutually exclusive mutations, including mutations in several transcription factors or other genes involved in transcriptional regulation. These sets are discovered directly from mutation data with no prior knowledge of pathways or gene interactions. We show that Multi-Dendrix outperforms other algorithms for identifying combinations of mutations and is also orders of magnitude faster on genome-scale data. Software available at: http://compbio.cs.brown.edu/software. PMID:23717195
[Krigle estimation and its simulated sampling of Chilo suppressalis population density].
Yuan, Zheming; Bai, Lianyang; Wang, Kuiwu; Hu, Xiangyue
2004-07-01
In order to draw up a rational sampling plan for the larvae population of Chilo suppressalis, an original population and its two derivative populations, random population and sequence population, were sampled and compared with random sampling, gap-range-random sampling, and a new systematic sampling integrated Krigle interpolation and random original position. As for the original population whose distribution was up to aggregative and dependence range in line direction was 115 cm (6.9 units), gap-range-random sampling in line direction was more precise than random sampling. Distinguishing the population pattern correctly is the key to get a better precision. Gap-range-random sampling and random sampling are fit for aggregated population and random population, respectively, but both of them are difficult to apply in practice. Therefore, a new systematic sampling named as Krigle sample (n = 441) was developed to estimate the density of partial sample (partial estimation, n = 441) and population (overall estimation, N = 1500). As for original population, the estimated precision of Krigle sample to partial sample and population was better than that of investigation sample. With the increase of the aggregation intensity of population, Krigel sample was more effective than investigation sample in both partial estimation and overall estimation in the appropriate sampling gap according to the dependence range.
Intelligent Control of a Sensor-Actuator System via Kernelized Least-Squares Policy Iteration
Liu, Bo; Chen, Sanfeng; Li, Shuai; Liang, Yongsheng
2012-01-01
In this paper a new framework, called Compressive Kernelized Reinforcement Learning (CKRL), for computing near-optimal policies in sequential decision making with uncertainty is proposed via incorporating the non-adaptive data-independent Random Projections and nonparametric Kernelized Least-squares Policy Iteration (KLSPI). Random Projections are a fast, non-adaptive dimensionality reduction framework in which high-dimensionality data is projected onto a random lower-dimension subspace via spherically random rotation and coordination sampling. KLSPI introduce kernel trick into the LSPI framework for Reinforcement Learning, often achieving faster convergence and providing automatic feature selection via various kernel sparsification approaches. In this approach, policies are computed in a low-dimensional subspace generated by projecting the high-dimensional features onto a set of random basis. We first show how Random Projections constitute an efficient sparsification technique and how our method often converges faster than regular LSPI, while at lower computational costs. Theoretical foundation underlying this approach is a fast approximation of Singular Value Decomposition (SVD). Finally, simulation results are exhibited on benchmark MDP domains, which confirm gains both in computation time and in performance in large feature spaces. PMID:22736969
Sample size calculations for stepped wedge and cluster randomised trials: a unified approach
Hemming, Karla; Taljaard, Monica
2016-01-01
Objectives To clarify and illustrate sample size calculations for the cross-sectional stepped wedge cluster randomized trial (SW-CRT) and to present a simple approach for comparing the efficiencies of competing designs within a unified framework. Study Design and Setting We summarize design effects for the SW-CRT, the parallel cluster randomized trial (CRT), and the parallel cluster randomized trial with before and after observations (CRT-BA), assuming cross-sectional samples are selected over time. We present new formulas that enable trialists to determine the required cluster size for a given number of clusters. We illustrate by example how to implement the presented design effects and give practical guidance on the design of stepped wedge studies. Results For a fixed total cluster size, the choice of study design that provides the greatest power depends on the intracluster correlation coefficient (ICC) and the cluster size. When the ICC is small, the CRT tends to be more efficient; when the ICC is large, the SW-CRT tends to be more efficient and can serve as an alternative design when the CRT is an infeasible design. Conclusion Our unified approach allows trialists to easily compare the efficiencies of three competing designs to inform the decision about the most efficient design in a given scenario. PMID:26344808
Sampling guidelines for oral fluid-based surveys of group-housed animals.
Rotolo, Marisa L; Sun, Yaxuan; Wang, Chong; Giménez-Lirola, Luis; Baum, David H; Gauger, Phillip C; Harmon, Karen M; Hoogland, Marlin; Main, Rodger; Zimmerman, Jeffrey J
2017-09-01
Formulas and software for calculating sample size for surveys based on individual animal samples are readily available. However, sample size formulas are not available for oral fluids and other aggregate samples that are increasingly used in production settings. Therefore, the objective of this study was to develop sampling guidelines for oral fluid-based porcine reproductive and respiratory syndrome virus (PRRSV) surveys in commercial swine farms. Oral fluid samples were collected in 9 weekly samplings from all pens in 3 barns on one production site beginning shortly after placement of weaned pigs. Samples (n=972) were tested by real-time reverse-transcription PCR (RT-rtPCR) and the binary results analyzed using a piecewise exponential survival model for interval-censored, time-to-event data with misclassification. Thereafter, simulation studies were used to study the barn-level probability of PRRSV detection as a function of sample size, sample allocation (simple random sampling vs fixed spatial sampling), assay diagnostic sensitivity and specificity, and pen-level prevalence. These studies provided estimates of the probability of detection by sample size and within-barn prevalence. Detection using fixed spatial sampling was as good as, or better than, simple random sampling. Sampling multiple barns on a site increased the probability of detection with the number of barns sampled. These results are relevant to PRRSV control or elimination projects at the herd, regional, or national levels, but the results are also broadly applicable to contagious pathogens of swine for which oral fluid tests of equivalent performance are available. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Study of Dynamic Characteristics of Aeroelastic Systems Utilizing Randomdec Signatures
NASA Technical Reports Server (NTRS)
Chang, C. S.
1975-01-01
The feasibility of utilizing the random decrement method in conjunction with a signature analysis procedure to determine the dynamic characteristics of an aeroelastic system for the purpose of on-line prediction of potential on-set of flutter was examined. Digital computer programs were developed to simulate sampled response signals of a two-mode aeroelastic system. Simulated response data were used to test the random decrement method. A special curve-fit approach was developed for analyzing the resulting signatures. A number of numerical 'experiments' were conducted on the combined processes. The method is capable of determining frequency and damping values accurately from randomdec signatures of carefully selected lengths.
Wu, Wei; Chen, Albert Y C; Zhao, Liang; Corso, Jason J
2014-03-01
Detection and segmentation of a brain tumor such as glioblastoma multiforme (GBM) in magnetic resonance (MR) images are often challenging due to its intrinsically heterogeneous signal characteristics. A robust segmentation method for brain tumor MRI scans was developed and tested. Simple thresholds and statistical methods are unable to adequately segment the various elements of the GBM, such as local contrast enhancement, necrosis, and edema. Most voxel-based methods cannot achieve satisfactory results in larger data sets, and the methods based on generative or discriminative models have intrinsic limitations during application, such as small sample set learning and transfer. A new method was developed to overcome these challenges. Multimodal MR images are segmented into superpixels using algorithms to alleviate the sampling issue and to improve the sample representativeness. Next, features were extracted from the superpixels using multi-level Gabor wavelet filters. Based on the features, a support vector machine (SVM) model and an affinity metric model for tumors were trained to overcome the limitations of previous generative models. Based on the output of the SVM and spatial affinity models, conditional random fields theory was applied to segment the tumor in a maximum a posteriori fashion given the smoothness prior defined by our affinity model. Finally, labeling noise was removed using "structural knowledge" such as the symmetrical and continuous characteristics of the tumor in spatial domain. The system was evaluated with 20 GBM cases and the BraTS challenge data set. Dice coefficients were computed, and the results were highly consistent with those reported by Zikic et al. (MICCAI 2012, Lecture notes in computer science. vol 7512, pp 369-376, 2012). A brain tumor segmentation method using model-aware affinity demonstrates comparable performance with other state-of-the art algorithms.
Muller, Julius; Parizotto, Eneida; Antrobus, Richard; Francis, James; Bunce, Campbell; Stranks, Amanda; Nichols, Marshall; McClain, Micah; Hill, Adrian V S; Ramasamy, Adaikalavan; Gilbert, Sarah C
2017-06-08
Influenza challenge trials are important for vaccine efficacy testing. Currently, disease severity is determined by self-reported scores to a list of symptoms which can be highly subjective. A more objective measure would allow for improved data analysis. Twenty-one volunteers participated in an influenza challenge trial. We calculated the daily sum of scores (DSS) for a list of 16 influenza symptoms. Whole blood collected at baseline and 24, 48, 72 and 96 h post challenge was profiled on Illumina HT12v4 microarrays. Changes in gene expression most strongly correlated with DSS were selected to train a Random Forest model and tested on two independent test sets consisting of 41 individuals profiled on a different microarray platform and 33 volunteers assayed by qRT-PCR. 1456 probes are significantly associated with DSS at 1% false discovery rate. We selected 19 genes with the largest fold change to train a random forest model. We observed good concordance between predicted and actual scores in the first test set (r = 0.57; RMSE = -16.1%) with the greatest agreement achieved on samples collected approximately 72 h post challenge. Therefore, we assayed samples collected at baseline and 72 h post challenge in the second test set by qRT-PCR and observed good concordance (r = 0.81; RMSE = -36.1%). We developed a 19-gene qRT-PCR panel to predict DSS, validated on two independent datasets. A transcriptomics based panel could provide a more objective measure of symptom scoring in future influenza challenge studies. Trial registration Samples were obtained from a clinical trial with the ClinicalTrials.gov Identifier: NCT02014870, first registered on December 5, 2013.
The quality of care in occupational therapy: an assessment of selected Michigan hospitals.
Kirchman, M M
1979-07-01
In this study, a methodology was developed and tested for assessing the quality of care in occupational therapy between educational and noneducational clinical settings, as measured by process and outcome. An instrument was constructed for an external audit of the hospital record. Standards drafted by the investigator were established as normative by a panel of experts for use in judging the programs. Hospital records of 84 patients with residual hemiparesis or hemiplegia in three noneducational settings and of 100 patients with similar diagnoses in two educational clinical settings from selected Michigan facilities were chosen by proportionate stratified random sampling. The process study showed that occupational therapy was of significantly higher quality in the educational settings. The outcome study did not show significant differences between types of settings. Implications for education and practice are discussed.
Milliren, Carly E; Evans, Clare R; Richmond, Tracy K; Dunn, Erin C
2018-06-06
Recent advances in multilevel modeling allow for modeling non-hierarchical levels (e.g., youth in non-nested schools and neighborhoods) using cross-classified multilevel models (CCMM). Current practice is to cluster samples from one context (e.g., schools) and utilize the observations however they are distributed from the second context (e.g., neighborhoods). However, it is unknown whether an uneven distribution of sample size across these contexts leads to incorrect estimates of random effects in CCMMs. Using the school and neighborhood data structure in Add Health, we examined the effect of neighborhood sample size imbalance on the estimation of variance parameters in models predicting BMI. We differentially assigned students from a given school to neighborhoods within that school's catchment area using three scenarios of (im)balance. 1000 random datasets were simulated for each of five combinations of school- and neighborhood-level variance and imbalance scenarios, for a total of 15,000 simulated data sets. For each simulation, we calculated 95% CIs for the variance parameters to determine whether the true simulated variance fell within the interval. Across all simulations, the "true" school and neighborhood variance parameters were estimated 93-96% of the time. Only 5% of models failed to capture neighborhood variance; 6% failed to capture school variance. These results suggest that there is no systematic bias in the ability of CCMM to capture the true variance parameters regardless of the distribution of students across neighborhoods. Ongoing efforts to use CCMM are warranted and can proceed without concern for the sample imbalance across contexts. Copyright © 2018 Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jenkins-Smith, H.C.; Espey, J.L.; Rouse, A.A.
1991-06-01
This report describes the results of a set of five surveys designed to assess the perceived risks of nuclear waste management policy in Colorado and New Mexico. Within these states, mail surveys of randomly selected samples were taken of members of the American Association for the Advancement of Science, members of the Sierra Club, members of business associations, and state legislators. In addition, a telephone sample of randomly selected households was conducted in Colorado and New Mexico. Using these data, the perceptions of the risk of nuclear waste management -- from production of nuclear energy through permanent storage of nuclearmore » wastes -- are compared for each of the five samples. The degree of trust in, and the perceived political influence of, the more prominent policy actors are assessed. Certain cognitive attributes, including degree of subjective certainty regarding beliefs about risks of nuclear wastes, and likelihood of altering perceived risks when confronted with new information, are compared across samples. In addition, the sample scores from rudimentary knowledge tests about the characteristics of radiation are compared. The relationships among the knowledge scores, cognitive attributes and risk perceptions are evaluated. Perceptions of the balance of media coverage are measured, as are the possible direct and indirect roles of media exposure in risk perception. Aggregate models, testing an array of hypotheses about the bases of nuclear waste risk perceptions, are conducted. These tests indicate that risk perceptions are related to a complex set of factors, and that these factors may differ significantly across the different sub-populations. Finally, the relationships between risk perception and political participation -- including registering to vote, political party affiliation, and level of political activism -- are analyzed. 5 figs., 33 tabs.« less
ERIC Educational Resources Information Center
Peretti-Watel, Patrick
2005-01-01
Objective: To study drug-related beliefs among adolescents, and specifically their propensity to distinguish "soft drugs" from "hard drugs"; to investigate factors associated with such a propensity as well as its relationship with cannabis use. Design & setting: A cross-sectional self-administered survey conducted among a random sample of 5,812…
ERIC Educational Resources Information Center
Barik, Henri; And Others
The results of the evaluation of the French immersion program at a school in a unilingual English environment are described. A battery of tests was administered to a random sample of children from the kindergarten and grade one experimental French immersion classes and to a comparison group composed of children following the regular English…
ERIC Educational Resources Information Center
Bilgin, Ibrahim; Karakuyu, Yunus; Ay, Yusuf
2015-01-01
The purpose of this study is to investigate the effects of the Project-Based Learning (PBL) method on undergraduate students' achievement and its association with these students' self-efficacy beliefs about science teaching and pinions about PBL. The sample of the study consisted of two randomly chosen classes from a set of seven classes enrolled…
ERIC Educational Resources Information Center
Mulvaney, C. A.; Watson, M. C.; Smith, S.; Coupland, C.; Kendrick, D.
2014-01-01
Objective: To determine the prevalence of home safety practices and use of safety equipment by disadvantaged families participating in a national home safety equipment scheme in England. Design: Cross-sectional postal survey sent to a random sample of 1,000 families. Setting: England, United Kingdom. Results: Half the families (51%) returned a…
ERIC Educational Resources Information Center
Taylor, Matthew J.; Merritt, Stephanie M.; Austin, Chammie C.
2013-01-01
A model of negative affect and alcohol use was replicated on a sample of African-American high school students. Participants (N = 5,086) were randomly selected from a previously collected data set and consisted of 2,253 males and 2,833 females residing in both rural and urban locations. Multivariate analysis of covariance and structural equation…
ERIC Educational Resources Information Center
Rivera, Hector H.; Tharp, Roland G.
2006-01-01
This study provides an empirical description of the dimensions of community values, beliefs, and opinions through a survey conducted in the Pueblo Indian community of Zuni in New Mexico. The sample was composed of 200 randomly chosen community members ranging from 21 to 103 years old. A principal component factor analysis was conducted, as well as…
Injury severity data for front and second row passengers in frontal crashes.
Atkinson, Theresa; Leszek Gawarecki; Tavakoli, Massoud
2016-06-01
The data contained here were obtained from the National Highway Transportation Safety Administration׳s National Automotive Sampling System - Crashworthiness Data System (NASS-CDS) for the years 2008-2014. This publically available data set monitors motor vehicle crashes in the United States, using a stratified random sample frame, resulting in information on approximately 5000 crashes each year that can be utilized to create national estimates for crashes. The NASS-CDS data sets document vehicle, crash, and occupant factors. These data can be utilized to examine public health, law enforcement, roadway planning, and vehicle design issues. The data provided in this brief are a subset of crash events and occupants. The crashes provided are exclusively frontal crashes. Within these crashes, only restrained occupants who were seated in the right front seat position or the second row outboard seat positions were included. The front row and second row data sets were utilized to construct occupant pairs crashes where both a right front seat occupant and a second row occupant were available. Both unpaired and paired data sets are provided in this brief.
Injury severity data for front and second row passengers in frontal crashes
Atkinson, Theresa; Leszek Gawarecki; Tavakoli, Massoud
2016-01-01
The data contained here were obtained from the National Highway Transportation Safety Administration׳s National Automotive Sampling System – Crashworthiness Data System (NASS-CDS) for the years 2008–2014. This publically available data set monitors motor vehicle crashes in the United States, using a stratified random sample frame, resulting in information on approximately 5000 crashes each year that can be utilized to create national estimates for crashes. The NASS-CDS data sets document vehicle, crash, and occupant factors. These data can be utilized to examine public health, law enforcement, roadway planning, and vehicle design issues. The data provided in this brief are a subset of crash events and occupants. The crashes provided are exclusively frontal crashes. Within these crashes, only restrained occupants who were seated in the right front seat position or the second row outboard seat positions were included. The front row and second row data sets were utilized to construct occupant pairs crashes where both a right front seat occupant and a second row occupant were available. Both unpaired and paired data sets are provided in this brief. PMID:27077084
Zhan, Xue-yan; Zhao, Na; Lin, Zhao-zhou; Wu, Zhi-sheng; Yuan, Rui-juan; Qiao, Yan-jiang
2014-12-01
The appropriate algorithm for calibration set selection was one of the key technologies for a good NIR quantitative model. There are different algorithms for calibration set selection, such as Random Sampling (RS) algorithm, Conventional Selection (CS) algorithm, Kennard-Stone(KS) algorithm and Sample set Portioning based on joint x-y distance (SPXY) algorithm, et al. However, there lack systematic comparisons between two algorithms of the above algorithms. The NIR quantitative models to determine the asiaticoside content in Centella total glucosides were established in the present paper, of which 7 indexes were classified and selected, and the effects of CS algorithm, KS algorithm and SPXY algorithm for calibration set selection on the accuracy and robustness of NIR quantitative models were investigated. The accuracy indexes of NIR quantitative models with calibration set selected by SPXY algorithm were significantly different from that with calibration set selected by CS algorithm or KS algorithm, while the robustness indexes, such as RMSECV and |RMSEP-RMSEC|, were not significantly different. Therefore, SPXY algorithm for calibration set selection could improve the predicative accuracy of NIR quantitative models to determine asiaticoside content in Centella total glucosides, and have no significant effect on the robustness of the models, which provides a reference to determine the appropriate algorithm for calibration set selection when NIR quantitative models are established for the solid system of traditional Chinese medcine.
Requirements for a loophole-free photonic Bell test using imperfect setting generators
NASA Astrophysics Data System (ADS)
Kofler, Johannes; Giustina, Marissa; Larsson, Jan-Åke; Mitchell, Morgan W.
2016-03-01
Experimental violations of Bell inequalities are in general vulnerable to so-called loopholes. In this work, we analyze the characteristics of a loophole-free Bell test with photons, closing simultaneously the locality, freedom-of-choice, fair-sampling (i.e., detection), coincidence-time, and memory loopholes. We pay special attention to the effect of excess predictability in the setting choices due to nonideal random-number generators. We discuss necessary adaptations of the Clauser-Horne and Eberhard inequality when using such imperfect devices and—using Hoeffding's inequality and Doob's optional stopping theorem—the statistical analysis in such Bell tests.
A test for patterns of modularity in sequences of developmental events.
Poe, Steven
2004-08-01
This study presents a statistical test for modularity in the context of relative timing of developmental events. The test assesses whether sets of developmental events show special phylogenetic conservation of rank order. The test statistic is the correlation coefficient of developmental ranks of the N events of the hypothesized module across taxa. The null distribution is obtained by taking correlation coefficients for randomly sampled sets of N events. This test was applied to two datasets, including one where phylogenetic information was taken into account. The events of limb development in two frog species were found to behave as a module.
Sampling procedures for throughfall monitoring: A simulation study
NASA Astrophysics Data System (ADS)
Zimmermann, Beate; Zimmermann, Alexander; Lark, Richard Murray; Elsenbeer, Helmut
2010-01-01
What is the most appropriate sampling scheme to estimate event-based average throughfall? A satisfactory answer to this seemingly simple question has yet to be found, a failure which we attribute to previous efforts' dependence on empirical studies. Here we try to answer this question by simulating stochastic throughfall fields based on parameters for statistical models of large monitoring data sets. We subsequently sampled these fields with different sampling designs and variable sample supports. We evaluated the performance of a particular sampling scheme with respect to the uncertainty of possible estimated means of throughfall volumes. Even for a relative error limit of 20%, an impractically large number of small, funnel-type collectors would be required to estimate mean throughfall, particularly for small events. While stratification of the target area is not superior to simple random sampling, cluster random sampling involves the risk of being less efficient. A larger sample support, e.g., the use of trough-type collectors, considerably reduces the necessary sample sizes and eliminates the sensitivity of the mean to outliers. Since the gain in time associated with the manual handling of troughs versus funnels depends on the local precipitation regime, the employment of automatically recording clusters of long troughs emerges as the most promising sampling scheme. Even so, a relative error of less than 5% appears out of reach for throughfall under heterogeneous canopies. We therefore suspect a considerable uncertainty of input parameters for interception models derived from measured throughfall, in particular, for those requiring data of small throughfall events.
Cario, Gunnar; Stanulla, Martin; Fine, Bernard M; Teuffel, Oliver; Neuhoff, Nils V; Schrauder, André; Flohr, Thomas; Schäfer, Beat W; Bartram, Claus R; Welte, Karl; Schlegelberger, Brigitte; Schrappe, Martin
2005-01-15
Treatment resistance, as indicated by the presence of high levels of minimal residual disease (MRD) after induction therapy and induction consolidation, is associated with a poor prognosis in childhood acute lymphoblastic leukemia (ALL). We hypothesized that treatment resistance is an intrinsic feature of ALL cells reflected in the gene expression pattern and that resistance to chemotherapy can be predicted before treatment. To test these hypotheses, gene expression signatures of ALL samples with high MRD load were compared with those of samples without measurable MRD during treatment. We identified 54 genes that clearly distinguished resistant from sensitive ALL samples. Genes with low expression in resistant samples were predominantly associated with cell-cycle progression and apoptosis, suggesting that impaired cell proliferation and apoptosis are involved in treatment resistance. Prediction analysis using randomly selected samples as a training set and the remaining samples as a test set revealed an accuracy of 84%. We conclude that resistance to chemotherapy seems at least in part to be an intrinsic feature of ALL cells. Because treatment response could be predicted with high accuracy, gene expression profiling could become a clinically relevant tool for treatment stratification in the early course of childhood ALL.
Batch Effect Confounding Leads to Strong Bias in Performance Estimates Obtained by Cross-Validation
Delorenzi, Mauro
2014-01-01
Background With the large amount of biological data that is currently publicly available, many investigators combine multiple data sets to increase the sample size and potentially also the power of their analyses. However, technical differences (“batch effects”) as well as differences in sample composition between the data sets may significantly affect the ability to draw generalizable conclusions from such studies. Focus The current study focuses on the construction of classifiers, and the use of cross-validation to estimate their performance. In particular, we investigate the impact of batch effects and differences in sample composition between batches on the accuracy of the classification performance estimate obtained via cross-validation. The focus on estimation bias is a main difference compared to previous studies, which have mostly focused on the predictive performance and how it relates to the presence of batch effects. Data We work on simulated data sets. To have realistic intensity distributions, we use real gene expression data as the basis for our simulation. Random samples from this expression matrix are selected and assigned to group 1 (e.g., ‘control’) or group 2 (e.g., ‘treated’). We introduce batch effects and select some features to be differentially expressed between the two groups. We consider several scenarios for our study, most importantly different levels of confounding between groups and batch effects. Methods We focus on well-known classifiers: logistic regression, Support Vector Machines (SVM), k-nearest neighbors (kNN) and Random Forests (RF). Feature selection is performed with the Wilcoxon test or the lasso. Parameter tuning and feature selection, as well as the estimation of the prediction performance of each classifier, is performed within a nested cross-validation scheme. The estimated classification performance is then compared to what is obtained when applying the classifier to independent data. PMID:24967636
A Random Finite Set Approach to Space Junk Tracking and Identification
2014-09-03
Final 3. DATES COVERED (From - To) 31 Jan 13 – 29 Apr 14 4. TITLE AND SUBTITLE A Random Finite Set Approach to Space Junk Tracking and...01-2013 to 29-04-2014 4. TITLE AND SUBTITLE A Random Finite Set Approach to Space Junk Tracking and Identification 5a. CONTRACT NUMBER FA2386-13...Prescribed by ANSI Std Z39-18 A Random Finite Set Approach to Space Junk Tracking and Indentification Ba-Ngu Vo1, Ba-Tuong Vo1, 1Department of
Goal Setting to Promote a Health Lifestyle.
Paxton, Raheem J; Taylor, Wendell C; Hudnall, Gina Evans; Christie, Juliette
2012-01-01
The purpose of this parallel-group study was to determine whether a feasibility study based on newsletters and telephone counseling would improve goal- setting constructs; physical activity (PA); and fruit and vegetable (F & V) intake in a sample of older adults. Forty-three older adults ( M age = 70 years, >70% Asian, 54% female) living in Honolulu, Hawaii were recruited and randomly assigned to either a PA or F & V intake condition. All participants completed measures of PA, F & V intake, and goal setting mechanisms (i.e., specificity, difficulty, effort, commitment, and persistence) at baseline and 8-weeks. Paired t -tests were used to evaluate changes across time. We found that F & V participants significantly increased F & V intake and mean scores of goal specificity, effort, commitment, and persistence (all p < .05). No statistically significant changes in PA or goal setting mechanisms were observed for participants in the PA condition. Overall, our results show that a short-term intervention using newsletters and motivational calls based on goal- setting theory was effective in improving F & V intake; however, more research is needed to determine whether these strategies are effective for improving PA among a multiethnic sample of older adults.
Metabolomics biomarkers to predict acamprosate treatment response in alcohol-dependent subjects.
Hinton, David J; Vázquez, Marely Santiago; Geske, Jennifer R; Hitschfeld, Mario J; Ho, Ada M C; Karpyak, Victor M; Biernacka, Joanna M; Choi, Doo-Sup
2017-05-31
Precision medicine for alcohol use disorder (AUD) allows optimal treatment of the right patient with the right drug at the right time. Here, we generated multivariable models incorporating clinical information and serum metabolite levels to predict acamprosate treatment response. The sample of 120 patients was randomly split into a training set (n = 80) and test set (n = 40) five independent times. Treatment response was defined as complete abstinence (no alcohol consumption during 3 months of acamprosate treatment) while nonresponse was defined as any alcohol consumption during this period. In each of the five training sets, we built a predictive model using a least absolute shrinkage and section operator (LASSO) penalized selection method and then evaluated the predictive performance of each model in the corresponding test set. The models predicted acamprosate treatment response with a mean sensitivity and specificity in the test sets of 0.83 and 0.31, respectively, suggesting our model performed well at predicting responders, but not non-responders (i.e. many non-responders were predicted to respond). Studies with larger sample sizes and additional biomarkers will expand the clinical utility of predictive algorithms for pharmaceutical response in AUD.
NASA Astrophysics Data System (ADS)
Sung, S.; Kim, H. G.; Lee, D. K.; Park, J. H.; Mo, Y.; Kil, S.; Park, C.
2016-12-01
The impact of climate change has been observed throughout the globe. The ecosystem experiences rapid changes such as vegetation shift, species extinction. In these context, Species Distribution Model (SDM) is one of the popular method to project impact of climate change on the ecosystem. SDM basically based on the niche of certain species with means to run SDM present point data is essential to find biological niche of species. To run SDM for plants, there are certain considerations on the characteristics of vegetation. Normally, to make vegetation data in large area, remote sensing techniques are used. In other words, the exact point of presence data has high uncertainties as we select presence data set from polygons and raster dataset. Thus, sampling methods for modeling vegetation presence data should be carefully selected. In this study, we used three different sampling methods for selection of presence data of vegetation: Random sampling, Stratified sampling and Site index based sampling. We used one of the R package BIOMOD2 to access uncertainty from modeling. At the same time, we included BioCLIM variables and other environmental variables as input data. As a result of this study, despite of differences among the 10 SDMs, the sampling methods showed differences in ROC values, random sampling methods showed the lowest ROC value while site index based sampling methods showed the highest ROC value. As a result of this study the uncertainties from presence data sampling methods and SDM can be quantified.
NASA Astrophysics Data System (ADS)
Bushel, Pierre R.; Bennett, Lee; Hamadeh, Hisham; Green, James; Ableson, Alan; Misener, Steve; Paules, Richard; Afshari, Cynthia
2002-06-01
We present an analysis of pattern recognition procedures used to predict the classes of samples exposed to pharmacologic agents by comparing gene expression patterns from samples treated with two classes of compounds. Rat liver mRNA samples following exposure for 24 hours with phenobarbital or peroxisome proliferators were analyzed using a 1700 rat cDNA microarray platform. Sets of genes that were consistently differentially expressed in the rat liver samples following treatment were stored in the MicroArray Project System (MAPS) database. MAPS identified 238 genes in common that possessed a low probability (P < 0.01) of being randomly detected as differentially expressed at the 95% confidence level. Hierarchical cluster analysis on the 238 genes clustered specific gene expression profiles that separated samples based on exposure to a particular class of compound.
Hospital survey on patient safety culture: psychometric analysis on a Scottish sample.
Sarac, Cakil; Flin, Rhona; Mearns, Kathryn; Jackson, Jeanette
2011-10-01
To investigate the psychometric properties of the Hospital Survey on Patient Safety Culture on a Scottish NHS data set. The data were collected from 1969 clinical staff (estimated 22% response rate) from one acute hospital from each of seven Scottish Health boards. Using a split-half validation technique, the data were randomly split; an exploratory factor analysis was conducted on the calibration data set, and confirmatory factor analyses were conducted on the validation data set to investigate and check the original US model fit in a Scottish sample. Following the split-half validation technique, exploratory factor analysis results showed a 10-factor optimal measurement model. The confirmatory factor analyses were then performed to compare the model fit of two competing models (10-factor alternative model vs 12-factor original model). An S-B scaled χ(2) square difference test demonstrated that the original 12-factor model performed significantly better in a Scottish sample. Furthermore, reliability analyses of each component yielded satisfactory results. The mean scores on the climate dimensions in the Scottish sample were comparable with those found in other European countries. This study provided evidence that the original 12-factor structure of the Hospital Survey on Patient Safety Culture scale has been replicated in this Scottish sample. Therefore, no modifications are required to the original 12-factor model, which is suggested for use, since it would allow researchers the possibility of cross-national comparisons.
Steel, Mike
2012-10-01
Neutral macroevolutionary models, such as the Yule model, give rise to a probability distribution on the set of discrete rooted binary trees over a given leaf set. Such models can provide a signal as to the approximate location of the root when only the unrooted phylogenetic tree is known, and this signal becomes relatively more significant as the number of leaves grows. In this short note, we show that among models that treat all taxa equally, and are sampling consistent (i.e. the distribution on trees is not affected by taxa yet to be included), all such models, except one (the so-called PDA model), convey some information as to the location of the ancestral root in an unrooted tree. Copyright © 2012 Elsevier Inc. All rights reserved.
Analysis of Longitudinal Outcome Data with Missing Values in Total Knee Arthroplasty.
Kang, Yeon Gwi; Lee, Jang Taek; Kang, Jong Yeal; Kim, Ga Hye; Kim, Tae Kyun
2016-01-01
We sought to determine the influence of missing data on the statistical results, and to determine which statistical method is most appropriate for the analysis of longitudinal outcome data of TKA with missing values among repeated measures ANOVA, generalized estimating equation (GEE) and mixed effects model repeated measures (MMRM). Data sets with missing values were generated with different proportion of missing data, sample size and missing-data generation mechanism. Each data set was analyzed with three statistical methods. The influence of missing data was greater with higher proportion of missing data and smaller sample size. MMRM tended to show least changes in the statistics. When missing values were generated by 'missing not at random' mechanism, no statistical methods could fully avoid deviations in the results. Copyright © 2016 Elsevier Inc. All rights reserved.
The multimedia computer for office-based patient education: a systematic review.
Wofford, James L; Smith, Edward D; Miller, David P
2005-11-01
Use of the multimedia computer for education is widespread in schools and businesses, and yet computer-assisted patient education is rare. In order to explore the potential use of computer-assisted patient education in the office setting, we performed a systematic review of randomized controlled trials (search date April 2004 using MEDLINE and Cochrane databases). Of the 26 trials identified, outcome measures included clinical indicators (12/26, 46.1%), knowledge retention (12/26, 46.1%), health attitudes (15/26, 57.7%), level of shared decision-making (5/26, 19.2%), health services utilization (4/26, 17.6%), and costs (5/26, 19.2%), respectively. Four trials targeted patients with breast cancer, but the clinical issues were otherwise diverse. Reporting of the testing of randomization (76.9%) and appropriate analysis of main effect variables (70.6%) were more common than reporting of a reliable randomization process (35.3%), blinding of outcomes assessment (17.6%), or sample size definition (29.4%). We concluded that the potential for improving the efficiency of the office through computer-assisted patient education has been demonstrated, but better proof of the impact on clinical outcomes is warranted before this strategy is accepted in the office setting.
Biomass energy inventory and mapping system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kasile, J.D.
1993-12-31
A four-stage biomass energy inventory and mapping system was conducted for the entire State of Ohio. The product is a set of maps and an inventory of the State of Ohio. The set of amps and an inventory of the State`s energy biomass resource are to a one kilometer grid square basis on the Universal Transverse Mercator (UTM) system. Each square kilometer is identified and mapped showing total British Thermal Unit (BTU) energy availability. Land cover percentages and BTU values are provided for each of nine biomass strata types for each one kilometer grid square. LANDSAT satellite data was usedmore » as the primary stratifier. The second stage sampling was the photointerpretation of randomly selected one kilometer grid squares that exactly corresponded to the LANDSAT one kilometer grid square classification orientation. Field sampling comprised the third stage of the energy biomass inventory system and was combined with the fourth stage sample of laboratory biomass energy analysis using a Bomb calorimeter and was then used to assign BTU values to the photointerpretation and to adjust the LANDSAT classification. The sampling error for the whole system was 3.91%.« less
Pu239 Cross-Section Variations Based on Experimental Uncertainties and Covariances
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sigeti, David Edward; Williams, Brian J.; Parsons, D. Kent
2016-10-18
Algorithms and software have been developed for producing variations in plutonium-239 neutron cross sections based on experimental uncertainties and covariances. The varied cross-section sets may be produced as random samples from the multi-variate normal distribution defined by an experimental mean vector and covariance matrix, or they may be produced as Latin-Hypercube/Orthogonal-Array samples (based on the same means and covariances) for use in parametrized studies. The variations obey two classes of constraints that are obligatory for cross-section sets and which put related constraints on the mean vector and covariance matrix that detemine the sampling. Because the experimental means and covariances domore » not obey some of these constraints to sufficient precision, imposing the constraints requires modifying the experimental mean vector and covariance matrix. Modification is done with an algorithm based on linear algebra that minimizes changes to the means and covariances while insuring that the operations that impose the different constraints do not conflict with each other.« less
Generic pure quantum states as steady states of quasi-local dissipative dynamics
NASA Astrophysics Data System (ADS)
Karuvade, Salini; Johnson, Peter D.; Ticozzi, Francesco; Viola, Lorenza
2018-04-01
We investigate whether a generic pure state on a multipartite quantum system can be the unique asymptotic steady state of locality-constrained purely dissipative Markovian dynamics. In the tripartite setting, we show that the problem is equivalent to characterizing the solution space of a set of linear equations and establish that the set of pure states obeying the above property has either measure zero or measure one, solely depending on the subsystems’ dimension. A complete analytical characterization is given when the central subsystem is a qubit. In the N-partite case, we provide conditions on the subsystems’ size and the nature of the locality constraint, under which random pure states cannot be quasi-locally stabilized generically. Also, allowing for the possibility to approximately stabilize entangled pure states that cannot be exact steady states in settings where stabilizability is generic, our results offer insights into the extent to which random pure states may arise as unique ground states of frustration-free parent Hamiltonians. We further argue that, to a high probability, pure quantum states sampled from a t-design enjoy the same stabilizability properties of Haar-random ones as long as suitable dimension constraints are obeyed and t is sufficiently large. Lastly, we demonstrate a connection between the tasks of quasi-local state stabilization and unique state reconstruction from local tomographic information, and provide a constructive procedure for determining a generic N-partite pure state based only on knowledge of the support of any two of the reduced density matrices of about half the parties, improving over existing results.
NASA Astrophysics Data System (ADS)
Liu, Fei; He, Yong; Wang, Li
2007-11-01
In order to implement the fast discrimination of different milk tea powders with different internal qualities, visible and near infrared (Vis/NIR) spectroscopy combined with effective wavelengths (EWs) and BP neural network (BPNN) was investigated as a new approach. Five brands of milk teas were obtained and 225 samples were selected randomly for the calibration set, while 75 samples for the validation set. The EWs were selected according to x-loading weights and regression coefficients by PLS analysis after some preprocessing. A total of 18 EWs (400, 401, 452, 453, 502, 503, 534, 535, 594, 595, 635, 636, 688, 689, 987, 988, 995 and 996 nm) were selected as the inputs of BPNN model. The performance was validated by the calibration and validation sets. The threshold error of prediction was set as +/-0.1 and an excellent precision and recognition ratio of 100% for calibration set and 98.7% for validation set were achieved. The prediction results indicated that the EWs reflected the main characteristics of milk tea of different brands based on Vis/NIR spectroscopy and BPNN model, and the EWs would be useful for the development of portable instrument to discriminate the variety and detect the adulteration of instant milk tea powders.
The CO₂ GAP Project--CO₂ GAP as a prognostic tool in emergency departments.
Shetty, Amith L; Lai, Kevin H; Byth, Karen
2010-12-01
To determine whether CO₂ GAP [(a-ET) PCO₂] value differs consistently in patients presenting with shortness of breath to the ED requiring ventilatory support. To determine a cut-off value of CO₂ GAP, which is consistently associated with measured outcome and to compare its performance against other derived variables. This prospective observational study was conducted in ED on a convenience sample of 412 from 759 patients who underwent concurrent arterial blood gas and ETCO₂ (end-tidal CO₂) measurement. They were randomized to test sample of 312 patients and validation set of 100 patients. The primary outcome of interest was the need for ventilatory support and secondary outcomes were admission to high dependency unit or death during stay in ED. The randomly selected training set was used to select cut-points for the possible predictors; that is, CO₂ GAP, CO₂ gradient, physiologic dead space and A-a gradient. The sensitivity, specificity and predictive values of these predictors were validated in the test set of 100 patients. Analysis of the receiver operating characteristic curves revealed the CO₂ GAP performed significantly better than the arterial-alveolar gradient in patients requiring ventilator support (area under the curve 0.950 vs 0.726). A CO₂ GAP ≥10 was associated with assisted ventilation outcomes when applied to the validation test set (100% sensitivity 70% specificity). The CO₂ GAP [(a-ET) PCO₂] differs significantly in patients requiring assisted ventilation when presenting with shortness of breath to EDs and further research addressing the prognostic value of CO₂ GAP in this specific aspect is required. © 2010 The Authors. EMA © 2010 Australasian College for Emergency Medicine and Australasian Society for Emergency Medicine.
Random sphere packing model of heterogeneous propellants
NASA Astrophysics Data System (ADS)
Kochevets, Sergei Victorovich
It is well recognized that combustion of heterogeneous propellants is strongly dependent on the propellant morphology. Recent developments in computing systems make it possible to start three-dimensional modeling of heterogeneous propellant combustion. A key component of such large scale computations is a realistic model of industrial propellants which retains the true morphology---a goal never achieved before. The research presented develops the Random Sphere Packing Model of heterogeneous propellants and generates numerical samples of actual industrial propellants. This is done by developing a sphere packing algorithm which randomly packs a large number of spheres with a polydisperse size distribution within a rectangular domain. First, the packing code is developed, optimized for performance, and parallelized using the OpenMP shared memory architecture. Second, the morphology and packing fraction of two simple cases of unimodal and bimodal packs are investigated computationally and analytically. It is shown that both the Loose Random Packing and Dense Random Packing limits are not well defined and the growth rate of the spheres is identified as the key parameter controlling the efficiency of the packing. For a properly chosen growth rate, computational results are found to be in excellent agreement with experimental data. Third, two strategies are developed to define numerical samples of polydisperse heterogeneous propellants: the Deterministic Strategy and the Random Selection Strategy. Using these strategies, numerical samples of industrial propellants are generated. The packing fraction is investigated and it is shown that the experimental values of the packing fraction can be achieved computationally. It is strongly believed that this Random Sphere Packing Model of propellants is a major step forward in the realistic computational modeling of heterogeneous propellant of combustion. In addition, a method of analysis of the morphology of heterogeneous propellants is developed which uses the concept of multi-point correlation functions. A set of intrinsic length scales of local density fluctuations in random heterogeneous propellants is identified by performing a Monte-Carlo study of the correlation functions. This method of analysis shows great promise for understanding the origins of the combustion instability of heterogeneous propellants, and is believed to become a valuable tool for the development of safe and reliable rocket engines.
Wang, Yi; Xiang, Ma; Wen, Ya-Dong; Yu, Chun-Xia; Wang, Luo-Ping; Zhao, Long-Lian; Li, Jun-Hui
2012-11-01
In this study, tobacco quality analysis of main Industrial classification of different years was carried out applying spectrum projection and correlation methods. The group of data was near-infrared (NIR) spectrum from Hongta Tobacco (Group) Co., Ltd. 5730 tobacco leaf Industrial classification samples from Yuxi in Yunnan province from 2007 to 2010 year were collected using near infrared spectroscopy, which from different parts and colors and all belong to tobacco varieties of HONGDA. The conclusion showed that, when the samples were divided to two part by the ratio of 2:1 randomly as analysis and verification sets in the same year, the verification set corresponded with the analysis set applying spectrum projection because their correlation coefficients were above 0.98. The correlation coefficients between two different years applying spectrum projection were above 0.97. The highest correlation coefficient was the one between 2008 and 2009 year and the lowest correlation coefficient was the one between 2007 and 2010 year. At the same time, The study discussed a method to get the quantitative similarity values of different industrial classification samples. The similarity and consistency values were instructive in combination and replacement of tobacco leaf blending.
Wiktelius, Daniel; Ahlinder, Linnea; Larsson, Andreas; Höjer Holmgren, Karin; Norlin, Rikard; Andersson, Per Ola
2018-08-15
Collecting data under field conditions for forensic investigations of chemical warfare agents calls for the use of portable instruments. In this study, a set of aged, crude preparations of sulfur mustard were characterized spectroscopically without any sample preparation using handheld Raman and portable IR instruments. The spectral data was used to construct Random Forest multivariate models for the attribution of test set samples to the synthetic method used for their production. Colored and fluorescent samples were included in the study, which made Raman spectroscopy challenging although fluorescence was diminished by using an excitation wavelength of 1064 nm. The predictive power of models constructed with IR or Raman data alone, as well as with combined data was investigated. Both techniques gave useful data for attribution. Model performance was enhanced when Raman and IR spectra were combined, allowing correct classification of 19/23 (83%) of test set spectra. The results demonstrate that data obtained with spectroscopy instruments amenable for field deployment can be useful in forensic studies of chemical warfare agents. Copyright © 2018 Elsevier B.V. All rights reserved.
Orbit Determination Using Vinti’s Solution
2016-09-15
Surveillance Network STK Systems Tool Kit TBP Two Body Problem TLE Two-line Element Set xv Acronym Definition UKF Unscented Kalman Filter WPAFB Wright...simplicity, stability, and speed. On the other hand, Kalman filters would be best suited for sequential estimation of stochastic or random components of a...be likened to how an Unscented Kalman Filter samples a system’s nonlinearities directly, avoiding linearizing the dynamics in the partials matrices
J. Breidenbach; E. Kublin; R. McGaughey; H.-E. Andersen; S. Reutebuch
2008-01-01
For this study, hierarchical data sets--in that several sample plots are located within a stand--were analyzed for study sites in the USA and Germany. The German data had an additional hierarchy as the stands are located within four distinct public forests. Fixed-effects models and mixed-effects models with a random intercept on the stand level were fit to each data...
ERIC Educational Resources Information Center
Iyeke, Patrick; Dafe, Onoharigho Festus
2016-01-01
This study is set out to ascertain the knowledge of hazards of self-medication among Secondary School Students. The descriptive Survey design was adopted for the work. The population of the study is 9,500 students in the public Secondary Schools, in Ethiope East Local Government Area of Delta State. The sample is 300 students randomly selected…
Hsu, Jia-Lien; Hung, Ping-Cheng; Lin, Hung-Yen; Hsieh, Chung-Ho
2015-04-01
Breast cancer is one of the most common cause of cancer mortality. Early detection through mammography screening could significantly reduce mortality from breast cancer. However, most of screening methods may consume large amount of resources. We propose a computational model, which is solely based on personal health information, for breast cancer risk assessment. Our model can be served as a pre-screening program in the low-cost setting. In our study, the data set, consisting of 3976 records, is collected from Taipei City Hospital starting from 2008.1.1 to 2008.12.31. Based on the dataset, we first apply the sampling techniques and dimension reduction method to preprocess the testing data. Then, we construct various kinds of classifiers (including basic classifiers, ensemble methods, and cost-sensitive methods) to predict the risk. The cost-sensitive method with random forest classifier is able to achieve recall (or sensitivity) as 100 %. At the recall of 100 %, the precision (positive predictive value, PPV), and specificity of cost-sensitive method with random forest classifier was 2.9 % and 14.87 %, respectively. In our study, we build a breast cancer risk assessment model by using the data mining techniques. Our model has the potential to be served as an assisting tool in the breast cancer screening.
Community Game Day: Using an End-of-Life Conversation Game to Encourage Advance Care Planning.
Van Scoy, Lauren J; Reading, Jean M; Hopkins, Margaret; Smith, Brandi; Dillon, Judy; Green, Michael J; Levi, Benjamin H
2017-11-01
Advance care planning (ACP) is an important process that involves discussing and documenting one's values and preferences for medical care, particularly end-of-life treatments. This convergent, mixed-methods study assessed whether an end-of-life conversation card game is an acceptable and effective means for performing ACP for patients with chronic illness and/or their caregivers when deployed in a community setting. Twenty-two games (n = 93 participants) were held in community settings surrounding Hershey, PA in 2016. Participants were recruited using random sampling from patient databases and also convenience sampling (i.e., flyers). Quantitative questionnaires and qualitative focus group interviews were administered to assess the game experience and subsequent performance of ACP behaviors. Both quantitative and qualitative data found that Community Game Day was a well-received, positive experience for participants and 75% of participants performed ACP within three months post-intervention. These findings suggest that using a conversation game during community outreach is a useful approach for engaging patients and caregivers in ACP. The convergence of quantitative and qualitative data strongly supports the continued investigation of the game in randomized controlled trials. Copyright © 2017 American Academy of Hospice and Palliative Medicine. Published by Elsevier Inc. All rights reserved.
Robust portfolio selection based on asymmetric measures of variability of stock returns
NASA Astrophysics Data System (ADS)
Chen, Wei; Tan, Shaohua
2009-10-01
This paper addresses a new uncertainty set--interval random uncertainty set for robust optimization. The form of interval random uncertainty set makes it suitable for capturing the downside and upside deviations of real-world data. These deviation measures capture distributional asymmetry and lead to better optimization results. We also apply our interval random chance-constrained programming to robust mean-variance portfolio selection under interval random uncertainty sets in the elements of mean vector and covariance matrix. Numerical experiments with real market data indicate that our approach results in better portfolio performance.
Methicillin-resistant staphylococcus aureus isolates in a hospital of shanghai.
Wang, Xiaoguang; Ouyang, Lin; Luo, Lingfei; Liu, Jiqian; Song, Chiping; Li, Cuizhen; Yan, Hongjing; Wang, Ping
2017-01-24
Methicillin-resistant Staphylococcus aureus (MRSA) strains are now common both in the health care setting and in the community. Active surveillance is critical for MRSA control and prevention. Specimens of patients (200 patients with 1119 specimens) as well as medical staff and hospital setting (1000 specimens) were randomly sampled in a level 2 hospital in Shanghai from September 2011 to August 2012. Isolation, cultivation and identification of S. aureus were performed. Totally, 67 S. aureus strains were isolated. 32 S. aureus strains were isolated from patient samples; 13 (13/32, 40.6%) of the 32 S. aureus isolates were MRSA; sputum sample and patients in the department of general internal medicine were the most frequent specimen and patient group for S. aureus strains isolation. Remaining 35 S. aureus strains were isolated from the medical staff and hospital setting; 20 (20/35, 57.1%) of the 35 S. aureus isolates were MRSA; specimens sampled from doctors and nurses' hands and nose and hospital facilities were the most frequent samples to isolate S. aureus. Resistant and virulent genes detection showed that, all 33 MRSA strains were mecA positive which accounts for 49.3% of the 67 S. aureus strains; 38 isolates were Panton-Valentine leukocidin (PVL) gene positive which accounts for 56.7% of the 67 S. aureus strains; and 17 (17/67, 25.4%) isolates are mecA and PVL genes dual positive. Multidrug-resistant strains of MRSA and PVL positive S. aureus are common in patients, medical staff and hospital setting, the potential health threat is worthy of our attention.
A compressed sensing X-ray camera with a multilayer architecture
NASA Astrophysics Data System (ADS)
Wang, Zhehui; Iaroshenko, O.; Li, S.; Liu, T.; Parab, N.; Chen, W. W.; Chu, P.; Kenyon, G. T.; Lipton, R.; Sun, K.-X.
2018-01-01
Recent advances in compressed sensing theory and algorithms offer new possibilities for high-speed X-ray camera design. In many CMOS cameras, each pixel has an independent on-board circuit that includes an amplifier, noise rejection, signal shaper, an analog-to-digital converter (ADC), and optional in-pixel storage. When X-ray images are sparse, i.e., when one of the following cases is true: (a.) The number of pixels with true X-ray hits is much smaller than the total number of pixels; (b.) The X-ray information is redundant; or (c.) Some prior knowledge about the X-ray images exists, sparse sampling may be allowed. Here we first illustrate the feasibility of random on-board pixel sampling (ROPS) using an existing set of X-ray images, followed by a discussion about signal to noise as a function of pixel size. Next, we describe a possible circuit architecture to achieve random pixel access and in-pixel storage. The combination of a multilayer architecture, sparse on-chip sampling, and computational image techniques, is expected to facilitate the development and applications of high-speed X-ray camera technology.
Govindarajan, R; Llueguera, E; Melero, A; Molero, J; Soler, N; Rueda, C; Paradinas, C
2010-01-01
Statistical Process Control (SPC) was applied to monitor patient set-up in radiotherapy and, when the measured set-up error values indicated a loss of process stability, its root cause was identified and eliminated to prevent set-up errors. Set up errors were measured for medial-lateral (ml), cranial-caudal (cc) and anterior-posterior (ap) dimensions and then the upper control limits were calculated. Once the control limits were known and the range variability was acceptable, treatment set-up errors were monitored using sub-groups of 3 patients, three times each shift. These values were plotted on a control chart in real time. Control limit values showed that the existing variation was acceptable. Set-up errors, measured and plotted on a X chart, helped monitor the set-up process stability and, if and when the stability was lost, treatment was interrupted, the particular cause responsible for the non-random pattern was identified and corrective action was taken before proceeding with the treatment. SPC protocol focuses on controlling the variability due to assignable cause instead of focusing on patient-to-patient variability which normally does not exist. Compared to weekly sampling of set-up error in each and every patient, which may only ensure that just those sampled sessions were set-up correctly, the SPC method enables set-up error prevention in all treatment sessions for all patients and, at the same time, reduces the control costs. Copyright © 2009 SECA. Published by Elsevier Espana. All rights reserved.
Ghiglietti, Andrea; Scarale, Maria Giovanna; Miceli, Rosalba; Ieva, Francesca; Mariani, Luigi; Gavazzi, Cecilia; Paganoni, Anna Maria; Edefonti, Valeria
2018-03-22
Recently, response-adaptive designs have been proposed in randomized clinical trials to achieve ethical and/or cost advantages by using sequential accrual information collected during the trial to dynamically update the probabilities of treatment assignments. In this context, urn models-where the probability to assign patients to treatments is interpreted as the proportion of balls of different colors available in a virtual urn-have been used as response-adaptive randomization rules. We propose the use of Randomly Reinforced Urn (RRU) models in a simulation study based on a published randomized clinical trial on the efficacy of home enteral nutrition in cancer patients after major gastrointestinal surgery. We compare results with the RRU design with those previously published with the non-adaptive approach. We also provide a code written with the R software to implement the RRU design in practice. In detail, we simulate 10,000 trials based on the RRU model in three set-ups of different total sample sizes. We report information on the number of patients allocated to the inferior treatment and on the empirical power of the t-test for the treatment coefficient in the ANOVA model. We carry out a sensitivity analysis to assess the effect of different urn compositions. For each sample size, in approximately 75% of the simulation runs, the number of patients allocated to the inferior treatment by the RRU design is lower, as compared to the non-adaptive design. The empirical power of the t-test for the treatment effect is similar in the two designs.
A two-stage Monte Carlo approach to the expression of uncertainty with finite sample sizes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Crowder, Stephen Vernon; Moyer, Robert D.
2005-05-01
Proposed supplement I to the GUM outlines a 'propagation of distributions' approach to deriving the distribution of a measurand for any non-linear function and for any set of random inputs. The supplement's proposed Monte Carlo approach assumes that the distributions of the random inputs are known exactly. This implies that the sample sizes are effectively infinite. In this case, the mean of the measurand can be determined precisely using a large number of Monte Carlo simulations. In practice, however, the distributions of the inputs will rarely be known exactly, but must be estimated using possibly small samples. If these approximatedmore » distributions are treated as exact, the uncertainty in estimating the mean is not properly taken into account. In this paper, we propose a two-stage Monte Carlo procedure that explicitly takes into account the finite sample sizes used to estimate parameters of the input distributions. We will illustrate the approach with a case study involving the efficiency of a thermistor mount power sensor. The performance of the proposed approach will be compared to the standard GUM approach for finite samples using simple non-linear measurement equations. We will investigate performance in terms of coverage probabilities of derived confidence intervals.« less
NASA Astrophysics Data System (ADS)
Liu, Zhangjun; Liu, Zenghui
2018-06-01
This paper develops a hybrid approach of spectral representation and random function for simulating stationary stochastic vector processes. In the proposed approach, the high-dimensional random variables, included in the original spectral representation (OSR) formula, could be effectively reduced to only two elementary random variables by introducing the random functions that serve as random constraints. Based on this, a satisfactory simulation accuracy can be guaranteed by selecting a small representative point set of the elementary random variables. The probability information of the stochastic excitations can be fully emerged through just several hundred of sample functions generated by the proposed approach. Therefore, combined with the probability density evolution method (PDEM), it could be able to implement dynamic response analysis and reliability assessment of engineering structures. For illustrative purposes, a stochastic turbulence wind velocity field acting on a frame-shear-wall structure is simulated by constructing three types of random functions to demonstrate the accuracy and efficiency of the proposed approach. Careful and in-depth studies concerning the probability density evolution analysis of the wind-induced structure have been conducted so as to better illustrate the application prospects of the proposed approach. Numerical examples also show that the proposed approach possesses a good robustness.
Random vs. systematic sampling from administrative databases involving human subjects.
Hagino, C; Lo, R J
1998-09-01
Two sampling techniques, simple random sampling (SRS) and systematic sampling (SS), were compared to determine whether they yield similar and accurate distributions for the following four factors: age, gender, geographic location and years in practice. Any point estimate within 7 yr or 7 percentage points of its reference standard (SRS or the entire data set, i.e., the target population) was considered "acceptably similar" to the reference standard. The sampling frame was from the entire membership database of the Canadian Chiropractic Association. The two sampling methods were tested using eight different sample sizes of n (50, 100, 150, 200, 250, 300, 500, 800). From the profile/characteristics, summaries of four known factors [gender, average age, number (%) of chiropractors in each province and years in practice], between- and within-methods chi 2 tests and unpaired t tests were performed to determine whether any of the differences [descriptively greater than 7% or 7 yr] were also statistically significant. The strengths of the agreements between the provincial distributions were quantified by calculating the percent agreements for each (provincial pairwise-comparison methods). Any percent agreement less than 70% was judged to be unacceptable. Our assessments of the two sampling methods (SRS and SS) for the different sample sizes tested suggest that SRS and SS yielded acceptably similar results. Both methods started to yield "correct" sample profiles at approximately the same sample size (n > 200). SS is not only convenient, it can be recommended for sampling from large databases in which the data are listed without any inherent order biases other than alphabetical listing by surname.
Schechtman, K B; Gordon, M E
1993-01-30
In randomized clinical trials, poor compliance and treatment intolerance lead to reduced between-group differences, increased sample size requirements, and increased cost. A run-in strategy is intended to reduce these problems. In this paper, we develop a comprehensive set of measures specifically sensitive to the effect of a run-in on cost and sample size requirements, both before and after randomization. Using these measures, we describe a step-by-step algorithm through which one can estimate the cost-effectiveness of a potential run-in. Because the cost-effectiveness of a run-in is partly mediated by its effect on sample size, we begin by discussing the likely impact of a planned run-in on the required number of randomized, eligible, and screened subjects. Run-in strategies are most likely to be cost-effective when: (1) per patient costs during the post-randomization as compared to the screening period are high; (2) poor compliance is associated with a substantial reduction in response to treatment; (3) the number of screened patients needed to identify a single eligible patient is small; (4) the run-in is inexpensive; (5) for most patients, the run-in compliance status is maintained following randomization and, most importantly, (6) many subjects excluded by the run-in are treatment intolerant or non-compliant to the extent that we expect little or no treatment response. Our analysis suggests that conditions for the cost-effectiveness of run-in strategies are stringent. In particular, if the only purpose of a run-in is to exclude ordinary partial compliers, the run-in will frequently add to the cost of the trial. Often, the cost-effectiveness of a run-in requires that one can identify and exclude a substantial number of treatment intolerant or otherwise unresponsive subjects.
Asymptotic laws for random knot diagrams
NASA Astrophysics Data System (ADS)
Chapman, Harrison
2017-06-01
We study random knotting by considering knot and link diagrams as decorated, (rooted) topological maps on spheres and pulling them uniformly from among sets of a given number of vertices n, as first established in recent work with Cantarella and Mastin. The knot diagram model is an exciting new model which captures both the random geometry of space curve models of knotting as well as the ease of computing invariants from diagrams. We prove that unknot diagrams are asymptotically exponentially rare, an analogue of Sumners and Whittington’s landmark result for self-avoiding polygons. Our proof uses the same key idea: we first show that knot diagrams obey a pattern theorem, which describes their fractal structure. We examine how quickly this behavior occurs in practice. As a consequence, almost all diagrams are asymmetric, simplifying sampling from this model. We conclude with experimental data on knotting in this model. This model of random knotting is similar to those studied by Diao et al, and Dunfield et al.
Evaluation of a Serum Lung Cancer Biomarker Panel.
Mazzone, Peter J; Wang, Xiao-Feng; Han, Xiaozhen; Choi, Humberto; Seeley, Meredith; Scherer, Richard; Doseeva, Victoria
2018-01-01
A panel of 3 serum proteins and 1 autoantibody has been developed to assist with the detection of lung cancer. We aimed to validate the accuracy of the biomarker panel in an independent test set and explore the impact of adding a fourth serum protein to the panel, as well as the impact of combining molecular and clinical variables. The training set of serum samples was purchased from commercially available biorepositories. The testing set was from a biorepository at the Cleveland Clinic. All lung cancer and control subjects were >50 years old and had smoked a minimum of 20 pack-years. A panel of biomarkers including CEA (carcinoembryonic antigen), CYFRA21-1 (cytokeratin-19 fragment 21-1), CA125 (carbohydrate antigen 125), HGF (hepatocyte growth factor), and NY-ESO-1 (New York esophageal cancer-1 antibody) was measured using immunoassay techniques. The multiple of the median method, multivariate logistic regression, and random forest modeling was used to analyze the results. The training set consisted of 604 patient samples (268 with lung cancer and 336 controls) and the testing set of 400 patient samples (155 with lung cancer and 245 controls). With a threshold established from the training set, the sensitivity and specificity of both the 4- and 5-biomarker panels on the testing set was 49% and 96%, respectively. Models built on the testing set using only clinical variables had an area under the receiver operating characteristic curve of 0.68, using the biomarker panel 0.81 and by combining clinical and biomarker variables 0.86. This study validates the accuracy of a panel of proteins and an autoantibody in a population relevant to lung cancer detection and suggests a benefit to combining clinical features with the biomarker results.
Evaluation of a Serum Lung Cancer Biomarker Panel
Mazzone, Peter J; Wang, Xiao-Feng; Han, Xiaozhen; Choi, Humberto; Seeley, Meredith; Scherer, Richard; Doseeva, Victoria
2018-01-01
Background: A panel of 3 serum proteins and 1 autoantibody has been developed to assist with the detection of lung cancer. We aimed to validate the accuracy of the biomarker panel in an independent test set and explore the impact of adding a fourth serum protein to the panel, as well as the impact of combining molecular and clinical variables. Methods: The training set of serum samples was purchased from commercially available biorepositories. The testing set was from a biorepository at the Cleveland Clinic. All lung cancer and control subjects were >50 years old and had smoked a minimum of 20 pack-years. A panel of biomarkers including CEA (carcinoembryonic antigen), CYFRA21-1 (cytokeratin-19 fragment 21-1), CA125 (carbohydrate antigen 125), HGF (hepatocyte growth factor), and NY-ESO-1 (New York esophageal cancer-1 antibody) was measured using immunoassay techniques. The multiple of the median method, multivariate logistic regression, and random forest modeling was used to analyze the results. Results: The training set consisted of 604 patient samples (268 with lung cancer and 336 controls) and the testing set of 400 patient samples (155 with lung cancer and 245 controls). With a threshold established from the training set, the sensitivity and specificity of both the 4- and 5-biomarker panels on the testing set was 49% and 96%, respectively. Models built on the testing set using only clinical variables had an area under the receiver operating characteristic curve of 0.68, using the biomarker panel 0.81 and by combining clinical and biomarker variables 0.86. Conclusions: This study validates the accuracy of a panel of proteins and an autoantibody in a population relevant to lung cancer detection and suggests a benefit to combining clinical features with the biomarker results. PMID:29371783
Gerbasi, David; Shapiro, Moshe; Brumer, Paul
2006-02-21
Enantiomeric control of 1,3 dimethylallene in a collisional environment is examined. Specifically, our previous "laser distillation" scenario wherein three perpendicular linearly polarized light fields are applied to excite a set of vib-rotational eigenstates of a randomly oriented sample is considered. The addition of internal conversion, dissociation, decoherence, and collisional relaxation mimics experimental conditions and molecular decay processes. Of greatest relevance is internal conversion which, in the case of dimethylallene, is followed by molecular dissociation. For various rates of internal conversion, enantiomeric control is maintained in this scenario by a delicate balance between collisional relaxation of excited dimethylallene that enhances control and collisional dephasing, which diminishes control.
NASA Technical Reports Server (NTRS)
Peters, C. (Principal Investigator)
1980-01-01
A general theorem is given which establishes the existence and uniqueness of a consistent solution of the likelihood equations given a sequence of independent random vectors whose distributions are not identical but have the same parameter set. In addition, it is shown that the consistent solution is a MLE and that it is asymptotically normal and efficient. Two applications are discussed: one in which independent observations of a normal random vector have missing components, and the other in which the parameters in a mixture from an exponential family are estimated using independent homogeneous sample blocks of different sizes.
NASA Astrophysics Data System (ADS)
Frantz, J. A.; Selby, J.; Busse, L. E.; Shaw, L. B.; Aggarwal, I. D.; Sanghera, J. S.
2018-02-01
Both ordered and random anti-reflective surface structures (ARSS) have been shown to increase the transmission of an optical surface to >99.9%. These structures are of great interest as an alternative to traditional thin film anti-reflection (AR) coatings for a variety of reasons. Unlike traditional AR coatings, they are patterned directly into the surface of an optic rather than deposited on its surface and are thus not prone to the delamination under thermal cycling that can occur with thin film coatings. Their laser-induced damage thresholds can also be considerably higher. In addition, they provide AR performance over a larger spectral and angular range. It has been previously demonstrated that random ARSSs in silica are remarkably insensitive to incident polarization, with nearly zero variation in transmittance with respect to polarization of the incident beam at fixed wavelength for angles of incidence up to at least 30°. In this work, we evaluate polarization sensitivity of ARSS as a function of wavelength for both random and ordered ARSS. We demonstrate that ordered ARSS is significantly more sensitive to polarization than random ARSS and explain the reason for this difference. In the case of ordered ARSS, we observe significant differences as a function of wavelength, with the transmittance of s- and p-polarized light diverging near the diffraction edge. We present results for both silica and spinel samples and discuss differences observed for these two sets of samples.
Bias, Confounding, and Interaction: Lions and Tigers, and Bears, Oh My!
Vetter, Thomas R; Mascha, Edward J
2017-09-01
Epidemiologists seek to make a valid inference about the causal effect between an exposure and a disease in a specific population, using representative sample data from a specific population. Clinical researchers likewise seek to make a valid inference about the association between an intervention and outcome(s) in a specific population, based upon their randomly collected, representative sample data. Both do so by using the available data about the sample variable to make a valid estimate about its corresponding or underlying, but unknown population parameter. Random error in an experiment can be due to the natural, periodic fluctuation or variation in the accuracy or precision of virtually any data sampling technique or health measurement tool or scale. In a clinical research study, random error can be due to not only innate human variability but also purely chance. Systematic error in an experiment arises from an innate flaw in the data sampling technique or measurement instrument. In the clinical research setting, systematic error is more commonly referred to as systematic bias. The most commonly encountered types of bias in anesthesia, perioperative, critical care, and pain medicine research include recall bias, observational bias (Hawthorne effect), attrition bias, misclassification or informational bias, and selection bias. A confounding variable is a factor associated with both the exposure of interest and the outcome of interest. A confounding variable (confounding factor or confounder) is a variable that correlates (positively or negatively) with both the exposure and outcome. Confounding is typically not an issue in a randomized trial because the randomized groups are sufficiently balanced on all potential confounding variables, both observed and nonobserved. However, confounding can be a major problem with any observational (nonrandomized) study. Ignoring confounding in an observational study will often result in a "distorted" or incorrect estimate of the association or treatment effect. Interaction among variables, also known as effect modification, exists when the effect of 1 explanatory variable on the outcome depends on the particular level or value of another explanatory variable. Bias and confounding are common potential explanations for statistically significant associations between exposure and outcome when the true relationship is noncausal. Understanding interactions is vital to proper interpretation of treatment effects. These complex concepts should be consistently and appropriately considered whenever one is not only designing but also analyzing and interpreting data from a randomized trial or observational study.
Using near infrared spectroscopy to classify soybean oil according to expiration date.
da Costa, Gean Bezerra; Fernandes, David Douglas Sousa; Gomes, Adriano A; de Almeida, Valber Elias; Veras, Germano
2016-04-01
A rapid and non-destructive methodology is proposed for the screening of edible vegetable oils according to conservation state expiration date employing near infrared (NIR) spectroscopy and chemometric tools. A total of fifty samples of soybean vegetable oil, of different brands andlots, were used in this study; these included thirty expired and twenty non-expired samples. The oil oxidation was measured by peroxide index. NIR spectra were employed in raw form and preprocessed by offset baseline correction and Savitzky-Golay derivative procedure, followed by PCA exploratory analysis, which showed that NIR spectra would be suitable for the classification task of soybean oil samples. The classification models were based in SPA-LDA (Linear Discriminant Analysis coupled with Successive Projection Algorithm) and PLS-DA (Discriminant Analysis by Partial Least Squares). The set of samples (50) was partitioned into two groups of training (35 samples: 15 non-expired and 20 expired) and test samples (15 samples 5 non-expired and 10 expired) using sample-selection approaches: (i) Kennard-Stone, (ii) Duplex, and (iii) Random, in order to evaluate the robustness of the models. The obtained results for the independent test set (in terms of correct classification rate) were 96% and 98% for SPA-LDA and PLS-DA, respectively, indicating that the NIR spectra can be used as an alternative to evaluate the degree of oxidation of soybean oil samples. Copyright © 2015 Elsevier Ltd. All rights reserved.
Gray, B.R.; Haro, R.J.; Rogala, J.T.; Sauer, J.S.
2005-01-01
1. Macroinvertebrate count data often exhibit nested or hierarchical structure. Examples include multiple measurements along each of a set of streams, and multiple synoptic measurements from each of a set of ponds. With data exhibiting hierarchical structure, outcomes at both sampling (e.g. Within stream) and aggregated (e.g. Stream) scales are often of interest. Unfortunately, methods for modelling hierarchical count data have received little attention in the ecological literature. 2. We demonstrate the use of hierarchical count models using fingernail clam (Family: Sphaeriidae) count data and habitat predictors derived from sampling and aggregated spatial scales. The sampling scale corresponded to that of a standard Ponar grab (0.052 m(2)) and the aggregated scale to impounded and backwater regions within 38-197 km reaches of the Upper Mississippi River. Impounded and backwater regions were resampled annually for 10 years. Consequently, measurements on clams were nested within years. Counts were treated as negative binomial random variates, and means from each resampling event as random departures from the impounded and backwater region grand means. 3. Clam models were improved by the addition of covariates that varied at both the sampling and regional scales. Substrate composition varied at the sampling scale and was associated with model improvements, and reductions (for a given mean) in variance at the sampling scale. Inorganic suspended solids (ISS) levels, measured in the summer preceding sampling, also yielded model improvements and were associated with reductions in variances at the regional rather than sampling scales. ISS levels were negatively associated with mean clam counts. 4. Hierarchical models allow hierarchically structured data to be modelled without ignoring information specific to levels of the hierarchy. In addition, information at each hierarchical level may be modelled as functions of covariates that themselves vary by and within levels. As a result, hierarchical models provide researchers and resource managers with a method for modelling hierarchical data that explicitly recognises both the sampling design and the information contained in the corresponding data.
Johnson, Jason K.; Oyen, Diane Adele; Chertkov, Michael; ...
2016-12-01
Inference and learning of graphical models are both well-studied problems in statistics and machine learning that have found many applications in science and engineering. However, exact inference is intractable in general graphical models, which suggests the problem of seeking the best approximation to a collection of random variables within some tractable family of graphical models. In this paper, we focus on the class of planar Ising models, for which exact inference is tractable using techniques of statistical physics. Based on these techniques and recent methods for planarity testing and planar embedding, we propose a greedy algorithm for learning the bestmore » planar Ising model to approximate an arbitrary collection of binary random variables (possibly from sample data). Given the set of all pairwise correlations among variables, we select a planar graph and optimal planar Ising model defined on this graph to best approximate that set of correlations. Finally, we demonstrate our method in simulations and for two applications: modeling senate voting records and identifying geo-chemical depth trends from Mars rover data.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Johnson, Jason K.; Oyen, Diane Adele; Chertkov, Michael
Inference and learning of graphical models are both well-studied problems in statistics and machine learning that have found many applications in science and engineering. However, exact inference is intractable in general graphical models, which suggests the problem of seeking the best approximation to a collection of random variables within some tractable family of graphical models. In this paper, we focus on the class of planar Ising models, for which exact inference is tractable using techniques of statistical physics. Based on these techniques and recent methods for planarity testing and planar embedding, we propose a greedy algorithm for learning the bestmore » planar Ising model to approximate an arbitrary collection of binary random variables (possibly from sample data). Given the set of all pairwise correlations among variables, we select a planar graph and optimal planar Ising model defined on this graph to best approximate that set of correlations. Finally, we demonstrate our method in simulations and for two applications: modeling senate voting records and identifying geo-chemical depth trends from Mars rover data.« less
Asteroid orbital inversion using uniform phase-space sampling
NASA Astrophysics Data System (ADS)
Muinonen, K.; Pentikäinen, H.; Granvik, M.; Oszkiewicz, D.; Virtanen, J.
2014-07-01
We review statistical inverse methods for asteroid orbit computation from a small number of astrometric observations and short time intervals of observations. With the help of Markov-chain Monte Carlo methods (MCMC), we present a novel inverse method that utilizes uniform sampling of the phase space for the orbital elements. The statistical orbital ranging method (Virtanen et al. 2001, Muinonen et al. 2001) was set out to resolve the long-lasting challenges in the initial computation of orbits for asteroids. The ranging method starts from the selection of a pair of astrometric observations. Thereafter, the topocentric ranges and angular deviations in R.A. and Decl. are randomly sampled. The two Cartesian positions allow for the computation of orbital elements and, subsequently, the computation of ephemerides for the observation dates. Candidate orbital elements are included in the sample of accepted elements if the χ^2-value between the observed and computed observations is within a pre-defined threshold. The sample orbital elements obtain weights based on a certain debiasing procedure. When the weights are available, the full sample of orbital elements allows the probabilistic assessments for, e.g., object classification and ephemeris computation as well as the computation of collision probabilities. The MCMC ranging method (Oszkiewicz et al. 2009; see also Granvik et al. 2009) replaces the original sampling algorithm described above with a proposal probability density function (p.d.f.), and a chain of sample orbital elements results in the phase space. MCMC ranging is based on a bivariate Gaussian p.d.f. for the topocentric ranges, and allows for the sampling to focus on the phase-space domain with most of the probability mass. In the virtual-observation MCMC method (Muinonen et al. 2012), the proposal p.d.f. for the orbital elements is chosen to mimic the a posteriori p.d.f. for the elements: first, random errors are simulated for each observation, resulting in a set of virtual observations; second, corresponding virtual least-squares orbital elements are derived using the Nelder-Mead downhill simplex method; third, repeating the procedure two times allows for a computation of a difference for two sets of virtual orbital elements; and, fourth, this orbital-element difference constitutes a symmetric proposal in a random-walk Metropolis-Hastings algorithm, avoiding the explicit computation of the proposal p.d.f. In a discrete approximation, the allowed proposals coincide with the differences that are based on a large number of pre-computed sets of virtual least-squares orbital elements. The virtual-observation MCMC method is thus based on the characterization of the relevant volume in the orbital-element phase space. Here we utilize MCMC to map the phase-space domain of acceptable solutions. We can make use of the proposal p.d.f.s from the MCMC ranging and virtual-observation methods. The present phase-space mapping produces, upon convergence, a uniform sampling of the solution space within a pre-defined χ^2-value. The weights of the sampled orbital elements are then computed on the basis of the corresponding χ^2-values. The present method resembles the original ranging method. On one hand, MCMC mapping is insensitive to local extrema in the phase space and efficiently maps the solution space. This is somewhat contrary to the MCMC methods described above. On the other hand, MCMC mapping can suffer from producing a small number of sample elements with small χ^2-values, in resemblance to the original ranging method. We apply the methods to example near-Earth, main-belt, and transneptunian objects, and highlight the utilization of the methods in the data processing and analysis pipeline of the ESA Gaia space mission.
NASA Astrophysics Data System (ADS)
Forkert, Nils Daniel; Fiehler, Jens
2015-03-01
The tissue outcome prediction in acute ischemic stroke patients is highly relevant for clinical and research purposes. It has been shown that the combined analysis of diffusion and perfusion MRI datasets using high-level machine learning techniques leads to an improved prediction of final infarction compared to single perfusion parameter thresholding. However, most high-level classifiers require a previous training and, until now, it is ambiguous how many subjects are required for this, which is the focus of this work. 23 MRI datasets of acute stroke patients with known tissue outcome were used in this work. Relative values of diffusion and perfusion parameters as well as the binary tissue outcome were extracted on a voxel-by- voxel level for all patients and used for training of a random forest classifier. The number of patients used for training set definition was iteratively and randomly reduced from using all 22 other patients to only one other patient. Thus, 22 tissue outcome predictions were generated for each patient using the trained random forest classifiers and compared to the known tissue outcome using the Dice coefficient. Overall, a logarithmic relation between the number of patients used for training set definition and tissue outcome prediction accuracy was found. Quantitatively, a mean Dice coefficient of 0.45 was found for the prediction using the training set consisting of the voxel information from only one other patient, which increases to 0.53 if using all other patients (n=22). Based on extrapolation, 50-100 patients appear to be a reasonable tradeoff between tissue outcome prediction accuracy and effort required for data acquisition and preparation.
Zhang, Hong-Guang; Yang, Qin-Min; Lu, Jian-Gang
2014-04-01
In this paper, a novel discriminant methodology based on near infrared spectroscopic analysis technique and least square support vector machine was proposed for rapid and nondestructive discrimination of different types of Polyacrylamide. The diffuse reflectance spectra of samples of Non-ionic Polyacrylamide, Anionic Polyacrylamide and Cationic Polyacrylamide were measured. Then principal component analysis method was applied to reduce the dimension of the spectral data and extract of the principal compnents. The first three principal components were used for cluster analysis of the three different types of Polyacrylamide. Then those principal components were also used as inputs of least square support vector machine model. The optimization of the parameters and the number of principal components used as inputs of least square support vector machine model was performed through cross validation based on grid search. 60 samples of each type of Polyacrylamide were collected. Thus a total of 180 samples were obtained. 135 samples, 45 samples for each type of Polyacrylamide, were randomly split into a training set to build calibration model and the rest 45 samples were used as test set to evaluate the performance of the developed model. In addition, 5 Cationic Polyacrylamide samples and 5 Anionic Polyacrylamide samples adulterated with different proportion of Non-ionic Polyacrylamide were also prepared to show the feasibilty of the proposed method to discriminate the adulterated Polyacrylamide samples. The prediction error threshold for each type of Polyacrylamide was determined by F statistical significance test method based on the prediction error of the training set of corresponding type of Polyacrylamide in cross validation. The discrimination accuracy of the built model was 100% for prediction of the test set. The prediction of the model for the 10 mixing samples was also presented, and all mixing samples were accurately discriminated as adulterated samples. The overall results demonstrate that the discrimination method proposed in the present paper can rapidly and nondestructively discriminate the different types of Polyacrylamide and the adulterated Polyacrylamide samples, and offered a new approach to discriminate the types of Polyacrylamide.
Signaling protein signature predicts clinical outcome of non-small-cell lung cancer.
Jin, Bao-Feng; Yang, Fan; Ying, Xiao-Min; Gong, Lin; Hu, Shuo-Feng; Zhao, Qing; Liao, Yi-Da; Chen, Ke-Zhong; Li, Teng; Tai, Yan-Hong; Cao, Yuan; Li, Xiao; Huang, Yan; Zhan, Xiao-Yan; Qin, Xuan-He; Wu, Jin; Chen, Shuai; Guo, Sai-Sai; Zhang, Yu-Cheng; Chen, Jing; Shen, Dan-Hua; Sun, Kun-Kun; Chen, Lu; Li, Wei-Hua; Li, Ai-Ling; Wang, Na; Xia, Qing; Wang, Jun; Zhou, Tao
2018-03-06
Non-small-cell lung cancer (NSCLC) is characterized by abnormalities of numerous signaling proteins that play pivotal roles in cancer development and progression. Many of these proteins have been reported to be correlated with clinical outcomes of NSCLC. However, none of them could provide adequate accuracy of prognosis prediction in clinical application. A total of 384 resected NSCLC specimens from two hospitals in Beijing (BJ) and Chongqing (CQ) were collected. Using immunohistochemistry (IHC) staining on stored formalin-fixed paraffin-embedded (FFPE) surgical samples, we examined the expression levels of 75 critical proteins on BJ samples. Random forest algorithm (RFA) and support vector machines (SVM) computation were applied to identify protein signatures on 2/3 randomly assigned BJ samples. The identified signatures were tested on the remaining BJ samples, and were further validated with CQ independent cohort. A 6-protein signature for adenocarcinoma (ADC) and a 5-protein signature for squamous cell carcinoma (SCC) were identified from training sets and tested in testing sets. In independent validation with CQ cohort, patients can also be divided into high- and low-risk groups with significantly different median overall survivals by Kaplan-Meier analysis, both in ADC (31 months vs. 87 months, HR 2.81; P < 0.001) and SCC patients (27 months vs. not reached, HR 9.97; P < 0.001). Cox regression analysis showed that both signatures are independent prognostic indicators and outperformed TNM staging (ADC: adjusted HR 3.07 vs. 2.43, SCC: adjusted HR 7.84 vs. 2.24). Particularly, we found that only the ADC patients in high-risk group significantly benefited from adjuvant chemotherapy (P = 0.018). Both ADC and SCC protein signatures could effectively stratify the prognosis of NSCLC patients, and may support patient selection for adjuvant chemotherapy.
Enhanced cognitive behavioral therapy for eating disorders adapted for a group setting.
Wade, Stephanie; Byrne, Sue; Allen, Karina
2017-08-01
This randomized control trial is an evaluation of the effectiveness of enhanced cognitive behavioral treatment (CBT-E) for eating disorders adapted for a group setting. The study aimed to examine the effects of group CBT-E on eating disorder psychopathology and additional maintaining pathology. A transdiagnostic sample of individuals with eating disorders with a BMI ≥ 18 kg/m 2 (N = 40) were randomized to an immediate-start or delayed-start condition so as to compare therapeutic effects of group CBT-E with a waitlist control. Global Eating Disorder Examination Questionnaire (EDE-Q) scores, BMI, and measures of Clinical Perfectionism, Self-Esteem, Interpersonal Difficulties, and Mood Intolerance were measured across the 8-week control period, throughout the group treatment and at 3-months post-treatment. Over 70% of those who entered the trial completed treatment. The first eight weeks of group CBT-E were more effective at reducing Global EDE-Q scores than no treatment (waitlist control). By post-treatment, good outcome (a Global EDE-Q within 1 SD of Australian community norms plus BMI ≥ 18.5) was achieved by 67.9% of treatment completers and 66.7% of the total sample. Symptom abstinence within the previous month was reported by 14.3% of treatment completers and 10.3% of the total sample. Significant reductions in Clinical Perfectionism, Self-Esteem, Interpersonal Difficulties, and Mood Intolerance were also observed. This study demonstrated that a group version of CBT-E can be effective at reducing eating disorder psychopathology in a transdiagnostic sample of individuals with eating disorders. Group CBT-E could provide a means of increasing availability of evidence-based treatment for eating disorders. © 2017 Wiley Periodicals, Inc.
Convenience samples and caregiving research: how generalizable are the findings?
Pruchno, Rachel A; Brill, Jonathan E; Shands, Yvonne; Gordon, Judith R; Genderson, Maureen Wilson; Rose, Miriam; Cartwright, Francine
2008-12-01
We contrast characteristics of respondents recruited using convenience strategies with those of respondents recruited by random digit dial (RDD) methods. We compare sample variances, means, and interrelationships among variables generated from the convenience and RDD samples. Women aged 50 to 64 who work full time and provide care to a community-dwelling older person were recruited using either RDD (N = 55) or convenience methods (N = 87). Telephone interviews were conducted using reliable, valid measures of demographics, characteristics of the care recipient, help provided to the care recipient, evaluations of caregiver-care recipient relationship, and outcomes common to caregiving research. Convenience and RDD samples had similar variances on 68.4% of the examined variables. We found significant mean differences for 63% of the variables examined. Bivariate correlations suggest that one would reach different conclusions using the convenience and RDD sample data sets. Researchers should use convenience samples cautiously, as they may have limited generalizability.
An embedded system for face classification in infrared video using sparse representation
NASA Astrophysics Data System (ADS)
Saavedra M., Antonio; Pezoa, Jorge E.; Zarkesh-Ha, Payman; Figueroa, Miguel
2017-09-01
We propose a platform for robust face recognition in Infrared (IR) images using Compressive Sensing (CS). In line with CS theory, the classification problem is solved using a sparse representation framework, where test images are modeled by means of a linear combination of the training set. Because the training set constitutes an over-complete dictionary, we identify new images by finding their sparsest representation based on the training set, using standard l1-minimization algorithms. Unlike conventional face-recognition algorithms, we feature extraction is performed using random projections with a precomputed binary matrix, as proposed in the CS literature. This random sampling reduces the effects of noise and occlusions such as facial hair, eyeglasses, and disguises, which are notoriously challenging in IR images. Thus, the performance of our framework is robust to these noise and occlusion factors, achieving an average accuracy of approximately 90% when the UCHThermalFace database is used for training and testing purposes. We implemented our framework on a high-performance embedded digital system, where the computation of the sparse representation of IR images was performed by a dedicated hardware using a deeply pipelined architecture on an Field-Programmable Gate Array (FPGA).
Diffusion Processes Satisfying a Conservation Law Constraint
Bakosi, J.; Ristorcelli, J. R.
2014-03-04
We investigate coupled stochastic differential equations governing N non-negative continuous random variables that satisfy a conservation principle. In various fields a conservation law requires that a set of fluctuating variables be non-negative and (if appropriately normalized) sum to one. As a result, any stochastic differential equation model to be realizable must not produce events outside of the allowed sample space. We develop a set of constraints on the drift and diffusion terms of such stochastic models to ensure that both the non-negativity and the unit-sum conservation law constraint are satisfied as the variables evolve in time. We investigate the consequencesmore » of the developed constraints on the Fokker-Planck equation, the associated system of stochastic differential equations, and the evolution equations of the first four moments of the probability density function. We show that random variables, satisfying a conservation law constraint, represented by stochastic diffusion processes, must have diffusion terms that are coupled and nonlinear. The set of constraints developed enables the development of statistical representations of fluctuating variables satisfying a conservation law. We exemplify the results with the bivariate beta process and the multivariate Wright-Fisher, Dirichlet, and Lochner’s generalized Dirichlet processes.« less
Diffusion Processes Satisfying a Conservation Law Constraint
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bakosi, J.; Ristorcelli, J. R.
We investigate coupled stochastic differential equations governing N non-negative continuous random variables that satisfy a conservation principle. In various fields a conservation law requires that a set of fluctuating variables be non-negative and (if appropriately normalized) sum to one. As a result, any stochastic differential equation model to be realizable must not produce events outside of the allowed sample space. We develop a set of constraints on the drift and diffusion terms of such stochastic models to ensure that both the non-negativity and the unit-sum conservation law constraint are satisfied as the variables evolve in time. We investigate the consequencesmore » of the developed constraints on the Fokker-Planck equation, the associated system of stochastic differential equations, and the evolution equations of the first four moments of the probability density function. We show that random variables, satisfying a conservation law constraint, represented by stochastic diffusion processes, must have diffusion terms that are coupled and nonlinear. The set of constraints developed enables the development of statistical representations of fluctuating variables satisfying a conservation law. We exemplify the results with the bivariate beta process and the multivariate Wright-Fisher, Dirichlet, and Lochner’s generalized Dirichlet processes.« less
Decorrelation of the true and estimated classifier errors in high-dimensional settings.
Hanczar, Blaise; Hua, Jianping; Dougherty, Edward R
2007-01-01
The aim of many microarray experiments is to build discriminatory diagnosis and prognosis models. Given the huge number of features and the small number of examples, model validity which refers to the precision of error estimation is a critical issue. Previous studies have addressed this issue via the deviation distribution (estimated error minus true error), in particular, the deterioration of cross-validation precision in high-dimensional settings where feature selection is used to mitigate the peaking phenomenon (overfitting). Because classifier design is based upon random samples, both the true and estimated errors are sample-dependent random variables, and one would expect a loss of precision if the estimated and true errors are not well correlated, so that natural questions arise as to the degree of correlation and the manner in which lack of correlation impacts error estimation. We demonstrate the effect of correlation on error precision via a decomposition of the variance of the deviation distribution, observe that the correlation is often severely decreased in high-dimensional settings, and show that the effect of high dimensionality on error estimation tends to result more from its decorrelating effects than from its impact on the variance of the estimated error. We consider the correlation between the true and estimated errors under different experimental conditions using both synthetic and real data, several feature-selection methods, different classification rules, and three error estimators commonly used (leave-one-out cross-validation, k-fold cross-validation, and .632 bootstrap). Moreover, three scenarios are considered: (1) feature selection, (2) known-feature set, and (3) all features. Only the first is of practical interest; however, the other two are needed for comparison purposes. We will observe that the true and estimated errors tend to be much more correlated in the case of a known feature set than with either feature selection or using all features, with the better correlation between the latter two showing no general trend, but differing for different models.
Combinatorial Statistics on Trees and Networks
2010-09-29
interaction graph is drawn from the Erdos- Renyi , G(n,p), where each edge is present independently with probability p. For this model we establish a double...special interest is the behavior of Gibbs sampling on the Erdos- Renyi random graph G{n, d/n), where each edge is chosen independently with...which have no counterparts in the coloring setting. Our proof presented here exploits in novel ways the local treelike structure of Erdos- Renyi
California’s Shrinking Defense Contractors: Effects on Small Suppliers,
1996-01-01
They did not physically segregate any parts of their operations or set up a separate data management system to do business with prime contractors...of them out of business . * Small defense aerospace suppliers are not making cutting-edge products for commercial customers. Small defense aerospace...are considered critical to their primes’ supplier base are concerned. Most Firms Are Still in Business When we checked with a random sample of small
ERIC Educational Resources Information Center
Wertz, Richard D.; And Others
In an effort to elicit student attitudes concerning residence hall living on campus a questionnaire was designed and administered to a random sample of 1,100 resident students at the University of South Carolina. The survey instrument consisted of a set of sixteen statements that required an "is" and a "should be" response. The…
Have We Really Been Analyzing Terminating Simulations Incorrectly All These Years?
2013-12-01
TERMINATING SIMULATIONS INCORRECTLY ALL THESE YEARS? Paul J. Sánchez Operations Research Naval Postgraduate School 1411 Cunningham Road Monterey, CA...measure. If that observation directly represents an end state such as the number of failed components after a week’s operation , or the number of patients...processed in 24 hours of emergency room operations , there’s no problem—the set of values obtained by replication represent a random sample from the
Judd, Charles M; Westfall, Jacob; Kenny, David A
2012-07-01
Throughout social and cognitive psychology, participants are routinely asked to respond in some way to experimental stimuli that are thought to represent categories of theoretical interest. For instance, in measures of implicit attitudes, participants are primed with pictures of specific African American and White stimulus persons sampled in some way from possible stimuli that might have been used. Yet seldom is the sampling of stimuli taken into account in the analysis of the resulting data, in spite of numerous warnings about the perils of ignoring stimulus variation (Clark, 1973; Kenny, 1985; Wells & Windschitl, 1999). Part of this failure to attend to stimulus variation is due to the demands imposed by traditional analysis of variance procedures for the analysis of data when both participants and stimuli are treated as random factors. In this article, we present a comprehensive solution using mixed models for the analysis of data with crossed random factors (e.g., participants and stimuli). We show the substantial biases inherent in analyses that ignore one or the other of the random factors, and we illustrate the substantial advantages of the mixed models approach with both hypothetical and actual, well-known data sets in social psychology (Bem, 2011; Blair, Chapleau, & Judd, 2005; Correll, Park, Judd, & Wittenbrink, 2002). PsycINFO Database Record (c) 2012 APA, all rights reserved
Nonuniform sampling theorems for random signals in the linear canonical transform domain
NASA Astrophysics Data System (ADS)
Shuiqing, Xu; Congmei, Jiang; Yi, Chai; Youqiang, Hu; Lei, Huang
2018-06-01
Nonuniform sampling can be encountered in various practical processes because of random events or poor timebase. The analysis and applications of the nonuniform sampling for deterministic signals related to the linear canonical transform (LCT) have been well considered and researched, but up to now no papers have been published regarding the various nonuniform sampling theorems for random signals related to the LCT. The aim of this article is to explore the nonuniform sampling and reconstruction of random signals associated with the LCT. First, some special nonuniform sampling models are briefly introduced. Second, based on these models, some reconstruction theorems for random signals from various nonuniform samples associated with the LCT have been derived. Finally, the simulation results are made to prove the accuracy of the sampling theorems. In addition, the latent real practices of the nonuniform sampling for random signals have been also discussed.
Pierce, Brandon L; Ahsan, Habibul; Vanderweele, Tyler J
2011-06-01
Mendelian Randomization (MR) studies assess the causality of an exposure-disease association using genetic determinants [i.e. instrumental variables (IVs)] of the exposure. Power and IV strength requirements for MR studies using multiple genetic variants have not been explored. We simulated cohort data sets consisting of a normally distributed disease trait, a normally distributed exposure, which affects this trait and a biallelic genetic variant that affects the exposure. We estimated power to detect an effect of exposure on disease for varying allele frequencies, effect sizes and samples sizes (using two-stage least squares regression on 10,000 data sets-Stage 1 is a regression of exposure on the variant. Stage 2 is a regression of disease on the fitted exposure). Similar analyses were conducted using multiple genetic variants (5, 10, 20) as independent or combined IVs. We assessed IV strength using the first-stage F statistic. Simulations of realistic scenarios indicate that MR studies will require large (n > 1000), often very large (n > 10,000), sample sizes. In many cases, so-called 'weak IV' problems arise when using multiple variants as independent IVs (even with as few as five), resulting in biased effect estimates. Combining genetic factors into fewer IVs results in modest power decreases, but alleviates weak IV problems. Ideal methods for combining genetic factors depend upon knowledge of the genetic architecture underlying the exposure. The feasibility of well-powered, unbiased MR studies will depend upon the amount of variance in the exposure that can be explained by known genetic factors and the 'strength' of the IV set derived from these genetic factors.
Cost-effectiveness of Collaborative Care for Depression in Human Immunodeficiency Virus Clinics
Fortney, John C; Gifford, Allen L; Rimland, David; Monson, Thomas; Rodriguez-Barradas, Maria C.; Pyne, Jeffrey M
2015-01-01
Objective To examine the cost-effectiveness of the HITIDES intervention. Design Randomized controlled effectiveness and implementation trial comparing depression collaborative care with enhanced usual care. Setting Three Veterans Health Administration (VHA) HIV clinics in the Southern US. Subjects 249 HIV-infected patients completed the baseline interview; 123 were randomized to the intervention and 126 to usual care. Intervention HITIDES consisted of an off-site HIV depression care team that delivered up to 12 months of collaborative care. The intervention used a stepped-care model for depression treatment and specific recommendations were based on the Texas Medication Algorithm Project and the VA/Department of Defense Depression Treatment Guidelines. Main outcome measure(s) Quality-adjusted life years (QALYs) were calculated using the 12-Item Short Form Health Survey, the Quality of Well Being Scale, and by converting depression-free days to QALYs. The base case analysis used outpatient, pharmacy, patient, and intervention costs. Cost-effectiveness was calculated using incremental cost effectiveness ratios (ICERs) and net health benefit (NHB). ICER distributions were generated using nonparametric bootstrap with replacement sampling. Results The HITIDES intervention was more effective and cost-saving compared to usual care in 78% of bootstrapped samples. The intervention NHB was positive and therefore deemed cost-effective using an ICER threshold of $50,000/QALY. Conclusions In HIV clinic settings this intervention was more effective and cost-saving compared to usual care. Implementation of off-site depression collaborative care programs in specialty care settings may be a strategy that not only improves outcomes for patients, but also maximizes the efficient use of limited healthcare resources. PMID:26102447
Gilbert, Peter B; Yu, Xuesong; Rotnitzky, Andrea
2014-03-15
To address the objective in a clinical trial to estimate the mean or mean difference of an expensive endpoint Y, one approach employs a two-phase sampling design, wherein inexpensive auxiliary variables W predictive of Y are measured in everyone, Y is measured in a random sample, and the semiparametric efficient estimator is applied. This approach is made efficient by specifying the phase two selection probabilities as optimal functions of the auxiliary variables and measurement costs. While this approach is familiar to survey samplers, it apparently has seldom been used in clinical trials, and several novel results practicable for clinical trials are developed. We perform simulations to identify settings where the optimal approach significantly improves efficiency compared to approaches in current practice. We provide proofs and R code. The optimality results are developed to design an HIV vaccine trial, with objective to compare the mean 'importance-weighted' breadth (Y) of the T-cell response between randomized vaccine groups. The trial collects an auxiliary response (W) highly predictive of Y and measures Y in the optimal subset. We show that the optimal design-estimation approach can confer anywhere between absent and large efficiency gain (up to 24 % in the examples) compared to the approach with the same efficient estimator but simple random sampling, where greater variability in the cost-standardized conditional variance of Y given W yields greater efficiency gains. Accurate estimation of E[Y | W] is important for realizing the efficiency gain, which is aided by an ample phase two sample and by using a robust fitting method. Copyright © 2013 John Wiley & Sons, Ltd.
Gilbert, Peter B.; Yu, Xuesong; Rotnitzky, Andrea
2014-01-01
To address the objective in a clinical trial to estimate the mean or mean difference of an expensive endpoint Y, one approach employs a two-phase sampling design, wherein inexpensive auxiliary variables W predictive of Y are measured in everyone, Y is measured in a random sample, and the semi-parametric efficient estimator is applied. This approach is made efficient by specifying the phase-two selection probabilities as optimal functions of the auxiliary variables and measurement costs. While this approach is familiar to survey samplers, it apparently has seldom been used in clinical trials, and several novel results practicable for clinical trials are developed. Simulations are performed to identify settings where the optimal approach significantly improves efficiency compared to approaches in current practice. Proofs and R code are provided. The optimality results are developed to design an HIV vaccine trial, with objective to compare the mean “importance-weighted” breadth (Y) of the T cell response between randomized vaccine groups. The trial collects an auxiliary response (W) highly predictive of Y, and measures Y in the optimal subset. We show that the optimal design-estimation approach can confer anywhere between absent and large efficiency gain (up to 24% in the examples) compared to the approach with the same efficient estimator but simple random sampling, where greater variability in the cost-standardized conditional variance of Y given W yields greater efficiency gains. Accurate estimation of E[Y∣W] is important for realizing the efficiency gain, which is aided by an ample phase-two sample and by using a robust fitting method. PMID:24123289
Hassanpour, Gholamreza; Mohebali, Mehdi; Raeisi, Ahmad; Abolghasemi, Hassan; Zeraati, Hojjat; Alipour, Mohsen; Azizi, Ebrahim; Keshavarz, Hossein
2011-06-01
The transmission of malaria by blood transfusion was one of the first transfusion-transmitted infections recorded in the world. Transfusion-transmitted malaria may lead to serious problems because infection with Plasmodium falciparum may cause rapidly fatal death. This study aimed to compare real-time polymerase chain reaction (real-time PCR) with rapid diagnostic test (RDT) and light microscopy for the detection of Plasmodium spp. in blood transfusion, both in endemic and non-endemic areas of malaria disease in Iran. Two sets of 50 blood samples were randomly collected. One set was taken from blood samples donated in blood bank of Bandar Abbas, a city located in a malarious-endemic area, and the other set from Tehran, a non-endemic one. Light microscopic examination on both thin and thick smears, RDTs, and real-time PCR were performed on the blood samples and the results were compared. Thin and thick light microscopic examinations of all samples as well as RDT results were negative for Plasmodium spp. Two blood samples from endemic area were positive only with real-time PCR. It seems that real-time PCR as a highly sensitive method can be helpful for the confirmation of malaria infection in different units of blood transfusion organization especially in malaria-endemic areas where the majority of donors may be potentially infected with malaria parasites.
Shallow ground-water quality beneath a major urban center: Denver, Colorado, USA
Bruce, B.W.; McMahon, P.B.
1996-01-01
A survey of the chemical quality of ground water in the unconsolidated alluvial aquifer beneath a major urban center (Denver, Colorado, USA) was performed in 1993 with the objective of characterizing the quality of shallow ground-water in the urban area and relating water quality to land use. Thirty randomly selected alluvial wells were each sampled once for a broad range of dissolved constituents. The urban land use at each well site was sub- classified into one of three land-use settings: residential, commercial, and industrial. Shallow ground-water quality was highly variable in the urban area and the variability could be related to these land-use setting classifications. Sulfate (SO4) was the predominant anion in most samples from the residential and commercial land-use settings, whereas bicarbonate (HCO3) was the predominant anion in samples from the industrial land-use setting, indicating a possible shift in redox conditions associated with land use. Only three of 30 samples had nitrate concentrations that exceeded the US national drinking-water standard of 10 mg l-1 as nitrogen, indicating that nitrate contamination of shallow ground water may not be a serious problem in this urban area. However, the highest median nitrate concentration (4.2 mg l-1) was in samples from the residential setting, where fertilizer application is assumed to be most intense. Twenty-seven of 30 samples had detectable pesticides and nine of 82 analyzed pesticide compounds were detected at low concentrations, indicating that pesticides are widely distributed in shallow ground water in this urban area. Although the highest median total pesticide concentration (0.17 ??g l-1) was in the commercial setting, the herbicides prometon and atrazine were found in each land-use setting. Similarly, 25 of 29 samples analyzed had detectable volatile organic compounds (VOCs) indicating these compounds are also widely distributed in this urban area. The total VOC concentrations in sampled wells ranged from nondetectable to 23 442 ??g l-1. Widespread detections and occasionally high concentrations point to VOCs as the major anthropogenic ground-water impact in this urban environment. Generally, the highest VOC concentrations occurred in samples from the industrial setting. The most frequently detected VOC was the gasoline additive methyl tertbutyl ether (MTBE, in 23 of 29 wells). Results from this study indicate that the quality of shallow ground water in major urban areas can be related to land-use settings. Moreover, some VOCs and pesticides may be widely distributed at low concentrations in shallow ground water throughout major urban areas. As a result, the differentiation between point and non-point sources for these compounds in urban areas may be difficult.
An AUC-based permutation variable importance measure for random forests
2013-01-01
Background The random forest (RF) method is a commonly used tool for classification with high dimensional data as well as for ranking candidate predictors based on the so-called random forest variable importance measures (VIMs). However the classification performance of RF is known to be suboptimal in case of strongly unbalanced data, i.e. data where response class sizes differ considerably. Suggestions were made to obtain better classification performance based either on sampling procedures or on cost sensitivity analyses. However to our knowledge the performance of the VIMs has not yet been examined in the case of unbalanced response classes. In this paper we explore the performance of the permutation VIM for unbalanced data settings and introduce an alternative permutation VIM based on the area under the curve (AUC) that is expected to be more robust towards class imbalance. Results We investigated the performance of the standard permutation VIM and of our novel AUC-based permutation VIM for different class imbalance levels using simulated data and real data. The results suggest that the new AUC-based permutation VIM outperforms the standard permutation VIM for unbalanced data settings while both permutation VIMs have equal performance for balanced data settings. Conclusions The standard permutation VIM loses its ability to discriminate between associated predictors and predictors not associated with the response for increasing class imbalance. It is outperformed by our new AUC-based permutation VIM for unbalanced data settings, while the performance of both VIMs is very similar in the case of balanced classes. The new AUC-based VIM is implemented in the R package party for the unbiased RF variant based on conditional inference trees. The codes implementing our study are available from the companion website: http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/070_drittmittel/janitza/index.html. PMID:23560875
An AUC-based permutation variable importance measure for random forests.
Janitza, Silke; Strobl, Carolin; Boulesteix, Anne-Laure
2013-04-05
The random forest (RF) method is a commonly used tool for classification with high dimensional data as well as for ranking candidate predictors based on the so-called random forest variable importance measures (VIMs). However the classification performance of RF is known to be suboptimal in case of strongly unbalanced data, i.e. data where response class sizes differ considerably. Suggestions were made to obtain better classification performance based either on sampling procedures or on cost sensitivity analyses. However to our knowledge the performance of the VIMs has not yet been examined in the case of unbalanced response classes. In this paper we explore the performance of the permutation VIM for unbalanced data settings and introduce an alternative permutation VIM based on the area under the curve (AUC) that is expected to be more robust towards class imbalance. We investigated the performance of the standard permutation VIM and of our novel AUC-based permutation VIM for different class imbalance levels using simulated data and real data. The results suggest that the new AUC-based permutation VIM outperforms the standard permutation VIM for unbalanced data settings while both permutation VIMs have equal performance for balanced data settings. The standard permutation VIM loses its ability to discriminate between associated predictors and predictors not associated with the response for increasing class imbalance. It is outperformed by our new AUC-based permutation VIM for unbalanced data settings, while the performance of both VIMs is very similar in the case of balanced classes. The new AUC-based VIM is implemented in the R package party for the unbiased RF variant based on conditional inference trees. The codes implementing our study are available from the companion website: http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/070_drittmittel/janitza/index.html.
Handa, Sudhanshu; Peterman, Amber; Seidenfeld, David; Tembo, Gelson
2016-02-01
There is promising recent evidence that poverty-targeted social cash transfers have potential to improve maternal health outcomes; however, questions remain surrounding design features responsible for impacts. In addition, virtually no evidence exists from the African region. This study explores the impact of Zambia's Child Grant Program on a range of maternal health utilization outcomes using a randomized design and difference-in-differences multivariate regression from data collected over 24 months from 2010 to 2012. Results indicate that while there are no measurable program impacts among the main sample, there are heterogeneous impacts on skilled attendance at birth among a sample of women residing in households having better access to maternal health services. The latter result is particularly interesting because of the overall low level of health care availability in program areas suggesting that dedicated program design or matching supply-side interventions may be necessary to leverage unconditional cash transfers in similar settings to impact maternal health. Copyright © 2015 John Wiley & Sons, Ltd.
Handa, Sudhanshu; Peterman, Amber; Seidenfeld, David; Tembo, Gelson
2017-01-01
There is promising recent evidence that poverty-targeted social cash transfers have potential to improve maternal health outcomes, however questions remain surrounding design features responsible for impacts. In addition, virtually no evidence exists from the African region. This study explores the impact of Zambia’s Child Grant Program on a range of maternal health utilization outcomes using a randomized design and difference-in-differences multivariate regression from data collected over 24 months from 2010 to 2012. Results indicate that while there are no measurable program impacts among the main sample, there are heterogeneous impacts on skilled attendance at birth among a sample of women residing in households having better access to maternal health services. The latter result is particularly interesting because of the overall low level of healthcare availability in program areas suggesting dedicated program design or matching supply-side interventions may be necessary to leverage unconditional cash transfers in similar settings to impact maternal health. PMID:25581062
Yoga in the schools: a systematic review of the literature.
Serwacki, Michelle L; Cook-Cottone, Catherine
2012-01-01
The objective of this research was to examine the evidence for delivering yoga-based interventions in schools. An electronic literature search was conducted to identify peer-reviewed, published studies in which yoga and a meditative component (breathing practices or meditation) were taught to youths in a school setting. Pilot studies, single cohort, quasi-experimental, and randomized clinical trials were considered. quality was evaluated and summarized. Twelve published studies were identified. Samples for which yoga was implemented as an intervention included youths with autism, intellectual disability, learning disability, and emotional disturbance, as well as typically developing youths. Although effects of participating in school-based yoga programs appeared to be beneficial for the most part, methodological limitations, including lack of randomization, small samples, limited detail regarding the intervention, and statistical ambiguities curtailed the ability to provide definitive conclusions or recommendations. Findings speak to the need for greater methodological rigor and an increased understanding of the mechanisms of success for school-based yoga interventions.
Comparison of Oral Reading Errors between Contextual Sentences and Random Words among Schoolchildren
ERIC Educational Resources Information Center
Khalid, Nursyairah Mohd; Buari, Noor Halilah; Chen, Ai-Hong
2017-01-01
This paper compares the oral reading errors between the contextual sentences and random words among schoolchildren. Two sets of reading materials were developed to test the oral reading errors in 30 schoolchildren (10.00±1.44 years). Set A was comprised contextual sentences while Set B encompassed random words. The schoolchildren were asked to…
Wilson, Nick; Edwards, Richard; Parry, Rhys
2011-03-04
To assess the need for additional smokefree settings, by measuring secondhand smoke (SHS) in a range of public places in an urban setting. Measurements were made in Wellington City during the 6-year period after the implementation of legislation that made indoor areas of restaurants and bars/pubs smokefree in December 2004, and up to 20 years after the 1990 legislation making most indoor workplaces smokefree. Fine particulate levels (PM2.5) were measured with a portable real-time airborne particle monitor. We collated data from our previously published work involving random sampling, purposeful sampling and convenience sampling of a wide range of settings (in 2006) and from additional sampling of selected indoor and outdoor areas (in 2007-2008 and 2010). The "outdoor" smoking areas of hospitality venues had the highest particulate levels, with a mean value of 72 mcg/m3 (range of maximum values 51-284 mcg/m3) (n=20 sampling periods). These levels are likely to create health hazards for some workers and patrons (i.e., when considered in relation to the WHO air quality guidelines). National survey data also indicate that these venues are the ones where SHS exposure is most frequently reported by non-smokers. Areas inside bars that were adjacent to "outdoor" smoking areas also had high levels, with a mean of 54 mcg/m3 (range of maximum values: 18-239 mcg/m3, for n=13 measurements). In all other settings mean levels were lower (means: 2-22 mcg/m3). These other settings included inside traditional style pubs/sports bars (n=10), bars (n=18), restaurants (n=9), cafes (n=5), inside public buildings (n=15), inside transportation settings (n=15), and various outdoor street/park settings (n=22). During the data collection in all settings made smokefree by law, there was only one occasion of a person observed smoking. The results suggest that compliance in pubs/bars and restaurants has remained extremely high in this city in the nearly six years since implementation of the upgraded smokefree legislation. The results also highlight additional potential health gain from extending smokefree policies to reduce SHS exposure in the "outdoor" smoking areas of hospitality venues and to reduce SHS drift from these areas to indoor areas.
Minnis, Alexandra M; vanDommelen-Gonzalez, Evan; Luecke, Ellen; Cheng, Helen; Dow, William; Bautista-Arredondo, Sergio; Padian, Nancy S
2015-02-01
Most existing evidence-based sexual health interventions focus on individual-level behavior, even though there is substantial evidence that highlights the influential role of social environments in shaping adolescents' behaviors and reproductive health outcomes. We developed Yo Puedo, a combined conditional cash transfer and life skills intervention for youth to promote educational attainment, job training, and reproductive health wellness that we then evaluated for feasibility among 162 youth aged 16-21 years in a predominantly Latino community in San Francisco, CA. The intervention targeted youth's social networks and involved recruitment and randomization of small social network clusters. In this paper we describe the design of the feasibility study and report participants' baseline characteristics. Furthermore, we examined the sample and design implications of recruiting social network clusters as the unit of randomization. Baseline data provide evidence that we successfully enrolled high risk youth using a social network recruitment approach in community and school-based settings. Nearly all participants (95%) were high risk for adverse educational and reproductive health outcomes based on multiple measures of low socioeconomic status (81%) and/or reported high risk behaviors (e.g., gang affiliation, past pregnancy, recent unprotected sex, frequent substance use; 62%). We achieved variability in the study sample through heterogeneity in recruitment of the index participants, whereas the individuals within the small social networks of close friends demonstrated substantial homogeneity across sociodemographic and risk profile characteristics. Social networks recruitment was feasible and yielded a sample of high risk youth willing to enroll in a randomized study to evaluate a novel sexual health intervention.
Redding, David W; Lucas, Tim C D; Blackburn, Tim M; Jones, Kate E
2017-01-01
Statistical approaches for inferring the spatial distribution of taxa (Species Distribution Models, SDMs) commonly rely on available occurrence data, which is often clumped and geographically restricted. Although available SDM methods address some of these factors, they could be more directly and accurately modelled using a spatially-explicit approach. Software to fit models with spatial autocorrelation parameters in SDMs are now widely available, but whether such approaches for inferring SDMs aid predictions compared to other methodologies is unknown. Here, within a simulated environment using 1000 generated species' ranges, we compared the performance of two commonly used non-spatial SDM methods (Maximum Entropy Modelling, MAXENT and boosted regression trees, BRT), to a spatial Bayesian SDM method (fitted using R-INLA), when the underlying data exhibit varying combinations of clumping and geographic restriction. Finally, we tested how any recommended methodological settings designed to account for spatially non-random patterns in the data impact inference. Spatial Bayesian SDM method was the most consistently accurate method, being in the top 2 most accurate methods in 7 out of 8 data sampling scenarios. Within high-coverage sample datasets, all methods performed fairly similarly. When sampling points were randomly spread, BRT had a 1-3% greater accuracy over the other methods and when samples were clumped, the spatial Bayesian SDM method had a 4%-8% better AUC score. Alternatively, when sampling points were restricted to a small section of the true range all methods were on average 10-12% less accurate, with greater variation among the methods. Model inference under the recommended settings to account for autocorrelation was not impacted by clumping or restriction of data, except for the complexity of the spatial regression term in the spatial Bayesian model. Methods, such as those made available by R-INLA, can be successfully used to account for spatial autocorrelation in an SDM context and, by taking account of random effects, produce outputs that can better elucidate the role of covariates in predicting species occurrence. Given that it is often unclear what the drivers are behind data clumping in an empirical occurrence dataset, or indeed how geographically restricted these data are, spatially-explicit Bayesian SDMs may be the better choice when modelling the spatial distribution of target species.
Irisin and exercise training in humans - results from a randomized controlled training trial.
Hecksteden, Anne; Wegmann, Melissa; Steffen, Anke; Kraushaar, Jochen; Morsch, Arne; Ruppenthal, Sandra; Kaestner, Lars; Meyer, Tim
2013-11-05
The recent discovery of a new myokine (irisin) potentially involved in health-related training effects has gained great attention, but evidence for a training-induced increase in irisin remains preliminary. Therefore, the present study aimed to determine whether irisin concentration is increased after regular exercise training in humans. In a randomized controlled design, two guideline conforming training interventions were studied. Inclusion criteria were age 30 to 60 years, <1 hour/week regular activity, non-smoker, and absence of major diseases. 102 participants could be included in the analysis. Subjects in the training groups exercised 3 times per week for 26 weeks. The minimum compliance was defined at 70%. Aerobic endurance training (AET) consisted of 45 minutes of walking/running at 60% heart rate reserve. Strength endurance training (SET) consisted of 8 machine-based exercises (2 sets of 15 repetitions with 100% of the 20 repetition maximum). Serum irisin concentrations in frozen serum samples were determined in a single blinded measurement immediately after the end of the training study. Physical performance provided positive control for the overall efficacy of training. Differences between groups were tested for significance using analysis of variance. For post hoc comparisons with the control group, Dunnett's test was used. Maximum performance increased significantly in the training groups compared with controls (controls: ±0.0 ± 0.7 km/h; AET: 1.1 ± 0.6 km/h, P < 0.01; SET: +0.5 ± 0.7 km/h, P = 0.01). Changes in irisin did not differ between groups (controls: 101 ± 81 ng/ml; AET: 44 ± 93 ng/ml; SET: 60 ± 92 ng/ml; in both cases: P = 0.99 (one-tailed testing), 1-β error probability = 0.7). The general upward trend was mainly accounted for by a negative association of irisin concentration with the storage duration of frozen serum samples (P < 0.01, β = -0.33). After arithmetically eliminating this confounder, the differences between groups remained non-significant. A training-induced increase in circulating irisin could not be confirmed, calling into question its proposed involvement in health-related training effects. Because frozen samples are prone to irisin degradation over time, positive results from uncontrolled trials might exclusively reflect the longer storage of samples from initial tests.
MAVTgsa: An R Package for Gene Set (Enrichment) Analysis
Chien, Chih-Yi; Chang, Ching-Wei; Tsai, Chen-An; ...
2014-01-01
Gene semore » t analysis methods aim to determine whether an a priori defined set of genes shows statistically significant difference in expression on either categorical or continuous outcomes. Although many methods for gene set analysis have been proposed, a systematic analysis tool for identification of different types of gene set significance modules has not been developed previously. This work presents an R package, called MAVTgsa, which includes three different methods for integrated gene set enrichment analysis. (1) The one-sided OLS (ordinary least squares) test detects coordinated changes of genes in gene set in one direction, either up- or downregulation. (2) The two-sided MANOVA (multivariate analysis variance) detects changes both up- and downregulation for studying two or more experimental conditions. (3) A random forests-based procedure is to identify gene sets that can accurately predict samples from different experimental conditions or are associated with the continuous phenotypes. MAVTgsa computes the P values and FDR (false discovery rate) q -value for all gene sets in the study. Furthermore, MAVTgsa provides several visualization outputs to support and interpret the enrichment results. This package is available online.« less
The zoonotic potential of Giardia intestinalis assemblage E in rural settings.
Abdel-Moein, Khaled A; Saeed, Hossam
2016-08-01
Giardiasis is a globally re-emerging protozoan disease with veterinary and public health implications. The current study was carried out to investigate the zoonotic potential of livestock-specific assemblage E in rural settings. For this purpose, a total of 40 microscopically positive Giardia stool samples from children with gastrointestinal complaints with or without diarrhea were enrolled in the study as well as fecal samples from 46 diarrheic cattle (18 dairy cows and 28 calves). Animal samples were examined by sedimentation method to identify Giardia spp., and then, all Giardia positive samples from human and animals were processed for molecular detection of livestock-specific assemblage E through amplification of assemblage-specific triosephosphate isomerase (tpi) gene using nested polymerase chain reaction (PCR). The results of the study revealed high unexpected occurrence of assemblage E among human samples (62.5 %), whereas the distribution among patients with diarrhea and those without was 42.1 and 81 %, respectively. On the other hand, the prevalence of Giardia spp. among diarrheic dairy cattle was (8.7 %), while only calves yielded positive results (14.3 %) and all bovine Giardia spp. were genetically classified as Giardia intestinalis assemblage E. Moreover, DNA sequencing of randomly selected one positive human sample and another bovine one revealed 100 and 99 % identity with assemblage E tpi gene sequences available at GenBank after BLAST analysis. In conclusion, the current study highlights the wide dissemination of livestock-specific assemblage E among humans in rural areas, and thus, zoonotic transmission cycle should not be discounted during the control of giardiasis in such settings.
ANALYSIS OF SAMPLING TECHNIQUES FOR IMBALANCED DATA: AN N=648 ADNI STUDY
Dubey, Rashmi; Zhou, Jiayu; Wang, Yalin; Thompson, Paul M.; Ye, Jieping
2013-01-01
Many neuroimaging applications deal with imbalanced imaging data. For example, in Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset, the mild cognitive impairment (MCI) cases eligible for the study are nearly two times the Alzheimer’s disease (AD) patients for structural magnetic resonance imaging (MRI) modality and six times the control cases for proteomics modality. Constructing an accurate classifier from imbalanced data is a challenging task. Traditional classifiers that aim to maximize the overall prediction accuracy tend to classify all data into the majority class. In this paper, we study an ensemble system of feature selection and data sampling for the class imbalance problem. We systematically analyze various sampling techniques by examining the efficacy of different rates and types of undersampling, oversampling, and a combination of over and under sampling approaches. We thoroughly examine six widely used feature selection algorithms to identify significant biomarkers and thereby reduce the complexity of the data. The efficacy of the ensemble techniques is evaluated using two different classifiers including Random Forest and Support Vector Machines based on classification accuracy, area under the receiver operating characteristic curve (AUC), sensitivity, and specificity measures. Our extensive experimental results show that for various problem settings in ADNI, (1). a balanced training set obtained with K-Medoids technique based undersampling gives the best overall performance among different data sampling techniques and no sampling approach; and (2). sparse logistic regression with stability selection achieves competitive performance among various feature selection algorithms. Comprehensive experiments with various settings show that our proposed ensemble model of multiple undersampled datasets yields stable and promising results. PMID:24176869
NASA Astrophysics Data System (ADS)
Li, Zhe; Feng, Jinchao; Liu, Pengyu; Sun, Zhonghua; Li, Gang; Jia, Kebin
2018-05-01
Temperature is usually considered as a fluctuation in near-infrared spectral measurement. Chemometric methods were extensively studied to correct the effect of temperature variations. However, temperature can be considered as a constructive parameter that provides detailed chemical information when systematically changed during the measurement. Our group has researched the relationship between temperature-induced spectral variation (TSVC) and normalized squared temperature. In this study, we focused on the influence of temperature distribution in calibration set. Multi-temperature calibration set selection (MTCS) method was proposed to improve the prediction accuracy by considering the temperature distribution of calibration samples. Furthermore, double-temperature calibration set selection (DTCS) method was proposed based on MTCS method and the relationship between TSVC and normalized squared temperature. We compare the prediction performance of PLS models based on random sampling method and proposed methods. The results from experimental studies showed that the prediction performance was improved by using proposed methods. Therefore, MTCS method and DTCS method will be the alternative methods to improve prediction accuracy in near-infrared spectral measurement.
An investigation of error correcting techniques for OMV and AXAF
NASA Technical Reports Server (NTRS)
Ingels, Frank; Fryer, John
1991-01-01
The original objectives of this project were to build a test system for the NASA 255/223 Reed/Solomon encoding/decoding chip set and circuit board. This test system was then to be interfaced with a convolutional system at MSFC to examine the performance of the concantinated codes. After considerable work, it was discovered that the convolutional system could not function as needed. This report documents the design, construction, and testing of the test apparatus for the R/S chip set. The approach taken was to verify the error correcting behavior of the chip set by injecting known error patterns onto data and observing the results. Error sequences were generated using pseudo-random number generator programs, with Poisson time distribution between errors and Gaussian burst lengths. Sample means, variances, and number of un-correctable errors were calculated for each data set before testing.
Testing for qualitative heterogeneity: An application to composite endpoints in survival analysis.
Oulhaj, Abderrahim; El Ghouch, Anouar; Holman, Rury R
2017-01-01
Composite endpoints are frequently used in clinical outcome trials to provide more endpoints, thereby increasing statistical power. A key requirement for a composite endpoint to be meaningful is the absence of the so-called qualitative heterogeneity to ensure a valid overall interpretation of any treatment effect identified. Qualitative heterogeneity occurs when individual components of a composite endpoint exhibit differences in the direction of a treatment effect. In this paper, we develop a general statistical method to test for qualitative heterogeneity, that is to test whether a given set of parameters share the same sign. This method is based on the intersection-union principle and, provided that the sample size is large, is valid whatever the model used for parameters estimation. We propose two versions of our testing procedure, one based on a random sampling from a Gaussian distribution and another version based on bootstrapping. Our work covers both the case of completely observed data and the case where some observations are censored which is an important issue in many clinical trials. We evaluated the size and power of our proposed tests by carrying out some extensive Monte Carlo simulations in the case of multivariate time to event data. The simulations were designed under a variety of conditions on dimensionality, censoring rate, sample size and correlation structure. Our testing procedure showed very good performances in terms of statistical power and type I error. The proposed test was applied to a data set from a single-center, randomized, double-blind controlled trial in the area of Alzheimer's disease.
Assessing the Generalizability of Randomized Trial Results to Target Populations
Stuart, Elizabeth A.; Bradshaw, Catherine P.; Leaf, Philip J.
2014-01-01
Recent years have seen increasing interest in and attention to evidence-based practices, where the “evidence” generally comes from well-conducted randomized trials. However, while those trials yield accurate estimates of the effect of the intervention for the participants in the trial (known as “internal validity”), they do not always yield relevant information about the effects in a particular target population (known as “external validity”). This may be due to a lack of specification of a target population when designing the trial, difficulties recruiting a sample that is representative of a pre-specified target population, or to interest in considering a target population somewhat different from the population directly targeted by the trial. This paper first provides an overview of existing design and analysis methods for assessing and enhancing the ability of a randomized trial to estimate treatment effects in a target population. It then provides a case study using one particular method, which weights the subjects in a randomized trial to match the population on a set of observed characteristics. The case study uses data from a randomized trial of School-wide Positive Behavioral Interventions and Supports (PBIS); our interest is in generalizing the results to the state of Maryland. In the case of PBIS, after weighting, estimated effects in the target population were similar to those observed in the randomized trial. The paper illustrates that statistical methods can be used to assess and enhance the external validity of randomized trials, making the results more applicable to policy and clinical questions. However, there are also many open research questions; future research should focus on questions of treatment effect heterogeneity and further developing these methods for enhancing external validity. Researchers should think carefully about the external validity of randomized trials and be cautious about extrapolating results to specific populations unless they are confident of the similarity between the trial sample and that target population. PMID:25307417
Assessing the generalizability of randomized trial results to target populations.
Stuart, Elizabeth A; Bradshaw, Catherine P; Leaf, Philip J
2015-04-01
Recent years have seen increasing interest in and attention to evidence-based practices, where the "evidence" generally comes from well-conducted randomized trials. However, while those trials yield accurate estimates of the effect of the intervention for the participants in the trial (known as "internal validity"), they do not always yield relevant information about the effects in a particular target population (known as "external validity"). This may be due to a lack of specification of a target population when designing the trial, difficulties recruiting a sample that is representative of a prespecified target population, or to interest in considering a target population somewhat different from the population directly targeted by the trial. This paper first provides an overview of existing design and analysis methods for assessing and enhancing the ability of a randomized trial to estimate treatment effects in a target population. It then provides a case study using one particular method, which weights the subjects in a randomized trial to match the population on a set of observed characteristics. The case study uses data from a randomized trial of school-wide positive behavioral interventions and supports (PBIS); our interest is in generalizing the results to the state of Maryland. In the case of PBIS, after weighting, estimated effects in the target population were similar to those observed in the randomized trial. The paper illustrates that statistical methods can be used to assess and enhance the external validity of randomized trials, making the results more applicable to policy and clinical questions. However, there are also many open research questions; future research should focus on questions of treatment effect heterogeneity and further developing these methods for enhancing external validity. Researchers should think carefully about the external validity of randomized trials and be cautious about extrapolating results to specific populations unless they are confident of the similarity between the trial sample and that target population.
Wearn, Oliver R.; Rowcliffe, J. Marcus; Carbone, Chris; Bernard, Henry; Ewers, Robert M.
2013-01-01
The proliferation of camera-trapping studies has led to a spate of extensions in the known distributions of many wild cat species, not least in Borneo. However, we still do not have a clear picture of the spatial patterns of felid abundance in Southeast Asia, particularly with respect to the large areas of highly-disturbed habitat. An important obstacle to increasing the usefulness of camera trap data is the widespread practice of setting cameras at non-random locations. Non-random deployment interacts with non-random space-use by animals, causing biases in our inferences about relative abundance from detection frequencies alone. This may be a particular problem if surveys do not adequately sample the full range of habitat features present in a study region. Using camera-trapping records and incidental sightings from the Kalabakan Forest Reserve, Sabah, Malaysian Borneo, we aimed to assess the relative abundance of felid species in highly-disturbed forest, as well as investigate felid space-use and the potential for biases resulting from non-random sampling. Although the area has been intensively logged over three decades, it was found to still retain the full complement of Bornean felids, including the bay cat Pardofelis badia, a poorly known Bornean endemic. Camera-trapping using strictly random locations detected four of the five Bornean felid species and revealed inter- and intra-specific differences in space-use. We compare our results with an extensive dataset of >1,200 felid records from previous camera-trapping studies and show that the relative abundance of the bay cat, in particular, may have previously been underestimated due to the use of non-random survey locations. Further surveys for this species using random locations will be crucial in determining its conservation status. We advocate the more wide-spread use of random survey locations in future camera-trapping surveys in order to increase the robustness and generality of inferences that can be made. PMID:24223717
Martínez Vega, Mabel V; Sharifzadeh, Sara; Wulfsohn, Dvoralai; Skov, Thomas; Clemmensen, Line Harder; Toldam-Andersen, Torben B
2013-12-01
Visible-near infrared spectroscopy remains a method of increasing interest as a fast alternative for the evaluation of fruit quality. The success of the method is assumed to be achieved by using large sets of samples to produce robust calibration models. In this study we used representative samples of an early and a late season apple cultivar to evaluate model robustness (in terms of prediction ability and error) on the soluble solids content (SSC) and acidity prediction, in the wavelength range 400-1100 nm. A total of 196 middle-early season and 219 late season apples (Malus domestica Borkh.) cvs 'Aroma' and 'Holsteiner Cox' samples were used to construct spectral models for SSC and acidity. Partial least squares (PLS), ridge regression (RR) and elastic net (EN) models were used to build prediction models. Furthermore, we compared three sub-sample arrangements for forming training and test sets ('smooth fractionator', by date of measurement after harvest and random). Using the 'smooth fractionator' sampling method, fewer spectral bands (26) and elastic net resulted in improved performance for SSC models of 'Aroma' apples, with a coefficient of variation CVSSC = 13%. The model showed consistently low errors and bias (PLS/EN: R(2) cal = 0.60/0.60; SEC = 0.88/0.88°Brix; Biascal = 0.00/0.00; R(2) val = 0.33/0.44; SEP = 1.14/1.03; Biasval = 0.04/0.03). However, the prediction acidity and for SSC (CV = 5%) of the late cultivar 'Holsteiner Cox' produced inferior results as compared with 'Aroma'. It was possible to construct local SSC and acidity calibration models for early season apple cultivars with CVs of SSC and acidity around 10%. The overall model performance of these data sets also depend on the proper selection of training and test sets. The 'smooth fractionator' protocol provided an objective method for obtaining training and test sets that capture the existing variability of the fruit samples for construction of visible-NIR prediction models. The implication is that by using such 'efficient' sampling methods for obtaining an initial sample of fruit that represents the variability of the population and for sub-sampling to form training and test sets it should be possible to use relatively small sample sizes to develop spectral predictions of fruit quality. Using feature selection and elastic net appears to improve the SSC model performance in terms of R(2), RMSECV and RMSEP for 'Aroma' apples. © 2013 Society of Chemical Industry.
Li, Lingling; Kulldorff, Martin; Russek-Cohen, Estelle; Kawai, Alison Tse; Hua, Wei
2015-12-01
The self-controlled risk interval design is commonly used to assess the association between an acute exposure and an adverse event of interest, implicitly adjusting for fixed, non-time-varying covariates. Explicit adjustment needs to be made for time-varying covariates, for example, age in young children. It can be performed via either a fixed or random adjustment. The random-adjustment approach can provide valid point and interval estimates but requires access to individual-level data for an unexposed baseline sample. The fixed-adjustment approach does not have this requirement and will provide a valid point estimate but may underestimate the variance. We conducted a comprehensive simulation study to evaluate their performance. We designed the simulation study using empirical data from the Food and Drug Administration-sponsored Mini-Sentinel Post-licensure Rapid Immunization Safety Monitoring Rotavirus Vaccines and Intussusception study in children 5-36.9 weeks of age. The time-varying confounder is age. We considered a variety of design parameters including sample size, relative risk, time-varying baseline risks, and risk interval length. The random-adjustment approach has very good performance in almost all considered settings. The fixed-adjustment approach can be used as a good alternative when the number of events used to estimate the time-varying baseline risks is at least the number of events used to estimate the relative risk, which is almost always the case. We successfully identified settings in which the fixed-adjustment approach can be used as a good alternative and provided guidelines on the selection and implementation of appropriate analyses for the self-controlled risk interval design. Copyright © 2015 John Wiley & Sons, Ltd.
Zamani, Ahmad Reza; Motamedi, Narges; Farajzadegan, Ziba
2015-01-01
Background: To have high-quality primary health care services, an adequate doctor–patient communication is necessary. Because of time restrictions and limited budget in health system, an effective, feasible, and continuous training approach is important. The aim of this study is to assess the appropriateness of a communication skills training program simultaneously with routine programs of health care system. Materials and Methods: It was a randomized field trial in two health network settings during 2013. Twenty-eight family physicians through simple random sampling and 140 patients through convenience sampling participated as intervention and control group. The physicians in the intervention group (n = 14) attended six educational sessions, simultaneous organization meeting, with case discussion and peer education method. In both the groups, physicians completed communication skills knowledge and attitude questionnaires, and patients completed patient satisfaction of medical interview questionnaire at baseline, immediately after intervention, and four months postintervention. Physicians and health network administrators (stakeholders), completed a set of program evaluation forms. Descriptive statistics and Chi-square test, t-test, and repeated measure analysis of variance were used to analyze the data. Results: Use of routine program as a strategy of training was rated by stakeholders highly on “feasibility” (80.5%), “acceptability” (93.5%), “educational content and method appropriateness” (80.75%), and “ability to integrating in the health system programs” (approximate 60%). Significant improvements were found in physicians’ knowledge (P < 0.001), attitude (P < 0.001), and patients’ satisfaction (P = 0.002) in intervention group. Conclusions: Communication skills training program, simultaneous organization meeting was successfully implemented and well received by stakeholders, without considering extra time and manpower. Therefore it can be a valuable opportunity toward communication skills training. PMID:27462613
Aoun, Samar M; Nekolaichuk, Cheryl
2014-12-01
The adoption of evidence-based hierarchies and research methods from other disciplines may not completely translate to complex palliative care settings. The heterogeneity of the palliative care population, complexity of clinical presentations, and fluctuating health states present significant research challenges. The aim of this narrative review was to explore the debate about the use of current evidence-based approaches for conducting research, such as randomized controlled trials and other study designs, in palliative care, and more specifically to (1) describe key myths about palliative care research; (2) highlight substantive challenges of conducting palliative care research, using case illustrations; and (3) propose specific strategies to address some of these challenges. Myths about research in palliative care revolve around evidence hierarchies, sample heterogeneity, random assignment, participant burden, and measurement issues. Challenges arise because of the complex physical, psychological, existential, and spiritual problems faced by patients, families, and service providers. These challenges can be organized according to six general domains: patient, system/organization, context/setting, study design, research team, and ethics. A number of approaches for dealing with challenges in conducting research fall into five separate domains: study design, sampling, conceptual, statistical, and measures and outcomes. Although randomized controlled trials have their place whenever possible, alternative designs may offer more feasible research protocols that can be successfully implemented in palliative care. Therefore, this article highlights "outside the box" approaches that would benefit both clinicians and researchers in the palliative care field. Ultimately, the selection of research designs is dependent on a clearly articulated research question, which drives the research process. Copyright © 2014 American Academy of Hospice and Palliative Medicine. Published by Elsevier Inc. All rights reserved.
Digital simulation of an arbitrary stationary stochastic process by spectral representation.
Yura, Harold T; Hanson, Steen G
2011-04-01
In this paper we present a straightforward, efficient, and computationally fast method for creating a large number of discrete samples with an arbitrary given probability density function and a specified spectral content. The method relies on initially transforming a white noise sample set of random Gaussian distributed numbers into a corresponding set with the desired spectral distribution, after which this colored Gaussian probability distribution is transformed via an inverse transform into the desired probability distribution. In contrast to previous work, where the analyses were limited to auto regressive and or iterative techniques to obtain satisfactory results, we find that a single application of the inverse transform method yields satisfactory results for a wide class of arbitrary probability distributions. Although a single application of the inverse transform technique does not conserve the power spectra exactly, it yields highly accurate numerical results for a wide range of probability distributions and target power spectra that are sufficient for system simulation purposes and can thus be regarded as an accurate engineering approximation, which can be used for wide range of practical applications. A sufficiency condition is presented regarding the range of parameter values where a single application of the inverse transform method yields satisfactory agreement between the simulated and target power spectra, and a series of examples relevant for the optics community are presented and discussed. Outside this parameter range the agreement gracefully degrades but does not distort in shape. Although we demonstrate the method here focusing on stationary random processes, we see no reason why the method could not be extended to simulate non-stationary random processes. © 2011 Optical Society of America
Reiss, Neele; Warnecke, Irene; Tolgou, Theano; Krampen, Dorothea; Luka-Krausgrill, Ursula; Rohrmann, Sonja
2017-01-15
Test anxiety is a common condition in students, which may lead to impaired academic performance as well as to distress. The primary objective of this study was to evaluate the effectiveness of two cognitive-behavioral interventions designed to reduce test anxiety. Test anxiety in the participants was diagnosed as social or specific phobia according to DSM-IV. Subsequently subjects were randomized to three groups: a moderated self-help group, which served as a control group, and two treatment groups, where either relaxation techniques or imagery rescripting were applied. Students suffering from test anxiety were recruited at two German universities (n=180). The randomized controlled design comprised three groups which received test anxiety treatment in weekly three-hour sessions over a period of five weeks. Treatment outcome was assessed with a test anxiety questionnaire, which was administered before and after treatment, as well as in a six-month follow-up. A repeated-measures ANOVA for participants with complete data (n=59) revealed a significant reduction of test anxiety from baseline to six-month follow-up in all three treatment groups (p<.001). Participants were included if they had a clinical diagnosis of test anxiety. The sample may therefore represent only more severe forms of text anxiety . Moreover, the sample size in this study was small, the numbers of participants per group differed, and treatment results were based on self-report. Due to the length of the treatment, an implementation of the group treatments used in this study might not be feasible in all settings. Group treatments constitute an effective method of treating test anxiety, e.g. in university settings. Imagery rescripting may particularly contribute to treatment efficacy. Copyright © 2016 Elsevier B.V. All rights reserved.
Roets-Merken, Lieve M; Graff, Maud J L; Zuidema, Sytse U; Hermsen, Pieter G J M; Teerenstra, Steven; Kempen, Gertrudis I J M; Vernooij-Dassen, Myrra J F J
2013-10-07
Five to 25 percent of residents in aged care settings have a combined hearing and visual sensory impairment. Usual care is generally restricted to single sensory impairment, neglecting the consequences of dual sensory impairment on social participation and autonomy. The aim of this study is to evaluate the effectiveness of a self-management program for seniors who acquired dual sensory impairment at old age. In a cluster randomized, single-blind controlled trial, with aged care settings as the unit of randomization, the effectiveness of a self-management program will be compared to usual care. A minimum of 14 and maximum of 20 settings will be randomized to either the intervention cluster or the control cluster, aiming to include a total of 132 seniors with dual sensory impairment. Each senior will be linked to a licensed practical nurse working at the setting. During a five to six month intervention period, nurses at the intervention clusters will be trained in a self-management program to support and empower seniors to use self-management strategies. In two separate diaries, nurses keep track of the interviews with the seniors and their reflections on their own learning process. Nurses of the control clusters offer care as usual. At senior level, the primary outcome is the social participation of the seniors measured using the Hearing Handicap Questionnaire and the Activity Card Sort, and secondary outcomes are mood, autonomy and quality of life. At nurse level, the outcome is job satisfaction. Effectiveness will be evaluated using linear mixed model analysis. The results of this study will provide evidence for the effectiveness of the Self-Management Program for seniors with dual sensory impairment living in aged care settings. The findings are expected to contribute to the knowledge on the program's potential to enhance social participation and autonomy of the seniors, as well as increasing the job satisfaction of the licensed practical nurses. Furthermore, an extensive process evaluation will take place which will offer insight in the quality and feasibility of the sampling and intervention process. If it is shown to be effective and feasible, this Self-Management Program could be widely disseminated. ClinicalTrials.gov, NCT01217502.
2013-01-01
Background Five to 25 percent of residents in aged care settings have a combined hearing and visual sensory impairment. Usual care is generally restricted to single sensory impairment, neglecting the consequences of dual sensory impairment on social participation and autonomy. The aim of this study is to evaluate the effectiveness of a self-management program for seniors who acquired dual sensory impairment at old age. Methods/Design In a cluster randomized, single-blind controlled trial, with aged care settings as the unit of randomization, the effectiveness of a self-management program will be compared to usual care. A minimum of 14 and maximum of 20 settings will be randomized to either the intervention cluster or the control cluster, aiming to include a total of 132 seniors with dual sensory impairment. Each senior will be linked to a licensed practical nurse working at the setting. During a five to six month intervention period, nurses at the intervention clusters will be trained in a self-management program to support and empower seniors to use self-management strategies. In two separate diaries, nurses keep track of the interviews with the seniors and their reflections on their own learning process. Nurses of the control clusters offer care as usual. At senior level, the primary outcome is the social participation of the seniors measured using the Hearing Handicap Questionnaire and the Activity Card Sort, and secondary outcomes are mood, autonomy and quality of life. At nurse level, the outcome is job satisfaction. Effectiveness will be evaluated using linear mixed model analysis. Discussion The results of this study will provide evidence for the effectiveness of the Self-Management Program for seniors with dual sensory impairment living in aged care settings. The findings are expected to contribute to the knowledge on the program’s potential to enhance social participation and autonomy of the seniors, as well as increasing the job satisfaction of the licensed practical nurses. Furthermore, an extensive process evaluation will take place which will offer insight in the quality and feasibility of the sampling and intervention process. If it is shown to be effective and feasible, this Self-Management Program could be widely disseminated. Clinical trials registration ClinicalTrials.gov, NCT01217502. PMID:24099315
Marital assortment for genetic similarity.
Eckman, Ronael E; Williams, Robert; Nagoshi, Craig
2002-10-01
The present study involved analyses of a Caucasian American sample (n=949) and a Japanese American sample (n=400) for factors supporting Genetic Similarity Theory (GST). The analyses found no evidence for the presence of genetic similarity between spouses in either sample for the blood group analyses of nine loci. All results indicated random mating for blood group genes. The results did not provide consistent substantial support to show that spousal similarity is correlated with the degree of genetic component of a trait for a set of seventeen individual differences variables, with only the Caucasian sample yielding significant correlations for this analysis. A third analysis examining the correlation between presence of spousal genetic similarity and spousal similarity on observable traits was not performed because spousal genetic similarity was not observed in either sample. The overall implication of the study is that GST is not supported as an explanation for spousal similarity in humans.
What Randomized Benchmarking Actually Measures
Proctor, Timothy; Rudinger, Kenneth; Young, Kevin; ...
2017-09-28
Randomized benchmarking (RB) is widely used to measure an error rate of a set of quantum gates, by performing random circuits that would do nothing if the gates were perfect. In the limit of no finite-sampling error, the exponential decay rate of the observable survival probabilities, versus circuit length, yields a single error metric r. For Clifford gates with arbitrary small errors described by process matrices, r was believed to reliably correspond to the mean, over all Clifford gates, of the average gate infidelity between the imperfect gates and their ideal counterparts. We show that this quantity is not amore » well-defined property of a physical gate set. It depends on the representations used for the imperfect and ideal gates, and the variant typically computed in the literature can differ from r by orders of magnitude. We present new theories of the RB decay that are accurate for all small errors describable by process matrices, and show that the RB decay curve is a simple exponential for all such errors. Here, these theories allow explicit computation of the error rate that RB measures (r), but as far as we can tell it does not correspond to the infidelity of a physically allowed (completely positive) representation of the imperfect gates.« less
Ledford, Christy J W; Womack, Jasmyne J; Rider, Heather A; Seehusen, Angela B; Conner, Stephen J; Lauters, Rebecca A; Hodge, Joshua A
2018-06-01
As pregnant mothers increasingly engage in shared decision making regarding prenatal decisions, such as induction of labor, the patient's level of activation may influence pregnancy outcomes. One potential tool to increase patient activation in the clinical setting is mobile applications. However, research is limited in comparing mobile apps with other modalities of patient education and engagement tools. This study was designed to test the effectiveness of a mobile app as a replacement for a spiral notebook guide as a patient education and engagement tool in the prenatal clinical setting. This randomized controlled trial was conducted in the Women's Health Clinic and Family Health Clinic of three hospitals. Repeated-measures analysis of covariance was used to test intervention effects in the study sample of 205 patients. Mothers used a mobile app interface to more frequently record information about their pregnancy; however, across time, mothers using a mobile app reported a significant decrease in patient activation. The unexpected negative effects in the group of patients randomized to the mobile app prompt these authors to recommend that health systems pause before distributing their own version of mobile apps that may decrease patient activation. Mobile apps can be inherently empowering and engaging, but how a system encourages their use may ultimately determine their adoption and success.
Shah, Anoop D.; Bartlett, Jonathan W.; Carpenter, James; Nicholas, Owen; Hemingway, Harry
2014-01-01
Multivariate imputation by chained equations (MICE) is commonly used for imputing missing data in epidemiologic research. The “true” imputation model may contain nonlinearities which are not included in default imputation models. Random forest imputation is a machine learning technique which can accommodate nonlinearities and interactions and does not require a particular regression model to be specified. We compared parametric MICE with a random forest-based MICE algorithm in 2 simulation studies. The first study used 1,000 random samples of 2,000 persons drawn from the 10,128 stable angina patients in the CALIBER database (Cardiovascular Disease Research using Linked Bespoke Studies and Electronic Records; 2001–2010) with complete data on all covariates. Variables were artificially made “missing at random,” and the bias and efficiency of parameter estimates obtained using different imputation methods were compared. Both MICE methods produced unbiased estimates of (log) hazard ratios, but random forest was more efficient and produced narrower confidence intervals. The second study used simulated data in which the partially observed variable depended on the fully observed variables in a nonlinear way. Parameter estimates were less biased using random forest MICE, and confidence interval coverage was better. This suggests that random forest imputation may be useful for imputing complex epidemiologic data sets in which some patients have missing data. PMID:24589914
Shah, Anoop D; Bartlett, Jonathan W; Carpenter, James; Nicholas, Owen; Hemingway, Harry
2014-03-15
Multivariate imputation by chained equations (MICE) is commonly used for imputing missing data in epidemiologic research. The "true" imputation model may contain nonlinearities which are not included in default imputation models. Random forest imputation is a machine learning technique which can accommodate nonlinearities and interactions and does not require a particular regression model to be specified. We compared parametric MICE with a random forest-based MICE algorithm in 2 simulation studies. The first study used 1,000 random samples of 2,000 persons drawn from the 10,128 stable angina patients in the CALIBER database (Cardiovascular Disease Research using Linked Bespoke Studies and Electronic Records; 2001-2010) with complete data on all covariates. Variables were artificially made "missing at random," and the bias and efficiency of parameter estimates obtained using different imputation methods were compared. Both MICE methods produced unbiased estimates of (log) hazard ratios, but random forest was more efficient and produced narrower confidence intervals. The second study used simulated data in which the partially observed variable depended on the fully observed variables in a nonlinear way. Parameter estimates were less biased using random forest MICE, and confidence interval coverage was better. This suggests that random forest imputation may be useful for imputing complex epidemiologic data sets in which some patients have missing data.
Evaluating cost-efficiency and accuracy of hunter harvest survey designs
Lukacs, P.M.; Gude, J.A.; Russell, R.E.; Ackerman, B.B.
2011-01-01
Effective management of harvested wildlife often requires accurate estimates of the number of animals harvested annually by hunters. A variety of techniques exist to obtain harvest data, such as hunter surveys, check stations, mandatory reporting requirements, and voluntary reporting of harvest. Agencies responsible for managing harvested wildlife such as deer (Odocoileus spp.), elk (Cervus elaphus), and pronghorn (Antilocapra americana) are challenged with balancing the cost of data collection versus the value of the information obtained. We compared precision, bias, and relative cost of several common strategies, including hunter self-reporting and random sampling, for estimating hunter harvest using a realistic set of simulations. Self-reporting with a follow-up survey of hunters who did not report produces the best estimate of harvest in terms of precision and bias, but it is also, by far, the most expensive technique. Self-reporting with no followup survey risks very large bias in harvest estimates, and the cost increases with increased response rate. Probability-based sampling provides a substantial cost savings, though accuracy can be affected by nonresponse bias. We recommend stratified random sampling with a calibration estimator used to reweight the sample based on the proportions of hunters responding in each covariate category as the best option for balancing cost and accuracy. ?? 2011 The Wildlife Society.
Evaluating diagnosis-based case-mix measures: how well do they apply to the VA population?
Rosen, A K; Loveland, S; Anderson, J J; Rothendler, J A; Hankin, C S; Rakovski, C C; Moskowitz, M A; Berlowitz, D R
2001-07-01
Diagnosis-based case-mix measures are increasingly used for provider profiling, resource allocation, and capitation rate setting. Measures developed in one setting may not adequately capture the disease burden in other settings. To examine the feasibility of adapting two such measures, Adjusted Clinical Groups (ACGs) and Diagnostic Cost Groups (DCGs), to the Department of Veterans Affairs (VA) population. A 60% random sample of veterans who used health care services during FY 1997 was obtained from VA inpatient and outpatient administrative databases. A split-sample technique was used to obtain a 40% sample (n = 1,046,803) for development and a 20% sample (n = 524,461) for validation. Concurrent ACG and DCG risk adjustment models, using 1997 diagnoses and demographics to predict FY 1997 utilization (ambulatory provider encounters, and service days-the sum of a patient's inpatient and outpatient visit days), were fitted and cross-validated. Patients were classified into groupings that indicated a population with multiple psychiatric and medical diseases. Model R-squares explained between 6% and 32% of the variation in service utilization. Although reparameterized models did better in predicting utilization than models with external weights, none of the models was adequate in characterizing the entire population. For predicting service days, DCGs were superior to ACGs in most categories, whereas ACGs did better at discriminating among veterans who had the lowest utilization. Although "off-the-shelf" case-mix measures perform moderately well when applied to another setting, modifications may be required to accurately characterize a population's disease burden with respect to the resource needs of all patients.
Exactly solvable random graph ensemble with extensively many short cycles
NASA Astrophysics Data System (ADS)
Aguirre López, Fabián; Barucca, Paolo; Fekom, Mathilde; Coolen, Anthony C. C.
2018-02-01
We introduce and analyse ensembles of 2-regular random graphs with a tuneable distribution of short cycles. The phenomenology of these graphs depends critically on the scaling of the ensembles’ control parameters relative to the number of nodes. A phase diagram is presented, showing a second order phase transition from a connected to a disconnected phase. We study both the canonical formulation, where the size is large but fixed, and the grand canonical formulation, where the size is sampled from a discrete distribution, and show their equivalence in the thermodynamical limit. We also compute analytically the spectral density, which consists of a discrete set of isolated eigenvalues, representing short cycles, and a continuous part, representing cycles of diverging size.
Random forests-based differential analysis of gene sets for gene expression data.
Hsueh, Huey-Miin; Zhou, Da-Wei; Tsai, Chen-An
2013-04-10
In DNA microarray studies, gene-set analysis (GSA) has become the focus of gene expression data analysis. GSA utilizes the gene expression profiles of functionally related gene sets in Gene Ontology (GO) categories or priori-defined biological classes to assess the significance of gene sets associated with clinical outcomes or phenotypes. Many statistical approaches have been proposed to determine whether such functionally related gene sets express differentially (enrichment and/or deletion) in variations of phenotypes. However, little attention has been given to the discriminatory power of gene sets and classification of patients. In this study, we propose a method of gene set analysis, in which gene sets are used to develop classifications of patients based on the Random Forest (RF) algorithm. The corresponding empirical p-value of an observed out-of-bag (OOB) error rate of the classifier is introduced to identify differentially expressed gene sets using an adequate resampling method. In addition, we discuss the impacts and correlations of genes within each gene set based on the measures of variable importance in the RF algorithm. Significant classifications are reported and visualized together with the underlying gene sets and their contribution to the phenotypes of interest. Numerical studies using both synthesized data and a series of publicly available gene expression data sets are conducted to evaluate the performance of the proposed methods. Compared with other hypothesis testing approaches, our proposed methods are reliable and successful in identifying enriched gene sets and in discovering the contributions of genes within a gene set. The classification results of identified gene sets can provide an valuable alternative to gene set testing to reveal the unknown, biologically relevant classes of samples or patients. In summary, our proposed method allows one to simultaneously assess the discriminatory ability of gene sets and the importance of genes for interpretation of data in complex biological systems. The classifications of biologically defined gene sets can reveal the underlying interactions of gene sets associated with the phenotypes, and provide an insightful complement to conventional gene set analyses. Copyright © 2012 Elsevier B.V. All rights reserved.
Geng, Elvin H; Odeny, Thomas A; Lyamuya, Rita E; Nakiwogga-Muwanga, Alice; Diero, Lameck; Bwana, Mwebesa; Muyindike, Winnie; Braitstein, Paula; Somi, Geoffrey R; Kambugu, Andrew; Bukusi, Elizabeth A; Wenger, Megan; Wools-Kaloustian, Kara K; Glidden, David V; Yiannoutsos, Constantin T; Martin, Jeffrey N
2015-03-01
Mortality in HIV-infected people after initiation of antiretroviral treatment (ART) in resource-limited settings is an important measure of the effectiveness and comparative effectiveness of the global public health response. Substantial loss to follow-up precludes accurate accounting of deaths and limits our understanding of effectiveness. We aimed to provide a better understanding of mortality at scale and, by extension, the effectiveness and comparative effectiveness of public health ART treatment in east Africa. In 14 clinics in five settings in Kenya, Uganda, and Tanzania, we intensively traced a sample of patients randomly selected using a random number generator, who were infected with HIV and on ART and who were lost to follow-up (>90 days late for last scheduled visit). We incorporated the vital status outcomes for these patients into analyses of the entire clinic population through probability-weighted survival analyses. We followed 34 277 adults on ART from Mbarara and Kampala in Uganda, Eldoret, and Kisumu in Kenya, and Morogoro in Tanzania. The median age was 35 years (IQR 30-42), 11 628 (34%) were men, and median CD4 count count before therapy was 154 cells per μL (IQR 70-234). 5780 patients (17%) were lost to follow-up, 991 (17%) were selected for tracing between June 10, 2011, and Aug 27, 2012, and vital status was ascertained for 860 (87%). With incorporation of outcomes from the patients lost to follow-up, estimated 3 year mortality increased from 3·9% (95% CI 3·6-4·2) to 12·5% (11·8-13·3). The sample-corrected, unadjusted 3 year mortality across settings was lowest in Mbarara (7·2%) and highest in Morogoro (23·6%). After adjustment for age, sex, CD4 count before therapy, and WHO stage, the sample-corrected hazard ratio comparing the settings with highest and lowest mortalities was 2·2 (95% CI 1·5-3·4) and the risk difference for death at 3 years was 11% (95% CI 5·0-17·7). A sampling-based approach is widely feasible and important to an understanding of mortality after initiation of ART. After adjustment for measured biological drivers, mortality differs substantially across settings despite delivery of a similar clinical package of treatment. Implementation research to understand the systems, community, and patients' behaviours driving these differences is urgently needed. The US National Institutes of Health and President's Emergency Fund for AIDS Relief. Copyright © 2015 Elsevier Ltd. All rights reserved.
Kulongoski, Justin T.; Belitz, Kenneth; Dawson, Barbara J.
2006-01-01
Ground-water samples were analyzed for major and minor ions, trace elements, nutrients, volatile organic compounds, pesticides and pesticide degradates, waste-water indicators, dissolved methane, nitrogen, carbon dioxide and noble gases (in collaboration with Lawrence Livermore National Laboratory). Naturally occurring isotopes (tritium, carbon-14, oxygen-18, deuterium and helium-4) also were measured in the samples to help identify the source and age of the ground water. Results show that no anthropogenic constituents were detected at concentrations higher than those levels set for regulatory purposes, and relatively few naturally-occurring constituents were detected at concentrations greater than regulatory levels. In this study, 21 of the 88 volatile organic compounds (VOCs) and gasoline additives and (or) oxygenates investigated were detected in ground-water samples, however, detected concentrations were one-half to one-forty-thousandth the maximum contaminant levels (MCL). Thirty-two percent of the randomized wells sampled had at least a single detection of a VOC or gasoline additive and (or) oxygenate. The most frequently detected compounds were chloroform, found in 12 of the 84 randomized wells; carbon disulfide, found in 8 of the 84 randomized wells; and toluene, found in 4 of the 84 randomized wells. Trihalomethanes were the most frequently detected class of VOCs. Nine of the 122 pesticides and (or) pesticide degradates investigated were detected in ground-water samples, however, concentrations were one-seventieth to one-eight-hundredth the MCLs. Seventeen percent of the randomized wells sampled had at least a single detection of pesticide and pesticide degradate. Herbicides were the most frequently detected class of pesticides. The most frequently detected compound was simazine, found in 8 of the 84 of the randomized wells. Chlordiamino-s-triazine and deisopropyl atrazine were both found in 2 of the 84 randomized wells sampled. Thirteen out of 63 compounds that may be indicative of the prescence of waste-water were detected in ground-water samples. Twenty-six percent of the randomized wells sampled for waste-water indicators had at least one detection. Isophorone was the most frequently detected in 6 of the 84 randomized wells. Bisphenol-A, caffeine, and indole each were detected in 3 of the 84 randomized wells. Major and minor ions and dissolved solids (DS) samples were collected at 33 public-supply wells; 3 samples had DS concentrations above the secondary maximum contaminant level (SMCL) of 500 mg/L. Ground-water samples from 32 public-supply wells were analyzed for trace elements. Arsenic concentrations above the MCL of 10 μg/L were measured at 4 public-supply wells, boron concentrations above the detection level for the purpose of reporting (DLR) of 100 μg/L were measured at 19 wells. Iron concentrations above the SMCL of 300 μg/L were measured at 7 wells, a lead concentration above the California notification level (NL) of 15 μg/L at one well, and manganese concentrations above the SMCL of 50 μg/L were measured at 17 wells. Vanadium concentrations above the DLR of 3 μg/L were measured at 9 public-supply wells; and chromium(VI) concentrations above the DLR of 1 μg/L were measured at 48 public-supply wells. Major and minor ions and dissolved solids (DS) samples were collected at 33 public-supply wells; 3 samples had DS concentrations above the secondary maximum contaminant level (SMCL) of 500 mg/L. Ground-water samples from 32 public-supply wells were analyzed for trace elements. Arsenic concentrations above the MCL of 10 μg/L were measured at 4 public-supply wells, boron concentrations above the detection level for the purpose of reporting (DLR) of 100 μg/L were measured at 19 wells. Iron concentrations above the SMCL of 300 μg/L were measured at 7 wells, a lead concentration above the California notification level (NL) of 15 μg/L at one well, and manganese concentrations above the SMCL of 50 μg/L were measured at 17 wells. Vanadium concentrations above the DLR of 3 μg/L were measured at 9 public-supply wells; and chromium(VI) concentrations above the DLR of 1 μg/L were measured at 48 public-supply wells. Microbial constituents were analyzed in 22 ground-water samples. Total coliform was detected in three wells. Counts ranged from 2 colonies per 100 mL to 20 colonies per 100 mL. MCLs for microbial constituents are based on reoccurring detection, and will be monitored during future sampling.
Sampling Of SAR Imagery For Wind Resource Assesment
NASA Astrophysics Data System (ADS)
Badger, Merete; Badger, Jake; Hasager, Charlotte; Nielsen, Morten
2010-04-01
Wind resources over the sea can be assessed from a series of wind fields retrieved from Envisat ASAR imagery, or other SAR data. Previous wind resource maps have been produced through random sampling of 70 or more satellite scenes over a given area of interest followed by fitting of a Weibull function to the data. Here we introduce a more advanced sampling strategy based on wind class methodology that is normally applied in Risø DTU’s numerical modeling of wind resources. The aim is to obtain a more representative data set using fewer satellite SAR scenes. The new sampling strategy has been applied within a wind and solar resource assessment study for the United Arab Emirates (UAE) and also for wind resource mapping over a domain in the North Sea, as part of the EU- NORSEWInD project (2008-2012).
Díaz-Zabala, Héctor J; Nieves-Colón, María A; Martínez-Cruzado, Juan C
2017-04-01
Maternal lineages of West Eurasian and North African origin account for 11.5% of total mitochondrial ancestry in Puerto Rico. Historical sources suggest that this ancestry arrived mostly from European migrations that took place during the four centuries of the Spanish colonization of Puerto Rico. This study analyzed 101 mitochondrial control region sequences and diagnostic coding region variants from a sample set randomly and systematically selected using a census-based sampling frame to be representative of the Puerto Rican population, with the goal of defining West Eurasian-North African maternal clades and estimating their possible geographical origin. Median-joining haplotype networks were constructed using hypervariable regions 1 and 2 sequences from various reference populations in search of shared haplotypes. A posterior probability analysis was performed to estimate the percentage of possible origins across wide geographic regions for the entire sample set and for the most common haplogroups on the island. Principal component analyses were conducted to place the Puerto Rican mtDNA set within the variation present among all reference populations. Our study shows that up to 38% of West Eurasian and North African mitochondrial ancestry in Puerto Rico most likely migrated from the Canary Islands. However, most of those haplotypes had previously migrated to the Canary Islands from elsewhere, and there are substantial contributions from various populations across the circum-Mediterranean region and from West African populations related to the modern Wolof and Serer peoples from Senegal and the nomad Fulani who extend up to Cameroon. In conclusion, the West Eurasian mitochondrial ancestry in Puerto Ricans is geographically diverse. However, haplotype diversity seems to be low, and frequencies have been shaped by population bottlenecks, migration waves, and random genetic drift. Consequently, approximately 47% of mtDNAs of West Eurasian and North African ancestry in Puerto Rico probably arrived early in its colonial history.
Goesling, Brian; Colman, Silvie; Trenholm, Christopher; Terzian, Mary; Moore, Kristin
2014-05-01
This systematic review provides a comprehensive, updated assessment of programs with evidence of effectiveness in reducing teen pregnancy, sexually transmitted infections (STIs), or associated sexual risk behaviors. The review was conducted in four steps. First, multiple literature search strategies were used to identify relevant studies released from 1989 through January 2011. Second, identified studies were screened against prespecified eligibility criteria. Third, studies were assessed by teams of two trained reviewers for the quality and execution of their research designs. Fourth, for studies that passed the quality assessment, the review team extracted and analyzed information on the research design, study sample, evaluation setting, and program impacts. A total of 88 studies met the review criteria for study quality and were included in the data extraction and analysis. The studies examined a range of programs delivered in diverse settings. Most studies had mixed-gender and predominately African-American research samples (70% and 51%, respectively). Randomized controlled trials accounted for the large majority (87%) of included studies. Most studies (76%) included multiple follow-ups, with sample sizes ranging from 62 to 5,244. Analysis of the study impact findings identified 31 programs with evidence of effectiveness. Research conducted since the late 1980s has identified more than two dozen teen pregnancy and STI prevention programs with evidence of effectiveness. Key strengths of this research are the large number of randomized controlled trials, the common use of multiple follow-up periods, and attention to a broad range of programs delivered in diverse settings. Two main gaps are a lack of replication studies and the need for more research on Latino youth and other high-risk populations. In addressing these gaps, researchers must overcome common limitations in study design, analysis, and reporting that have negatively affected prior research. Copyright © 2014 Society for Adolescent Health and Medicine. All rights reserved.
Honest Importance Sampling with Multiple Markov Chains
Tan, Aixin; Doss, Hani; Hobert, James P.
2017-01-01
Importance sampling is a classical Monte Carlo technique in which a random sample from one probability density, π1, is used to estimate an expectation with respect to another, π. The importance sampling estimator is strongly consistent and, as long as two simple moment conditions are satisfied, it obeys a central limit theorem (CLT). Moreover, there is a simple consistent estimator for the asymptotic variance in the CLT, which makes for routine computation of standard errors. Importance sampling can also be used in the Markov chain Monte Carlo (MCMC) context. Indeed, if the random sample from π1 is replaced by a Harris ergodic Markov chain with invariant density π1, then the resulting estimator remains strongly consistent. There is a price to be paid however, as the computation of standard errors becomes more complicated. First, the two simple moment conditions that guarantee a CLT in the iid case are not enough in the MCMC context. Second, even when a CLT does hold, the asymptotic variance has a complex form and is difficult to estimate consistently. In this paper, we explain how to use regenerative simulation to overcome these problems. Actually, we consider a more general set up, where we assume that Markov chain samples from several probability densities, π1, …, πk, are available. We construct multiple-chain importance sampling estimators for which we obtain a CLT based on regeneration. We show that if the Markov chains converge to their respective target distributions at a geometric rate, then under moment conditions similar to those required in the iid case, the MCMC-based importance sampling estimator obeys a CLT. Furthermore, because the CLT is based on a regenerative process, there is a simple consistent estimator of the asymptotic variance. We illustrate the method with two applications in Bayesian sensitivity analysis. The first concerns one-way random effects models under different priors. The second involves Bayesian variable selection in linear regression, and for this application, importance sampling based on multiple chains enables an empirical Bayes approach to variable selection. PMID:28701855
Honest Importance Sampling with Multiple Markov Chains.
Tan, Aixin; Doss, Hani; Hobert, James P
2015-01-01
Importance sampling is a classical Monte Carlo technique in which a random sample from one probability density, π 1 , is used to estimate an expectation with respect to another, π . The importance sampling estimator is strongly consistent and, as long as two simple moment conditions are satisfied, it obeys a central limit theorem (CLT). Moreover, there is a simple consistent estimator for the asymptotic variance in the CLT, which makes for routine computation of standard errors. Importance sampling can also be used in the Markov chain Monte Carlo (MCMC) context. Indeed, if the random sample from π 1 is replaced by a Harris ergodic Markov chain with invariant density π 1 , then the resulting estimator remains strongly consistent. There is a price to be paid however, as the computation of standard errors becomes more complicated. First, the two simple moment conditions that guarantee a CLT in the iid case are not enough in the MCMC context. Second, even when a CLT does hold, the asymptotic variance has a complex form and is difficult to estimate consistently. In this paper, we explain how to use regenerative simulation to overcome these problems. Actually, we consider a more general set up, where we assume that Markov chain samples from several probability densities, π 1 , …, π k , are available. We construct multiple-chain importance sampling estimators for which we obtain a CLT based on regeneration. We show that if the Markov chains converge to their respective target distributions at a geometric rate, then under moment conditions similar to those required in the iid case, the MCMC-based importance sampling estimator obeys a CLT. Furthermore, because the CLT is based on a regenerative process, there is a simple consistent estimator of the asymptotic variance. We illustrate the method with two applications in Bayesian sensitivity analysis. The first concerns one-way random effects models under different priors. The second involves Bayesian variable selection in linear regression, and for this application, importance sampling based on multiple chains enables an empirical Bayes approach to variable selection.
NASA Astrophysics Data System (ADS)
Zhang, Q.; Ball, W. P.
2016-12-01
Regression-based approaches are often employed to estimate riverine constituent concentrations and fluxes based on typically sparse concentration observations. One such approach is the WRTDS ("Weighted Regressions on Time, Discharge, and Season") method, which has been shown to provide more accurate estimates than prior approaches. Centered on WRTDS, this work was aimed at developing improved models for constituent concentration and flux estimation by accounting for antecedent discharge conditions. Twelve modified models were developed and tested, each of which contains one additional variable to represent antecedent conditions. High-resolution ( daily) data at nine monitoring sites were used to evaluate the relative merits of the models for estimation of six constituents - chloride (Cl), nitrate-plus-nitrite (NOx), total Kjeldahl nitrogen (TKN), total phosphorus (TP), soluble reactive phosphorus (SRP), and suspended sediment (SS). For each site-constituent combination, 30 concentration subsets were generated from the original data through Monte Carlo sub-sampling and then used to evaluate model performance. For the sub-sampling, three sampling strategies were adopted: (A) 1 random sample each month (12/year), (B) 12 random monthly samples plus additional 8 random samples per year (20/year), and (C) 12 regular (non-storm) and 8 storm samples per year (20/year). The modified models show general improvement over the original model under all three sampling strategies. Major improvements were achieved for NOx by the long-term flow-anomaly model and for Cl by the ADF (average discounted flow) model and the short-term flow-anomaly model. Moderate improvements were achieved for SS, TP, and TKN by the ADF model. By contrast, no such achievement was achieved for SRP by any proposed model. In terms of sampling strategy, performance of all models was generally best using strategy C and worst using strategy A, and especially so for SS, TP, and SRP, confirming the value of routinely collecting storm-flow samples. Overall, this work provides a comprehensive set of statistical evidence for supporting the incorporation of antecedent discharge conditions into WRTDS for constituent concentration and flux estimation, thereby combining the advantages of two recent developments in water quality modeling.
Discrepancy-based error estimates for Quasi-Monte Carlo III. Error distributions and central limits
NASA Astrophysics Data System (ADS)
Hoogland, Jiri; Kleiss, Ronald
1997-04-01
In Quasi-Monte Carlo integration, the integration error is believed to be generally smaller than in classical Monte Carlo with the same number of integration points. Using an appropriate definition of an ensemble of quasi-random point sets, we derive various results on the probability distribution of the integration error, which can be compared to the standard Central Limit Theorem for normal stochastic sampling. In many cases, a Gaussian error distribution is obtained.
Use of LANDSAT imagery for wildlife habitat mapping in northeast and east central Alaska
NASA Technical Reports Server (NTRS)
Lent, P. C. (Principal Investigator)
1975-01-01
The author has identified the following significant results. Two scenes were analyzed by applying an iterative cluster analysis to a 2% random data sample and then using the resulting clusters as a training set basis for maximum likelihood classification. Twenty-six and twenty-seven categorical classes, respectively resulted from this process. The majority of classes in each case were quite specific vegetation types; each of these types has specific value as moose habitat.
Schwartz, Seth J; Benet-Martínez, Verónica; Knight, George P; Unger, Jennifer B; Zamboanga, Byron L; Des Rosiers, Sabrina E; Stephens, Dionne P; Huang, Shi; Szapocznik, José
2014-03-01
The present study used a randomized design, with fully bilingual Hispanic participants from the Miami area, to investigate 2 sets of research questions. First, we sought to ascertain the extent to which measures of acculturation (Hispanic and U.S. practices, values, and identifications) satisfied criteria for linguistic measurement equivalence. Second, we sought to examine whether cultural frame switching would emerge--that is, whether latent acculturation mean scores for U.S. acculturation would be higher among participants randomized to complete measures in English and whether latent acculturation mean scores for Hispanic acculturation would be higher among participants randomized to complete measures in Spanish. A sample of 722 Hispanic students from a Hispanic-serving university participated in the study. Participants were first asked to complete translation tasks to verify that they were fully bilingual. Based on ratings from 2 independent coders, 574 participants (79.5% of the sample) qualified as fully bilingual and were randomized to complete the acculturation measures in either English or Spanish. Theoretically relevant criterion measures--self-esteem, depressive symptoms, and personal identity--were also administered in the randomized language. Measurement equivalence analyses indicated that all of the acculturation measures--Hispanic and U.S. practices, values, and identifications-met criteria for configural, weak/metric, strong/scalar, and convergent validity equivalence. These findings indicate that data generated using acculturation measures can, at least under some conditions, be combined or compared across languages of administration. Few latent mean differences emerged. These results are discussed in terms of the measurement of acculturation in linguistically diverse populations. 2014 APA
Schwartz, Seth J.; Benet-Martínez, Verónica; Knight, George P.; Unger, Jennifer B.; Zamboanga, Byron L.; Des Rosiers, Sabrina E.; Stephens, Dionne; Huang, Shi; Szapocznik, José
2014-01-01
The present study used a randomized design, with fully bilingual Hispanic participants from the Miami area, to investigate two sets of research questions. First, we sought to ascertain the extent to which measures of acculturation (heritage and U.S. practices, values, and identifications) satisfied criteria for linguistic measurement equivalence. Second, we sought to examine whether cultural frame switching would emerge – that is, whether latent acculturation mean scores for U.S. acculturation would be higher among participants randomized to complete measures in English, and whether latent acculturation mean scores for Hispanic acculturation would be higher among participants randomized to complete measures in Spanish. A sample of 722 Hispanic students from a Hispanic-serving university participated in the study. Participants were first asked to complete translation tasks to verify that they were fully bilingual. Based on ratings from two independent coders, 574 participants (79.5% of the sample) qualified as fully bilingual and were randomized to complete the acculturation measures in either English or Spanish. Theoretically relevant criterion measures – self-esteem, depressive symptoms, and personal identity – were also administered in the randomized language. Measurement equivalence analyses indicated that all of the acculturation measures – Hispanic and U.S. practices, values, and identifications – met criteria for configural, weak/metric, strong/scalar, and convergent validity equivalence. These findings indicate that data generated using acculturation measures can, at least under some conditions, be combined or compared across languages of administration. Few latent mean differences emerged. These results are discussed in terms of the measurement of acculturation in linguistically diverse populations. PMID:24188146
The topomer-sampling model of protein folding
Debe, Derek A.; Carlson, Matt J.; Goddard, William A.
1999-01-01
Clearly, a protein cannot sample all of its conformations (e.g., ≈3100 ≈ 1048 for a 100 residue protein) on an in vivo folding timescale (<1 s). To investigate how the conformational dynamics of a protein can accommodate subsecond folding time scales, we introduce the concept of the native topomer, which is the set of all structures similar to the native structure (obtainable from the native structure through local backbone coordinate transformations that do not disrupt the covalent bonding of the peptide backbone). We have developed a computational procedure for estimating the number of distinct topomers required to span all conformations (compact and semicompact) for a polypeptide of a given length. For 100 residues, we find ≈3 × 107 distinct topomers. Based on the distance calculated between different topomers, we estimate that a 100-residue polypeptide diffusively samples one topomer every ≈3 ns. Hence, a 100-residue protein can find its native topomer by random sampling in just ≈100 ms. These results suggest that subsecond folding of modest-sized, single-domain proteins can be accomplished by a two-stage process of (i) topomer diffusion: random, diffusive sampling of the 3 × 107 distinct topomers to find the native topomer (≈0.1 s), followed by (ii) intratopomer ordering: nonrandom, local conformational rearrangements within the native topomer to settle into the precise native state. PMID:10077555
Yu, Wenxi; Liu, Yang; Ma, Zongwei; Bi, Jun
2017-08-01
Using satellite-based aerosol optical depth (AOD) measurements and statistical models to estimate ground-level PM 2.5 is a promising way to fill the areas that are not covered by ground PM 2.5 monitors. The statistical models used in previous studies are primarily Linear Mixed Effects (LME) and Geographically Weighted Regression (GWR) models. In this study, we developed a new regression model between PM 2.5 and AOD using Gaussian processes in a Bayesian hierarchical setting. Gaussian processes model the stochastic nature of the spatial random effects, where the mean surface and the covariance function is specified. The spatial stochastic process is incorporated under the Bayesian hierarchical framework to explain the variation of PM 2.5 concentrations together with other factors, such as AOD, spatial and non-spatial random effects. We evaluate the results of our model and compare them with those of other, conventional statistical models (GWR and LME) by within-sample model fitting and out-of-sample validation (cross validation, CV). The results show that our model possesses a CV result (R 2 = 0.81) that reflects higher accuracy than that of GWR and LME (0.74 and 0.48, respectively). Our results indicate that Gaussian process models have the potential to improve the accuracy of satellite-based PM 2.5 estimates.
Microfracture spacing distributions and the evolution of fracture patterns in sandstones
NASA Astrophysics Data System (ADS)
Hooker, J. N.; Laubach, S. E.; Marrett, R.
2018-03-01
Natural fracture patterns in sandstone were sampled using scanning electron microscope-based cathodoluminescence (SEM-CL) imaging. All fractures are opening-mode and are fully or partially sealed by quartz cement. Most sampled fractures are too small to be height-restricted by sedimentary layers. At very low strains (<∼0.001), fracture spatial distributions are indistinguishable from random, whereas at higher strains, fractures are generally statistically clustered. All 12 large (N > 100) datasets show spacings that are best fit by log-normal size distributions, compared to exponential, power law, or normal distributions. The clustering of fractures suggests that the locations of natural factures are not determined by a random process. To investigate natural fracture localization, we reconstructed the opening history of a cluster of fractures within the Huizachal Group in northeastern Mexico, using fluid inclusions from synkinematic cements and thermal-history constraints. The largest fracture, which is the only fracture in the cluster visible to the naked eye, among 101 present, opened relatively late in the sequence. This result suggests that the growth of sets of fractures is a self-organized process, in which small, initially isolated fractures grow and progressively interact, with preferential growth of a subset of fractures developing at the expense of growth of the rest. Size-dependent sealing of fractures within sets suggests that synkinematic cementation may contribute to fracture clustering.
Data splitting for artificial neural networks using SOM-based stratified sampling.
May, R J; Maier, H R; Dandy, G C
2010-03-01
Data splitting is an important consideration during artificial neural network (ANN) development where hold-out cross-validation is commonly employed to ensure generalization. Even for a moderate sample size, the sampling methodology used for data splitting can have a significant effect on the quality of the subsets used for training, testing and validating an ANN. Poor data splitting can result in inaccurate and highly variable model performance; however, the choice of sampling methodology is rarely given due consideration by ANN modellers. Increased confidence in the sampling is of paramount importance, since the hold-out sampling is generally performed only once during ANN development. This paper considers the variability in the quality of subsets that are obtained using different data splitting approaches. A novel approach to stratified sampling, based on Neyman sampling of the self-organizing map (SOM), is developed, with several guidelines identified for setting the SOM size and sample allocation in order to minimize the bias and variance in the datasets. Using an example ANN function approximation task, the SOM-based approach is evaluated in comparison to random sampling, DUPLEX, systematic stratified sampling, and trial-and-error sampling to minimize the statistical differences between data sets. Of these approaches, DUPLEX is found to provide benchmark performance with good model performance, with no variability. The results show that the SOM-based approach also reliably generates high-quality samples and can therefore be used with greater confidence than other approaches, especially in the case of non-uniform datasets, with the benefit of scalability to perform data splitting on large datasets. Copyright 2009 Elsevier Ltd. All rights reserved.
A compressed sensing X-ray camera with a multilayer architecture
Wang, Zhehui; Laroshenko, O.; Li, S.; ...
2018-01-25
Recent advances in compressed sensing theory and algorithms offer new possibilities for high-speed X-ray camera design. In many CMOS cameras, each pixel has an independent on-board circuit that includes an amplifier, noise rejection, signal shaper, an analog-to-digital converter (ADC), and optional in-pixel storage. When X-ray images are sparse, i.e., when one of the following cases is true: (a.) The number of pixels with true X-ray hits is much smaller than the total number of pixels; (b.) The X-ray information is redundant; or (c.) Some prior knowledge about the X-ray images exists, sparse sampling may be allowed. In this work, wemore » first illustrate the feasibility of random on-board pixel sampling (ROPS) using an existing set of X-ray images, followed by a discussion about signal to noise as a function of pixel size. Next, we describe a possible circuit architecture to achieve random pixel access and in-pixel storage. The combination of a multilayer architecture, sparse on-chip sampling, and computational image techniques, is expected to facilitate the development and applications of high-speed X-ray camera technology.« less
Seven common mistakes in population genetics and how to avoid them.
Meirmans, Patrick G
2015-07-01
As the data resulting from modern genotyping tools are astoundingly complex, genotyping studies require great care in the sampling design, genotyping, data analysis and interpretation. Such care is necessary because, with data sets containing thousands of loci, small biases can easily become strongly significant patterns. Such biases may already be present in routine tasks that are present in almost every genotyping study. Here, I discuss seven common mistakes that can be frequently encountered in the genotyping literature: (i) giving more attention to genotyping than to sampling, (ii) failing to perform or report experimental randomization in the laboratory, (iii) equating geopolitical borders with biological borders, (iv) testing significance of clustering output, (v) misinterpreting Mantel's r statistic, (vi) only interpreting a single value of k and (vii) forgetting that only a small portion of the genome will be associated with climate. For every of those issues, I give some suggestions how to avoid the mistake. Overall, I argue that genotyping studies would benefit from establishing a more rigorous experimental design, involving proper sampling design, randomization and better distinction of a priori hypotheses and exploratory analyses. © 2015 John Wiley & Sons Ltd.
NASA Astrophysics Data System (ADS)
Glazner, Allen F.; Sadler, Peter M.
2016-12-01
The duration of a geologic interval, such as the time over which a given volume of magma accumulated to form a pluton, or the lifespan of a large igneous province, is commonly determined from a relatively small number of geochronologic determinations (e.g., 4-10) within that interval. Such sample sets can underestimate the true length of the interval by a significant amount. For example, the average interval determined from a sample of size n = 5, drawn from a uniform random distribution, will underestimate the true interval by 50%. Even for n = 10, the average sample only captures ˜80% of the interval. If the underlying distribution is known then a correction factor can be determined from theory or Monte Carlo analysis; for a uniform random distribution, this factor is
A compressed sensing X-ray camera with a multilayer architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Zhehui; Laroshenko, O.; Li, S.
Recent advances in compressed sensing theory and algorithms offer new possibilities for high-speed X-ray camera design. In many CMOS cameras, each pixel has an independent on-board circuit that includes an amplifier, noise rejection, signal shaper, an analog-to-digital converter (ADC), and optional in-pixel storage. When X-ray images are sparse, i.e., when one of the following cases is true: (a.) The number of pixels with true X-ray hits is much smaller than the total number of pixels; (b.) The X-ray information is redundant; or (c.) Some prior knowledge about the X-ray images exists, sparse sampling may be allowed. In this work, wemore » first illustrate the feasibility of random on-board pixel sampling (ROPS) using an existing set of X-ray images, followed by a discussion about signal to noise as a function of pixel size. Next, we describe a possible circuit architecture to achieve random pixel access and in-pixel storage. The combination of a multilayer architecture, sparse on-chip sampling, and computational image techniques, is expected to facilitate the development and applications of high-speed X-ray camera technology.« less
Lohiya, Ayush; Kapil, Arti; Gupta, Sanjeev Kumar; Misra, Puneet; Rai, Sanjay K.
2015-01-01
Background Despite world-wide evidence of increased antibiotic resistance, there is scarce data on antibiotic resistance in community settings. One of the reason being difficulty in collection of biological specimen (traditionally stool) in community from apparently healthy individuals. Hence, finding an alternative specimen that is easier to obtain in a community setting or in large scale surveys for the purpose, is crucial. We conducted this study to explore the feasibility of using urine samples for deriving community based estimates of antibiotic resistance and to estimate the magnitude of resistance among urinary isolates of Escherichia coli and Klebsiella pneumonia against multiple antibiotics in apparently healthy individuals residing in a rural community of Haryana, North India. Materials and Methods Eligible individuals were apparently healthy, aged 18 years or older. Using the health management information system (HMIS) of Ballabgarh Health Demographic Surveillance System (HDSS), sampling frame was prepared. Potential individuals were identified using simple random sampling. Random urine sample was collected in a sterile container and transported to laboratory under ambient condition. Species identification and antibiotic susceptibility testing for Enterobacteriaceae was done using Clinical Laboratory and Standards Institute (CLSI) 2012 guidelines. Multi-drug resistant (MDR) Enterobacteriaceae, Extended Spectrum Beta Lactamase (ESBL) producing Enterobacteriaceae, and Carbapenem producing Enterobacteriaceae (CRE) were identified from the urine samples. Results A total of 433 individuals participated in the study (non-response rate – 13.4%), out of which 58 (13.4%) were positive for Enterobacteriaceae, 8.1% for E. coli and 5.3% for K. pneumoniae. Resistance against penicillin (amoxicillin/ampicillin) for E. coli and K. pneumoniae was 62.8% and 100.0% respectively. Isolates resistant to co-trimoxazole were 5.7% and 0.0% respectively. None of the isolates were resistant to imipenem, and meropenem. Conclusion and recommendations It is feasible to use urine sample to study magnitude of antibiotic resistance in population based surveys. At community level, resistance to amoxicillin was considerable, negligible for co-trimoxazole, and to higher antibiotics including carbapenems. PMID:26393150
Network Sampling and Classification:An Investigation of Network Model Representations
Airoldi, Edoardo M.; Bai, Xue; Carley, Kathleen M.
2011-01-01
Methods for generating a random sample of networks with desired properties are important tools for the analysis of social, biological, and information networks. Algorithm-based approaches to sampling networks have received a great deal of attention in recent literature. Most of these algorithms are based on simple intuitions that associate the full features of connectivity patterns with specific values of only one or two network metrics. Substantive conclusions are crucially dependent on this association holding true. However, the extent to which this simple intuition holds true is not yet known. In this paper, we examine the association between the connectivity patterns that a network sampling algorithm aims to generate and the connectivity patterns of the generated networks, measured by an existing set of popular network metrics. We find that different network sampling algorithms can yield networks with similar connectivity patterns. We also find that the alternative algorithms for the same connectivity pattern can yield networks with different connectivity patterns. We argue that conclusions based on simulated network studies must focus on the full features of the connectivity patterns of a network instead of on the limited set of network metrics for a specific network type. This fact has important implications for network data analysis: for instance, implications related to the way significance is currently assessed. PMID:21666773
Design of Phase II Non-inferiority Trials.
Jung, Sin-Ho
2017-09-01
With the development of inexpensive treatment regimens and less invasive surgical procedures, we are confronted with non-inferiority study objectives. A non-inferiority phase III trial requires a roughly four times larger sample size than that of a similar standard superiority trial. Because of the large required sample size, we often face feasibility issues to open a non-inferiority trial. Furthermore, due to lack of phase II non-inferiority trial design methods, we do not have an opportunity to investigate the efficacy of the experimental therapy through a phase II trial. As a result, we often fail to open a non-inferiority phase III trial and a large number of non-inferiority clinical questions still remain unanswered. In this paper, we want to develop some designs for non-inferiority randomized phase II trials with feasible sample sizes. At first, we review a design method for non-inferiority phase III trials. Subsequently, we propose three different designs for non-inferiority phase II trials that can be used under different settings. Each method is demonstrated with examples. Each of the proposed design methods is shown to require a reasonable sample size for non-inferiority phase II trials. The three different non-inferiority phase II trial designs are used under different settings, but require similar sample sizes that are typical for phase II trials.
Zhang, Haixia; Zhao, Junkang; Gu, Caijiao; Cui, Yan; Rong, Huiying; Meng, Fanlong; Wang, Tong
2015-05-01
The study of the medical expenditure and its influencing factors among the students enrolling in Urban Resident Basic Medical Insurance (URBMI) in Taiyuan indicated that non response bias and selection bias coexist in dependent variable of the survey data. Unlike previous studies only focused on one missing mechanism, a two-stage method to deal with two missing mechanisms simultaneously was suggested in this study, combining multiple imputation with sample selection model. A total of 1 190 questionnaires were returned by the students (or their parents) selected in child care settings, schools and universities in Taiyuan by stratified cluster random sampling in 2012. In the returned questionnaires, 2.52% existed not missing at random (NMAR) of dependent variable and 7.14% existed missing at random (MAR) of dependent variable. First, multiple imputation was conducted for MAR by using completed data, then sample selection model was used to correct NMAR in multiple imputation, and a multi influencing factor analysis model was established. Based on 1 000 times resampling, the best scheme of filling the random missing values is the predictive mean matching (PMM) method under the missing proportion. With this optimal scheme, a two stage survey was conducted. Finally, it was found that the influencing factors on annual medical expenditure among the students enrolling in URBMI in Taiyuan included population group, annual household gross income, affordability of medical insurance expenditure, chronic disease, seeking medical care in hospital, seeking medical care in community health center or private clinic, hospitalization, hospitalization canceled due to certain reason, self medication and acceptable proportion of self-paid medical expenditure. The two-stage method combining multiple imputation with sample selection model can deal with non response bias and selection bias effectively in dependent variable of the survey data.
Methodological reporting of randomized trials in five leading Chinese nursing journals.
Shi, Chunhu; Tian, Jinhui; Ren, Dan; Wei, Hongli; Zhang, Lihuan; Wang, Quan; Yang, Kehu
2014-01-01
Randomized controlled trials (RCTs) are not always well reported, especially in terms of their methodological descriptions. This study aimed to investigate the adherence of methodological reporting complying with CONSORT and explore associated trial level variables in the Chinese nursing care field. In June 2012, we identified RCTs published in five leading Chinese nursing journals and included trials with details of randomized methods. The quality of methodological reporting was measured through the methods section of the CONSORT checklist and the overall CONSORT methodological items score was calculated and expressed as a percentage. Meanwhile, we hypothesized that some general and methodological characteristics were associated with reporting quality and conducted a regression with these data to explore the correlation. The descriptive and regression statistics were calculated via SPSS 13.0. In total, 680 RCTs were included. The overall CONSORT methodological items score was 6.34 ± 0.97 (Mean ± SD). No RCT reported descriptions and changes in "trial design," changes in "outcomes" and "implementation," or descriptions of the similarity of interventions for "blinding." Poor reporting was found in detailing the "settings of participants" (13.1%), "type of randomization sequence generation" (1.8%), calculation methods of "sample size" (0.4%), explanation of any interim analyses and stopping guidelines for "sample size" (0.3%), "allocation concealment mechanism" (0.3%), additional analyses in "statistical methods" (2.1%), and targeted subjects and methods of "blinding" (5.9%). More than 50% of trials described randomization sequence generation, the eligibility criteria of "participants," "interventions," and definitions of the "outcomes" and "statistical methods." The regression analysis found that publication year and ITT analysis were weakly associated with CONSORT score. The completeness of methodological reporting of RCTs in the Chinese nursing care field is poor, especially with regard to the reporting of trial design, changes in outcomes, sample size calculation, allocation concealment, blinding, and statistical methods.
Methodology Series Module 5: Sampling Strategies.
Setia, Maninder Singh
2016-01-01
Once the research question and the research design have been finalised, it is important to select the appropriate sample for the study. The method by which the researcher selects the sample is the ' Sampling Method'. There are essentially two types of sampling methods: 1) probability sampling - based on chance events (such as random numbers, flipping a coin etc.); and 2) non-probability sampling - based on researcher's choice, population that accessible & available. Some of the non-probability sampling methods are: purposive sampling, convenience sampling, or quota sampling. Random sampling method (such as simple random sample or stratified random sample) is a form of probability sampling. It is important to understand the different sampling methods used in clinical studies and mention this method clearly in the manuscript. The researcher should not misrepresent the sampling method in the manuscript (such as using the term ' random sample' when the researcher has used convenience sample). The sampling method will depend on the research question. For instance, the researcher may want to understand an issue in greater detail for one particular population rather than worry about the ' generalizability' of these results. In such a scenario, the researcher may want to use ' purposive sampling' for the study.
Melvin, Neal R; Poda, Daniel; Sutherland, Robert J
2007-10-01
When properly applied, stereology is a very robust and efficient method to quantify a variety of parameters from biological material. A common sampling strategy in stereology is systematic random sampling, which involves choosing a random sampling [corrected] start point outside the structure of interest, and sampling relevant objects at [corrected] sites that are placed at pre-determined, equidistant intervals. This has proven to be a very efficient sampling strategy, and is used widely in stereological designs. At the microscopic level, this is most often achieved through the use of a motorized stage that facilitates the systematic random stepping across the structure of interest. Here, we report a simple, precise and cost-effective software-based alternative to accomplishing systematic random sampling under the microscope. We believe that this approach will facilitate the use of stereological designs that employ systematic random sampling in laboratories that lack the resources to acquire costly, fully automated systems.
Feline mitochondrial DNA sampling for forensic analysis: when enough is enough!
Grahn, Robert A; Alhaddad, Hasan; Alves, Paulo C; Randi, Ettore; Waly, Nashwa E; Lyons, Leslie A
2015-05-01
Pet hair has a demonstrated value in resolving legal issues. Cat hair is chronically shed and it is difficult to leave a home with cats without some level of secondary transfer. The power of cat hair as an evidentiary resource may be underused because representative genetic databases are not available for exclusionary purposes. Mitochondrial control region databases are highly valuable for hair analyses and have been developed for the cat. In a representative worldwide data set, 83% of domestic cat mitotypes belong to one of twelve major types. Of the remaining 17%, 7.5% are unique within the published 1394 sample database. The current research evaluates the sample size necessary to establish a representative population for forensic comparison of the mitochondrial control region for the domestic cat. For most worldwide populations, randomly sampling 50 unrelated local individuals will achieve saturation at 95%. The 99% saturation is achieved by randomly sampling 60-170 cats, depending on the numbers of mitotypes available in the population at large. Likely due to the recent domestication of the cat and minimal localized population substructure, fewer cats are needed to meet mitochondria DNA control region database practical saturation than for humans or dogs. Coupled with the available worldwide feline control region database of nearly 1400 cats, minimal local sampling will be required to establish an appropriate comparative representative database and achieve significant exclusionary power. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Zaharov, V V; Farahi, R H; Snyder, P J; Davison, B H; Passian, A
2014-11-21
Resolving weak spectral variations in the dynamic response of materials that are either dominated or excited by stochastic processes remains a challenge. Responses that are thermal in origin are particularly relevant examples due to the delocalized nature of heat. Despite its inherent properties in dealing with stochastic processes, the Karhunen-Loève expansion has not been fully exploited in measurement of systems that are driven solely by random forces or can exhibit large thermally driven random fluctuations. Here, we present experimental results and analysis of the archetypes (a) the resonant excitation and transient response of an atomic force microscope probe by the ambient random fluctuations and nanoscale photothermal sample response, and (b) the photothermally scattered photons in pump-probe spectroscopy. In each case, the dynamic process is represented as an infinite series with random coefficients to obtain pertinent frequency shifts and spectral peaks and demonstrate spectral enhancement for a set of compounds including the spectrally complex biomass. The considered cases find important applications in nanoscale material characterization, biosensing, and spectral identification of biological and chemical agents.
Deterministic matrices matching the compressed sensing phase transitions of Gaussian random matrices
Monajemi, Hatef; Jafarpour, Sina; Gavish, Matan; Donoho, David L.; Ambikasaran, Sivaram; Bacallado, Sergio; Bharadia, Dinesh; Chen, Yuxin; Choi, Young; Chowdhury, Mainak; Chowdhury, Soham; Damle, Anil; Fithian, Will; Goetz, Georges; Grosenick, Logan; Gross, Sam; Hills, Gage; Hornstein, Michael; Lakkam, Milinda; Lee, Jason; Li, Jian; Liu, Linxi; Sing-Long, Carlos; Marx, Mike; Mittal, Akshay; Monajemi, Hatef; No, Albert; Omrani, Reza; Pekelis, Leonid; Qin, Junjie; Raines, Kevin; Ryu, Ernest; Saxe, Andrew; Shi, Dai; Siilats, Keith; Strauss, David; Tang, Gary; Wang, Chaojun; Zhou, Zoey; Zhu, Zhen
2013-01-01
In compressed sensing, one takes samples of an N-dimensional vector using an matrix A, obtaining undersampled measurements . For random matrices with independent standard Gaussian entries, it is known that, when is k-sparse, there is a precisely determined phase transition: for a certain region in the (,)-phase diagram, convex optimization typically finds the sparsest solution, whereas outside that region, it typically fails. It has been shown empirically that the same property—with the same phase transition location—holds for a wide range of non-Gaussian random matrix ensembles. We report extensive experiments showing that the Gaussian phase transition also describes numerous deterministic matrices, including Spikes and Sines, Spikes and Noiselets, Paley Frames, Delsarte-Goethals Frames, Chirp Sensing Matrices, and Grassmannian Frames. Namely, for each of these deterministic matrices in turn, for a typical k-sparse object, we observe that convex optimization is successful over a region of the phase diagram that coincides with the region known for Gaussian random matrices. Our experiments considered coefficients constrained to for four different sets , and the results establish our finding for each of the four associated phase transitions. PMID:23277588
Efficacy of abstinence promotion media messages: findings from an online randomized trial.
Evans, W Douglas; Davis, Kevin C; Ashley, Olivia Silber; Blitstein, Jonathan; Koo, Helen; Zhang, Yun
2009-10-01
We conducted an online randomized experiment to evaluate the efficacy of messages from the Parents Speak Up National Campaign (PSUNC) to promote parent-child communication about sex. We randomly assigned a national sample of 1,969 mothers and fathers to treatment (PSUNC exposure) and control (no exposure) conditions. Mothers were further randomized into treatment and booster (additional messages) conditions to evaluate dose-response effects. Participants were surveyed at baseline, 4 weeks postexposure, and 6 months postexposure. We used multivariable logistic regression procedures in our analysis. Treatment fathers were more likely than control fathers to initiate conversations about sex at 4 weeks, and treatment fathers and mothers were more likely than controls at 6 months to recommend that their children wait to have sex. Treatment fathers and mothers were far more likely than controls to use the campaign Web site. There was a dose-response effect for mothers' Web site use. Using new media methods, this study shows that PSUNC messages are efficacious in promoting parent-child communication about sex and abstinence. Future research should evaluate mechanisms and effectiveness in natural settings.
Sampling maternal care behaviour in domestic dogs: What's the best approach?
Czerwinski, Veronika H; Smith, Bradley P; Hynd, Philip I; Hazel, Susan J
2017-07-01
Our understanding of the frequency and duration of maternal care behaviours in the domestic dog during the first two postnatal weeks is limited, largely due to the inconsistencies in the sampling methodologies that have been employed. In order to develop a more concise picture of maternal care behaviour during this period, and to help establish the sampling method that represents these behaviours best, we compared a variety of time sampling methods Six litters were continuously observed for a total of 96h over postnatal days 3, 6, 9 and 12 (24h per day). Frequent (dam presence, nursing duration, contact duration) and infrequent maternal behaviours (anogenital licking duration and frequency) were coded using five different time sampling methods that included: 12-h night (1800-0600h), 12-h day (0600-1800h), one hour period during the night (1800-0600h), one hour period during the day (0600-1800h) and a one hour period anytime. Each of the one hour time sampling method consisted of four randomly chosen 15-min periods. Two random sets of four 15-min period were also analysed to ensure reliability. We then determined which of the time sampling methods averaged over the three 24-h periods best represented the frequency and duration of behaviours. As might be expected, frequently occurring behaviours were adequately represented by short (oneh) sampling periods, however this was not the case with the infrequent behaviour. Thus, we argue that the time sampling methodology employed must match the behaviour of interest. This caution applies to maternal behaviour in altricial species, such as canids, as well as all systematic behavioural observations utilising time sampling methodology. Copyright © 2017. Published by Elsevier B.V.
Efficient sampling of complex network with modified random walk strategies
NASA Astrophysics Data System (ADS)
Xie, Yunya; Chang, Shuhua; Zhang, Zhipeng; Zhang, Mi; Yang, Lei
2018-02-01
We present two novel random walk strategies, choosing seed node (CSN) random walk and no-retracing (NR) random walk. Different from the classical random walk sampling, the CSN and NR strategies focus on the influences of the seed node choice and path overlap, respectively. Three random walk samplings are applied in the Erdös-Rényi (ER), Barabási-Albert (BA), Watts-Strogatz (WS), and the weighted USAir networks, respectively. Then, the major properties of sampled subnets, such as sampling efficiency, degree distributions, average degree and average clustering coefficient, are studied. The similar conclusions can be reached with these three random walk strategies. Firstly, the networks with small scales and simple structures are conducive to the sampling. Secondly, the average degree and the average clustering coefficient of the sampled subnet tend to the corresponding values of original networks with limited steps. And thirdly, all the degree distributions of the subnets are slightly biased to the high degree side. However, the NR strategy performs better for the average clustering coefficient of the subnet. In the real weighted USAir networks, some obvious characters like the larger clustering coefficient and the fluctuation of degree distribution are reproduced well by these random walk strategies.
Methodology Series Module 5: Sampling Strategies
Setia, Maninder Singh
2016-01-01
Once the research question and the research design have been finalised, it is important to select the appropriate sample for the study. The method by which the researcher selects the sample is the ‘ Sampling Method’. There are essentially two types of sampling methods: 1) probability sampling – based on chance events (such as random numbers, flipping a coin etc.); and 2) non-probability sampling – based on researcher's choice, population that accessible & available. Some of the non-probability sampling methods are: purposive sampling, convenience sampling, or quota sampling. Random sampling method (such as simple random sample or stratified random sample) is a form of probability sampling. It is important to understand the different sampling methods used in clinical studies and mention this method clearly in the manuscript. The researcher should not misrepresent the sampling method in the manuscript (such as using the term ‘ random sample’ when the researcher has used convenience sample). The sampling method will depend on the research question. For instance, the researcher may want to understand an issue in greater detail for one particular population rather than worry about the ‘ generalizability’ of these results. In such a scenario, the researcher may want to use ‘ purposive sampling’ for the study. PMID:27688438
Randomized algorithms for high quality treatment planning in volumetric modulated arc therapy
NASA Astrophysics Data System (ADS)
Yang, Yu; Dong, Bin; Wen, Zaiwen
2017-02-01
In recent years, volumetric modulated arc therapy (VMAT) has been becoming a more and more important radiation technique widely used in clinical application for cancer treatment. One of the key problems in VMAT is treatment plan optimization, which is complicated due to the constraints imposed by the involved equipments. In this paper, we consider a model with four major constraints: the bound on the beam intensity, an upper bound on the rate of the change of the beam intensity, the moving speed of leaves of the multi-leaf collimator (MLC) and its directional-convexity. We solve the model by a two-stage algorithm: performing minimization with respect to the shapes of the aperture and the beam intensities alternatively. Specifically, the shapes of the aperture are obtained by a greedy algorithm whose performance is enhanced by random sampling in the leaf pairs with a decremental rate. The beam intensity is optimized using a gradient projection method with non-monotonic line search. We further improve the proposed algorithm by an incremental random importance sampling of the voxels to reduce the computational cost of the energy functional. Numerical simulations on two clinical cancer date sets demonstrate that our method is highly competitive to the state-of-the-art algorithms in terms of both computational time and quality of treatment planning.
Effect of cryotherapy on arteriovenous fistula puncture-related pain in hemodialysis patients.
P B, Sabitha; Khakha, D C; Mahajan, S; Gupta, S; Agarwal, M; Yadav, S L
2008-10-01
Pain during areteriovenous fistula (AVF) cannulation remains a common problem in hemodialysis (HD) patients. This study was undertaken to assess the effect of cryotherapy on pain due to arteriovenous fistula puncture in hemodialysis patients. A convenience sample of 60 patients (30 each in experimental and control groups) who were undergoing hemodialysis by using AVF, was assessed in a randomized control trial. Hemodialysis patients who met the inclusion criteria, were randomly assigned to experimental and control groups using a randomization table. Objective and subjective pain scoring was done on two consecutive days of HD treatment (with cryotherapy for the experimental and without cryotherapy for the control group). The tools used were a questionnaire examining demographic and clinical characteristics, an observation checklist for assessing objective pain behavior, and a numerical rating scale for subjective pain assessment. Descriptive statistics were used as deemed appropriate. Chi square, two-sample and paired t-tests, the Mann Whitney test, Wilcoxon's signed rank test, the Kruskal Wallis test, and Spearman's and Pearson's correlations were used for inferential statistics. We found that the objective and subjective pain scores were found to be significantly (P = 0.001) reduced within the experimental group with the application of cryotherapy. This study highlights the need for adopting alternative therapies such as cryotherapy for effective pain management in hospital settings.
Grelet, C; Bastin, C; Gelé, M; Davière, J-B; Johan, M; Werner, A; Reding, R; Fernandez Pierna, J A; Colinet, F G; Dardenne, P; Gengler, N; Soyeurt, H; Dehareng, F
2016-06-01
To manage negative energy balance and ketosis in dairy farms, rapid and cost-effective detection is needed. Among the milk biomarkers that could be useful for this purpose, acetone and β-hydroxybutyrate (BHB) have been proved as molecules of interest regarding ketosis and citrate was recently identified as an early indicator of negative energy balance. Because Fourier transform mid-infrared spectrometry can provide rapid and cost-effective predictions of milk composition, the objective of this study was to evaluate the ability of this technology to predict these biomarkers in milk. Milk samples were collected in commercial and experimental farms in Luxembourg, France, and Germany. Acetone, BHB, and citrate contents were determined by flow injection analysis. Milk mid-infrared spectra were recorded and standardized for all samples. After edits, a total of 548 samples were used in the calibration and validation data sets for acetone, 558 for BHB, and 506 for citrate. Acetone content ranged from 0.020 to 3.355mmol/L with an average of 0.103mmol/L; BHB content ranged from 0.045 to 1.596mmol/L with an average of 0.215mmol/L; and citrate content ranged from 3.88 to 16.12mmol/L with an average of 9.04mmol/L. Acetone and BHB contents were log-transformed and a part of the samples with low values was randomly excluded to approach a normal distribution. The 3 edited data sets were then randomly divided into a calibration data set (3/4 of the samples) and a validation data set (1/4 of the samples). Prediction equations were developed using partial least square regression. The coefficient of determination (R(2)) of cross-validation was 0.73 for acetone, 0.71 for BHB, and 0.90 for citrate with root mean square error of 0.248, 0.109, and 0.70mmol/L, respectively. Finally, the external validation was performed and R(2) obtained were 0.67 for acetone, 0.63 for BHB, and 0.86 for citrate, with respective root mean square error of validation of 0.196, 0.083, and 0.76mmol/L. Although the practical usefulness of the equations developed should be further verified with other field data, results from this study demonstrated the potential of Fourier transform mid-infrared spectrometry to predict citrate content with good accuracy and to supply indicative contents of BHB and acetone in milk, thereby providing rapid and cost-effective tools to manage ketosis and negative energy balance in dairy farms. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
An improved initialization center k-means clustering algorithm based on distance and density
NASA Astrophysics Data System (ADS)
Duan, Yanling; Liu, Qun; Xia, Shuyin
2018-04-01
Aiming at the problem of the random initial clustering center of k means algorithm that the clustering results are influenced by outlier data sample and are unstable in multiple clustering, a method of central point initialization method based on larger distance and higher density is proposed. The reciprocal of the weighted average of distance is used to represent the sample density, and the data sample with the larger distance and the higher density are selected as the initial clustering centers to optimize the clustering results. Then, a clustering evaluation method based on distance and density is designed to verify the feasibility of the algorithm and the practicality, the experimental results on UCI data sets show that the algorithm has a certain stability and practicality.
Makowski, David; Bancal, Rémi; Bensadoun, Arnaud; Monod, Hervé; Messéan, Antoine
2017-09-01
According to E.U. regulations, the maximum allowable rate of adventitious transgene presence in non-genetically modified (GM) crops is 0.9%. We compared four sampling methods for the detection of transgenic material in agricultural non-GM maize fields: random sampling, stratified sampling, random sampling + ratio reweighting, random sampling + regression reweighting. Random sampling involves simply sampling maize grains from different locations selected at random from the field concerned. The stratified and reweighting sampling methods make use of an auxiliary variable corresponding to the output of a gene-flow model (a zero-inflated Poisson model) simulating cross-pollination as a function of wind speed, wind direction, and distance to the closest GM maize field. With the stratified sampling method, an auxiliary variable is used to define several strata with contrasting transgene presence rates, and grains are then sampled at random from each stratum. With the two methods involving reweighting, grains are first sampled at random from various locations within the field, and the observations are then reweighted according to the auxiliary variable. Data collected from three maize fields were used to compare the four sampling methods, and the results were used to determine the extent to which transgene presence rate estimation was improved by the use of stratified and reweighting sampling methods. We found that transgene rate estimates were more accurate and that substantially smaller samples could be used with sampling strategies based on an auxiliary variable derived from a gene-flow model. © 2017 Society for Risk Analysis.
Valid statistical inference methods for a case-control study with missing data.
Tian, Guo-Liang; Zhang, Chi; Jiang, Xuejun
2018-04-01
The main objective of this paper is to derive the valid sampling distribution of the observed counts in a case-control study with missing data under the assumption of missing at random by employing the conditional sampling method and the mechanism augmentation method. The proposed sampling distribution, called the case-control sampling distribution, can be used to calculate the standard errors of the maximum likelihood estimates of parameters via the Fisher information matrix and to generate independent samples for constructing small-sample bootstrap confidence intervals. Theoretical comparisons of the new case-control sampling distribution with two existing sampling distributions exhibit a large difference. Simulations are conducted to investigate the influence of the three different sampling distributions on statistical inferences. One finding is that the conclusion by the Wald test for testing independency under the two existing sampling distributions could be completely different (even contradictory) from the Wald test for testing the equality of the success probabilities in control/case groups under the proposed distribution. A real cervical cancer data set is used to illustrate the proposed statistical methods.
Evaluating data mining algorithms using molecular dynamics trajectories.
Tatsis, Vasileios A; Tjortjis, Christos; Tzirakis, Panagiotis
2013-01-01
Molecular dynamics simulations provide a sample of a molecule's conformational space. Experiments on the mus time scale, resulting in large amounts of data, are nowadays routine. Data mining techniques such as classification provide a way to analyse such data. In this work, we evaluate and compare several classification algorithms using three data sets which resulted from computer simulations, of a potential enzyme mimetic biomolecule. We evaluated 65 classifiers available in the well-known data mining toolkit Weka, using 'classification' errors to assess algorithmic performance. Results suggest that: (i) 'meta' classifiers perform better than the other groups, when applied to molecular dynamics data sets; (ii) Random Forest and Rotation Forest are the best classifiers for all three data sets; and (iii) classification via clustering yields the highest classification error. Our findings are consistent with bibliographic evidence, suggesting a 'roadmap' for dealing with such data.
2009-08-01
Bryant, R, Engel, CC (2004). A therapist-assisted internet self-help program for traumatic stress . Professional Psychology: Research and Practice, 35...Combat-Related PTSD in Military Primary Healthcare Settings: A Randomized Trial of “DESTRESS-PC” PRINCIPAL INVESTIGATOR: Charles Engel...Early Resilience Intervention for Combat-Related PTSD in Military Primary Healthcare Settings: A Randomized Trial of DESTRESS-PC 5b. GRANT NUMBER
Electromagnetic Scattering by Fully Ordered and Quasi-Random Rigid Particulate Samples
NASA Technical Reports Server (NTRS)
Mishchenko, Michael I.; Dlugach, Janna M.; Mackowski, Daniel W.
2016-01-01
In this paper we have analyzed circumstances under which a rigid particulate sample can behave optically as a true discrete random medium consisting of particles randomly moving relative to each other during measurement. To this end, we applied the numerically exact superposition T-matrix method to model far-field scattering characteristics of fully ordered and quasi-randomly arranged rigid multiparticle groups in fixed and random orientations. We have shown that, in and of itself, averaging optical observables over movements of a rigid sample as a whole is insufficient unless it is combined with a quasi-random arrangement of the constituent particles in the sample. Otherwise, certain scattering effects typical of discrete random media (including some manifestations of coherent backscattering) may not be accurately replicated.
Unsupervised Metric Fusion Over Multiview Data by Graph Random Walk-Based Cross-View Diffusion.
Wang, Yang; Zhang, Wenjie; Wu, Lin; Lin, Xuemin; Zhao, Xiang
2017-01-01
Learning an ideal metric is crucial to many tasks in computer vision. Diverse feature representations may combat this problem from different aspects; as visual data objects described by multiple features can be decomposed into multiple views, thus often provide complementary information. In this paper, we propose a cross-view fusion algorithm that leads to a similarity metric for multiview data by systematically fusing multiple similarity measures. Unlike existing paradigms, we focus on learning distance measure by exploiting a graph structure of data samples, where an input similarity matrix can be improved through a propagation of graph random walk. In particular, we construct multiple graphs with each one corresponding to an individual view, and a cross-view fusion approach based on graph random walk is presented to derive an optimal distance measure by fusing multiple metrics. Our method is scalable to a large amount of data by enforcing sparsity through an anchor graph representation. To adaptively control the effects of different views, we dynamically learn view-specific coefficients, which are leveraged into graph random walk to balance multiviews. However, such a strategy may lead to an over-smooth similarity metric where affinities between dissimilar samples may be enlarged by excessively conducting cross-view fusion. Thus, we figure out a heuristic approach to controlling the iteration number in the fusion process in order to avoid over smoothness. Extensive experiments conducted on real-world data sets validate the effectiveness and efficiency of our approach.
Araya, Mesfin; Chotai, Jayanti; Komproe, Ivan H; de Jong, Joop T V M
2011-07-01
The resilience of post-war displaced persons is not only influenced partly by the nature of premigration trauma, but also by postmigration psychosocial circumstances and living conditions. A lengthy civil war leading to Eritrea separating from Ethiopia and becoming an independent state in 1991 resulted in many displaced persons. A random sample of 749 displaced women living in the shelters in the Ethiopian capital Addis Ababa was compared with a random sample of 110 displaced women living in the community setting of Debre Zeit, 50 km away from Addis Ababa, regarding their quality of life, mental distress, sociodemographics, living conditions, perceived social support, and coping strategies, 6 years after displacement. Subjects from Debre Zeit reported significantly higher quality of life and better living conditions. However, mental distress did not differ significantly between the groups. Also, Debre Zeit subjects contained a higher proportion born in Ethiopia, a higher proportion married, reported higher traumatic life events, employed more task-oriented coping, and perceived higher social support. Factors that accounted for the difference in quality of life between the shelters and Debre Zeit groups in three of the four quality of life domains of WHOQOL-BREF (physical health, psychological, environment), included protection from insects/rodents and other living conditions. However, to account for the difference in the fourth domain (social relationships), psychosocial factors also contributed significantly. Placement and rehabilitation in a community setting seems better than in the shelters. If this possibility is not available, measures to improve specific living conditions in the shelters are likely to lead to a considerable increase in quality of life.
A Dirichlet-Multinomial Bayes Classifier for Disease Diagnosis with Microbial Compositions.
Gao, Xiang; Lin, Huaiying; Dong, Qunfeng
2017-01-01
Dysbiosis of microbial communities is associated with various human diseases, raising the possibility of using microbial compositions as biomarkers for disease diagnosis. We have developed a Bayes classifier by modeling microbial compositions with Dirichlet-multinomial distributions, which are widely used to model multicategorical count data with extra variation. The parameters of the Dirichlet-multinomial distributions are estimated from training microbiome data sets based on maximum likelihood. The posterior probability of a microbiome sample belonging to a disease or healthy category is calculated based on Bayes' theorem, using the likelihood values computed from the estimated Dirichlet-multinomial distribution, as well as a prior probability estimated from the training microbiome data set or previously published information on disease prevalence. When tested on real-world microbiome data sets, our method, called DMBC (for Dirichlet-multinomial Bayes classifier), shows better classification accuracy than the only existing Bayesian microbiome classifier based on a Dirichlet-multinomial mixture model and the popular random forest method. The advantage of DMBC is its built-in automatic feature selection, capable of identifying a subset of microbial taxa with the best classification accuracy between different classes of samples based on cross-validation. This unique ability enables DMBC to maintain and even improve its accuracy at modeling species-level taxa. The R package for DMBC is freely available at https://github.com/qunfengdong/DMBC. IMPORTANCE By incorporating prior information on disease prevalence, Bayes classifiers have the potential to estimate disease probability better than other common machine-learning methods. Thus, it is important to develop Bayes classifiers specifically tailored for microbiome data. Our method shows higher classification accuracy than the only existing Bayesian classifier and the popular random forest method, and thus provides an alternative option for using microbial compositions for disease diagnosis.
Utilizing Maximal Independent Sets as Dominating Sets in Scale-Free Networks
NASA Astrophysics Data System (ADS)
Derzsy, N.; Molnar, F., Jr.; Szymanski, B. K.; Korniss, G.
Dominating sets provide key solution to various critical problems in networked systems, such as detecting, monitoring, or controlling the behavior of nodes. Motivated by graph theory literature [Erdos, Israel J. Math. 4, 233 (1966)], we studied maximal independent sets (MIS) as dominating sets in scale-free networks. We investigated the scaling behavior of the size of MIS in artificial scale-free networks with respect to multiple topological properties (size, average degree, power-law exponent, assortativity), evaluated its resilience to network damage resulting from random failure or targeted attack [Molnar et al., Sci. Rep. 5, 8321 (2015)], and compared its efficiency to previously proposed dominating set selection strategies. We showed that, despite its small set size, MIS provides very high resilience against network damage. Using extensive numerical analysis on both synthetic and real-world (social, biological, technological) network samples, we demonstrate that our method effectively satisfies four essential requirements of dominating sets for their practical applicability on large-scale real-world systems: 1.) small set size, 2.) minimal network information required for their construction scheme, 3.) fast and easy computational implementation, and 4.) resiliency to network damage. Supported by DARPA, DTRA, and NSF.
Uncertainty Analysis in 3D Equilibrium Reconstruction
Cianciosa, Mark R.; Hanson, James D.; Maurer, David A.
2018-02-21
Reconstruction is an inverse process where a parameter space is searched to locate a set of parameters with the highest probability of describing experimental observations. Due to systematic errors and uncertainty in experimental measurements, this optimal set of parameters will contain some associated uncertainty. This uncertainty in the optimal parameters leads to uncertainty in models derived using those parameters. V3FIT is a three-dimensional (3D) equilibrium reconstruction code that propagates uncertainty from the input signals, to the reconstructed parameters, and to the final model. Here in this paper, we describe the methods used to propagate uncertainty in V3FIT. Using the resultsmore » of whole shot 3D equilibrium reconstruction of the Compact Toroidal Hybrid, this propagated uncertainty is validated against the random variation in the resulting parameters. Two different model parameterizations demonstrate how the uncertainty propagation can indicate the quality of a reconstruction. As a proxy for random sampling, the whole shot reconstruction results in a time interval that will be used to validate the propagated uncertainty from a single time slice.« less
Uncertainty Analysis in 3D Equilibrium Reconstruction
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cianciosa, Mark R.; Hanson, James D.; Maurer, David A.
Reconstruction is an inverse process where a parameter space is searched to locate a set of parameters with the highest probability of describing experimental observations. Due to systematic errors and uncertainty in experimental measurements, this optimal set of parameters will contain some associated uncertainty. This uncertainty in the optimal parameters leads to uncertainty in models derived using those parameters. V3FIT is a three-dimensional (3D) equilibrium reconstruction code that propagates uncertainty from the input signals, to the reconstructed parameters, and to the final model. Here in this paper, we describe the methods used to propagate uncertainty in V3FIT. Using the resultsmore » of whole shot 3D equilibrium reconstruction of the Compact Toroidal Hybrid, this propagated uncertainty is validated against the random variation in the resulting parameters. Two different model parameterizations demonstrate how the uncertainty propagation can indicate the quality of a reconstruction. As a proxy for random sampling, the whole shot reconstruction results in a time interval that will be used to validate the propagated uncertainty from a single time slice.« less
Analysis of sampling techniques for imbalanced data: An n = 648 ADNI study.
Dubey, Rashmi; Zhou, Jiayu; Wang, Yalin; Thompson, Paul M; Ye, Jieping
2014-02-15
Many neuroimaging applications deal with imbalanced imaging data. For example, in Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, the mild cognitive impairment (MCI) cases eligible for the study are nearly two times the Alzheimer's disease (AD) patients for structural magnetic resonance imaging (MRI) modality and six times the control cases for proteomics modality. Constructing an accurate classifier from imbalanced data is a challenging task. Traditional classifiers that aim to maximize the overall prediction accuracy tend to classify all data into the majority class. In this paper, we study an ensemble system of feature selection and data sampling for the class imbalance problem. We systematically analyze various sampling techniques by examining the efficacy of different rates and types of undersampling, oversampling, and a combination of over and undersampling approaches. We thoroughly examine six widely used feature selection algorithms to identify significant biomarkers and thereby reduce the complexity of the data. The efficacy of the ensemble techniques is evaluated using two different classifiers including Random Forest and Support Vector Machines based on classification accuracy, area under the receiver operating characteristic curve (AUC), sensitivity, and specificity measures. Our extensive experimental results show that for various problem settings in ADNI, (1) a balanced training set obtained with K-Medoids technique based undersampling gives the best overall performance among different data sampling techniques and no sampling approach; and (2) sparse logistic regression with stability selection achieves competitive performance among various feature selection algorithms. Comprehensive experiments with various settings show that our proposed ensemble model of multiple undersampled datasets yields stable and promising results. © 2013 Elsevier Inc. All rights reserved.
2014-02-01
moisture level of 14% dry soil mass was maintained for the duration of the study by weekly additions of ASTM Type I water. Soil samples were collected...maintain the initial soil moisture level. One cluster of Orchard grass straw was harvested from a set of randomly selected replicate containers...decomposition is among the most integrating processes within the soil ecosystem because it involves complex interactions of soil microbial, plant , and
Visual statistical learning is not reliably modulated by selective attention to isolated events
Musz, Elizabeth; Weber, Matthew J.; Thompson-Schill, Sharon L.
2014-01-01
Recent studies of visual statistical learning (VSL) indicate that the visual system can automatically extract temporal and spatial relationships between objects. We report several attempts to replicate and extend earlier work (Turk-Browne et al., 2005) in which observers performed a cover task on one of two interleaved stimulus sets, resulting in learning of temporal relationships that occur in the attended stream, but not those present in the unattended stream. Across four experiments, we exposed observers to a similar or identical familiarization protocol, directing attention to one of two interleaved stimulus sets; afterward, we assessed VSL efficacy for both sets using either implicit response-time measures or explicit familiarity judgments. In line with prior work, we observe learning for the attended stimulus set. However, unlike previous reports, we also observe learning for the unattended stimulus set. When instructed to selectively attend to only one of the stimulus sets and ignore the other set, observers could extract temporal regularities for both sets. Our efforts to experimentally decrease this effect by changing the cover task (Experiment 1) or the complexity of the statistical regularities (Experiment 3) were unsuccessful. A fourth experiment using a different assessment of learning likewise failed to show an attentional effect. Simulations drawing random samples our first three experiments (n=64) confirm that the distribution of attentional effects in our sample closely approximates the null. We offer several potential explanations for our failure to replicate earlier findings, and discuss how our results suggest limiting conditions on the relevance of attention to VSL. PMID:25172196
NASA Technical Reports Server (NTRS)
Olson, William S.; Kummerow, Christian D.; Yang, Song; Petty, Grant W.; Tao, Wei-Kuo; Bell, Thomas L.; Braun, Scott A.; Wang, Yansen; Lang, Stephen E.; Johnson, Daniel E.;
2006-01-01
A revised Bayesian algorithm for estimating surface rain rate, convective rain proportion, and latent heating profiles from satellite-borne passive microwave radiometer observations over ocean backgrounds is described. The algorithm searches a large database of cloud-radiative model simulations to find cloud profiles that are radiatively consistent with a given set of microwave radiance measurements. The properties of these radiatively consistent profiles are then composited to obtain best estimates of the observed properties. The revised algorithm is supported by an expanded and more physically consistent database of cloud-radiative model simulations. The algorithm also features a better quantification of the convective and nonconvective contributions to total rainfall, a new geographic database, and an improved representation of background radiances in rain-free regions. Bias and random error estimates are derived from applications of the algorithm to synthetic radiance data, based upon a subset of cloud-resolving model simulations, and from the Bayesian formulation itself. Synthetic rain-rate and latent heating estimates exhibit a trend of high (low) bias for low (high) retrieved values. The Bayesian estimates of random error are propagated to represent errors at coarser time and space resolutions, based upon applications of the algorithm to TRMM Microwave Imager (TMI) data. Errors in TMI instantaneous rain-rate estimates at 0.5 -resolution range from approximately 50% at 1 mm/h to 20% at 14 mm/h. Errors in collocated spaceborne radar rain-rate estimates are roughly 50%-80% of the TMI errors at this resolution. The estimated algorithm random error in TMI rain rates at monthly, 2.5deg resolution is relatively small (less than 6% at 5 mm day.1) in comparison with the random error resulting from infrequent satellite temporal sampling (8%-35% at the same rain rate). Percentage errors resulting from sampling decrease with increasing rain rate, and sampling errors in latent heating rates follow the same trend. Averaging over 3 months reduces sampling errors in rain rates to 6%-15% at 5 mm day.1, with proportionate reductions in latent heating sampling errors.
Miller, Michael A; Colby, Alison C C; Kanehl, Paul D; Blocksom, Karen
2009-03-01
The Wisconsin Department of Natural Resources (WDNR), with support from the U.S. EPA, conducted an assessment of wadeable streams in the Driftless Area ecoregion in western Wisconsin using a probabilistic sampling design. This ecoregion encompasses 20% of Wisconsin's land area and contains 8,800 miles of perennial streams. Randomly-selected stream sites (n = 60) equally distributed among stream orders 1-4 were sampled. Watershed land use, riparian and in-stream habitat, water chemistry, macroinvertebrate, and fish assemblage data were collected at each true random site and an associated "modified-random" site on each stream that was accessed via a road crossing nearest to the true random site. Targeted least-disturbed reference sites (n = 22) were also sampled to develop reference conditions for various physical, chemical, and biological measures. Cumulative distribution function plots of various measures collected at the true random sites evaluated with reference condition thresholds, indicate that high proportions of the random sites (and by inference the entire Driftless Area wadeable stream population) show some level of degradation. Study results show no statistically significant differences between the true random and modified-random sample sites for any of the nine physical habitat, 11 water chemistry, seven macroinvertebrate, or eight fish metrics analyzed. In Wisconsin's Driftless Area, 79% of wadeable stream lengths were accessible via road crossings. While further evaluation of the statistical rigor of using a modified-random sampling design is warranted, sampling randomly-selected stream sites accessed via the nearest road crossing may provide a more economical way to apply probabilistic sampling in stream monitoring programs.
The Impact of Missing Data on Species Tree Estimation.
Xi, Zhenxiang; Liu, Liang; Davis, Charles C
2016-03-01
Phylogeneticists are increasingly assembling genome-scale data sets that include hundreds of genes to resolve their focal clades. Although these data sets commonly include a moderate to high amount of missing data, there remains no consensus on their impact to species tree estimation. Here, using several simulated and empirical data sets, we assess the effects of missing data on species tree estimation under varying degrees of incomplete lineage sorting (ILS) and gene rate heterogeneity. We demonstrate that concatenation (RAxML), gene-tree-based coalescent (ASTRAL, MP-EST, and STAR), and supertree (matrix representation with parsimony [MRP]) methods perform reliably, so long as missing data are randomly distributed (by gene and/or by species) and that a sufficiently large number of genes are sampled. When data sets are indecisive sensu Sanderson et al. (2010. Phylogenomics with incomplete taxon coverage: the limits to inference. BMC Evol Biol. 10:155) and/or ILS is high, however, high amounts of missing data that are randomly distributed require exhaustive levels of gene sampling, likely exceeding most empirical studies to date. Moreover, missing data become especially problematic when they are nonrandomly distributed. We demonstrate that STAR produces inconsistent results when the amount of nonrandom missing data is high, regardless of the degree of ILS and gene rate heterogeneity. Similarly, concatenation methods using maximum likelihood can be misled by nonrandom missing data in the presence of gene rate heterogeneity, which becomes further exacerbated when combined with high ILS. In contrast, ASTRAL, MP-EST, and MRP are more robust under all of these scenarios. These results underscore the importance of understanding the influence of missing data in the phylogenomics era. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Langevin, Scott M; Eliot, Melissa; Butler, Rondi A; Cheong, Agnes; Zhang, Xiang; McClean, Michael D; Koestler, Devin C; Kelsey, Karl T
2015-01-01
There are currently no screening tests in routine use for oral and pharyngeal cancer beyond visual inspection and palpation, which are provided on an opportunistic basis, indicating a need for development of novel methods for early detection, particularly in high-risk populations. We sought to address this need through comprehensive interrogation of CpG island methylation in oral rinse samples. We used the Infinium HumanMethylation450 BeadArray to interrogate DNA methylation in oral rinse samples collected from 154 patients with incident oral or pharyngeal carcinoma prior to treatment and 72 cancer-free control subjects. Subjects were randomly allocated to either a training or a testing set. For each subject, average methylation was calculated for each CpG island represented on the array. We applied a semi-supervised recursively partitioned mixture model to the CpG island methylation data to identify a classifier for prediction of case status in the training set. We then applied the resultant classifier to the testing set for validation and to assess the predictive accuracy. We identified a methylation classifier comprised of 22 CpG islands, which predicted oral and pharyngeal carcinoma with a high degree of accuracy (AUC = 0.92, 95 % CI 0.86, 0.98). This novel methylation panel is a strong predictor of oral and pharyngeal carcinoma case status in oral rinse samples and may have utility in early detection and post-treatment follow-up.
Effect of adhesive materials on shear bond strength of a mineral trioxide aggregate.
Ali, Ahmed; Banerjee, Avijit; Mannocci, Francesco
2016-02-01
To compare the shear bond strength (SBS) and fractography between mineral trioxide aggregate (MTA) and glass-ionomer cement (GIC) or resin composite (RC) after varying MTA setting time intervals. MTA was mixed and packed into standardized cavities (4 mm diameter x 3 mm depth) in acrylic blocks. RC with 37% H₃PO₄ and type 2 (etch and rinse) adhesive, or conventional GIC was bonded to the exposed MTA sample surfaces after 10-minute, 24-hour, 72-hour and 30-day MTA setting intervals (n = 10/group, eight groups). Samples were stored (37°C, 24 hours, 100% humidity) before SBS testing and statistical analysis (ANOVA, Tukey LSD, P < 0.05). Fractography was undertaken using stereomicroscopy for all samples and three random samples/group by using SEM. Significant differences between all groups were found (P= 0.002). SBS of RC:MTA (Max 5.09 ± 1.79 MPa) was higher than the SBS of GIC:MTA (Max 3.74 ± 0.70 MPa) in 24-hour, 72-hour and 30-day groups except in the 10-minute MTA setting time groups, where SBS of GIC:MTA was higher. There was a significant effect of time on SBS of RC: MTA (P = 0.008) and no effect on SBS of GIC:MTA (P = 3.00). Fractography revealed mixed (adhesive/cohesive) failures in all groups; in RC:MTA groups there was a decrease in adhesive failure with time in contrast to the GIC:MTA groups.
Minnis, Alexandra M.; vanDommelen-Gonzalez, Evan; Luecke, Ellen; Cheng, Helen; Dow, William; Bautista-Arredondo, Sergio; Padian, Nancy S.
2016-01-01
Most existing evidence-based sexual health interventions focus on individual-level behavior, even though there is substantial evidence that highlights the influential role of social environments in shaping adolescents’ behaviors and reproductive health outcomes. We developed Yo Puedo, a combined conditional cash transfer (CCT) and life skills intervention for youth to promote educational attainment, job training, and reproductive health wellness that we then evaluated for feasibility among 162 youth aged 16–21 years in a predominantly Latino community in San Francisco, CA. The intervention targeted youth’s social networks and involved recruitment and randomization of small social network clusters. In this paper we describe the design of the feasibility study and report participants’ baseline characteristics. Furthermore, we examined the sample and design implications of recruiting social network clusters as the unit of randomization. Baseline data provide evidence that we successfully enrolled high risk youth using a social network recruitment approach in community and school-based settings. Nearly all participants (95%) were high risk for adverse educational and reproductive health outcomes based on multiple measures of low socioeconomic status (81%) and/or reported high risk behaviors (e.g., gang affiliation, past pregnancy, recent unprotected sex, frequent substance use) (62%). We achieved variability in the study sample through heterogeneity in recruitment of the index participants, whereas the individuals within the small social networks of close friends demonstrated substantial homogeneity across sociodemographic and risk profile characteristics. Social networks recruitment was feasible and yielded a sample of high risk youth willing to enroll in a randomized study to evaluate a novel sexual health intervention. PMID:25358834
Men, Hong; Fu, Songlin; Yang, Jialin; Cheng, Meiqi; Shi, Yan; Liu, Jingjing
2018-01-18
Paraffin odor intensity is an important quality indicator when a paraffin inspection is performed. Currently, paraffin odor level assessment is mainly dependent on an artificial sensory evaluation. In this paper, we developed a paraffin odor analysis system to classify and grade four kinds of paraffin samples. The original feature set was optimized using Principal Component Analysis (PCA) and Partial Least Squares (PLS). Support Vector Machine (SVM), Random Forest (RF), and Extreme Learning Machine (ELM) were applied to three different feature data sets for classification and level assessment of paraffin. For classification, the model based on SVM, with an accuracy rate of 100%, was superior to that based on RF, with an accuracy rate of 98.33-100%, and ELM, with an accuracy rate of 98.01-100%. For level assessment, the R² related to the training set was above 0.97 and the R² related to the test set was above 0.87. Through comprehensive comparison, the generalization of the model based on ELM was superior to those based on SVM and RF. The scoring errors for the three models were 0.0016-0.3494, lower than the error of 0.5-1.0 measured by industry standard experts, meaning these methods have a higher prediction accuracy for scoring paraffin level.
Bayes classification of interferometric TOPSAR data
NASA Technical Reports Server (NTRS)
Michel, T. R.; Rodriguez, E.; Houshmand, B.; Carande, R.
1995-01-01
We report the Bayes classification of terrain types at different sites using airborne interferometric synthetic aperture radar (INSAR) data. A Gaussian maximum likelihood classifier was applied on multidimensional observations derived from the SAR intensity, the terrain elevation model, and the magnitude of the interferometric correlation. Training sets for forested, urban, agricultural, or bare areas were obtained either by selecting samples with known ground truth, or by k-means clustering of random sets of samples uniformly distributed across all sites, and subsequent assignments of these clusters using ground truth. The accuracy of the classifier was used to optimize the discriminating efficiency of the set of features that was chosen. The most important features include the SAR intensity, a canopy penetration depth model, and the terrain slope. We demonstrate the classifier's performance across sites using a unique set of training classes for the four main terrain categories. The scenes examined include San Francisco (CA) (predominantly urban and water), Mount Adams (WA) (forested with clear cuts), Pasadena (CA) (urban with mountains), and Antioch Hills (CA) (water, swamps, fields). Issues related to the effects of image calibration and the robustness of the classification to calibration errors are explored. The relative performance of single polarization Interferometric data classification is contrasted against classification schemes based on polarimetric SAR data.
Takesh, Thair; Sargsyan, Anik; Lee, Matthew; Anbarani, Afarin; Ho, Jessica; Wilder-Smith, Petra
2017-01-01
Aims The aim of this project was to evaluate the effects of 2 different whitening strips on color, microstructure and roughness of tea stained porcelain and composite surfaces. Methods 54 porcelain and 72 composite chips served as samples for timed application of over-the-counter (OTC) test or control dental whitening strips. Chips were divided randomly into three groups of 18 porcelain and 24 composite chips each. Of these groups, 1 porcelain and 1 composite set served as controls. The remaining 2 groups were randomized to treatment with either Oral Essentials® Whitening Strips or Crest® 3D White Whitestrips™. Sample surface structure was examined by light microscopy, profilometry and Scanning Electron Microscopy (SEM). Additionally, a reflectance spectrophotometer was used to assess color changes in the porcelain and composite samples over 24 hours of whitening. Data points were analyzed at each time point using ANOVA. Results In the light microscopy and SEM images, no discrete physical defects were observed in any of the samples at any time points. However, high-resolution SEM images showed an appearance of increased surface roughness in all composite samples. Using profilometry, significantly increased post-whitening roughness was documented in the composite samples exposed to the control bleaching strips. Composite samples underwent a significant and equivalent shift in color following exposure to Crest® 3D White Whitestrips™ and Oral Essentials® Whitening Strips. Conclusions A novel commercial tooth whitening strip demonstrated a comparable beaching effect to a widely used OTC whitening strip. Neither whitening strip caused physical defects in the sample surfaces. However, the control strip caused roughening of the composite samples whereas the test strip did not. PMID:29226023
Occurrence of MTBE and other gasoline oxygenates in CWS source waters
Carter, Janet M.; Grady, Stephen J.; Delzer, Gregory C.; Koch, Bart; Zogorski, John S.
2006-01-01
Results from two national surveys indicate that the gasoline oxygenate methyl tertiary butyl ether (MTBE) is one of the most frequently detected volatile organic compounds in source waters used by community water systems in the United States. Three other ether oxygenates were detected infrequently but almost always co-occurred with MTBE. A random sampling of source waters across the United States found MTBE in almost 9% of samples. In geographic areas with high MTBE use, the compound was detected in 23% of source water samples. Although MTBE concentrations were low (<1 µg/L) in most samples, some concentrations equaled or exceeded the drinking water advisory of 20 µg/L set by the US Environmental Protection Agency. The frequent detection of even low concentrations of MTBE demonstrates the vulnerability of US source waters to anthropogenic compounds, indicating a need to include MTBE in monitoring programs to track the trend of contamination.
Psychometric evaluation of the Revised Professional Practice Environment (RPPE) scale.
Erickson, Jeanette Ives; Duffy, Mary E; Ditomassi, Marianne; Jones, Dorothy
2009-05-01
The purpose was to examine the psychometric properties of the Revised Professional Practice Environment (RPPE) scale. Despite renewed focus on studying health professionals' practice environments, there are still few reliable and valid instruments available to assist nurse administrators in decision making. A psychometric evaluation using a random-sample cross-validation procedure (calibration sample [CS], n = 775; validation sample [VS], n = 775) was undertaken. Cronbach alpha internal consistency reliability of the total score (r = 0.93 [CS] and 0.92 [VS]), resulting subscale scores (r range: 0.80-0.87 [CS], 0.81-0.88 [VS]), and principal components analyses with Varimax rotation and Kaiser normalization (8 components, 59.2% variance [CS], 59.7% [VS]) produced almost identical results in both samples. The multidimensional RPPE is a psychometrically sound measure of 8 components of the professional practice environment in the acute care setting and sufficiently reliable and valid for use as independent subscales in healthcare research.
Designing a national soil erosion monitoring network for England and Wales
NASA Astrophysics Data System (ADS)
Lark, Murray; Rawlins, Barry; Anderson, Karen; Evans, Martin; Farrow, Luke; Glendell, Miriam; James, Mike; Rickson, Jane; Quine, Timothy; Quinton, John; Brazier, Richard
2014-05-01
Although soil erosion is recognised as a significant threat to sustainable land use and may be a priority for action in any forthcoming EU Soil Framework Directive, those responsible for setting national policy with respect to erosion are constrained by a lack of robust, representative, data at large spatial scales. This reflects the process-orientated nature of much soil erosion research. Recognising this limitation, The UK Department for Environment, Food and Rural Affairs (Defra) established a project to pilot a cost-effective framework for monitoring of soil erosion in England and Wales (E&W). The pilot will compare different soil erosion monitoring methods at a site scale and provide statistical information for the final design of the full national monitoring network that will: provide unbiased estimates of the spatial mean of soil erosion rate across E&W (tonnes ha-1 yr-1) for each of three land-use classes - arable and horticultural grassland upland and semi-natural habitats quantify the uncertainty of these estimates with confidence intervals. Probability (design-based) sampling provides most efficient unbiased estimates of spatial means. In this study, a 16 hectare area (a square of 400 x 400 m) positioned at the centre of a 1-km grid cell, selected at random from mapped land use across E&W, provided the sampling support for measurement of erosion rates, with at least 94% of the support area corresponding to the target land use classes. Very small or zero erosion rates likely to be encountered at many sites reduce the sampling efficiency and make it difficult to compare different methods of soil erosion monitoring. Therefore, to increase the proportion of samples with larger erosion rates without biasing our estimates, we increased the inclusion probability density in areas where the erosion rate is likely to be large by using stratified random sampling. First, each sampling domain (land use class in E&W) was divided into strata; e.g. two sub-domains within which, respectively, small or no erosion rates, and moderate or larger erosion rates are expected. Each stratum was then sampled independently and at random. The sample density need not be equal in the two strata, but is known and is accounted for in the estimation of the mean and its standard error. To divide the domains into strata we used information on slope angle, previous interpretation of erosion susceptibility of the soil associations that correspond to the soil map of E&W at 1:250 000 (Soil Survey of England and Wales, 1983), and visual interpretation of evidence of erosion from aerial photography. While each domain could be stratified on the basis of the first two criteria, air photo interpretation across the whole country was not feasible. For this reason we used a two-phase random sampling for stratification (TPRS) design (de Gruijter et al., 2006). First, we formed an initial random sample of 1-km grid cells from the target domain. Second, each cell was then allocated to a stratum on the basis of the three criteria. A subset of the selected cells from each stratum were then selected for field survey at random, with a specified sampling density for each stratum so as to increase the proportion of cells where moderate or larger erosion rates were expected. Once measurements of erosion have been made, an estimate of the spatial mean of the erosion rate over the target domain, its standard error and associated uncertainty can be calculated by an expression which accounts for the estimated proportions of the two strata within the initial random sample. de Gruijter, J.J., Brus, D.J., Biekens, M.F.P. & Knotters, M. 2006. Sampling for Natural Resource Monitoring. Springer, Berlin. Soil Survey of England and Wales. 1983 National Soil Map NATMAP Vector 1:250,000. National Soil Research Institute, Cranfield University.
Quantum speedup of Monte Carlo methods.
Montanaro, Ashley
2015-09-08
Monte Carlo methods use random sampling to estimate numerical quantities which are hard to compute deterministically. One important example is the use in statistical physics of rapidly mixing Markov chains to approximately compute partition functions. In this work, we describe a quantum algorithm which can accelerate Monte Carlo methods in a very general setting. The algorithm estimates the expected output value of an arbitrary randomized or quantum subroutine with bounded variance, achieving a near-quadratic speedup over the best possible classical algorithm. Combining the algorithm with the use of quantum walks gives a quantum speedup of the fastest known classical algorithms with rigorous performance bounds for computing partition functions, which use multiple-stage Markov chain Monte Carlo techniques. The quantum algorithm can also be used to estimate the total variation distance between probability distributions efficiently.
Quantum speedup of Monte Carlo methods
Montanaro, Ashley
2015-01-01
Monte Carlo methods use random sampling to estimate numerical quantities which are hard to compute deterministically. One important example is the use in statistical physics of rapidly mixing Markov chains to approximately compute partition functions. In this work, we describe a quantum algorithm which can accelerate Monte Carlo methods in a very general setting. The algorithm estimates the expected output value of an arbitrary randomized or quantum subroutine with bounded variance, achieving a near-quadratic speedup over the best possible classical algorithm. Combining the algorithm with the use of quantum walks gives a quantum speedup of the fastest known classical algorithms with rigorous performance bounds for computing partition functions, which use multiple-stage Markov chain Monte Carlo techniques. The quantum algorithm can also be used to estimate the total variation distance between probability distributions efficiently. PMID:26528079
A random spatial sampling method in a rural developing nation
Michelle C. Kondo; Kent D.W. Bream; Frances K. Barg; Charles C. Branas
2014-01-01
Nonrandom sampling of populations in developing nations has limitations and can inaccurately estimate health phenomena, especially among hard-to-reach populations such as rural residents. However, random sampling of rural populations in developing nations can be challenged by incomplete enumeration of the base population. We describe a stratified random sampling method...
NASA Astrophysics Data System (ADS)
Rajabi, Mohammad Mahdi; Ataie-Ashtiani, Behzad; Janssen, Hans
2015-02-01
The majority of literature regarding optimized Latin hypercube sampling (OLHS) is devoted to increasing the efficiency of these sampling strategies through the development of new algorithms based on the combination of innovative space-filling criteria and specialized optimization schemes. However, little attention has been given to the impact of the initial design that is fed into the optimization algorithm, on the efficiency of OLHS strategies. Previous studies, as well as codes developed for OLHS, have relied on one of the following two approaches for the selection of the initial design in OLHS: (1) the use of random points in the hypercube intervals (random LHS), and (2) the use of midpoints in the hypercube intervals (midpoint LHS). Both approaches have been extensively used, but no attempt has been previously made to compare the efficiency and robustness of their resulting sample designs. In this study we compare the two approaches and show that the space-filling characteristics of OLHS designs are sensitive to the initial design that is fed into the optimization algorithm. It is also illustrated that the space-filling characteristics of OLHS designs based on midpoint LHS are significantly better those based on random LHS. The two approaches are compared by incorporating their resulting sample designs in Monte Carlo simulation (MCS) for uncertainty propagation analysis, and then, by employing the sample designs in the selection of the training set for constructing non-intrusive polynomial chaos expansion (NIPCE) meta-models which subsequently replace the original full model in MCSs. The analysis is based on two case studies involving numerical simulation of density dependent flow and solute transport in porous media within the context of seawater intrusion in coastal aquifers. We show that the use of midpoint LHS as the initial design increases the efficiency and robustness of the resulting MCSs and NIPCE meta-models. The study also illustrates that this relative improvement decreases with increasing number of sample points and input parameter dimensions. Since the computational time and efforts for generating the sample designs in the two approaches are identical, the use of midpoint LHS as the initial design in OLHS is thus recommended.
Mixed emotions: Sensitivity to facial variance in a crowd of faces.
Haberman, Jason; Lee, Pegan; Whitney, David
2015-01-01
The visual system automatically represents summary information from crowds of faces, such as the average expression. This is a useful heuristic insofar as it provides critical information about the state of the world, not simply information about the state of one individual. However, the average alone is not sufficient for making decisions about how to respond to a crowd. The variance or heterogeneity of the crowd--the mixture of emotions--conveys information about the reliability of the average, essential for determining whether the average can be trusted. Despite its importance, the representation of variance within a crowd of faces has yet to be examined. This is addressed here in three experiments. In the first experiment, observers viewed a sample set of faces that varied in emotion, and then adjusted a subsequent set to match the variance of the sample set. To isolate variance as the summary statistic of interest, the average emotion of both sets was random. Results suggested that observers had information regarding crowd variance. The second experiment verified that this was indeed a uniquely high-level phenomenon, as observers were unable to derive the variance of an inverted set of faces as precisely as an upright set of faces. The third experiment replicated and extended the first two experiments using method-of-constant-stimuli. Together, these results show that the visual system is sensitive to emergent information about the emotional heterogeneity, or ambivalence, in crowds of faces.
Predicting active-layer soil thickness using topographic variables at a small watershed scale
Li, Aidi; Tan, Xing; Wu, Wei; Liu, Hongbin; Zhu, Jie
2017-01-01
Knowledge about the spatial distribution of active-layer (AL) soil thickness is indispensable for ecological modeling, precision agriculture, and land resource management. However, it is difficult to obtain the details on AL soil thickness by using conventional soil survey method. In this research, the objective is to investigate the possibility and accuracy of mapping the spatial distribution of AL soil thickness through random forest (RF) model by using terrain variables at a small watershed scale. A total of 1113 soil samples collected from the slope fields were randomly divided into calibration (770 soil samples) and validation (343 soil samples) sets. Seven terrain variables including elevation, aspect, relative slope position, valley depth, flow path length, slope height, and topographic wetness index were derived from a digital elevation map (30 m). The RF model was compared with multiple linear regression (MLR), geographically weighted regression (GWR) and support vector machines (SVM) approaches based on the validation set. Model performance was evaluated by precision criteria of mean error (ME), mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination (R2). Comparative results showed that RF outperformed MLR, GWR and SVM models. The RF gave better values of ME (0.39 cm), MAE (7.09 cm), and RMSE (10.85 cm) and higher R2 (62%). The sensitivity analysis demonstrated that the DEM had less uncertainty than the AL soil thickness. The outcome of the RF model indicated that elevation, flow path length and valley depth were the most important factors affecting the AL soil thickness variability across the watershed. These results demonstrated the RF model is a promising method for predicting spatial distribution of AL soil thickness using terrain parameters. PMID:28877196
Speeding up Coarse Point Cloud Registration by Threshold-Independent Baysac Match Selection
NASA Astrophysics Data System (ADS)
Kang, Z.; Lindenbergh, R.; Pu, S.
2016-06-01
This paper presents an algorithm for the automatic registration of terrestrial point clouds by match selection using an efficiently conditional sampling method -- threshold-independent BaySAC (BAYes SAmpling Consensus) and employs the error metric of average point-to-surface residual to reduce the random measurement error and then approach the real registration error. BaySAC and other basic sampling algorithms usually need to artificially determine a threshold by which inlier points are identified, which leads to a threshold-dependent verification process. Therefore, we applied the LMedS method to construct the cost function that is used to determine the optimum model to reduce the influence of human factors and improve the robustness of the model estimate. Point-to-point and point-to-surface error metrics are most commonly used. However, point-to-point error in general consists of at least two components, random measurement error and systematic error as a result of a remaining error in the found rigid body transformation. Thus we employ the measure of the average point-to-surface residual to evaluate the registration accuracy. The proposed approaches, together with a traditional RANSAC approach, are tested on four data sets acquired by three different scanners in terms of their computational efficiency and quality of the final registration. The registration results show the st.dev of the average point-to-surface residuals is reduced from 1.4 cm (plain RANSAC) to 0.5 cm (threshold-independent BaySAC). The results also show that, compared to the performance of RANSAC, our BaySAC strategies lead to less iterations and cheaper computational cost when the hypothesis set is contaminated with more outliers.
Widaman, Keith F.; Grimm, Kevin J.; Early, Dawnté R.; Robins, Richard W.; Conger, Rand D.
2013-01-01
Difficulties arise in multiple-group evaluations of factorial invariance if particular manifest variables are missing completely in certain groups. Ad hoc analytic alternatives can be used in such situations (e.g., deleting manifest variables), but some common approaches, such as multiple imputation, are not viable. At least 3 solutions to this problem are viable: analyzing differing sets of variables across groups, using pattern mixture approaches, and a new method using random number generation. The latter solution, proposed in this article, is to generate pseudo-random normal deviates for all observations for manifest variables that are missing completely in a given sample and then to specify multiple-group models in a way that respects the random nature of these values. An empirical example is presented in detail comparing the 3 approaches. The proposed solution can enable quantitative comparisons at the latent variable level between groups using programs that require the same number of manifest variables in each group. PMID:24019738
Tarrab, Leticia; Garcia, Carlos M.; Cantero, Mariano I.; Oberg, Kevin
2012-01-01
This work presents a systematic analysis quantifying the role of the presence of turbulence fluctuations on uncertainties (random errors) of acoustic Doppler current profiler (ADCP) discharge measurements from moving platforms. Data sets of three-dimensional flow velocities with high temporal and spatial resolution were generated from direct numerical simulation (DNS) of turbulent open channel flow. Dimensionless functions relating parameters quantifying the uncertainty in discharge measurements due to flow turbulence (relative variance and relative maximum random error) to sampling configuration were developed from the DNS simulations and then validated with field-scale discharge measurements. The validated functions were used to evaluate the role of the presence of flow turbulence fluctuations on uncertainties in ADCP discharge measurements. The results of this work indicate that random errors due to the flow turbulence are significant when: (a) a low number of transects is used for a discharge measurement, and (b) measurements are made in shallow rivers using high boat velocity (short time for the boat to cross a flow turbulence structure).
There is More than a Power Law in Zipf
Cristelli, Matthieu; Batty, Michael; Pietronero, Luciano
2012-01-01
The largest cities, the most frequently used words, the income of the richest countries, and the most wealthy billionaires, can be all described in terms of Zipf’s Law, a rank-size rule capturing the relation between the frequency of a set of objects or events and their size. It is assumed to be one of many manifestations of an underlying power law like Pareto’s or Benford’s, but contrary to popular belief, from a distribution of, say, city sizes and a simple random sampling, one does not obtain Zipf’s law for the largest cities. This pathology is reflected in the fact that Zipf’s Law has a functional form depending on the number of events N. This requires a fundamental property of the sample distribution which we call ‘coherence’ and it corresponds to a ‘screening’ between various elements of the set. We show how it should be accounted for when fitting Zipf’s Law. PMID:23139862
Reduction of display artifacts by random sampling
NASA Technical Reports Server (NTRS)
Ahumada, A. J., Jr.; Nagel, D. C.; Watson, A. B.; Yellott, J. I., Jr.
1983-01-01
The application of random-sampling techniques to remove visible artifacts (such as flicker, moire patterns, and paradoxical motion) introduced in TV-type displays by discrete sequential scanning is discussed and demonstrated. Sequential-scanning artifacts are described; the window of visibility defined in spatiotemporal frequency space by Watson and Ahumada (1982 and 1983) and Watson et al. (1983) is explained; the basic principles of random sampling are reviewed and illustrated by the case of the human retina; and it is proposed that the sampling artifacts can be replaced by random noise, which can then be shifted to frequency-space regions outside the window of visibility. Vertical sequential, single-random-sequence, and continuously renewed random-sequence plotting displays generating 128 points at update rates up to 130 Hz are applied to images of stationary and moving lines, and best results are obtained with the single random sequence for the stationary lines and with the renewed random sequence for the moving lines.
Nguyen, Thanh-Tung; Huang, Joshua; Wu, Qingyao; Nguyen, Thuy; Li, Mark
2015-01-01
Single-nucleotide polymorphisms (SNPs) selection and identification are the most important tasks in Genome-wide association data analysis. The problem is difficult because genome-wide association data is very high dimensional and a large portion of SNPs in the data is irrelevant to the disease. Advanced machine learning methods have been successfully used in Genome-wide association studies (GWAS) for identification of genetic variants that have relatively big effects in some common, complex diseases. Among them, the most successful one is Random Forests (RF). Despite of performing well in terms of prediction accuracy in some data sets with moderate size, RF still suffers from working in GWAS for selecting informative SNPs and building accurate prediction models. In this paper, we propose to use a new two-stage quality-based sampling method in random forests, named ts-RF, for SNP subspace selection for GWAS. The method first applies p-value assessment to find a cut-off point that separates informative and irrelevant SNPs in two groups. The informative SNPs group is further divided into two sub-groups: highly informative and weak informative SNPs. When sampling the SNP subspace for building trees for the forest, only those SNPs from the two sub-groups are taken into account. The feature subspaces always contain highly informative SNPs when used to split a node at a tree. This approach enables one to generate more accurate trees with a lower prediction error, meanwhile possibly avoiding overfitting. It allows one to detect interactions of multiple SNPs with the diseases, and to reduce the dimensionality and the amount of Genome-wide association data needed for learning the RF model. Extensive experiments on two genome-wide SNP data sets (Parkinson case-control data comprised of 408,803 SNPs and Alzheimer case-control data comprised of 380,157 SNPs) and 10 gene data sets have demonstrated that the proposed model significantly reduced prediction errors and outperformed most existing the-state-of-the-art random forests. The top 25 SNPs in Parkinson data set were identified by the proposed model including four interesting genes associated with neurological disorders. The presented approach has shown to be effective in selecting informative sub-groups of SNPs potentially associated with diseases that traditional statistical approaches might fail. The new RF works well for the data where the number of case-control objects is much smaller than the number of SNPs, which is a typical problem in gene data and GWAS. Experiment results demonstrated the effectiveness of the proposed RF model that outperformed the state-of-the-art RFs, including Breiman's RF, GRRF and wsRF methods.
Oster, Natalia V; Carney, Patricia A; Allison, Kimberly H; Weaver, Donald L; Reisch, Lisa M; Longton, Gary; Onega, Tracy; Pepe, Margaret; Geller, Berta M; Nelson, Heidi D; Ross, Tyler R; Tosteson, Aanna N A; Elmore, Joann G
2013-02-05
Diagnostic test sets are a valuable research tool that contributes importantly to the validity and reliability of studies that assess agreement in breast pathology. In order to fully understand the strengths and weaknesses of any agreement and reliability study, however, the methods should be fully reported. In this paper we provide a step-by-step description of the methods used to create four complex test sets for a study of diagnostic agreement among pathologists interpreting breast biopsy specimens. We use the newly developed Guidelines for Reporting Reliability and Agreement Studies (GRRAS) as a basis to report these methods. Breast tissue biopsies were selected from the National Cancer Institute-funded Breast Cancer Surveillance Consortium sites. We used a random sampling stratified according to woman's age (40-49 vs. ≥50), parenchymal breast density (low vs. high) and interpretation of the original pathologist. A 3-member panel of expert breast pathologists first independently interpreted each case using five primary diagnostic categories (non-proliferative changes, proliferative changes without atypia, atypical ductal hyperplasia, ductal carcinoma in situ, and invasive carcinoma). When the experts did not unanimously agree on a case diagnosis a modified Delphi method was used to determine the reference standard consensus diagnosis. The final test cases were stratified and randomly assigned into one of four unique test sets. We found GRRAS recommendations to be very useful in reporting diagnostic test set development and recommend inclusion of two additional criteria: 1) characterizing the study population and 2) describing the methods for reference diagnosis, when applicable.
Garvin, Jennifer H; DuVall, Scott L; South, Brett R; Bray, Bruce E; Bolton, Daniel; Heavirland, Julia; Pickard, Steve; Heidenreich, Paul; Shen, Shuying; Weir, Charlene; Samore, Matthew; Goldstein, Mary K
2012-01-01
Left ventricular ejection fraction (EF) is a key component of heart failure quality measures used within the Department of Veteran Affairs (VA). Our goals were to build a natural language processing system to extract the EF from free-text echocardiogram reports to automate measurement reporting and to validate the accuracy of the system using a comparison reference standard developed through human review. This project was a Translational Use Case Project within the VA Consortium for Healthcare Informatics. We created a set of regular expressions and rules to capture the EF using a random sample of 765 echocardiograms from seven VA medical centers. The documents were randomly assigned to two sets: a set of 275 used for training and a second set of 490 used for testing and validation. To establish the reference standard, two independent reviewers annotated all documents in both sets; a third reviewer adjudicated disagreements. System test results for document-level classification of EF of <40% had a sensitivity (recall) of 98.41%, a specificity of 100%, a positive predictive value (precision) of 100%, and an F measure of 99.2%. System test results at the concept level had a sensitivity of 88.9% (95% CI 87.7% to 90.0%), a positive predictive value of 95% (95% CI 94.2% to 95.9%), and an F measure of 91.9% (95% CI 91.2% to 92.7%). An EF value of <40% can be accurately identified in VA echocardiogram reports. An automated information extraction system can be used to accurately extract EF for quality measurement.
Bohmanova, J; Miglior, F; Jamrozik, J; Misztal, I; Sullivan, P G
2008-09-01
A random regression model with both random and fixed regressions fitted by Legendre polynomials of order 4 was compared with 3 alternative models fitting linear splines with 4, 5, or 6 knots. The effects common for all models were a herd-test-date effect, fixed regressions on days in milk (DIM) nested within region-age-season of calving class, and random regressions for additive genetic and permanent environmental effects. Data were test-day milk, fat and protein yields, and SCS recorded from 5 to 365 DIM during the first 3 lactations of Canadian Holstein cows. A random sample of 50 herds consisting of 96,756 test-day records was generated to estimate variance components within a Bayesian framework via Gibbs sampling. Two sets of genetic evaluations were subsequently carried out to investigate performance of the 4 models. Models were compared by graphical inspection of variance functions, goodness of fit, error of prediction of breeding values, and stability of estimated breeding values. Models with splines gave lower estimates of variances at extremes of lactations than the model with Legendre polynomials. Differences among models in goodness of fit measured by percentages of squared bias, correlations between predicted and observed records, and residual variances were small. The deviance information criterion favored the spline model with 6 knots. Smaller error of prediction and higher stability of estimated breeding values were achieved by using spline models with 5 and 6 knots compared with the model with Legendre polynomials. In general, the spline model with 6 knots had the best overall performance based upon the considered model comparison criteria.
Mukaka, Mavuto; White, Sarah A; Terlouw, Dianne J; Mwapasa, Victor; Kalilani-Phiri, Linda; Faragher, E Brian
2016-07-22
Missing outcomes can seriously impair the ability to make correct inferences from randomized controlled trials (RCTs). Complete case (CC) analysis is commonly used, but it reduces sample size and is perceived to lead to reduced statistical efficiency of estimates while increasing the potential for bias. As multiple imputation (MI) methods preserve sample size, they are generally viewed as the preferred analytical approach. We examined this assumption, comparing the performance of CC and MI methods to determine risk difference (RD) estimates in the presence of missing binary outcomes. We conducted simulation studies of 5000 simulated data sets with 50 imputations of RCTs with one primary follow-up endpoint at different underlying levels of RD (3-25 %) and missing outcomes (5-30 %). For missing at random (MAR) or missing completely at random (MCAR) outcomes, CC method estimates generally remained unbiased and achieved precision similar to or better than MI methods, and high statistical coverage. Missing not at random (MNAR) scenarios yielded invalid inferences with both methods. Effect size estimate bias was reduced in MI methods by always including group membership even if this was unrelated to missingness. Surprisingly, under MAR and MCAR conditions in the assessed scenarios, MI offered no statistical advantage over CC methods. While MI must inherently accompany CC methods for intention-to-treat analyses, these findings endorse CC methods for per protocol risk difference analyses in these conditions. These findings provide an argument for the use of the CC approach to always complement MI analyses, with the usual caveat that the validity of the mechanism for missingness be thoroughly discussed. More importantly, researchers should strive to collect as much data as possible.
Ensemble Feature Learning of Genomic Data Using Support Vector Machine
Anaissi, Ali; Goyal, Madhu; Catchpoole, Daniel R.; Braytee, Ali; Kennedy, Paul J.
2016-01-01
The identification of a subset of genes having the ability to capture the necessary information to distinguish classes of patients is crucial in bioinformatics applications. Ensemble and bagging methods have been shown to work effectively in the process of gene selection and classification. Testament to that is random forest which combines random decision trees with bagging to improve overall feature selection and classification accuracy. Surprisingly, the adoption of these methods in support vector machines has only recently received attention but mostly on classification not gene selection. This paper introduces an ensemble SVM-Recursive Feature Elimination (ESVM-RFE) for gene selection that follows the concepts of ensemble and bagging used in random forest but adopts the backward elimination strategy which is the rationale of RFE algorithm. The rationale behind this is, building ensemble SVM models using randomly drawn bootstrap samples from the training set, will produce different feature rankings which will be subsequently aggregated as one feature ranking. As a result, the decision for elimination of features is based upon the ranking of multiple SVM models instead of choosing one particular model. Moreover, this approach will address the problem of imbalanced datasets by constructing a nearly balanced bootstrap sample. Our experiments show that ESVM-RFE for gene selection substantially increased the classification performance on five microarray datasets compared to state-of-the-art methods. Experiments on the childhood leukaemia dataset show that an average 9% better accuracy is achieved by ESVM-RFE over SVM-RFE, and 5% over random forest based approach. The selected genes by the ESVM-RFE algorithm were further explored with Singular Value Decomposition (SVD) which reveals significant clusters with the selected data. PMID:27304923
Clark, Florence; Pyatak, Elizabeth A; Carlson, Mike; Blanche, Erna Imperatore; Vigen, Cheryl; Hay, Joel; Mallinson, Trudy; Blanchard, Jeanine; Unger, Jennifer B; Garber, Susan L; Diaz, Jesus; Florindez, Lucia I; Atkins, Michal; Rubayi, Salah; Azen, Stanley Paul
2014-04-01
Randomized trials of complex, non-pharmacologic interventions implemented in home and community settings, such as the University of Southern California (USC)-Rancho Los Amigos National Rehabilitation Center (RLANRC) Pressure Ulcer Prevention Study (PUPS), present unique challenges with respect to (1) participant recruitment and retention, (2) intervention delivery and fidelity, (3) randomization and assessment, and (4) potential inadvertent treatment effects. We describe the methods employed to address the challenges confronted in implementing PUPS. In this randomized controlled trial, we are assessing the efficacy of a complex, preventive intervention in reducing the incidence of, and costs associated with, the development of medically serious pressure ulcers in people with spinal cord injury. Individuals with spinal cord injury recruited from RLANRC were assigned to either a 12-month preventive intervention group or a standard care control group. The primary outcome is the incidence of serious pressure ulcers with secondary endpoints including ulcer-related surgeries, medical treatment costs, and quality of life. These outcomes are assessed at 12 and 24 months after randomization. Additionally, we are studying the mediating mechanisms that account for intervention outcomes. PUPS has been successfully implemented, including recruitment of the target sample size of 170 participants, assurance of the integrity of intervention protocol delivery with an average 90% treatment adherence rate, and enactment of the assessment plan. However, implementation has been replete with challenges. To meet recruitment goals, we instituted a five-pronged approach customized for an underserved, ethnically diverse population. In intervention delivery, we increased staff time to overcome economic and cultural barriers to retention and adherence. To ensure treatment fidelity and replicability, we monitored intervention protocol delivery in accordance with a rigorous plan. Finally, we have overcome unanticipated assessment and design concerns related to (1) determining pressure ulcer incidence/severity, (2) randomization imbalance, and (3) inadvertent potential control group contamination. We have addressed the most daunting challenges encountered in the recruitment, assessment, and intervention phases of PUPS. Some challenges and solutions may not apply to trials conducted in other settings. Overcoming challenges has required a multifaceted approach incorporating individualization, flexibility, and persistence, as well as the ability to implement needed mid-course corrections.
On the Prediction of Ground Motion
NASA Astrophysics Data System (ADS)
Lavallee, D.; Schmedes, J.; Archuleta, R. J.
2012-12-01
Using a slip-weakening dynamic model of rupture, we generated earthquake scenarios that provided the spatio-temporal evolution of the slip on the fault and the radiated field at the free surface. We observed scenarios where the rupture propagates at a supershear speed on some parts of the fault while remaining subshear for other parts of the fault. For some scenarios with nearly identical initial conditions, the rupture speed was always subshear. For both types of scenarios (mixture of supershear and subshear speeds and only subshear), we compute the peak ground accelerations (PGA) regularly distributed over the Earth's surface. We then calculate the probability density functions (PDF) of the PGA. For both types of scenarios, the PDF curves are asymmetrically shaped and asymptotically attenuated according to power law. This behavior of the PDF is similar to that observed for the PDF curves of PGA recorded during earthquakes. The main difference between scenarios with a supershear rupture speed and scenarios with only subshear rupture speed is the range of PGA values. Based on these results, we investigate three issues fundamental for the prediction of ground motion. It is important to recognize that recorded ground motions during an earthquake sample a small fraction of the radiation field. It is not obvious that such sampling will capture the largest ground motion generated during an earthquake, nor that the number of stations is large enough to properly infer the statistical properties associated with the radiation field. To quantify the effect of under (or low) sampling of the radiation field, we design three experiments. For a scenario where the rupture speed is only subshear, we construct multiple sets of observations. Each set is comprised of 100 randomly selected PGA values from all of the PGA's calculated at the Earth's surface. In the first experiment, we evaluate how the distributions of PGA in the sets compare with the distribution of all the PGA. For this experiment, we used different statistical tests (e.g. chi-square). This experiment quantifies the likelihood that a random set of PGA can be used to infer the statistical properties of all the PGA. In the second experiment, we fit the PDF of the PGA of every set with probability laws used in the literature to describe the PDF of recorded PGA: the lognormal law, the generalized maximum extreme value law, and the Levy law. For each set, the probability laws are then used to compute the probability to observe a PGA value that will cause "moderate to heavy" potential damage according to Instrumental Intensity scale developed by USGS. For each probability law, we compare predictions based on the set with the prediction estimated from all the PGA. This experiment quantifies the reliability and uncertainty in predicting an outcome due to under sampling the radiation field. The third experiment consists in using the sets discussed above and repeats the two investigations discussed above but this time comparing with a scenario where the rupture has a supershear speed over part of the fault. The objective here is to assess additional uncertainty in predicting PGA and damage resulting from ruptures that have supershear speeds.
Seismic random noise attenuation method based on empirical mode decomposition of Hausdorff dimension
NASA Astrophysics Data System (ADS)
Yan, Z.; Luan, X.
2017-12-01
Introduction Empirical mode decomposition (EMD) is a noise suppression algorithm by using wave field separation, which is based on the scale differences between effective signal and noise. However, since the complexity of the real seismic wave field results in serious aliasing modes, it is not ideal and effective to denoise with this method alone. Based on the multi-scale decomposition characteristics of the signal EMD algorithm, combining with Hausdorff dimension constraints, we propose a new method for seismic random noise attenuation. First of all, We apply EMD algorithm adaptive decomposition of seismic data and obtain a series of intrinsic mode function (IMF)with different scales. Based on the difference of Hausdorff dimension between effectively signals and random noise, we identify IMF component mixed with random noise. Then we use threshold correlation filtering process to separate the valid signal and random noise effectively. Compared with traditional EMD method, the results show that the new method of seismic random noise attenuation has a better suppression effect. The implementation process The EMD algorithm is used to decompose seismic signals into IMF sets and analyze its spectrum. Since most of the random noise is high frequency noise, the IMF sets can be divided into three categories: the first category is the effective wave composition of the larger scale; the second category is the noise part of the smaller scale; the third category is the IMF component containing random noise. Then, the third kind of IMF component is processed by the Hausdorff dimension algorithm, and the appropriate time window size, initial step and increment amount are selected to calculate the Hausdorff instantaneous dimension of each component. The dimension of the random noise is between 1.0 and 1.05, while the dimension of the effective wave is between 1.05 and 2.0. On the basis of the previous steps, according to the dimension difference between the random noise and effective signal, we extracted the sample points, whose fractal dimension value is less than or equal to 1.05 for the each IMF components, to separate the residual noise. Using the IMF components after dimension filtering processing and the effective wave IMF components after the first selection for reconstruction, we can obtained the results of de-noising.
Rare Event Simulation in Radiation Transport
NASA Astrophysics Data System (ADS)
Kollman, Craig
This dissertation studies methods for estimating extremely small probabilities by Monte Carlo simulation. Problems in radiation transport typically involve estimating very rare events or the expected value of a random variable which is with overwhelming probability equal to zero. These problems often have high dimensional state spaces and irregular geometries so that analytic solutions are not possible. Monte Carlo simulation must be used to estimate the radiation dosage being transported to a particular location. If the area is well shielded the probability of any one particular particle getting through is very small. Because of the large number of particles involved, even a tiny fraction penetrating the shield may represent an unacceptable level of radiation. It therefore becomes critical to be able to accurately estimate this extremely small probability. Importance sampling is a well known technique for improving the efficiency of rare event calculations. Here, a new set of probabilities is used in the simulation runs. The results are multiplied by the likelihood ratio between the true and simulated probabilities so as to keep our estimator unbiased. The variance of the resulting estimator is very sensitive to which new set of transition probabilities are chosen. It is shown that a zero variance estimator does exist, but that its computation requires exact knowledge of the solution. A simple random walk with an associated killing model for the scatter of neutrons is introduced. Large deviation results for optimal importance sampling in random walks are extended to the case where killing is present. An adaptive "learning" algorithm for implementing importance sampling is given for more general Markov chain models of neutron scatter. For finite state spaces this algorithm is shown to give, with probability one, a sequence of estimates converging exponentially fast to the true solution. In the final chapter, an attempt to generalize this algorithm to a continuous state space is made. This involves partitioning the space into a finite number of cells. There is a tradeoff between additional computation per iteration and variance reduction per iteration that arises in determining the optimal grid size. All versions of this algorithm can be thought of as a compromise between deterministic and Monte Carlo methods, capturing advantages of both techniques.
Randomization Methods in Emergency Setting Trials: A Descriptive Review
ERIC Educational Resources Information Center
Corbett, Mark Stephen; Moe-Byrne, Thirimon; Oddie, Sam; McGuire, William
2016-01-01
Background: Quasi-randomization might expedite recruitment into trials in emergency care settings but may also introduce selection bias. Methods: We searched the Cochrane Library and other databases for systematic reviews of interventions in emergency medicine or urgent care settings. We assessed selection bias (baseline imbalances) in prognostic…
Robustly Aligning a Shape Model and Its Application to Car Alignment of Unknown Pose.
Li, Yan; Gu, Leon; Kanade, Takeo
2011-09-01
Precisely localizing in an image a set of feature points that form a shape of an object, such as car or face, is called alignment. Previous shape alignment methods attempted to fit a whole shape model to the observed data, based on the assumption of Gaussian observation noise and the associated regularization process. However, such an approach, though able to deal with Gaussian noise in feature detection, turns out not to be robust or precise because it is vulnerable to gross feature detection errors or outliers resulting from partial occlusions or spurious features from the background or neighboring objects. We address this problem by adopting a randomized hypothesis-and-test approach. First, a Bayesian inference algorithm is developed to generate a shape-and-pose hypothesis of the object from a partial shape or a subset of feature points. For alignment, a large number of hypotheses are generated by randomly sampling subsets of feature points, and then evaluated to find the one that minimizes the shape prediction error. This method of randomized subset-based matching can effectively handle outliers and recover the correct object shape. We apply this approach on a challenging data set of over 5,000 different-posed car images, spanning a wide variety of car types, lighting, background scenes, and partial occlusions. Experimental results demonstrate favorable improvements over previous methods on both accuracy and robustness.
Tao, Da; Or, Calvin Kl
2013-04-01
We conducted a systematic review and meta-analysis of randomized controlled trials (RCTs) which had evaluated self-management health information technology (SMHIT) for glycaemic control in patients with diabetes. A total of 43 RCTs was identified, which reported on 52 control-intervention comparisons. The glycosylated haemoglobin (HbA 1c ) data were pooled using a random effects meta-analysis method, followed by a meta-regression and subgroup analyses to examine the effects of a set of moderators. The meta-analysis showed that use of SMHITs was associated with a significant reduction in HbA 1c compared to usual care, with a pooled standardized mean difference of -0.30% (95% CI -0.39 to -0.21, P < 0.001). Sample size, age, study setting, type of application and method of data entry significantly moderated the effects of SMHIT use. The review supports the use of SMHITs as a self-management approach to improve glycaemic control. The effect of SMHIT use is significantly greater when the technology is a web-based application, when a mechanism for patients' health data entry is provided (manual or automatic) and when the technology is operated in the home or without location restrictions. Integrating these variables into the design of SMHITs may augment the effectiveness of the interventions. © SAGE Publications Ltd, 2013.
A comparison of fitness-case sampling methods for genetic programming
NASA Astrophysics Data System (ADS)
Martínez, Yuliana; Naredo, Enrique; Trujillo, Leonardo; Legrand, Pierrick; López, Uriel
2017-11-01
Genetic programming (GP) is an evolutionary computation paradigm for automatic program induction. GP has produced impressive results but it still needs to overcome some practical limitations, particularly its high computational cost, overfitting and excessive code growth. Recently, many researchers have proposed fitness-case sampling methods to overcome some of these problems, with mixed results in several limited tests. This paper presents an extensive comparative study of four fitness-case sampling methods, namely: Interleaved Sampling, Random Interleaved Sampling, Lexicase Selection and Keep-Worst Interleaved Sampling. The algorithms are compared on 11 symbolic regression problems and 11 supervised classification problems, using 10 synthetic benchmarks and 12 real-world data-sets. They are evaluated based on test performance, overfitting and average program size, comparing them with a standard GP search. Comparisons are carried out using non-parametric multigroup tests and post hoc pairwise statistical tests. The experimental results suggest that fitness-case sampling methods are particularly useful for difficult real-world symbolic regression problems, improving performance, reducing overfitting and limiting code growth. On the other hand, it seems that fitness-case sampling cannot improve upon GP performance when considering supervised binary classification.
Rehem, Tania Cristina Morais Santa Barbara; de Oliveira, Maria Regina Fernandes; Ciosak, Suely Itsuko; Egry, Emiko Yoshikawa
2013-01-01
To estimate the sensitivity, specificity and positive and negative predictive values of the Unified Health System's Hospital Information System for the appropriate recording of hospitalizations for ambulatory care-sensitive conditions. The hospital information system records for conditions which are sensitive to ambulatory care, and for those which are not, were considered for analysis, taking the medical records as the gold standard. Through simple random sampling, a sample of 816 medical records was defined and selected by means of a list of random numbers using the Statistical Package for Social Sciences. The sensitivity was 81.89%, specificity was 95.19%, the positive predictive value was 77.61% and the negative predictive value was 96.27%. In the study setting, the Hospital Information System (SIH) was more specific than sensitive, with nearly 20% of care sensitive conditions not detected. There are no validation studies in Brazil of the Hospital Information System records for the hospitalizations which are sensitive to primary health care. These results are relevant when one considers that this system is one of the bases for assessment of the effectiveness of primary health care.
Revisiting sample size: are big trials the answer?
Lurati Buse, Giovanna A L; Botto, Fernando; Devereaux, P J
2012-07-18
The superiority of the evidence generated in randomized controlled trials over observational data is not only conditional to randomization. Randomized controlled trials require proper design and implementation to provide a reliable effect estimate. Adequate random sequence generation, allocation implementation, analyses based on the intention-to-treat principle, and sufficient power are crucial to the quality of a randomized controlled trial. Power, or the probability of the trial to detect a difference when a real difference between treatments exists, strongly depends on sample size. The quality of orthopaedic randomized controlled trials is frequently threatened by a limited sample size. This paper reviews basic concepts and pitfalls in sample-size estimation and focuses on the importance of large trials in the generation of valid evidence.
Multidimensional Normalization to Minimize Plate Effects of Suspension Bead Array Data.
Hong, Mun-Gwan; Lee, Woojoo; Nilsson, Peter; Pawitan, Yudi; Schwenk, Jochen M
2016-10-07
Enhanced by the growing number of biobanks, biomarker studies can now be performed with reasonable statistical power by using large sets of samples. Antibody-based proteomics by means of suspension bead arrays offers one attractive approach to analyze serum, plasma, or CSF samples for such studies in microtiter plates. To expand measurements beyond single batches, with either 96 or 384 samples per plate, suitable normalization methods are required to minimize the variation between plates. Here we propose two normalization approaches utilizing MA coordinates. The multidimensional MA (multi-MA) and MA-loess both consider all samples of a microtiter plate per suspension bead array assay and thus do not require any external reference samples. We demonstrate the performance of the two MA normalization methods with data obtained from the analysis of 384 samples including both serum and plasma. Samples were randomized across 96-well sample plates, processed, and analyzed in assay plates, respectively. Using principal component analysis (PCA), we could show that plate-wise clusters found in the first two components were eliminated by multi-MA normalization as compared with other normalization methods. Furthermore, we studied the correlation profiles between random pairs of antibodies and found that both MA normalization methods substantially reduced the inflated correlation introduced by plate effects. Normalization approaches using multi-MA and MA-loess minimized batch effects arising from the analysis of several assay plates with antibody suspension bead arrays. In a simulated biomarker study, multi-MA restored associations lost due to plate effects. Our normalization approaches, which are available as R package MDimNormn, could also be useful in studies using other types of high-throughput assay data.
Uncertainty and Sensitivity Analyses of a Pebble Bed HTGR Loss of Cooling Event
Strydom, Gerhard
2013-01-01
The Very High Temperature Reactor Methods Development group at the Idaho National Laboratory identified the need for a defensible and systematic uncertainty and sensitivity approach in 2009. This paper summarizes the results of an uncertainty and sensitivity quantification investigation performed with the SUSA code, utilizing the International Atomic Energy Agency CRP 5 Pebble Bed Modular Reactor benchmark and the INL code suite PEBBED-THERMIX. Eight model input parameters were selected for inclusion in this study, and after the input parameters variations and probability density functions were specified, a total of 800 steady state and depressurized loss of forced cooling (DLOFC) transientmore » PEBBED-THERMIX calculations were performed. The six data sets were statistically analyzed to determine the 5% and 95% DLOFC peak fuel temperature tolerance intervals with 95% confidence levels. It was found that the uncertainties in the decay heat and graphite thermal conductivities were the most significant contributors to the propagated DLOFC peak fuel temperature uncertainty. No significant differences were observed between the results of Simple Random Sampling (SRS) or Latin Hypercube Sampling (LHS) data sets, and use of uniform or normal input parameter distributions also did not lead to any significant differences between these data sets.« less
Photometric redshift analysis in the Dark Energy Survey Science Verification data
NASA Astrophysics Data System (ADS)
Sánchez, C.; Carrasco Kind, M.; Lin, H.; Miquel, R.; Abdalla, F. B.; Amara, A.; Banerji, M.; Bonnett, C.; Brunner, R.; Capozzi, D.; Carnero, A.; Castander, F. J.; da Costa, L. A. N.; Cunha, C.; Fausti, A.; Gerdes, D.; Greisel, N.; Gschwend, J.; Hartley, W.; Jouvel, S.; Lahav, O.; Lima, M.; Maia, M. A. G.; Martí, P.; Ogando, R. L. C.; Ostrovski, F.; Pellegrini, P.; Rau, M. M.; Sadeh, I.; Seitz, S.; Sevilla-Noarbe, I.; Sypniewski, A.; de Vicente, J.; Abbot, T.; Allam, S. S.; Atlee, D.; Bernstein, G.; Bernstein, J. P.; Buckley-Geer, E.; Burke, D.; Childress, M. J.; Davis, T.; DePoy, D. L.; Dey, A.; Desai, S.; Diehl, H. T.; Doel, P.; Estrada, J.; Evrard, A.; Fernández, E.; Finley, D.; Flaugher, B.; Frieman, J.; Gaztanaga, E.; Glazebrook, K.; Honscheid, K.; Kim, A.; Kuehn, K.; Kuropatkin, N.; Lidman, C.; Makler, M.; Marshall, J. L.; Nichol, R. C.; Roodman, A.; Sánchez, E.; Santiago, B. X.; Sako, M.; Scalzo, R.; Smith, R. C.; Swanson, M. E. C.; Tarle, G.; Thomas, D.; Tucker, D. L.; Uddin, S. A.; Valdés, F.; Walker, A.; Yuan, F.; Zuntz, J.
2014-12-01
We present results from a study of the photometric redshift performance of the Dark Energy Survey (DES), using the early data from a Science Verification period of observations in late 2012 and early 2013 that provided science-quality images for almost 200 sq. deg. at the nominal depth of the survey. We assess the photometric redshift (photo-z) performance using about 15 000 galaxies with spectroscopic redshifts available from other surveys. These galaxies are used, in different configurations, as a calibration sample, and photo-z's are obtained and studied using most of the existing photo-z codes. A weighting method in a multidimensional colour-magnitude space is applied to the spectroscopic sample in order to evaluate the photo-z performance with sets that mimic the full DES photometric sample, which is on average significantly deeper than the calibration sample due to the limited depth of spectroscopic surveys. Empirical photo-z methods using, for instance, artificial neural networks or random forests, yield the best performance in the tests, achieving core photo-z resolutions σ68 ˜ 0.08. Moreover, the results from most of the codes, including template-fitting methods, comfortably meet the DES requirements on photo-z performance, therefore, providing an excellent precedent for future DES data sets.
Photometric redshift analysis in the Dark Energy Survey Science Verification data
Sanchez, C.; Carrasco Kind, M.; Lin, H.; ...
2014-10-09
In this study, we present results from a study of the photometric redshift performance of the Dark Energy Survey (DES), using the early data from a Science Verification period of observations in late 2012 and early 2013 that provided science-quality images for almost 200 sq. deg. at the nominal depth of the survey. We assess the photometric redshift (photo-z) performance using about 15 000 galaxies with spectroscopic redshifts available from other surveys. These galaxies are used, in different configurations, as a calibration sample, and photo-z's are obtained and studied using most of the existing photo-z codes. A weighting method inmore » a multidimensional colour–magnitude space is applied to the spectroscopic sample in order to evaluate the photo-z performance with sets that mimic the full DES photometric sample, which is on average significantly deeper than the calibration sample due to the limited depth of spectroscopic surveys. In addition, empirical photo-z methods using, for instance, artificial neural networks or random forests, yield the best performance in the tests, achieving core photo-z resolutions σ68 ~ 0.08. Moreover, the results from most of the codes, including template-fitting methods, comfortably meet the DES requirements on photo-z performance, therefore, providing an excellent precedent for future DES data sets.« less
Billong, Serge Clotaire; Fokam, Joseph; Penda, Calixte Ida; Amadou, Salmon; Kob, David Same; Billong, Edson-Joan; Colizzi, Vittorio; Ndjolo, Alexis; Bisseck, Anne-Cecile Zoung-Kani; Elat, Jean-Bosco Nfetam
2016-11-15
Retention on lifelong antiretroviral therapy (ART) is essential in sustaining treatment success while preventing HIV drug resistance (HIVDR), especially in resource-limited settings (RLS). In an era of rising numbers of patients on ART, mastering patients in care is becoming more strategic for programmatic interventions. Due to lapses and uncertainty with the current WHO sampling approach in Cameroon, we thus aimed to ascertain the national performance of, and determinants in, retention on ART at 12 months. Using a systematic random sampling, a survey was conducted in the ten regions (56 sites) of Cameroon, within the "reporting period" of October 2013-November 2014, enrolling 5005 eligible adults and children. Performance in retention on ART at 12 months was interpreted following the definition of HIVDR early warning indicator: excellent (>85%), fair (85-75%), poor (<75); and factors with p-value < 0.01 were considered statistically significant. Majority (74.4%) of patients were in urban settings, and 50.9% were managed in reference treatment centres. Nationwide, retention on ART at 12 months was 60.4% (2023/3349); only six sites and one region achieved acceptable performances. Retention performance varied in reference treatment centres (54.2%) vs. management units (66.8%), p < 0.0001; male (57.1%) vs. women (62.0%), p = 0.007; and with WHO clinical stage I (63.3%) vs. other stages (55.6%), p = 0.007; but neither for age (adults [60.3%] vs. children [58.8%], p = 0.730) nor for immune status (CD4 351-500 [65.9%] vs. other CD4-staging [59.86%], p = 0.077). Poor retention in care, within 12 months of ART initiation, urges active search for lost-to-follow-up targeting preferentially male and symptomatic patients, especially within reference ART clinics. Such sampling strategy could be further strengthened for informed ART monitoring and HIVDR prevention perspectives.
Less is more: Sampling chemical space with active learning
NASA Astrophysics Data System (ADS)
Smith, Justin S.; Nebgen, Ben; Lubbers, Nicholas; Isayev, Olexandr; Roitberg, Adrian E.
2018-06-01
The development of accurate and transferable machine learning (ML) potentials for predicting molecular energetics is a challenging task. The process of data generation to train such ML potentials is a task neither well understood nor researched in detail. In this work, we present a fully automated approach for the generation of datasets with the intent of training universal ML potentials. It is based on the concept of active learning (AL) via Query by Committee (QBC), which uses the disagreement between an ensemble of ML potentials to infer the reliability of the ensemble's prediction. QBC allows the presented AL algorithm to automatically sample regions of chemical space where the ML potential fails to accurately predict the potential energy. AL improves the overall fitness of ANAKIN-ME (ANI) deep learning potentials in rigorous test cases by mitigating human biases in deciding what new training data to use. AL also reduces the training set size to a fraction of the data required when using naive random sampling techniques. To provide validation of our AL approach, we develop the COmprehensive Machine-learning Potential (COMP6) benchmark (publicly available on GitHub) which contains a diverse set of organic molecules. Active learning-based ANI potentials outperform the original random sampled ANI-1 potential with only 10% of the data, while the final active learning-based model vastly outperforms ANI-1 on the COMP6 benchmark after training to only 25% of the data. Finally, we show that our proposed AL technique develops a universal ANI potential (ANI-1x) that provides accurate energy and force predictions on the entire COMP6 benchmark. This universal ML potential achieves a level of accuracy on par with the best ML potentials for single molecules or materials, while remaining applicable to the general class of organic molecules composed of the elements CHNO.
Dickson, Victoria Vaughan; Melkus, Gail D'Eramo; Katz, Stuart; Levine-Wong, Alissa; Dillworth, Judy; Cleland, Charles M; Riegel, Barbara
2014-08-01
Most of the day-to-day care for heart failure (HF) is done by the patient at home and requires skill in self-care. In this randomized controlled trial (RCT) we tested the efficacy of a community-based skill-building intervention on HF self-care, knowledge and health-related quality of life (HRQL) at 1- and 3-months. An ethnically diverse sample (n=75) of patients with HF (53% female; 32% Hispanic, 27% Black; mean age 69.9±10 years) was randomized to the intervention group (IG) or a wait-list control group (CG). The protocol intervention focused on tactical and situational HF self-care skill development delivered by lay health educators in community senior centers. Data were analyzed using mixed (between-within subjects) ANOVA. There was a significant improvement in self-care maintenance [F(2,47)=3.42, p=.04, (Cohen's f=.38)], self-care management [F(2,41)=4.10, p=.02, (Cohen's f=.45) and HF knowledge [F(2,53)=8.00, p=.001 (Cohen's f=.54)] in the IG compared to the CG. The skill-building intervention improved self-care and knowledge but not HRQL in this community-dwelling sample. Delivering an intervention in a community setting using lay health educators provides an alternative to clinic- or home-based teaching that may be useful across diverse populations and geographically varied settings. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
The Effectiveness of SMS Reminders on Appointment Attendance: a Meta-Analysis.
Boksmati, Nasim; Butler-Henderson, Kerryn; Anderson, Kevin; Sahama, Tony
2016-04-01
To identify the efficacy of short message service (SMS) reminders in health care appointment attendance. A systematic review was undertaken to identify studies published between 2005 and 2015 that compared the attendance rates of patients receiving SMS reminders compared to patients not receiving a reminder. Each article was examined for information regarding the study design, sample size, population demographics and intervention methods. A meta-analysis was used to calculate a pooled estimate odds ratio. Twenty-eight (28) studies were included in the review, including 13 (46 %) randomized controlled trials. The pooled odds ratio of the randomized control trials was 1.62 (1.35-1.94). Half of the studies reviewed sent the reminder within 48 h prior to the appointment time, yet no significant subgroups differences with respect to participant age, SMS timing, rate or type, setting or specialty was detectable. All studies, except one with a small sample size, demonstrated a positive OR, indicating SMS reminders were an effective means of improving appointment attendance. There was no significant difference in OR when controlling for when the SMS was sent, the frequency of the reminders or the content of the reminder. SMS appointment reminders are an effective and operative method in improving appointment attendance in a health care setting and this effectiveness has improved over the past 5 years. Further research is required to identify the optimal SMS reminder timing and frequency, specifically in relation to the length of time since the appointment.
Ma, Li; Fan, Suohai
2017-03-14
The random forests algorithm is a type of classifier with prominent universality, a wide application range, and robustness for avoiding overfitting. But there are still some drawbacks to random forests. Therefore, to improve the performance of random forests, this paper seeks to improve imbalanced data processing, feature selection and parameter optimization. We propose the CURE-SMOTE algorithm for the imbalanced data classification problem. Experiments on imbalanced UCI data reveal that the combination of Clustering Using Representatives (CURE) enhances the original synthetic minority oversampling technique (SMOTE) algorithms effectively compared with the classification results on the original data using random sampling, Borderline-SMOTE1, safe-level SMOTE, C-SMOTE, and k-means-SMOTE. Additionally, the hybrid RF (random forests) algorithm has been proposed for feature selection and parameter optimization, which uses the minimum out of bag (OOB) data error as its objective function. Simulation results on binary and higher-dimensional data indicate that the proposed hybrid RF algorithms, hybrid genetic-random forests algorithm, hybrid particle swarm-random forests algorithm and hybrid fish swarm-random forests algorithm can achieve the minimum OOB error and show the best generalization ability. The training set produced from the proposed CURE-SMOTE algorithm is closer to the original data distribution because it contains minimal noise. Thus, better classification results are produced from this feasible and effective algorithm. Moreover, the hybrid algorithm's F-value, G-mean, AUC and OOB scores demonstrate that they surpass the performance of the original RF algorithm. Hence, this hybrid algorithm provides a new way to perform feature selection and parameter optimization.
ERIC Educational Resources Information Center
Henry, James A.; Thielman, Emily J.; Zaugg, Tara L.; Kaelin, Christine; Schmidt, Caroline J.; Griest, Susan; McMillan, Garnett P.; Myers, Paula; Rivera, Izel; Baldwin, Robert; Carlson, Kathleen
2017-01-01
Purpose: This randomized controlled trial evaluated, within clinical settings, the effectiveness of coping skills education that is provided with progressive tinnitus management (PTM). Method: At 2 Veterans Affairs medical centers, N = 300 veterans were randomized to either PTM intervention or 6-month wait-list control. The PTM intervention…
Mendonca, Filho J.G.; Araujo, C.V.; Borrego, A.G.; Cook, A.; Flores, D.; Hackley, P.; Hower, J.C.; Kern, M.L.; Kommeren, K.; Kus, J.; Mastalerz, Maria; Mendonca, J.O.; Menezes, T.R.; Newman, J.; Ranasinghe, P.; Souza, I.V.A.F.; Suarez-Ruiz, I.; Ujiie, Y.
2010-01-01
The main objective of this work was to study the effect of the kerogen isolation procedures on maturity parameters of organic matter using optical microscopes. This work represents the results of the Organic Matter Concentration Working Group (OMCWG) of the International Committee for Coal and Organic Petrology (ICCP) during the years 2008 and 2009. Four samples have been analysed covering a range of maturity (low and moderate) and terrestrial and marine geological settings. The analyses comprise random vitrinite reflectance measured on both kerogen concentrate and whole rock mounts and fluorescence spectra taken on alginite. Eighteen participants from twelve laboratories from all over the world performed the analyses. Samples of continental settings contained enough vitrinite for participants to record around 50 measurements whereas fewer readings were taken on samples from marine setting. The scatter of results was also larger in the samples of marine origin. Similar vitrinite reflectance values were in general recorded in the whole rock and in the kerogen concentrate. The small deviations of the trend cannot be attributed to the acid treatment involved in kerogen isolation but to reasons related to components identification or to the difficulty to achieve a good polish of samples with high mineral matter content. In samples difficult to polish, vitrinite reflectance was measured on whole rock tended to be lower. The presence or absence of rock fabric affected the selection of the vitrinite population for measurement and this also had an influence in the average value reported and in the scatter of the results. Slightly lower standard deviations were reported for the analyses run on kerogen concentrates. Considering the spectral fluorescence results, it was observed that the ??max presents a shift to higher wavelengths in the kerogen concentrate sample in comparison to the whole-rock sample, thus revealing an influence of preparation methods (acid treatment) on fluorescence properties. ?? 2010 Elsevier B.V.
An, Zhao; Wen-Xin, Zhang; Zhong, Yao; Yu-Kuan, Ma; Qing, Liu; Hou-Lang, Duan; Yi-di, Shang
2016-06-29
To optimize and simplify the survey method of Oncomelania hupensis snail in marshland endemic region of schistosomiasis and increase the precision, efficiency and economy of the snail survey. A quadrate experimental field was selected as the subject of 50 m×50 m size in Chayegang marshland near Henghu farm in the Poyang Lake region and a whole-covered method was adopted to survey the snails. The simple random sampling, systematic sampling and stratified random sampling methods were applied to calculate the minimum sample size, relative sampling error and absolute sampling error. The minimum sample sizes of the simple random sampling, systematic sampling and stratified random sampling methods were 300, 300 and 225, respectively. The relative sampling errors of three methods were all less than 15%. The absolute sampling errors were 0.221 7, 0.302 4 and 0.047 8, respectively. The spatial stratified sampling with altitude as the stratum variable is an efficient approach of lower cost and higher precision for the snail survey.
Total-reflection X-ray fluorescence studies of trace elements in biomedical samples
NASA Astrophysics Data System (ADS)
Kubala-Kukuś, A.; Braziewicz, J.; Pajek, M.
2004-08-01
Application of the total-reflection X-ray fluorescence (TXRF) analysis in the studies of trace element contents in biomedical samples is discussed in the following aspects: (i) a nature of trace element concentration distributions, (ii) censoring approach to the detection limits, and (iii) a comparison of two sets of censored data. The paper summarizes the recent results achieved in this topics, in particular, the lognormal, or more general logstable, nature of concentration distribution of trace elements, the random left-censoring and the Kaplan-Meier approach accounting for detection limits and, finally, the application of the logrank test to compare the censored concentrations measured for two groups. These new aspects, which are of importance for applications of the TXRF in different fields, are discussed here in the context of TXRF studies of trace element in various samples of medical interest.
Use of simulated data sets to evaluate the fidelity of metagenomic processing methods
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mavromatis, K; Ivanova, N; Barry, Kerrie
2007-01-01
Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods presently used to process metagenomic sequences, we constructed three simulated data sets of varying complexity by combining sequencing reads randomly selected from 113 isolate genomes. These data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition. We assembled sampled reads using three commonly used genome assemblers (Phrap, Arachne and JAZZ), and predicted genes using two popular gene-finding pipelines (fgenesb and CRITICA/GLIMMER). The phylogenetic origins of the assembled contigs were predicted using one sequence similarity-based ( blast hit distribution) and twomore » sequence composition-based (PhyloPythia, oligonucleotide frequencies) binning methods. We explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes. The simulated data sets are available online to facilitate standardized benchmarking of tools for metagenomic analysis.« less
Use of simulated data sets to evaluate the fidelity of Metagenomicprocessing methods
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mavromatis, Konstantinos; Ivanova, Natalia; Barry, Kerri
2006-12-01
Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods presently used to process metagenomic sequences, we constructed three simulated data sets of varying complexity by combining sequencing reads randomly selected from 113 isolate genomes. These data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition. We assembled sampled reads using three commonly used genome assemblers (Phrap, Arachne and JAZZ), and predicted genes using two popular gene finding pipelines (fgenesb and CRITICA/GLIMMER). The phylogenetic origins of the assembled contigs were predicted using one sequence similarity--based (blast hit distribution) and twomore » sequence composition--based (PhyloPythia, oligonucleotide frequencies) binning methods. We explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes. The simulated data sets are available online to facilitate standardized benchmarking of tools for metagenomic analysis.« less
NASA Astrophysics Data System (ADS)
Wang, Shao-Jiang; Guo, Qi; Cai, Rong-Gen
2017-12-01
We investigate the impact of different redshift distributions of random samples on the baryon acoustic oscillations (BAO) measurements of D_V(z)r_d^fid/r_d from the two-point correlation functions of galaxies in the Data Release 12 of the Baryon Oscillation Spectroscopic Survey (BOSS). Big surveys, such as BOSS, usually assign redshifts to the random samples by randomly drawing values from the measured redshift distributions of the data, which would necessarily introduce fiducial signals of fluctuations into the random samples, weakening the signals of BAO, if the cosmic variance cannot be ignored. We propose a smooth function of redshift distribution that fits the data well to populate the random galaxy samples. The resulting cosmological parameters match the input parameters of the mock catalogue very well. The significance of BAO signals has been improved by 0.33σ for a low-redshift sample and by 0.03σ for a constant-stellar-mass sample, though the absolute values do not change significantly. Given the precision of the measurements of current cosmological parameters, it would be appreciated for the future improvements on the measurements of galaxy clustering.
Hobbs, F D R; Davis, R C; Roalfe, A K; Hare, R; Davies, M K
2004-08-01
To determine the performance of a new NT-proBNP assay in comparison with brain natriuretic peptide (BNP) in identifying left ventricular systolic dysfunction (LVSD) in randomly selected community populations. Blood samples were taken prospectively in the community from 591 randomly sampled individuals over the age of 45 years, stratified for age and socioeconomic status and divided into four cohorts (general population; clinically diagnosed heart failure; patients on diuretics; and patients deemed at high risk of heart failure). Definite heart failure (left ventricular ejection fraction (LVEF) < 40%) was identified in 33 people. Samples were handled as though in routine clinical practice. The laboratories undertaking the assays were blinded. Using NT-proBNP to diagnose LVEF < 40% in the general population, a level of > 40 pmol/l had 80% sensitivity, 73% specificity, 5% positive predictive value (PPV), 100% negative predictive value (NPV), and an area under the receiver-operator characteristic curve (AUC) of 76% (95% confidence interval (CI) 46% to 100%). For BNP to diagnose LVSD, a cut off level of > 33 pmol/l had 80% sensitivity, 88% specificity, 10% PPV, 100% NPV, and AUC of 88% (95% CI 75% to 100%). Similar NPVs were found for patients randomly screened from the three other populations. Both NT-proBNP and BNP have value in diagnosing LVSD in a community setting, with similar sensitivities and specificities. Using a high cut off for positivity will confirm the diagnosis of LVSD but will miss cases. At lower cut off values, positive results will require cardiac imaging to confirm LVSD.
Caries status in 16 year-olds with varying exposure to water fluoridation in Ireland.
Mullen, J; McGaffin, J; Farvardin, N; Brightman, S; Haire, C; Freeman, R
2012-12-01
Most of the Republic of Ireland's public water supplies have been fluoridated since the mid-1960s while Northern Ireland has never been fluoridated, apart from some small short-lived schemes in east Ulster. This study examines dental caries status in 16 year-olds in a part of Ireland straddling fluoridated and non-fluoridated water supply areas and compares two methods of assessing the effectiveness of water fluoridation. The cross-sectional survey tested differences in caries status by two methods: 1, Estimated Fluoridation Status as used previously in national and regional studies in the Republic and in the All-Island study of 2002; 2, Percentage Lifetime Exposure, a modification of a system described by Slade in 1995 and used in Australian caries research. Adolescents were selected for the study by a two-part random sampling process. Firstly, schools were selected in each area by creating three tiers based on school size, and selecting schools randomly from each tier. Then random sampling of 16-year-olds from these schools, based on a pre-set sampling fraction for each tier of schools. With both systems of measurement, significantly lower caries levels were found in those children with the greatest exposure to fluoridated water when compared to those with the least exposure. The survey provides further evidence of the effectiveness in reducing dental caries experience up to 16 years of age. The extra intricacies involved in using the Percentage Lifetime Exposure method did not provide much more information when compared to the simpler Estimated Fluoridation Status method.
Correlated Observations, the Law of Small Numbers and Bank Runs
2016-01-01
Empirical descriptions and studies suggest that generally depositors observe a sample of previous decisions before deciding if to keep their funds deposited or to withdraw them. These observed decisions may exhibit different degrees of correlation across depositors. In our model depositors decide sequentially and are assumed to follow the law of small numbers in the sense that they believe that a bank run is underway if the number of observed withdrawals in their sample is large. Theoretically, with highly correlated samples and infinite depositors runs occur with certainty, while with random samples it needs not be the case, as for many parameter settings the likelihood of bank runs is zero. We investigate the intermediate cases and find that i) decreasing the correlation and ii) increasing the sample size reduces the likelihood of bank runs, ceteris paribus. Interestingly, the multiplicity of equilibria, a feature of the canonical Diamond-Dybvig model that we use also, disappears almost completely in our setup. Our results have relevant policy implications. PMID:27035435
Correlated Observations, the Law of Small Numbers and Bank Runs.
Horváth, Gergely; Kiss, Hubert János
2016-01-01
Empirical descriptions and studies suggest that generally depositors observe a sample of previous decisions before deciding if to keep their funds deposited or to withdraw them. These observed decisions may exhibit different degrees of correlation across depositors. In our model depositors decide sequentially and are assumed to follow the law of small numbers in the sense that they believe that a bank run is underway if the number of observed withdrawals in their sample is large. Theoretically, with highly correlated samples and infinite depositors runs occur with certainty, while with random samples it needs not be the case, as for many parameter settings the likelihood of bank runs is zero. We investigate the intermediate cases and find that i) decreasing the correlation and ii) increasing the sample size reduces the likelihood of bank runs, ceteris paribus. Interestingly, the multiplicity of equilibria, a feature of the canonical Diamond-Dybvig model that we use also, disappears almost completely in our setup. Our results have relevant policy implications.
NASA Astrophysics Data System (ADS)
Hedberg, Emma; Gidhagen, Lars; Johansson, Christer
Sampling of particles (PM10) was conducted during a one-year period at two rural sites in Central Chile, Quillota and Linares. The samples were analyzed for elemental composition. The data sets have undergone source-receptor analyses in order to estimate the sources and their abundance's in the PM10 size fraction, by using the factor analytical method positive matrix factorization (PMF). The analysis showed that PM10 was dominated by soil resuspension at both sites during the summer months, while during winter traffic dominated the particle mass at Quillota and local wood burning dominated the particle mass at Linares. Two copper smelters impacted the Quillota station, and contributed to 10% and 16% of PM10 as an average during summer and winter, respectively. One smelter impacted Linares by 8% and 19% of PM10 in the summer and winter, respectively. For arsenic the two smelters accounted for 87% of the monitored arsenic levels at Quillota and at Linares one smelter contributed with 72% of the measured mass. In comparison with PMF, the use of a dispersion model tended to overestimate the smelter contribution to arsenic levels at both sites. The robustness of the PMF model was tested by using randomly reduced data sets, where 85%, 70%, 50% and 33% of the samples were included. In this way the ability of the model to reconstruct the sources initially found by the original data set could be tested. On average for all sources the relative standard deviation increased from 7% to 25% for the variables identifying the sources, when decreasing the data set from 85% to 33% of the samples, indicating that the solution initially found was very stable to begin with. But it was also noted that sources due to industrial or combustion processes were more sensitive for the size of the data set, compared to the natural sources as local soil and sea spray sources.
Crime Victimization in Adults With Severe Mental Illness
Teplin, Linda A.; McClelland, Gary M.; Abram, Karen M.; Weiner, Dana A.
2006-01-01
Context Since deinstitutionalization, most persons with severe mental illness (SMI) now live in the community, where they are at great risk for crime victimization. Objectives To determine the prevalence and incidence of crime victimization among persons with SMI by sex, race/ethnicity, and age, and to compare rates with general population data (the National Crime Victimization Survey), controlling for income and demographic differences between the samples. Design Epidemiologic study of persons in treatment. Independent master’s-level clinical research interviewers administered the National Crime Victimization Survey to randomly selected patients sampled from 16 randomly selected mental health agencies. Setting Sixteen agencies providing outpatient, day, and residential treatment to persons with SMI in Chicago, Ill. Participants Randomly selected, stratified sample of 936 patients aged 18 or older (483 men, 453 women) who were African American (n = 329), non-Hispanic white (n = 321), Hispanic (n = 270), or other race/ethnicity (n = 22). The comparison group comprised 32449 participants in the National Crime Victimization Survey. Main Outcome Measure National Crime Victimization Survey, developed by the Bureau of Justice Statistics. Results More than one quarter of persons with SMI had been victims of a violent crime in the past year, a rate more than 11 times higher than the general population rates even after controlling for demographic differences between the 2 samples (P<.001). The annual incidence of violent crime in the SMI sample (168.2 incidents per 1000 persons) is more than 4 times higher than the general population rates (39.9 incidents per 1000 persons) (P<.001). Depending on the type of violent crime (rape/sexual assault, robbery, assault, and their subcategories), prevalence was 6 to 23 times greater among persons with SMI than among the general population. Conclusions Crime victimization is a major public health problem among persons with SMI who are treated in the community. We recommend directions for future research, propose modifications in public policy, and suggest how the mental health system can respond to reduce victimization and its consequences. PMID:16061769
Winer, Rachel L; Tiro, Jasmin A; Miglioretti, Diana L; Thayer, Chris; Beatty, Tara; Lin, John; Gao, Hongyuan; Kimbel, Kilian; Buist, Diana S M
2018-01-01
Women who delay or do not attend Papanicolaou (Pap) screening are at increased risk for cervical cancer. Trials in countries with organized screening programs have demonstrated that mailing high-risk (hr) human papillomavirus (HPV) self-sampling kits to under-screened women increases participation, but U.S. data are lacking. HOME is a pragmatic randomized controlled trial set within a U.S. integrated healthcare delivery system to compare two programmatic approaches for increasing cervical cancer screening uptake and effectiveness in under-screened women (≥3.4years since last Pap) aged 30-64years: 1) usual care (annual patient reminders and ad hoc outreach by clinics) and 2) usual care plus mailed hrHPV self-screening kits. Over 2.5years, eligible women were identified through electronic medical record (EMR) data and randomized 1:1 to the intervention or control arm. Women in the intervention arm were mailed kits with pre-paid envelopes to return samples to the central clinical laboratory for hrHPV testing. Results were documented in the EMR to notify women's primary care providers of appropriate follow-up. Primary outcomes are detection and treatment of cervical neoplasia. Secondary outcomes are cervical cancer screening uptake, abnormal screening results, and women's experiences and attitudes towards hrHPV self-sampling and follow-up of hrHPV-positive results (measured through surveys and interviews). The trial was designed to evaluate whether a programmatic strategy incorporating hrHPV self-sampling is effective in promoting adherence to the complete screening process (including follow-up of abnormal screening results and treatment). The objective of this report is to describe the rationale and design of this pragmatic trial. Copyright © 2017 Elsevier Inc. All rights reserved.
Beck, Thilo; Haasen, Christian; Verthein, Uwe; Walcher, Stephan; Schuler, Christoph; Backmund, Markus; Ruckes, Christian; Reimer, Jens
2014-01-01
Aims To compare the efficacy of slow-release oral morphine (SROM) and methadone as maintenance medication for opioid dependence in patients previously treated with methadone. Design Prospective, multiple-dose, open label, randomized, non-inferiority, cross-over study over two 11-week periods. Methadone treatment was switched to SROM with flexible dosing and vice versa according to period and sequence of treatment. Setting Fourteen out-patient addiction treatment centres in Switzerland and Germany. Participants Adults with opioid dependence in methadone maintenance programmes (dose ≥50 mg/day) for ≥26 weeks. Measurements The efficacy end-point was the proportion of heroin-positive urine samples per patient and period of treatment. Each week, two urine samples were collected, randomly selected and analysed for 6-monoacetyl-morphine and 6-acetylcodeine. Non-inferiority was concluded if the two-sided 95% confidence interval (CI) in the difference of proportions of positive urine samples was below the predefined boundary of 10%. Findings One hundred and fifty-seven patients fulfilled criteria to form the per protocol population. The proportion of heroin-positive urine samples under SROM treatment (0.20) was non-inferior to the proportion under methadone treatment (0.15) (least-squares mean difference 0.05; 95% CI = 0.02, 0.08; P > 0.01). The 95% CI fell within the 10% non-inferiority margin, confirming the non-inferiority of SROM to methadone. A dose-dependent effect was shown for SROM (i.e. decreasing proportions of heroin-positive urine samples with increasing SROM doses). Retention in treatment showed no significant differences between treatments (period 1/period 2: SROM: 88.7%/82.1%, methadone: 91.1%/88.0%; period 1: P = 0.50, period 2: P = 0.19). Overall, safety outcomes were similar between the two groups. Conclusions Slow-release oral morphine appears to be at least as effective as methadone in treating people with opioid use disorder. PMID:24304412
Ouyang, Liwen; Apley, Daniel W; Mehrotra, Sanjay
2016-04-01
Electronic medical record (EMR) databases offer significant potential for developing clinical hypotheses and identifying disease risk associations by fitting statistical models that capture the relationship between a binary response variable and a set of predictor variables that represent clinical, phenotypical, and demographic data for the patient. However, EMR response data may be error prone for a variety of reasons. Performing a manual chart review to validate data accuracy is time consuming, which limits the number of chart reviews in a large database. The authors' objective is to develop a new design-of-experiments-based systematic chart validation and review (DSCVR) approach that is more powerful than the random validation sampling used in existing approaches. The DSCVR approach judiciously and efficiently selects the cases to validate (i.e., validate whether the response values are correct for those cases) for maximum information content, based only on their predictor variable values. The final predictive model will be fit using only the validation sample, ignoring the remainder of the unvalidated and unreliable error-prone data. A Fisher information based D-optimality criterion is used, and an algorithm for optimizing it is developed. The authors' method is tested in a simulation comparison that is based on a sudden cardiac arrest case study with 23 041 patients' records. This DSCVR approach, using the Fisher information based D-optimality criterion, results in a fitted model with much better predictive performance, as measured by the receiver operating characteristic curve and the accuracy in predicting whether a patient will experience the event, than a model fitted using a random validation sample. The simulation comparisons demonstrate that this DSCVR approach can produce predictive models that are significantly better than those produced from random validation sampling, especially when the event rate is low. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Unsupervised Bayesian linear unmixing of gene expression microarrays.
Bazot, Cécile; Dobigeon, Nicolas; Tourneret, Jean-Yves; Zaas, Aimee K; Ginsburg, Geoffrey S; Hero, Alfred O
2013-03-19
This paper introduces a new constrained model and the corresponding algorithm, called unsupervised Bayesian linear unmixing (uBLU), to identify biological signatures from high dimensional assays like gene expression microarrays. The basis for uBLU is a Bayesian model for the data samples which are represented as an additive mixture of random positive gene signatures, called factors, with random positive mixing coefficients, called factor scores, that specify the relative contribution of each signature to a specific sample. The particularity of the proposed method is that uBLU constrains the factor loadings to be non-negative and the factor scores to be probability distributions over the factors. Furthermore, it also provides estimates of the number of factors. A Gibbs sampling strategy is adopted here to generate random samples according to the posterior distribution of the factors, factor scores, and number of factors. These samples are then used to estimate all the unknown parameters. Firstly, the proposed uBLU method is applied to several simulated datasets with known ground truth and compared with previous factor decomposition methods, such as principal component analysis (PCA), non negative matrix factorization (NMF), Bayesian factor regression modeling (BFRM), and the gradient-based algorithm for general matrix factorization (GB-GMF). Secondly, we illustrate the application of uBLU on a real time-evolving gene expression dataset from a recent viral challenge study in which individuals have been inoculated with influenza A/H3N2/Wisconsin. We show that the uBLU method significantly outperforms the other methods on the simulated and real data sets considered here. The results obtained on synthetic and real data illustrate the accuracy of the proposed uBLU method when compared to other factor decomposition methods from the literature (PCA, NMF, BFRM, and GB-GMF). The uBLU method identifies an inflammatory component closely associated with clinical symptom scores collected during the study. Using a constrained model allows recovery of all the inflammatory genes in a single factor.
Faggion, Clovis Mariano; Giannakopoulos, Nikolaos Nikitas
2012-10-01
Most readers, reviewers, and editors rely on abstracts to decide whether to assess the full text of an article. A research abstract should, therefore, be as informative as possible. The standard of reporting in abstracts of randomized controlled trials (RCTs) in periodontology and implant dentistry has not yet been assessed. The objectives of this review are: 1) to assess the quality of reporting in abstracts of RCTs in periodontology and implant dentistry, and 2) to investigate changes in the quality of reporting by comparing samples from different periods. The authors searched the PubMed electronic database, independently and in duplicate, for abstracts of RCTs published in seven leading journals of periodontology and implant dentistry from 2005 to 2007 and from 2009 to 2011. The quality of reporting in selected abstracts with reference to the CONSORT (Consolidated Standards of Reporting Trials) for Abstracts checklist published in January 2008 was assessed independently and in duplicate. Cohen κ statistic was used to determine the extent of agreement of the reviewers. Pearson χ(2) test and/or Fisher exact test were used to assess differences in reporting in the two samples. Level of significance was set at P <0.05. Three hundred ninety-two abstracts are included in this review. Three items (intervention, objective, and conclusions) were almost fully reported in both samples. In contrast, other items (randomization, trial registration, and funding) were never reported. There were significant changes in reporting for only two items, trial design and title (items better reported in the pre- and post-CONSORT samples, respectively). Most topics, however, were similarly poorly reported in both samples of abstracts. The quality of reporting in abstracts of RCTs in periodontology and implant dentistry can be improved. Authors should follow the CONSORT for Abstracts guidelines, and journal editors should promote clear rules to improve authors' adherence to these guidelines.
Baseline adjustments for binary data in repeated cross-sectional cluster randomized trials.
Nixon, R M; Thompson, S G
2003-09-15
Analysis of covariance models, which adjust for a baseline covariate, are often used to compare treatment groups in a controlled trial in which individuals are randomized. Such analysis adjusts for any baseline imbalance and usually increases the precision of the treatment effect estimate. We assess the value of such adjustments in the context of a cluster randomized trial with repeated cross-sectional design and a binary outcome. In such a design, a new sample of individuals is taken from the clusters at each measurement occasion, so that baseline adjustment has to be at the cluster level. Logistic regression models are used to analyse the data, with cluster level random effects to allow for different outcome probabilities in each cluster. We compare the estimated treatment effect and its precision in models that incorporate a covariate measuring the cluster level probabilities at baseline and those that do not. In two data sets, taken from a cluster randomized trial in the treatment of menorrhagia, the value of baseline adjustment is only evident when the number of subjects per cluster is large. We assess the generalizability of these findings by undertaking a simulation study, and find that increased precision of the treatment effect requires both large cluster sizes and substantial heterogeneity between clusters at baseline, but baseline imbalance arising by chance in a randomized study can always be effectively adjusted for. Copyright 2003 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Gomo, M.; Vermeulen, D.
2015-03-01
An investigation was conducted to statistically compare the influence of non-purging and purging groundwater sampling methods on analysed inorganic chemistry parameters and calculated saturation indices. Groundwater samples were collected from 15 monitoring wells drilled in Karoo aquifers before and after purging for the comparative study. For the non-purging method, samples were collected from groundwater flow zones located in the wells using electrical conductivity (EC) profiling. The two data sets of non-purged and purged groundwater samples were analysed for inorganic chemistry parameters at the Institute of Groundwater Studies (IGS) laboratory of the Free University in South Africa. Saturation indices for mineral phases that were found in the data base of PHREEQC hydrogeochemical model were calculated for each data set. Four one-way ANOVA tests were conducted using Microsoft excel 2007 to investigate if there is any statistically significant difference between: (1) all inorganic chemistry parameters measured in the non-purged and purged groundwater samples per each specific well, (2) all mineral saturation indices calculated for the non-purged and purged groundwater samples per each specific well, (3) individual inorganic chemistry parameters measured in the non-purged and purged groundwater samples across all wells and (4) Individual mineral saturation indices calculated for non-purged and purged groundwater samples across all wells. For all the ANOVA tests conducted, the calculated alpha values (p) are greater than 0.05 (significance level) and test statistic (F) is less than the critical value (Fcrit) (F < Fcrit). The results imply that there was no statistically significant difference between the two data sets. With a 95% confidence, it was therefore concluded that the variance between groups was rather due to random chance and not to the influence of the sampling methods (tested factor). It is therefore be possible that in some hydrogeologic conditions, non-purged groundwater samples might be just as representative as the purged ones. The findings of this study can provide an important platform for future evidence oriented research investigations to establish the necessity of purging prior to groundwater sampling in different aquifer systems.
ERIC Educational Resources Information Center
Shire, Stephanie Y.; Chang, Ya-Chih; Shih, Wendy; Bracaglia, Suzanne; Kodjoe, Maria; Kasari, Connie
2017-01-01
Background: Interventions found to be effective in research settings are often not as effective when implemented in community settings. Considering children with autism, studies have rarely examined the efficacy of laboratory-tested interventions on child outcomes in community settings using randomized controlled designs. Methods: One hundred and…