Robust portfolio selection based on asymmetric measures of variability of stock returns
NASA Astrophysics Data System (ADS)
Chen, Wei; Tan, Shaohua
2009-10-01
This paper addresses a new uncertainty set--interval random uncertainty set for robust optimization. The form of interval random uncertainty set makes it suitable for capturing the downside and upside deviations of real-world data. These deviation measures capture distributional asymmetry and lead to better optimization results. We also apply our interval random chance-constrained programming to robust mean-variance portfolio selection under interval random uncertainty sets in the elements of mean vector and covariance matrix. Numerical experiments with real market data indicate that our approach results in better portfolio performance.
Randomization Methods in Emergency Setting Trials: A Descriptive Review
ERIC Educational Resources Information Center
Corbett, Mark Stephen; Moe-Byrne, Thirimon; Oddie, Sam; McGuire, William
2016-01-01
Background: Quasi-randomization might expedite recruitment into trials in emergency care settings but may also introduce selection bias. Methods: We searched the Cochrane Library and other databases for systematic reviews of interventions in emergency medicine or urgent care settings. We assessed selection bias (baseline imbalances) in prognostic…
Good, Andrew C; Hermsmeier, Mark A
2007-01-01
Research into the advancement of computer-aided molecular design (CAMD) has a tendency to focus on the discipline of algorithm development. Such efforts are often wrought to the detriment of the data set selection and analysis used in said algorithm validation. Here we highlight the potential problems this can cause in the context of druglikeness classification. More rigorous efforts are applied to the selection of decoy (nondruglike) molecules from the ACD. Comparisons are made between model performance using the standard technique of random test set creation with test sets derived from explicit ontological separation by drug class. The dangers of viewing druglike space as sufficiently coherent to permit simple classification are highlighted. In addition the issues inherent in applying unfiltered data and random test set selection to (Q)SAR models utilizing large and supposedly heterogeneous databases are discussed.
Applications of random forest feature selection for fine-scale genetic population assignment.
Sylvester, Emma V A; Bentzen, Paul; Bradbury, Ian R; Clément, Marie; Pearce, Jon; Horne, John; Beiko, Robert G
2018-02-01
Genetic population assignment used to inform wildlife management and conservation efforts requires panels of highly informative genetic markers and sensitive assignment tests. We explored the utility of machine-learning algorithms (random forest, regularized random forest and guided regularized random forest) compared with F ST ranking for selection of single nucleotide polymorphisms (SNP) for fine-scale population assignment. We applied these methods to an unpublished SNP data set for Atlantic salmon ( Salmo salar ) and a published SNP data set for Alaskan Chinook salmon ( Oncorhynchus tshawytscha ). In each species, we identified the minimum panel size required to obtain a self-assignment accuracy of at least 90% using each method to create panels of 50-700 markers Panels of SNPs identified using random forest-based methods performed up to 7.8 and 11.2 percentage points better than F ST -selected panels of similar size for the Atlantic salmon and Chinook salmon data, respectively. Self-assignment accuracy ≥90% was obtained with panels of 670 and 384 SNPs for each data set, respectively, a level of accuracy never reached for these species using F ST -selected panels. Our results demonstrate a role for machine-learning approaches in marker selection across large genomic data sets to improve assignment for management and conservation of exploited populations.
Humphreys, Keith; Blodgett, Janet C; Wagner, Todd H
2014-11-01
Observational studies of Alcoholics Anonymous' (AA) effectiveness are vulnerable to self-selection bias because individuals choose whether or not to attend AA. The present study, therefore, employed an innovative statistical technique to derive a selection bias-free estimate of AA's impact. Six data sets from 5 National Institutes of Health-funded randomized trials (1 with 2 independent parallel arms) of AA facilitation interventions were analyzed using instrumental variables models. Alcohol-dependent individuals in one of the data sets (n = 774) were analyzed separately from the rest of sample (n = 1,582 individuals pooled from 5 data sets) because of heterogeneity in sample parameters. Randomization itself was used as the instrumental variable. Randomization was a good instrument in both samples, effectively predicting increased AA attendance that could not be attributed to self-selection. In 5 of the 6 data sets, which were pooled for analysis, increased AA attendance that was attributable to randomization (i.e., free of self-selection bias) was effective at increasing days of abstinence at 3-month (B = 0.38, p = 0.001) and 15-month (B = 0.42, p = 0.04) follow-up. However, in the remaining data set, in which preexisting AA attendance was much higher, further increases in AA involvement caused by the randomly assigned facilitation intervention did not affect drinking outcome. For most individuals seeking help for alcohol problems, increasing AA attendance leads to short- and long-term decreases in alcohol consumption that cannot be attributed to self-selection. However, for populations with high preexisting AA involvement, further increases in AA attendance may have little impact. Copyright © 2014 by the Research Society on Alcoholism.
Training set selection for the prediction of essential genes.
Cheng, Jian; Xu, Zhao; Wu, Wenwu; Zhao, Li; Li, Xiangchen; Liu, Yanlin; Tao, Shiheng
2014-01-01
Various computational models have been developed to transfer annotations of gene essentiality between organisms. However, despite the increasing number of microorganisms with well-characterized sets of essential genes, selection of appropriate training sets for predicting the essential genes of poorly-studied or newly sequenced organisms remains challenging. In this study, a machine learning approach was applied reciprocally to predict the essential genes in 21 microorganisms. Results showed that training set selection greatly influenced predictive accuracy. We determined four criteria for training set selection: (1) essential genes in the selected training set should be reliable; (2) the growth conditions in which essential genes are defined should be consistent in training and prediction sets; (3) species used as training set should be closely related to the target organism; and (4) organisms used as training and prediction sets should exhibit similar phenotypes or lifestyles. We then analyzed the performance of an incomplete training set and an integrated training set with multiple organisms. We found that the size of the training set should be at least 10% of the total genes to yield accurate predictions. Additionally, the integrated training sets exhibited remarkable increase in stability and accuracy compared with single sets. Finally, we compared the performance of the integrated training sets with the four criteria and with random selection. The results revealed that a rational selection of training sets based on our criteria yields better performance than random selection. Thus, our results provide empirical guidance on training set selection for the identification of essential genes on a genome-wide scale.
ERIC Educational Resources Information Center
Felce, David; Perry, Jonathan
2004-01-01
Background: The aims were to: (i) explore the association between age and size of setting and staffing per resident; and (ii) report resident and setting characteristics, and indicators of service process and resident activity for a national random sample of staffed housing provision. Methods: Sixty settings were selected randomly from those…
NASA Astrophysics Data System (ADS)
Erener, Arzu; Sivas, A. Abdullah; Selcuk-Kestel, A. Sevtap; Düzgün, H. Sebnem
2017-07-01
All of the quantitative landslide susceptibility mapping (QLSM) methods requires two basic data types, namely, landslide inventory and factors that influence landslide occurrence (landslide influencing factors, LIF). Depending on type of landslides, nature of triggers and LIF, accuracy of the QLSM methods differs. Moreover, how to balance the number of 0 (nonoccurrence) and 1 (occurrence) in the training set obtained from the landslide inventory and how to select which one of the 1's and 0's to be included in QLSM models play critical role in the accuracy of the QLSM. Although performance of various QLSM methods is largely investigated in the literature, the challenge of training set construction is not adequately investigated for the QLSM methods. In order to tackle this challenge, in this study three different training set selection strategies along with the original data set is used for testing the performance of three different regression methods namely Logistic Regression (LR), Bayesian Logistic Regression (BLR) and Fuzzy Logistic Regression (FLR). The first sampling strategy is proportional random sampling (PRS), which takes into account a weighted selection of landslide occurrences in the sample set. The second method, namely non-selective nearby sampling (NNS), includes randomly selected sites and their surrounding neighboring points at certain preselected distances to include the impact of clustering. Selective nearby sampling (SNS) is the third method, which concentrates on the group of 1's and their surrounding neighborhood. A randomly selected group of landslide sites and their neighborhood are considered in the analyses similar to NNS parameters. It is found that LR-PRS, FLR-PRS and BLR-Whole Data set-ups, with order, yield the best fits among the other alternatives. The results indicate that in QLSM based on regression models, avoidance of spatial correlation in the data set is critical for the model's performance.
ERIC Educational Resources Information Center
Yu, Bing; Hong, Guanglei
2012-01-01
This study uses simulation examples representing three types of treatment assignment mechanisms in data generation (the random intercept and slopes setting, the random intercept setting, and a third setting with a cluster-level treatment and an individual-level outcome) in order to determine optimal procedures for reducing bias and improving…
Random forest models to predict aqueous solubility.
Palmer, David S; O'Boyle, Noel M; Glen, Robert C; Mitchell, John B O
2007-01-01
Random Forest regression (RF), Partial-Least-Squares (PLS) regression, Support Vector Machines (SVM), and Artificial Neural Networks (ANN) were used to develop QSPR models for the prediction of aqueous solubility, based on experimental data for 988 organic molecules. The Random Forest regression model predicted aqueous solubility more accurately than those created by PLS, SVM, and ANN and offered methods for automatic descriptor selection, an assessment of descriptor importance, and an in-parallel measure of predictive ability, all of which serve to recommend its use. The prediction of log molar solubility for an external test set of 330 molecules that are solid at 25 degrees C gave an r2 = 0.89 and RMSE = 0.69 log S units. For a standard data set selected from the literature, the model performed well with respect to other documented methods. Finally, the diversity of the training and test sets are compared to the chemical space occupied by molecules in the MDL drug data report, on the basis of molecular descriptors selected by the regression analysis.
Feature Selection for Chemical Sensor Arrays Using Mutual Information
Wang, X. Rosalind; Lizier, Joseph T.; Nowotny, Thomas; Berna, Amalia Z.; Prokopenko, Mikhail; Trowell, Stephen C.
2014-01-01
We address the problem of feature selection for classifying a diverse set of chemicals using an array of metal oxide sensors. Our aim is to evaluate a filter approach to feature selection with reference to previous work, which used a wrapper approach on the same data set, and established best features and upper bounds on classification performance. We selected feature sets that exhibit the maximal mutual information with the identity of the chemicals. The selected features closely match those found to perform well in the previous study using a wrapper approach to conduct an exhaustive search of all permitted feature combinations. By comparing the classification performance of support vector machines (using features selected by mutual information) with the performance observed in the previous study, we found that while our approach does not always give the maximum possible classification performance, it always selects features that achieve classification performance approaching the optimum obtained by exhaustive search. We performed further classification using the selected feature set with some common classifiers and found that, for the selected features, Bayesian Networks gave the best performance. Finally, we compared the observed classification performances with the performance of classifiers using randomly selected features. We found that the selected features consistently outperformed randomly selected features for all tested classifiers. The mutual information filter approach is therefore a computationally efficient method for selecting near optimal features for chemical sensor arrays. PMID:24595058
Le, Trang T; Simmons, W Kyle; Misaki, Masaya; Bodurka, Jerzy; White, Bill C; Savitz, Jonathan; McKinney, Brett A
2017-09-15
Classification of individuals into disease or clinical categories from high-dimensional biological data with low prediction error is an important challenge of statistical learning in bioinformatics. Feature selection can improve classification accuracy but must be incorporated carefully into cross-validation to avoid overfitting. Recently, feature selection methods based on differential privacy, such as differentially private random forests and reusable holdout sets, have been proposed. However, for domains such as bioinformatics, where the number of features is much larger than the number of observations p≫n , these differential privacy methods are susceptible to overfitting. We introduce private Evaporative Cooling, a stochastic privacy-preserving machine learning algorithm that uses Relief-F for feature selection and random forest for privacy preserving classification that also prevents overfitting. We relate the privacy-preserving threshold mechanism to a thermodynamic Maxwell-Boltzmann distribution, where the temperature represents the privacy threshold. We use the thermal statistical physics concept of Evaporative Cooling of atomic gases to perform backward stepwise privacy-preserving feature selection. On simulated data with main effects and statistical interactions, we compare accuracies on holdout and validation sets for three privacy-preserving methods: the reusable holdout, reusable holdout with random forest, and private Evaporative Cooling, which uses Relief-F feature selection and random forest classification. In simulations where interactions exist between attributes, private Evaporative Cooling provides higher classification accuracy without overfitting based on an independent validation set. In simulations without interactions, thresholdout with random forest and private Evaporative Cooling give comparable accuracies. We also apply these privacy methods to human brain resting-state fMRI data from a study of major depressive disorder. Code available at http://insilico.utulsa.edu/software/privateEC . brett-mckinney@utulsa.edu. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
NASA Astrophysics Data System (ADS)
Goudarzi, Nasser
2016-04-01
In this work, two new and powerful chemometrics methods are applied for the modeling and prediction of the 19F chemical shift values of some fluorinated organic compounds. The radial basis function-partial least square (RBF-PLS) and random forest (RF) are employed to construct the models to predict the 19F chemical shifts. In this study, we didn't used from any variable selection method and RF method can be used as variable selection and modeling technique. Effects of the important parameters affecting the ability of the RF prediction power such as the number of trees (nt) and the number of randomly selected variables to split each node (m) were investigated. The root-mean-square errors of prediction (RMSEP) for the training set and the prediction set for the RBF-PLS and RF models were 44.70, 23.86, 29.77, and 23.69, respectively. Also, the correlation coefficients of the prediction set for the RBF-PLS and RF models were 0.8684 and 0.9313, respectively. The results obtained reveal that the RF model can be used as a powerful chemometrics tool for the quantitative structure-property relationship (QSPR) studies.
ERIC Educational Resources Information Center
Rafferty, Karen; Watson, Patrice; Lappe, Joan M.
2011-01-01
Objective: To assess the impact of calcium-fortified food and dairy food on selected nutrient intakes in the diets of adolescent girls. Design: Randomized controlled trial, secondary analysis. Setting and Participants: Adolescent girls (n = 149) from a midwestern metropolitan area participated in randomized controlled trials of bone physiology…
Improving ensemble decision tree performance using Adaboost and Bagging
NASA Astrophysics Data System (ADS)
Hasan, Md. Rajib; Siraj, Fadzilah; Sainin, Mohd Shamrie
2015-12-01
Ensemble classifier systems are considered as one of the most promising in medical data classification and the performance of deceision tree classifier can be increased by the ensemble method as it is proven to be better than single classifiers. However, in a ensemble settings the performance depends on the selection of suitable base classifier. This research employed two prominent esemble s namely Adaboost and Bagging with base classifiers such as Random Forest, Random Tree, j48, j48grafts and Logistic Model Regression (LMT) that have been selected independently. The empirical study shows that the performance varries when different base classifiers are selected and even some places overfitting issue also been noted. The evidence shows that ensemble decision tree classfiers using Adaboost and Bagging improves the performance of selected medical data sets.
Educational Research with Real-World Data: Reducing Selection Bias with Propensity Scores
ERIC Educational Resources Information Center
Adelson, Jill L.
2013-01-01
Often it is infeasible or unethical to use random assignment in educational settings to study important constructs and questions. Hence, educational research often uses observational data, such as large-scale secondary data sets and state and school district data, and quasi-experimental designs. One method of reducing selection bias in estimations…
Selective Influence through Conditional Independence.
ERIC Educational Resources Information Center
Dzhafarov, Ehtibar N.
2003-01-01
Presents a generalization and improvement for the definition proposed by E. Dzhafarov (2001) for selectiveness in the dependence of several random variables on several (sets of) external factors. This generalization links the notion of selective influence with that of conditional independence. (SLD)
Does rational selection of training and test sets improve the outcome of QSAR modeling?
Martin, Todd M; Harten, Paul; Young, Douglas M; Muratov, Eugene N; Golbraikh, Alexander; Zhu, Hao; Tropsha, Alexander
2012-10-22
Prior to using a quantitative structure activity relationship (QSAR) model for external predictions, its predictive power should be established and validated. In the absence of a true external data set, the best way to validate the predictive ability of a model is to perform its statistical external validation. In statistical external validation, the overall data set is divided into training and test sets. Commonly, this splitting is performed using random division. Rational splitting methods can divide data sets into training and test sets in an intelligent fashion. The purpose of this study was to determine whether rational division methods lead to more predictive models compared to random division. A special data splitting procedure was used to facilitate the comparison between random and rational division methods. For each toxicity end point, the overall data set was divided into a modeling set (80% of the overall set) and an external evaluation set (20% of the overall set) using random division. The modeling set was then subdivided into a training set (80% of the modeling set) and a test set (20% of the modeling set) using rational division methods and by using random division. The Kennard-Stone, minimal test set dissimilarity, and sphere exclusion algorithms were used as the rational division methods. The hierarchical clustering, random forest, and k-nearest neighbor (kNN) methods were used to develop QSAR models based on the training sets. For kNN QSAR, multiple training and test sets were generated, and multiple QSAR models were built. The results of this study indicate that models based on rational division methods generate better statistical results for the test sets than models based on random division, but the predictive power of both types of models are comparable.
Evaluation of variable selection methods for random forests and omics data sets.
Degenhardt, Frauke; Seifert, Stephan; Szymczak, Silke
2017-10-16
Machine learning methods and in particular random forests are promising approaches for prediction based on high dimensional omics data sets. They provide variable importance measures to rank predictors according to their predictive power. If building a prediction model is the main goal of a study, often a minimal set of variables with good prediction performance is selected. However, if the objective is the identification of involved variables to find active networks and pathways, approaches that aim to select all relevant variables should be preferred. We evaluated several variable selection procedures based on simulated data as well as publicly available experimental methylation and gene expression data. Our comparison included the Boruta algorithm, the Vita method, recurrent relative variable importance, a permutation approach and its parametric variant (Altmann) as well as recursive feature elimination (RFE). In our simulation studies, Boruta was the most powerful approach, followed closely by the Vita method. Both approaches demonstrated similar stability in variable selection, while Vita was the most robust approach under a pure null model without any predictor variables related to the outcome. In the analysis of the different experimental data sets, Vita demonstrated slightly better stability in variable selection and was less computationally intensive than Boruta.In conclusion, we recommend the Boruta and Vita approaches for the analysis of high-dimensional data sets. Vita is considerably faster than Boruta and thus more suitable for large data sets, but only Boruta can also be applied in low-dimensional settings. © The Author 2017. Published by Oxford University Press.
Crampin, A C; Mwinuka, V; Malema, S S; Glynn, J R; Fine, P E
2001-01-01
Selection bias, particularly of controls, is common in case-control studies and may materially affect the results. Methods of control selection should be tailored both for the risk factors and disease under investigation and for the population being studied. We present here a control selection method devised for a case-control study of tuberculosis in rural Africa (Karonga, northern Malawi) that selects an age/sex frequency-matched random sample of the population, with a geographical distribution in proportion to the population density. We also present an audit of the selection process, and discuss the potential of this method in other settings.
Fox, Eric W; Hill, Ryan A; Leibowitz, Scott G; Olsen, Anthony R; Thornbrugh, Darren J; Weber, Marc H
2017-07-01
Random forest (RF) modeling has emerged as an important statistical learning method in ecology due to its exceptional predictive performance. However, for large and complex ecological data sets, there is limited guidance on variable selection methods for RF modeling. Typically, either a preselected set of predictor variables are used or stepwise procedures are employed which iteratively remove variables according to their importance measures. This paper investigates the application of variable selection methods to RF models for predicting probable biological stream condition. Our motivating data set consists of the good/poor condition of n = 1365 stream survey sites from the 2008/2009 National Rivers and Stream Assessment, and a large set (p = 212) of landscape features from the StreamCat data set as potential predictors. We compare two types of RF models: a full variable set model with all 212 predictors and a reduced variable set model selected using a backward elimination approach. We assess model accuracy using RF's internal out-of-bag estimate, and a cross-validation procedure with validation folds external to the variable selection process. We also assess the stability of the spatial predictions generated by the RF models to changes in the number of predictors and argue that model selection needs to consider both accuracy and stability. The results suggest that RF modeling is robust to the inclusion of many variables of moderate to low importance. We found no substantial improvement in cross-validated accuracy as a result of variable reduction. Moreover, the backward elimination procedure tended to select too few variables and exhibited numerous issues such as upwardly biased out-of-bag accuracy estimates and instabilities in the spatial predictions. We use simulations to further support and generalize results from the analysis of real data. A main purpose of this work is to elucidate issues of model selection bias and instability to ecologists interested in using RF to develop predictive models with large environmental data sets.
Extraordinarily Adaptive Properties of the Genetically Encoded Amino Acids
Ilardo, Melissa; Meringer, Markus; Freeland, Stephen; Rasulev, Bakhtiyor; Cleaves II, H. James
2015-01-01
Using novel advances in computational chemistry, we demonstrate that the set of 20 genetically encoded amino acids, used nearly universally to construct all coded terrestrial proteins, has been highly influenced by natural selection. We defined an adaptive set of amino acids as one whose members thoroughly cover relevant physico-chemical properties, or “chemistry space.” Using this metric, we compared the encoded amino acid alphabet to random sets of amino acids. These random sets were drawn from a computationally generated compound library containing 1913 alternative amino acids that lie within the molecular weight range of the encoded amino acids. Sets that cover chemistry space better than the genetically encoded alphabet are extremely rare and energetically costly. Further analysis of more adaptive sets reveals common features and anomalies, and we explore their implications for synthetic biology. We present these computations as evidence that the set of 20 amino acids found within the standard genetic code is the result of considerable natural selection. The amino acids used for constructing coded proteins may represent a largely global optimum, such that any aqueous biochemistry would use a very similar set. PMID:25802223
ERIC Educational Resources Information Center
Norris, Susan L.; Holmer, Haley K.; Fu, Rongwei; Ogden, Lauren A.; Viswanathan, Meera S.; Abou-Setta, Ahmed M.
2014-01-01
Objective: This study aimed to examine selective outcome reporting (SOR) and selective analysis reporting (SAR) in randomized controlled trials (RCTs) and to explore the usefulness of trial registries for identifying SOR and SAR. Study Design and Setting: We selected one "index outcome" for each of three comparative effectiveness reviews…
Shimoni, Yishai
2018-02-01
One of the goals of cancer research is to identify a set of genes that cause or control disease progression. However, although multiple such gene sets were published, these are usually in very poor agreement with each other, and very few of the genes proved to be functional therapeutic targets. Furthermore, recent findings from a breast cancer gene-expression cohort showed that sets of genes selected randomly can be used to predict survival with a much higher probability than expected. These results imply that many of the genes identified in breast cancer gene expression analysis may not be causal of cancer progression, even though they can still be highly predictive of prognosis. We performed a similar analysis on all the cancer types available in the cancer genome atlas (TCGA), namely, estimating the predictive power of random gene sets for survival. Our work shows that most cancer types exhibit the property that random selections of genes are more predictive of survival than expected. In contrast to previous work, this property is not removed by using a proliferation signature, which implies that proliferation may not always be the confounder that drives this property. We suggest one possible solution in the form of data-driven sub-classification to reduce this property significantly. Our results suggest that the predictive power of random gene sets may be used to identify the existence of sub-classes in the data, and thus may allow better understanding of patient stratification. Furthermore, by reducing the observed bias this may allow more direct identification of biologically relevant, and potentially causal, genes.
2018-01-01
One of the goals of cancer research is to identify a set of genes that cause or control disease progression. However, although multiple such gene sets were published, these are usually in very poor agreement with each other, and very few of the genes proved to be functional therapeutic targets. Furthermore, recent findings from a breast cancer gene-expression cohort showed that sets of genes selected randomly can be used to predict survival with a much higher probability than expected. These results imply that many of the genes identified in breast cancer gene expression analysis may not be causal of cancer progression, even though they can still be highly predictive of prognosis. We performed a similar analysis on all the cancer types available in the cancer genome atlas (TCGA), namely, estimating the predictive power of random gene sets for survival. Our work shows that most cancer types exhibit the property that random selections of genes are more predictive of survival than expected. In contrast to previous work, this property is not removed by using a proliferation signature, which implies that proliferation may not always be the confounder that drives this property. We suggest one possible solution in the form of data-driven sub-classification to reduce this property significantly. Our results suggest that the predictive power of random gene sets may be used to identify the existence of sub-classes in the data, and thus may allow better understanding of patient stratification. Furthermore, by reducing the observed bias this may allow more direct identification of biologically relevant, and potentially causal, genes. PMID:29470520
Feature Selection for Ridge Regression with Provable Guarantees.
Paul, Saurabh; Drineas, Petros
2016-04-01
We introduce single-set spectral sparsification as a deterministic sampling-based feature selection technique for regularized least-squares classification, which is the classification analog to ridge regression. The method is unsupervised and gives worst-case guarantees of the generalization power of the classification function after feature selection with respect to the classification function obtained using all features. We also introduce leverage-score sampling as an unsupervised randomized feature selection method for ridge regression. We provide risk bounds for both single-set spectral sparsification and leverage-score sampling on ridge regression in the fixed design setting and show that the risk in the sampled space is comparable to the risk in the full-feature space. We perform experiments on synthetic and real-world data sets; a subset of TechTC-300 data sets, to support our theory. Experimental results indicate that the proposed methods perform better than the existing feature selection methods.
Yang, Lanlin; Cai, Sufen; Zhang, Shuoping; Kong, Xiangyi; Gu, Yifan; Lu, Changfu; Dai, Jing; Gong, Fei; Lu, Guangxiu; Lin, Ge
2018-05-01
Does single cleavage-stage (Day 3) embryo transfer using a time-lapse (TL) hierarchical classification model achieve comparable ongoing pregnancy rates (OPR) to single blastocyst (Day 5) transfer by conventional morphological (CM) selection? Day 3 single embryo transfer (SET) with a hierarchical classification model had a significantly lower OPR compared with Day 5 SET with CM selection. Cleavage-stage SET is an alternative to blastocyst SET. Time-lapse imaging assists better embryo selection, based on studies of pregnancy outcomes when adding time-lapse imaging to CM selection at the cleavage or blastocyst stage. This single-centre, randomized, open-label, active-controlled, non-inferiority study included 600 women between October 2015 and April 2017. Eligible patients were Chinese females, aged ≤36 years, who were undergoing their first or second fresh IVF cycle using their own oocytes, and who had FSH levels ≤12 IU/mL on Day 3 of the cycle and 10 or more oocytes retrieved. Patients who had underlying uterine conditions, oocyte donation, recurrent pregnancy loss, abnormal oocytes or <6 normally fertilized embryos (2PN) were excluded from the study participation. Patients were randomized 1:1 to either the cleavage-stage SET with a time-lapse hierarchical classification model for selection (D3 + TL) or blastocyst SET with CM selection (D5 + CM). All normally fertilized zygotes were cultured in Primo Vision. The study was conducted at a tertiary IVF centre (CITIC-Xiangya) and OPR was the primary outcome. A total of 600 patients were randomized to the two groups, among which 585 (D3 + TL = 290, D5 + CM = 295) were included in the Modified-intention-to-treat (mITT) population and 517 (D3 + TL = 261, D5 + CM = 256) were included in the PP population. In the per protocol (PP) population, OPR was significantly lower in the D3 group (59.4%, 155/261) than in the D5 group (68.4%, 175/256) (difference: -9.0%, 95% CI: -17.1%, -0.7%, P = 0.03). Analysis in mITT population showed a marginally significant difference in the OPR between the D3 + TL and D5 + CM groups (56.6 versus 64.1%, difference: -7.5%, 95% CI: -15.4%, 0.4%, P = 0.06). The D3 + TL group resulted in a markedly lower implantation rate than the D5 + CM group (64.4 versus 77.0%; P = 0.002) in the PP analysis, however, the early miscarriage rate did not significantly differ between the two groups. The study lacked a direct comparison between time-lapse and CM selections at cleavage-stage SET and was statistically underpowered to detect non-inferiority. The subject's eligibility criteria favouring women with a good prognosis for IVF weakened the generalizability of the results. The OPR from Day 3 cleavage-stage SET using hierarchical classification time-lapse selection was significantly lower compared with that from Day 5 blastocyst SET using conventional morphology, yet it appeared to be clinically acceptable in women underwent IVF. This study is supported by grants from Ferring Pharmaceuticals and the Program for New Century Excellent Talents in University, China. ChiCTR-ICR-15006600. 16 June 2015. 1 October 2015.
What is the Optimal Strategy for Adaptive Servo-Ventilation Therapy?
Imamura, Teruhiko; Kinugawa, Koichiro
2018-05-23
Clinical advantages in the adaptive servo-ventilation (ASV) therapy have been reported in selected heart failure patients with/without sleep-disorder breathing, whereas multicenter randomized control trials could not demonstrate such advantages. Considering this discrepancy, optimal patient selection and device setting may be a key for the successful ASV therapy. Hemodynamic and echocardiographic parameters indicating pulmonary congestion such as elevated pulmonary capillary wedge pressure were reported as predictors of good response to ASV therapy. Recently, parameters indicating right ventricular dysfunction also have been reported as good predictors. Optimal device setting with appropriate pressure setting during appropriate time may also be a key. Large-scale prospective trial with optimal patient selection and optimal device setting is warranted.
Zhan, Xue-yan; Zhao, Na; Lin, Zhao-zhou; Wu, Zhi-sheng; Yuan, Rui-juan; Qiao, Yan-jiang
2014-12-01
The appropriate algorithm for calibration set selection was one of the key technologies for a good NIR quantitative model. There are different algorithms for calibration set selection, such as Random Sampling (RS) algorithm, Conventional Selection (CS) algorithm, Kennard-Stone(KS) algorithm and Sample set Portioning based on joint x-y distance (SPXY) algorithm, et al. However, there lack systematic comparisons between two algorithms of the above algorithms. The NIR quantitative models to determine the asiaticoside content in Centella total glucosides were established in the present paper, of which 7 indexes were classified and selected, and the effects of CS algorithm, KS algorithm and SPXY algorithm for calibration set selection on the accuracy and robustness of NIR quantitative models were investigated. The accuracy indexes of NIR quantitative models with calibration set selected by SPXY algorithm were significantly different from that with calibration set selected by CS algorithm or KS algorithm, while the robustness indexes, such as RMSECV and |RMSEP-RMSEC|, were not significantly different. Therefore, SPXY algorithm for calibration set selection could improve the predicative accuracy of NIR quantitative models to determine asiaticoside content in Centella total glucosides, and have no significant effect on the robustness of the models, which provides a reference to determine the appropriate algorithm for calibration set selection when NIR quantitative models are established for the solid system of traditional Chinese medcine.
Random sampling of elementary flux modes in large-scale metabolic networks.
Machado, Daniel; Soons, Zita; Patil, Kiran Raosaheb; Ferreira, Eugénio C; Rocha, Isabel
2012-09-15
The description of a metabolic network in terms of elementary (flux) modes (EMs) provides an important framework for metabolic pathway analysis. However, their application to large networks has been hampered by the combinatorial explosion in the number of modes. In this work, we develop a method for generating random samples of EMs without computing the whole set. Our algorithm is an adaptation of the canonical basis approach, where we add an additional filtering step which, at each iteration, selects a random subset of the new combinations of modes. In order to obtain an unbiased sample, all candidates are assigned the same probability of getting selected. This approach avoids the exponential growth of the number of modes during computation, thus generating a random sample of the complete set of EMs within reasonable time. We generated samples of different sizes for a metabolic network of Escherichia coli, and observed that they preserve several properties of the full EM set. It is also shown that EM sampling can be used for rational strain design. A well distributed sample, that is representative of the complete set of EMs, should be suitable to most EM-based methods for analysis and optimization of metabolic networks. Source code for a cross-platform implementation in Python is freely available at http://code.google.com/p/emsampler. dmachado@deb.uminho.pt Supplementary data are available at Bioinformatics online.
Comparative study of feature selection with ensemble learning using SOM variants
NASA Astrophysics Data System (ADS)
Filali, Ameni; Jlassi, Chiraz; Arous, Najet
2017-03-01
Ensemble learning has succeeded in the growth of stability and clustering accuracy, but their runtime prohibits them from scaling up to real-world applications. This study deals the problem of selecting a subset of the most pertinent features for every cluster from a dataset. The proposed method is another extension of the Random Forests approach using self-organizing maps (SOM) variants to unlabeled data that estimates the out-of-bag feature importance from a set of partitions. Every partition is created using a various bootstrap sample and a random subset of the features. Then, we show that the process internal estimates are used to measure variable pertinence in Random Forests are also applicable to feature selection in unsupervised learning. This approach aims to the dimensionality reduction, visualization and cluster characterization at the same time. Hence, we provide empirical results on nineteen benchmark data sets indicating that RFS can lead to significant improvement in terms of clustering accuracy, over several state-of-the-art unsupervised methods, with a very limited subset of features. The approach proves promise to treat with very broad domains.
Sampling Large Graphs for Anticipatory Analytics
2015-05-15
low. C. Random Area Sampling Random area sampling [8] is a “ snowball ” sampling method in which a set of random seed vertices are selected and areas... Sampling Large Graphs for Anticipatory Analytics Lauren Edwards, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller Lincoln...systems, greater human-in-the-loop involvement, or through complex algorithms. We are investigating the use of sampling to mitigate these challenges
Kangovi, Shreya; Mitra, Nandita; Turr, Lindsey; Huo, Hairong; Grande, David; Long, Judith A.
2017-01-01
Upstream interventions – e.g. housing programs and community health worker interventions-address socioeconomic and behavioral factors that influence health outcomes across diseases. Studying these types of interventions in clinical trials raises a methodological challenge: how should researchers measure the effect of an upstream intervention in a sample of patients with different diseases? This paper addresses this question using an illustrative protocol of a randomized controlled trial of collaborative-goal setting versus goal-setting plus community health worker support among patients multiple chronic diseases: diabetes, obesity, hypertension and tobacco dependence. At study enrollment, patients met with their primary care providers to select one of their chronic diseases to focus on during the study, and to collaboratively set a goal for that disease. Patients randomly assigned to a community health worker also received six months of support to address socioeconomic and behavioral barriers to chronic disease control. The primary hypothesis was that there would be differences in patients’ selected chronic disease control as measured by HbA1c, body mass index, systolic blood pressure and cigarettes per day, between the goal-setting alone and community health worker support arms. To test this hypothesis, we will conduct a stratum specific multivariate analysis of variance which allows all patients (regardless of their selected chronic disease) to be included in a single model for the primary outcome. Population health researchers can use this approach to measure clinical outcomes across diseases. PMID:27965180
Randomness Testing of the Advanced Encryption Standard Finalist Candidates
2000-03-28
Excursions Variant 18 168-185 Rank 1 7 Serial 2 186-187 Spectral DFT 1 8 Lempel - Ziv Compression 1 188 Aperiodic Templates 148 9-156 Linear Complexity...256 bits) for each of the algorithms , for a total of 80 different data sets10. These data sets were selected based on the belief that they would be...useful in evaluating the randomness of cryptographic algorithms . Table 2 lists the eight data types. For a description of the data types, see Appendix
Janet, Jon Paul; Kulik, Heather J
2017-11-22
Machine learning (ML) of quantum mechanical properties shows promise for accelerating chemical discovery. For transition metal chemistry where accurate calculations are computationally costly and available training data sets are small, the molecular representation becomes a critical ingredient in ML model predictive accuracy. We introduce a series of revised autocorrelation functions (RACs) that encode relationships of the heuristic atomic properties (e.g., size, connectivity, and electronegativity) on a molecular graph. We alter the starting point, scope, and nature of the quantities evaluated in standard ACs to make these RACs amenable to inorganic chemistry. On an organic molecule set, we first demonstrate superior standard AC performance to other presently available topological descriptors for ML model training, with mean unsigned errors (MUEs) for atomization energies on set-aside test molecules as low as 6 kcal/mol. For inorganic chemistry, our RACs yield 1 kcal/mol ML MUEs on set-aside test molecules in spin-state splitting in comparison to 15-20× higher errors for feature sets that encode whole-molecule structural information. Systematic feature selection methods including univariate filtering, recursive feature elimination, and direct optimization (e.g., random forest and LASSO) are compared. Random-forest- or LASSO-selected subsets 4-5× smaller than the full RAC set produce sub- to 1 kcal/mol spin-splitting MUEs, with good transferability to metal-ligand bond length prediction (0.004-5 Å MUE) and redox potential on a smaller data set (0.2-0.3 eV MUE). Evaluation of feature selection results across property sets reveals the relative importance of local, electronic descriptors (e.g., electronegativity, atomic number) in spin-splitting and distal, steric effects in redox potential and bond lengths.
Adiposity and Quality of Life: A Case Study from an Urban Center in Nigeria
ERIC Educational Resources Information Center
Akinpelu, Aderonke O.; Akinola, Odunayo T.; Gbiri, Caleb A.
2009-01-01
Objective: To determine relationship between adiposity indices and quality of life (QOL) of residents of a housing estate in Lagos, Nigeria. Design: Cross-sectional survey employing multistep random sampling method. Setting: Urban residential estate. Participants: This study involved 900 randomly selected residents of Abesan Housing Estate, Lagos,…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Peressutti, D; Schipaanboord, B; Kadir, T
Purpose: To investigate the effectiveness of atlas selection methods for improving atlas-based auto-contouring in radiotherapy planning. Methods: 275 H&N clinically delineated cases were employed as an atlas database from which atlases would be selected. A further 40 previously contoured cases were used as test patients against which atlas selection could be performed and evaluated. 26 variations of selection methods proposed in the literature and used in commercial systems were investigated. Atlas selection methods comprised either global or local image similarity measures, computed after rigid or deformable registration, combined with direct atlas search or with an intermediate template image. Workflow Boxmore » (Mirada-Medical, Oxford, UK) was used for all auto-contouring. Results on brain, brainstem, parotids and spinal cord were compared to random selection, a fixed set of 10 “good” atlases, and optimal selection by an “oracle” with knowledge of the ground truth. The Dice score and the average ranking with respect to the “oracle” were employed to assess the performance of the top 10 atlases selected by each method. Results: The fixed set of “good” atlases outperformed all of the atlas-patient image similarity-based selection methods (mean Dice 0.715 c.f. 0.603 to 0.677). In general, methods based on exhaustive comparison of local similarity measures showed better average Dice scores (0.658 to 0.677) compared to the use of either template image (0.655 to 0.672) or global similarity measures (0.603 to 0.666). The performance of image-based selection methods was found to be only slightly better than a random (0.645). Dice scores given relate to the left parotid, but similar results patterns were observed for all organs. Conclusion: Intuitively, atlas selection based on the patient CT is expected to improve auto-contouring performance. However, it was found that published approaches performed marginally better than random and use of a fixed set of representative atlases showed favourable performance. This research was funded via InnovateUK Grant 600277 as part of Eurostars Grant E!9297. DP,BS,MG,TK are employees of Mirada Medical Ltd.« less
Unbiased feature selection in learning random forests for high-dimensional data.
Nguyen, Thanh-Tung; Huang, Joshua Zhexue; Nguyen, Thuy Thi
2015-01-01
Random forests (RFs) have been widely used as a powerful classification method. However, with the randomization in both bagging samples and feature selection, the trees in the forest tend to select uninformative features for node splitting. This makes RFs have poor accuracy when working with high-dimensional data. Besides that, RFs have bias in the feature selection process where multivalued features are favored. Aiming at debiasing feature selection in RFs, we propose a new RF algorithm, called xRF, to select good features in learning RFs for high-dimensional data. We first remove the uninformative features using p-value assessment, and the subset of unbiased features is then selected based on some statistical measures. This feature subset is then partitioned into two subsets. A feature weighting sampling technique is used to sample features from these two subsets for building trees. This approach enables one to generate more accurate trees, while allowing one to reduce dimensionality and the amount of data needed for learning RFs. An extensive set of experiments has been conducted on 47 high-dimensional real-world datasets including image datasets. The experimental results have shown that RFs with the proposed approach outperformed the existing random forests in increasing the accuracy and the AUC measures.
SNP selection and classification of genome-wide SNP data using stratified sampling random forests.
Wu, Qingyao; Ye, Yunming; Liu, Yang; Ng, Michael K
2012-09-01
For high dimensional genome-wide association (GWA) case-control data of complex disease, there are usually a large portion of single-nucleotide polymorphisms (SNPs) that are irrelevant with the disease. A simple random sampling method in random forest using default mtry parameter to choose feature subspace, will select too many subspaces without informative SNPs. Exhaustive searching an optimal mtry is often required in order to include useful and relevant SNPs and get rid of vast of non-informative SNPs. However, it is too time-consuming and not favorable in GWA for high-dimensional data. The main aim of this paper is to propose a stratified sampling method for feature subspace selection to generate decision trees in a random forest for GWA high-dimensional data. Our idea is to design an equal-width discretization scheme for informativeness to divide SNPs into multiple groups. In feature subspace selection, we randomly select the same number of SNPs from each group and combine them to form a subspace to generate a decision tree. The advantage of this stratified sampling procedure can make sure each subspace contains enough useful SNPs, but can avoid a very high computational cost of exhaustive search of an optimal mtry, and maintain the randomness of a random forest. We employ two genome-wide SNP data sets (Parkinson case-control data comprised of 408 803 SNPs and Alzheimer case-control data comprised of 380 157 SNPs) to demonstrate that the proposed stratified sampling method is effective, and it can generate better random forest with higher accuracy and lower error bound than those by Breiman's random forest generation method. For Parkinson data, we also show some interesting genes identified by the method, which may be associated with neurological disorders for further biological investigations.
Consistency of Toddler Engagement across Two Settings
ERIC Educational Resources Information Center
Aguiar, Cecilia; McWilliam, R. A.
2013-01-01
This study documented the consistency of child engagement across two settings, toddler child care classrooms and mother-child dyadic play. One hundred twelve children, aged 14-36 months (M = 25.17, SD = 6.06), randomly selected from 30 toddler child care classrooms from the district of Porto, Portugal, participated. Levels of engagement were…
2012-01-01
Background Single embryo transfer (SET) remains underutilized as a strategy to reduce multiple gestation risk in IVF, and its overall lower pregnancy rate underscores the need for improved techniques to select one embryo for fresh transfer. This study explored use of comprehensive chromosomal screening by array CGH (aCGH) to provide this advantage and improve pregnancy rate from SET. Methods First-time IVF patients with a good prognosis (age <35, no prior miscarriage) and normal karyotype seeking elective SET were prospectively randomized into two groups: In Group A, embryos were selected on the basis of morphology and comprehensive chromosomal screening via aCGH (from d5 trophectoderm biopsy) while Group B embryos were assessed by morphology only. All patients had a single fresh blastocyst transferred on d6. Laboratory parameters and clinical pregnancy rates were compared between the two groups. Results For patients in Group A (n = 55), 425 blastocysts were biopsied and analyzed via aCGH (7.7 blastocysts/patient). Aneuploidy was detected in 191/425 (44.9%) of blastocysts in this group. For patients in Group B (n = 48), 389 blastocysts were microscopically examined (8.1 blastocysts/patient). Clinical pregnancy rate was significantly higher in the morphology + aCGH group compared to the morphology-only group (70.9 and 45.8%, respectively; p = 0.017); ongoing pregnancy rate for Groups A and B were 69.1 vs. 41.7%, respectively (p = 0.009). There were no twin pregnancies. Conclusion Although aCGH followed by frozen embryo transfer has been used to screen at risk embryos (e.g., known parental chromosomal translocation or history of recurrent pregnancy loss), this is the first description of aCGH fully integrated with a clinical IVF program to select single blastocysts for fresh SET in good prognosis patients. The observed aneuploidy rate (44.9%) among biopsied blastocysts highlights the inherent imprecision of SET when conventional morphology is used alone. Embryos randomized to the aCGH group implanted with greater efficiency, resulted in clinical pregnancy more often, and yielded a lower miscarriage rate than those selected without aCGH. Additional studies are needed to verify our pilot data and confirm a role for on-site, rapid aCGH for IVF patients contemplating fresh SET. PMID:22551456
Overlapping meta-analyses on the same topic: survey of published studies.
Siontis, Konstantinos C; Hernandez-Boussard, Tina; Ioannidis, John P A
2013-07-19
To assess how common it is to have multiple overlapping meta-analyses of randomized trials published on the same topic. Survey of published meta-analyses. PubMed. Meta-analyses published in 2010 were identified, and 5% of them were randomly selected. We further selected those that included randomized trials and examined effectiveness of any medical intervention. For eligible meta-analyses, we searched for other meta-analyses on the same topic (covering the same comparisons, indications/settings, and outcomes or overlapping subsets of them) published until February 2013. Of 73 eligible meta-analyses published in 2010, 49 (67%) had at least one other overlapping meta-analysis (median two meta-analyses per topic, interquartile range 1-4, maximum 13). In 17 topics at least one author was involved in at least two of the overlapping meta-analyses. No characteristics of the index meta-analyses were associated with the potential for overlapping meta-analyses. Among pairs of overlapping meta-analyses in 20 randomly selected topics, 13 of the more recent meta-analyses did not include any additional outcomes. In three of the four topics with eight or more published meta-analyses, many meta-analyses examined only a subset of the eligible interventions or indications/settings covered by the index meta-analysis. Conversely, for statins in the prevention of atrial fibrillation after cardiac surgery, 11 meta-analyses were published with similar eligibility criteria for interventions and setting: there was still variability on which studies were included, but the results were always similar or even identical across meta-analyses. While some independent replication of meta-analyses by different teams is possibly useful, the overall picture suggests that there is a waste of efforts with many topics covered by multiple overlapping meta-analyses.
Fuzzy Random λ-Mean SAD Portfolio Selection Problem: An Ant Colony Optimization Approach
NASA Astrophysics Data System (ADS)
Thakur, Gour Sundar Mitra; Bhattacharyya, Rupak; Mitra, Swapan Kumar
2010-10-01
To reach the investment goal, one has to select a combination of securities among different portfolios containing large number of securities. Only the past records of each security do not guarantee the future return. As there are many uncertain factors which directly or indirectly influence the stock market and there are also some newer stock markets which do not have enough historical data, experts' expectation and experience must be combined with the past records to generate an effective portfolio selection model. In this paper the return of security is assumed to be Fuzzy Random Variable Set (FRVS), where returns are set of random numbers which are in turn fuzzy numbers. A new λ-Mean Semi Absolute Deviation (λ-MSAD) portfolio selection model is developed. The subjective opinions of the investors to the rate of returns of each security are taken into consideration by introducing a pessimistic-optimistic parameter vector λ. λ-Mean Semi Absolute Deviation (λ-MSAD) model is preferred as it follows absolute deviation of the rate of returns of a portfolio instead of the variance as the measure of the risk. As this model can be reduced to Linear Programming Problem (LPP) it can be solved much faster than quadratic programming problems. Ant Colony Optimization (ACO) is used for solving the portfolio selection problem. ACO is a paradigm for designing meta-heuristic algorithms for combinatorial optimization problem. Data from BSE is used for illustration.
Polynomial order selection in random regression models via penalizing adaptively the likelihood.
Corrales, J D; Munilla, S; Cantet, R J C
2015-08-01
Orthogonal Legendre polynomials (LP) are used to model the shape of additive genetic and permanent environmental effects in random regression models (RRM). Frequently, the Akaike (AIC) and the Bayesian (BIC) information criteria are employed to select LP order. However, it has been theoretically shown that neither AIC nor BIC is simultaneously optimal in terms of consistency and efficiency. Thus, the goal was to introduce a method, 'penalizing adaptively the likelihood' (PAL), as a criterion to select LP order in RRM. Four simulated data sets and real data (60,513 records, 6675 Colombian Holstein cows) were employed. Nested models were fitted to the data, and AIC, BIC and PAL were calculated for all of them. Results showed that PAL and BIC identified with probability of one the true LP order for the additive genetic and permanent environmental effects, but AIC tended to favour over parameterized models. Conversely, when the true model was unknown, PAL selected the best model with higher probability than AIC. In the latter case, BIC never favoured the best model. To summarize, PAL selected a correct model order regardless of whether the 'true' model was within the set of candidates. © 2015 Blackwell Verlag GmbH.
Ma, Li; Fan, Suohai
2017-03-14
The random forests algorithm is a type of classifier with prominent universality, a wide application range, and robustness for avoiding overfitting. But there are still some drawbacks to random forests. Therefore, to improve the performance of random forests, this paper seeks to improve imbalanced data processing, feature selection and parameter optimization. We propose the CURE-SMOTE algorithm for the imbalanced data classification problem. Experiments on imbalanced UCI data reveal that the combination of Clustering Using Representatives (CURE) enhances the original synthetic minority oversampling technique (SMOTE) algorithms effectively compared with the classification results on the original data using random sampling, Borderline-SMOTE1, safe-level SMOTE, C-SMOTE, and k-means-SMOTE. Additionally, the hybrid RF (random forests) algorithm has been proposed for feature selection and parameter optimization, which uses the minimum out of bag (OOB) data error as its objective function. Simulation results on binary and higher-dimensional data indicate that the proposed hybrid RF algorithms, hybrid genetic-random forests algorithm, hybrid particle swarm-random forests algorithm and hybrid fish swarm-random forests algorithm can achieve the minimum OOB error and show the best generalization ability. The training set produced from the proposed CURE-SMOTE algorithm is closer to the original data distribution because it contains minimal noise. Thus, better classification results are produced from this feasible and effective algorithm. Moreover, the hybrid algorithm's F-value, G-mean, AUC and OOB scores demonstrate that they surpass the performance of the original RF algorithm. Hence, this hybrid algorithm provides a new way to perform feature selection and parameter optimization.
Affect-Aware Adaptive Tutoring Based on Human-Automation Etiquette Strategies.
Yang, Euijung; Dorneich, Michael C
2018-06-01
We investigated adapting the interaction style of intelligent tutoring system (ITS) feedback based on human-automation etiquette strategies. Most ITSs adapt the content difficulty level, adapt the feedback timing, or provide extra content when they detect cognitive or affective decrements. Our previous work demonstrated that changing the interaction style via different feedback etiquette strategies has differential effects on students' motivation, confidence, satisfaction, and performance. The best etiquette strategy was also determined by user frustration. Based on these findings, a rule set was developed that systemically selected the proper etiquette strategy to address one of four learning factors (motivation, confidence, satisfaction, and performance) under two different levels of user frustration. We explored whether etiquette strategy selection based on this rule set (systematic) or random changes in etiquette strategy for a given level of frustration affected the four learning factors. Participants solved mathematics problems under different frustration conditions with feedback that adapted dynamic changes in etiquette strategies either systematically or randomly. The results demonstrated that feedback with etiquette strategies chosen systematically via the rule set could selectively target and improve motivation, confidence, satisfaction, and performance more than changing etiquette strategies randomly. The systematic adaptation was effective no matter the level of frustration for the participant. If computer tutors can vary the interaction style to effectively mitigate negative emotions, then ITS designers would have one more mechanism in which to design affect-aware adaptations that provide the proper responses in situations where human emotions affect the ability to learn.
USDA-ARS?s Scientific Manuscript database
(Co)variance components for calving ease and stillbirth in US Holsteins were estimated using a single-trait threshold animal model and two different sets of data edits. Six sets of approximately 250,000 records each were created by randomly selecting herd codes without replacement from the data used...
Rekully, Cameron M; Faulkner, Stefan T; Lachenmyer, Eric M; Cunningham, Brady R; Shaw, Timothy J; Richardson, Tammi L; Myrick, Michael L
2018-03-01
An all-pairs method is used to analyze phytoplankton fluorescence excitation spectra. An initial set of nine phytoplankton species is analyzed in pairwise fashion to select two optical filter sets, and then the two filter sets are used to explore variations among a total of 31 species in a single-cell fluorescence imaging photometer. Results are presented in terms of pair analyses; we report that 411 of the 465 possible pairings of the larger group of 31 species can be distinguished using the initial nine-species-based selection of optical filters. A bootstrap analysis based on the larger data set shows that the distribution of possible pair separation results based on a randomly selected nine-species initial calibration set is strongly peaked in the 410-415 pair separation range, consistent with our experimental result. Further, the result for filter selection using all 31 species is also 411 pair separations; The set of phytoplankton fluorescence excitation spectra is intuitively high in rank due to the number and variety of pigments that contribute to the spectrum. However, the results in this report are consistent with an effective rank as determined by a variety of heuristic and statistical methods in the range of 2-3. These results are reviewed in consideration of how consistent the filter selections are from model to model for the data presented here. We discuss the common observation that rank is generally found to be relatively low even in many seemingly complex circumstances, so that it may be productive to assume a low rank from the beginning. If a low-rank hypothesis is valid, then relatively few samples are needed to explore an experimental space. Under very restricted circumstances for uniformly distributed samples, the minimum number for an initial analysis might be as low as 8-11 random samples for 1-3 factors.
An overview of the Columbia Habitat Monitoring Program's (CHaMP) spatial-temporal design framework
We briefly review the concept of a master sample applied to stream networks in which a randomized set of stream sites is selected across a broad region to serve as a list of sites from which a subset of sites is selected to achieve multiple objectives of specific designs. The Col...
Selection Dynamics in Joint Matching to Rate and Magnitude of Reinforcement
ERIC Educational Resources Information Center
McDowell, J. J.; Popa, Andrei; Calvin, Nicholas T.
2012-01-01
Virtual organisms animated by a selectionist theory of behavior dynamics worked on concurrent random interval schedules where both the rate and magnitude of reinforcement were varied. The selectionist theory consists of a set of simple rules of selection, recombination, and mutation that act on a population of potential behaviors by means of a…
Improved Compressive Sensing of Natural Scenes Using Localized Random Sampling
Barranca, Victor J.; Kovačič, Gregor; Zhou, Douglas; Cai, David
2016-01-01
Compressive sensing (CS) theory demonstrates that by using uniformly-random sampling, rather than uniformly-spaced sampling, higher quality image reconstructions are often achievable. Considering that the structure of sampling protocols has such a profound impact on the quality of image reconstructions, we formulate a new sampling scheme motivated by physiological receptive field structure, localized random sampling, which yields significantly improved CS image reconstructions. For each set of localized image measurements, our sampling method first randomly selects an image pixel and then measures its nearby pixels with probability depending on their distance from the initially selected pixel. We compare the uniformly-random and localized random sampling methods over a large space of sampling parameters, and show that, for the optimal parameter choices, higher quality image reconstructions can be consistently obtained by using localized random sampling. In addition, we argue that the localized random CS optimal parameter choice is stable with respect to diverse natural images, and scales with the number of samples used for reconstruction. We expect that the localized random sampling protocol helps to explain the evolutionarily advantageous nature of receptive field structure in visual systems and suggests several future research areas in CS theory and its application to brain imaging. PMID:27555464
He, Yi; Xiao, Yi; Liwo, Adam; Scheraga, Harold A
2009-10-01
We explored the energy-parameter space of our coarse-grained UNRES force field for large-scale ab initio simulations of protein folding, to obtain good initial approximations for hierarchical optimization of the force field with new virtual-bond-angle bending and side-chain-rotamer potentials which we recently introduced to replace the statistical potentials. 100 sets of energy-term weights were generated randomly, and good sets were selected by carrying out replica-exchange molecular dynamics simulations of two peptides with a minimal alpha-helical and a minimal beta-hairpin fold, respectively: the tryptophan cage (PDB code: 1L2Y) and tryptophan zipper (PDB code: 1LE1). Eight sets of parameters produced native-like structures of these two peptides. These eight sets were tested on two larger proteins: the engrailed homeodomain (PDB code: 1ENH) and FBP WW domain (PDB code: 1E0L); two sets were found to produce native-like conformations of these proteins. These two sets were tested further on a larger set of nine proteins with alpha or alpha + beta structure and found to locate native-like structures of most of them. These results demonstrate that, in addition to finding reasonable initial starting points for optimization, an extensive search of parameter space is a powerful method to produce a transferable force field. Copyright 2009 Wiley Periodicals, Inc.
Efficient encapsulation of proteins with random copolymers.
Nguyen, Trung Dac; Qiao, Baofu; Olvera de la Cruz, Monica
2018-06-12
Membraneless organelles are aggregates of disordered proteins that form spontaneously to promote specific cellular functions in vivo. The possibility of synthesizing membraneless organelles out of cells will therefore enable fabrication of protein-based materials with functions inherent to biological matter. Since random copolymers contain various compositions and sequences of solvophobic and solvophilic groups, they are expected to function in nonbiological media similarly to a set of disordered proteins in membraneless organelles. Interestingly, the internal environment of these organelles has been noted to behave more like an organic solvent than like water. Therefore, an adsorbed layer of random copolymers that mimics the function of disordered proteins could, in principle, protect and enhance the proteins' enzymatic activity even in organic solvents, which are ideal when the products and/or the reactants have limited solubility in aqueous media. Here, we demonstrate via multiscale simulations that random copolymers efficiently incorporate proteins into different solvents with the potential to optimize their enzymatic activity. We investigate the key factors that govern the ability of random copolymers to encapsulate proteins, including the adsorption energy, copolymer average composition, and solvent selectivity. The adsorbed polymer chains have remarkably similar sequences, indicating that the proteins are able to select certain sequences that best reduce their exposure to the solvent. We also find that the protein surface coverage decreases when the fluctuation in the average distance between the protein adsorption sites increases. The results herein set the stage for computational design of random copolymers for stabilizing and delivering proteins across multiple media.
Prediction of aquatic toxicity mode of action using linear discriminant and random forest models.
Martin, Todd M; Grulke, Christopher M; Young, Douglas M; Russom, Christine L; Wang, Nina Y; Jackson, Crystal R; Barron, Mace G
2013-09-23
The ability to determine the mode of action (MOA) for a diverse group of chemicals is a critical part of ecological risk assessment and chemical regulation. However, existing MOA assignment approaches in ecotoxicology have been limited to a relatively few MOAs, have high uncertainty, or rely on professional judgment. In this study, machine based learning algorithms (linear discriminant analysis and random forest) were used to develop models for assigning aquatic toxicity MOA. These methods were selected since they have been shown to be able to correlate diverse data sets and provide an indication of the most important descriptors. A data set of MOA assignments for 924 chemicals was developed using a combination of high confidence assignments, international consensus classifications, ASTER (ASessment Tools for the Evaluation of Risk) predictions, and weight of evidence professional judgment based an assessment of structure and literature information. The overall data set was randomly divided into a training set (75%) and a validation set (25%) and then used to develop linear discriminant analysis (LDA) and random forest (RF) MOA assignment models. The LDA and RF models had high internal concordance and specificity and were able to produce overall prediction accuracies ranging from 84.5 to 87.7% for the validation set. These results demonstrate that computational chemistry approaches can be used to determine the acute toxicity MOAs across a large range of structures and mechanisms.
Humphreys, Keith; Blodgett, Janet C.; Wagner, Todd H.
2014-01-01
Background Observational studies of Alcoholics Anonymous’ (AA) effectiveness are vulnerable to self-selection bias because individuals choose whether or not to attend AA. The present study therefore employed an innovative statistical technique to derive a selection bias-free estimate of AA’s impact. Methods Six datasets from 5 National Institutes of Health-funded randomized trials (one with two independent parallel arms) of AA facilitation interventions were analyzed using instrumental variables models. Alcohol dependent individuals in one of the datasets (n = 774) were analyzed separately from the rest of sample (n = 1582 individuals pooled from 5 datasets) because of heterogeneity in sample parameters. Randomization itself was used as the instrumental variable. Results Randomization was a good instrument in both samples, effectively predicting increased AA attendance that could not be attributed to self-selection. In five of the six data sets, which were pooled for analysis, increased AA attendance that was attributable to randomization (i.e., free of self-selection bias) was effective at increasing days of abstinence at 3-month (B = .38, p = .001) and 15-month (B = 0.42, p = .04) follow-up. However, in the remaining dataset, in which pre-existing AA attendance was much higher, further increases in AA involvement caused by the randomly assigned facilitation intervention did not affect drinking outcome. Conclusions For most individuals seeking help for alcohol problems, increasing AA attendance leads to short and long term decreases in alcohol consumption that cannot be attributed to self-selection. However, for populations with high pre-existing AA involvement, further increases in AA attendance may have little impact. PMID:25421504
Measuring socioeconomic status in multicountry studies: results from the eight-country MAL-ED study
2014-01-01
Background There is no standardized approach to comparing socioeconomic status (SES) across multiple sites in epidemiological studies. This is particularly problematic when cross-country comparisons are of interest. We sought to develop a simple measure of SES that would perform well across diverse, resource-limited settings. Methods A cross-sectional study was conducted with 800 children aged 24 to 60 months across eight resource-limited settings. Parents were asked to respond to a household SES questionnaire, and the height of each child was measured. A statistical analysis was done in two phases. First, the best approach for selecting and weighting household assets as a proxy for wealth was identified. We compared four approaches to measuring wealth: maternal education, principal components analysis, Multidimensional Poverty Index, and a novel variable selection approach based on the use of random forests. Second, the selected wealth measure was combined with other relevant variables to form a more complete measure of household SES. We used child height-for-age Z-score (HAZ) as the outcome of interest. Results Mean age of study children was 41 months, 52% were boys, and 42% were stunted. Using cross-validation, we found that random forests yielded the lowest prediction error when selecting assets as a measure of household wealth. The final SES index included access to improved water and sanitation, eight selected assets, maternal education, and household income (the WAMI index). A 25% difference in the WAMI index was positively associated with a difference of 0.38 standard deviations in HAZ (95% CI 0.22 to 0.55). Conclusions Statistical learning methods such as random forests provide an alternative to principal components analysis in the development of SES scores. Results from this multicountry study demonstrate the validity of a simplified SES index. With further validation, this simplified index may provide a standard approach for SES adjustment across resource-limited settings. PMID:24656134
Covariate Selection for Multilevel Models with Missing Data
Marino, Miguel; Buxton, Orfeu M.; Li, Yi
2017-01-01
Missing covariate data hampers variable selection in multilevel regression settings. Current variable selection techniques for multiply-imputed data commonly address missingness in the predictors through list-wise deletion and stepwise-selection methods which are problematic. Moreover, most variable selection methods are developed for independent linear regression models and do not accommodate multilevel mixed effects regression models with incomplete covariate data. We develop a novel methodology that is able to perform covariate selection across multiply-imputed data for multilevel random effects models when missing data is present. Specifically, we propose to stack the multiply-imputed data sets from a multiple imputation procedure and to apply a group variable selection procedure through group lasso regularization to assess the overall impact of each predictor on the outcome across the imputed data sets. Simulations confirm the advantageous performance of the proposed method compared with the competing methods. We applied the method to reanalyze the Healthy Directions-Small Business cancer prevention study, which evaluated a behavioral intervention program targeting multiple risk-related behaviors in a working-class, multi-ethnic population. PMID:28239457
The Effect of Data Format on Integration of Performance Data into Angoff Judgments
ERIC Educational Resources Information Center
Clauser, Brian E.; Mee, Janet; Margolis, Melissa J.
2013-01-01
This study investigated the extent to which the performance data format impacted data use in Angoff standard setting exercises. Judges from two standard settings (a total of five panels) were randomly assigned to one of two groups. The full-data group received two types of data: (1) the proportion of examinees selecting each option and (2) plots…
ERIC Educational Resources Information Center
Tanglang, Nebath; Ibrahim, Aminu Kazeem
2015-01-01
The study adopted an ex-post facto research design. Randomization sampling technique was used to select 346 undergraduate distance learners and the learners were grouped into four, High and Low Goal setter learners and High and Low Decision-making skills learners. The instruments for data collection were Undergraduate Academic Goal Setting Scale…
A social preference valuations set for EQ-5D health states in Flanders, Belgium.
Cleemput, Irina
2010-04-01
This study aimed at deriving a preference valuation set for EQ-5D health states from the general Flemish public in Belgium. A EuroQol valuation instrument with 16 health states to be valued on a visual analogue scale was sent to a random sample of 2,754 adults. The initial response rate was 35%. Eventually, 548 (20%) respondents provided useable valuations for modeling. Valuations for 245 health states were modeled using a random effects model. The selection of the model was based on two criteria: health state valuations must be consistent, and the difference with the directly observed valuations must be small. A model including a value decrement if any health dimension of the EQ-5D is on the worst level was selected to construct the social health state valuation set. A comparison with health state valuations from other countries showed similarities, especially with those from New Zealand. The use of a single preference valuation set across different health economic evaluations within a country is highly preferable to increase their usability for policy makers. This study contributes to the standardization of outcome measurement in economic evaluations in Belgium.
The quality of care in occupational therapy: an assessment of selected Michigan hospitals.
Kirchman, M M
1979-07-01
In this study, a methodology was developed and tested for assessing the quality of care in occupational therapy between educational and noneducational clinical settings, as measured by process and outcome. An instrument was constructed for an external audit of the hospital record. Standards drafted by the investigator were established as normative by a panel of experts for use in judging the programs. Hospital records of 84 patients with residual hemiparesis or hemiplegia in three noneducational settings and of 100 patients with similar diagnoses in two educational clinical settings from selected Michigan facilities were chosen by proportionate stratified random sampling. The process study showed that occupational therapy was of significantly higher quality in the educational settings. The outcome study did not show significant differences between types of settings. Implications for education and practice are discussed.
Ensemble Feature Learning of Genomic Data Using Support Vector Machine
Anaissi, Ali; Goyal, Madhu; Catchpoole, Daniel R.; Braytee, Ali; Kennedy, Paul J.
2016-01-01
The identification of a subset of genes having the ability to capture the necessary information to distinguish classes of patients is crucial in bioinformatics applications. Ensemble and bagging methods have been shown to work effectively in the process of gene selection and classification. Testament to that is random forest which combines random decision trees with bagging to improve overall feature selection and classification accuracy. Surprisingly, the adoption of these methods in support vector machines has only recently received attention but mostly on classification not gene selection. This paper introduces an ensemble SVM-Recursive Feature Elimination (ESVM-RFE) for gene selection that follows the concepts of ensemble and bagging used in random forest but adopts the backward elimination strategy which is the rationale of RFE algorithm. The rationale behind this is, building ensemble SVM models using randomly drawn bootstrap samples from the training set, will produce different feature rankings which will be subsequently aggregated as one feature ranking. As a result, the decision for elimination of features is based upon the ranking of multiple SVM models instead of choosing one particular model. Moreover, this approach will address the problem of imbalanced datasets by constructing a nearly balanced bootstrap sample. Our experiments show that ESVM-RFE for gene selection substantially increased the classification performance on five microarray datasets compared to state-of-the-art methods. Experiments on the childhood leukaemia dataset show that an average 9% better accuracy is achieved by ESVM-RFE over SVM-RFE, and 5% over random forest based approach. The selected genes by the ESVM-RFE algorithm were further explored with Singular Value Decomposition (SVD) which reveals significant clusters with the selected data. PMID:27304923
On the information content of hydrological signatures and their relationship to catchment attributes
NASA Astrophysics Data System (ADS)
Addor, Nans; Clark, Martyn P.; Prieto, Cristina; Newman, Andrew J.; Mizukami, Naoki; Nearing, Grey; Le Vine, Nataliya
2017-04-01
Hydrological signatures, which are indices characterizing hydrologic behavior, are increasingly used for the evaluation, calibration and selection of hydrological models. Their key advantage is to provide more direct insights into specific hydrological processes than aggregated metrics (e.g., the Nash-Sutcliffe efficiency). A plethora of signatures now exists, which enable characterizing a variety of hydrograph features, but also makes the selection of signatures for new studies challenging. Here we propose that the selection of signatures should be based on their information content, which we estimated using several approaches, all leading to similar conclusions. To explore the relationship between hydrological signatures and the landscape, we extended a previously published data set of hydrometeorological time series for 671 catchments in the contiguous United States, by characterizing the climatic conditions, topography, soil, vegetation and stream network of each catchment. This new catchment attributes data set will soon be in open access, and we are looking forward to introducing it to the community. We used this data set in a data-learning algorithm (random forests) to explore whether hydrological signatures could be inferred from catchment attributes alone. We find that some signatures can be predicted remarkably well by random forests and, interestingly, the same signatures are well captured when simulating discharge using a conceptual hydrological model. We discuss what this result reveals about our understanding of hydrological processes shaping hydrological signatures. We also identify which catchment attributes exert the strongest control on catchment behavior, in particular during extreme hydrological events. Overall, climatic attributes have the most significant influence, and strongly condition how well hydrological signatures can be predicted by random forests and simulated by the hydrological model. In contrast, soil characteristics at the catchment scale are not found to be significant predictors by random forests, which raises questions on how to best use soil data for hydrological modeling, for instance for parameter estimation. We finally demonstrate that signatures with high spatial variability are poorly captured by random forests and model simulations, which makes their regionalization delicate. We conclude with a ranking of signatures based on their information content, and propose that the signatures with high information content are best suited for model calibration, model selection and understanding hydrologic similarity.
Scott, J.C.
1990-01-01
Computer software was written to randomly select sites for a ground-water-quality sampling network. The software uses digital cartographic techniques and subroutines from a proprietary geographic information system. The report presents the approaches, computer software, and sample applications. It is often desirable to collect ground-water-quality samples from various areas in a study region that have different values of a spatial characteristic, such as land-use or hydrogeologic setting. A stratified network can be used for testing hypotheses about relations between spatial characteristics and water quality, or for calculating statistical descriptions of water-quality data that account for variations that correspond to the spatial characteristic. In the software described, a study region is subdivided into areal subsets that have a common spatial characteristic to stratify the population into several categories from which sampling sites are selected. Different numbers of sites may be selected from each category of areal subsets. A population of potential sampling sites may be defined by either specifying a fixed population of existing sites, or by preparing an equally spaced population of potential sites. In either case, each site is identified with a single category, depending on the value of the spatial characteristic of the areal subset in which the site is located. Sites are selected from one category at a time. One of two approaches may be used to select sites. Sites may be selected randomly, or the areal subsets in the category can be grouped into cells and sites selected randomly from each cell.
ERIC Educational Resources Information Center
Pierce, Thomas B., Jr.; And Others
1990-01-01
A survey assessed time spent in the community and/or on unstructured activities by randomly selected individuals in Intermediate Care Facilities for the Mentally Retarded (ICF/MR) (N=20) or minigroup home settings (N=20). Individuals in ICF/MR homes spent more time in the community with staff and made fewer choices of unstructured activities.…
Changes in Mobility of Children with Cerebral Palsy over Time and across Environmental Settings
ERIC Educational Resources Information Center
Tieman, Beth L.; Palisano, Robert J.; Gracely, Edward J.; Rosenbaum, Peter L.; Chiarello, Lisa A.; O'Neil, Margaret E.
2004-01-01
This study examined changes in mobility methods of children with cerebral palsy (CP) over time and across environmental settings. Sixty-two children with CP, ages 6-14 years and classified as levels II-IV on the Gross Motor Function Classification System, were randomly selected from a larger data base and followed for three to four years. On each…
Goloboff, Pablo A
2014-10-01
Three different types of data sets, for which the uniquely most parsimonious tree can be known exactly but is hard to find with heuristic tree search methods, are studied. Tree searches are complicated more by the shape of the tree landscape (i.e. the distribution of homoplasy on different trees) than by the sheer abundance of homoplasy or character conflict. Data sets of Type 1 are those constructed by Radel et al. (2013). Data sets of Type 2 present a very rugged landscape, with narrow peaks and valleys, but relatively low amounts of homoplasy. For such a tree landscape, subjecting the trees to TBR and saving suboptimal trees produces much better results when the sequence of clipping for the tree branches is randomized instead of fixed. An unexpected finding for data sets of Types 1 and 2 is that starting a search from a random tree instead of a random addition sequence Wagner tree may increase the probability that the search finds the most parsimonious tree; a small artificial example where these probabilities can be calculated exactly is presented. Data sets of Type 3, the most difficult data sets studied here, comprise only congruent characters, and a single island with only one most parsimonious tree. Even if there is a single island, missing entries create a very flat landscape which is difficult to traverse with tree search algorithms because the number of equally parsimonious trees that need to be saved and swapped to effectively move around the plateaus is too large. Minor modifications of the parameters of tree drifting, ratchet, and sectorial searches allow travelling around these plateaus much more efficiently than saving and swapping large numbers of equally parsimonious trees with TBR. For these data sets, two new related criteria for selecting taxon addition sequences in Wagner trees (the "selected" and "informative" addition sequences) produce much better results than the standard random or closest addition sequences. These new methods for Wagner trees and for moving around plateaus can be useful when analyzing phylogenomic data sets formed by concatenation of genes with uneven taxon representation ("sparse" supermatrices), which are likely to present a tree landscape with extensive plateaus. Copyright © 2014 Elsevier Inc. All rights reserved.
Passenger Flow Analysis, 1978. Riverside Line, MBTA.
DOT National Transportation Integrated Search
1981-08-01
In order to complete a set of passenger flow estimates for use in a simulation model of a light rail line, a count of passenger movement was made at randomly selected stations in the underground section. Above-ground stations had been studied a year ...
An active learning representative subset selection method using net analyte signal.
He, Zhonghai; Ma, Zhenhe; Luan, Jingmin; Cai, Xi
2018-05-05
To guarantee accurate predictions, representative samples are needed when building a calibration model for spectroscopic measurements. However, in general, it is not known whether a sample is representative prior to measuring its concentration, which is both time-consuming and expensive. In this paper, a method to determine whether a sample should be selected into a calibration set is presented. The selection is based on the difference of Euclidean norm of net analyte signal (NAS) vector between the candidate and existing samples. First, the concentrations and spectra of a group of samples are used to compute the projection matrix, NAS vector, and scalar values. Next, the NAS vectors of candidate samples are computed by multiplying projection matrix with spectra of samples. Scalar value of NAS is obtained by norm computation. The distance between the candidate set and the selected set is computed, and samples with the largest distance are added to selected set sequentially. Last, the concentration of the analyte is measured such that the sample can be used as a calibration sample. Using a validation test, it is shown that the presented method is more efficient than random selection. As a result, the amount of time and money spent on reference measurements is greatly reduced. Copyright © 2018 Elsevier B.V. All rights reserved.
An active learning representative subset selection method using net analyte signal
NASA Astrophysics Data System (ADS)
He, Zhonghai; Ma, Zhenhe; Luan, Jingmin; Cai, Xi
2018-05-01
To guarantee accurate predictions, representative samples are needed when building a calibration model for spectroscopic measurements. However, in general, it is not known whether a sample is representative prior to measuring its concentration, which is both time-consuming and expensive. In this paper, a method to determine whether a sample should be selected into a calibration set is presented. The selection is based on the difference of Euclidean norm of net analyte signal (NAS) vector between the candidate and existing samples. First, the concentrations and spectra of a group of samples are used to compute the projection matrix, NAS vector, and scalar values. Next, the NAS vectors of candidate samples are computed by multiplying projection matrix with spectra of samples. Scalar value of NAS is obtained by norm computation. The distance between the candidate set and the selected set is computed, and samples with the largest distance are added to selected set sequentially. Last, the concentration of the analyte is measured such that the sample can be used as a calibration sample. Using a validation test, it is shown that the presented method is more efficient than random selection. As a result, the amount of time and money spent on reference measurements is greatly reduced.
Influence of Protein Abundance on High-Throughput Protein-Protein Interaction Detection
2009-06-05
the interaction data sets we determined, via comparisons with strict randomized simulations , the propensity for essential proteins to selectively...and analysis of high- quality PPI data sets. Materials and Methods We analyzed protein interaction networks for yeast and E. coli determined from Y2H...we reinvestigated the centrality-lethality rule, which implies that proteins having more interactions are more likely to be essential. From analysis
Training a whole-book LSTM-based recognizer with an optimal training set
NASA Astrophysics Data System (ADS)
Soheili, Mohammad Reza; Yousefi, Mohammad Reza; Kabir, Ehsanollah; Stricker, Didier
2018-04-01
Despite the recent progress in OCR technologies, whole-book recognition, is still a challenging task, in particular in case of old and historical books, that the unknown font faces or low quality of paper and print contributes to the challenge. Therefore, pre-trained recognizers and generic methods do not usually perform up to required standards, and usually the performance degrades for larger scale recognition tasks, such as of a book. Such reportedly low error-rate methods turn out to require a great deal of manual correction. Generally, such methodologies do not make effective use of concepts such redundancy in whole-book recognition. In this work, we propose to train Long Short Term Memory (LSTM) networks on a minimal training set obtained from the book to be recognized. We show that clustering all the sub-words in the book, and using the sub-word cluster centers as the training set for the LSTM network, we can train models that outperform any identical network that is trained with randomly selected pages of the book. In our experiments, we also show that although the sub-word cluster centers are equivalent to about 8 pages of text for a 101- page book, a LSTM network trained on such a set performs competitively compared to an identical network that is trained on a set of 60 randomly selected pages of the book.
Yet another method for triangulation and contouring for automated cartography
NASA Technical Reports Server (NTRS)
De Floriani, L.; Falcidieno, B.; Nasy, G.; Pienovi, C.
1982-01-01
An algorithm is presented for hierarchical subdivision of a set of three-dimensional surface observations. The data structure used for obtaining the desired triangulation is also singularly appropriate for extracting contours. Some examples are presented, and the results obtained are compared with those given by Delaunay triangulation. The data points selected by the algorithm provide a better approximation to the desired surface than do randomly selected points.
NASA Astrophysics Data System (ADS)
Sirait, Kamson; Tulus; Budhiarti Nababan, Erna
2017-12-01
Clustering methods that have high accuracy and time efficiency are necessary for the filtering process. One method that has been known and applied in clustering is K-Means Clustering. In its application, the determination of the begining value of the cluster center greatly affects the results of the K-Means algorithm. This research discusses the results of K-Means Clustering with starting centroid determination with a random and KD-Tree method. The initial determination of random centroid on the data set of 1000 student academic data to classify the potentially dropout has a sse value of 952972 for the quality variable and 232.48 for the GPA, whereas the initial centroid determination by KD-Tree has a sse value of 504302 for the quality variable and 214,37 for the GPA variable. The smaller sse values indicate that the result of K-Means Clustering with initial KD-Tree centroid selection have better accuracy than K-Means Clustering method with random initial centorid selection.
Dual-state modulation of the contextual cueing effect: Evidence from eye movement recordings.
Zhao, Guang; Liu, Qiang; Jiao, Jun; Zhou, Peiling; Li, Hong; Sun, Hong-jin
2012-06-08
The repeated configurations of random elements induce a better search performance than that of the displays of novel random configurations. The mechanism of such contextual cueing effect has been investigated through the use of the RT × Set Size function. There are divergent views on whether the contextual cueing effect is driven by attentional guidance or facilitation of initial perceptual processing or response selection. To explore this question, we used eye movement recording in this study, which offers information about the substages of the search task. The results suggest that the contextual cueing effect is contributed mainly by attentional guidance, and facilitation of response selection also plays a role.
TOWARDS USING STABLE SPERMATOZOAL RNAS FOR PROGNOSTIC ASSESSMENT OF MALE FACTOR FERTILITY
Objective: To establish the stability of spermatozoal RNAs as a means to validate their use as a male fertility marker. Design: Semen samples were randomly selected for 1 of 3 cryopreservation treatments. Setting: An academic research environment. Patient(s): Men aged...
PCA-LBG-based algorithms for VQ codebook generation
NASA Astrophysics Data System (ADS)
Tsai, Jinn-Tsong; Yang, Po-Yuan
2015-04-01
Vector quantisation (VQ) codebooks are generated by combining principal component analysis (PCA) algorithms with Linde-Buzo-Gray (LBG) algorithms. All training vectors are grouped according to the projected values of the principal components. The PCA-LBG-based algorithms include (1) PCA-LBG-Median, which selects the median vector of each group, (2) PCA-LBG-Centroid, which adopts the centroid vector of each group, and (3) PCA-LBG-Random, which randomly selects a vector of each group. The LBG algorithm finds a codebook based on the better vectors sent to an initial codebook by the PCA. The PCA performs an orthogonal transformation to convert a set of potentially correlated variables into a set of variables that are not linearly correlated. Because the orthogonal transformation efficiently distinguishes test image vectors, the proposed PCA-LBG-based algorithm is expected to outperform conventional algorithms in designing VQ codebooks. The experimental results confirm that the proposed PCA-LBG-based algorithms indeed obtain better results compared to existing methods reported in the literature.
Combined rule extraction and feature elimination in supervised classification.
Liu, Sheng; Patel, Ronak Y; Daga, Pankaj R; Liu, Haining; Fu, Gang; Doerksen, Robert J; Chen, Yixin; Wilkins, Dawn E
2012-09-01
There are a vast number of biology related research problems involving a combination of multiple sources of data to achieve a better understanding of the underlying problems. It is important to select and interpret the most important information from these sources. Thus it will be beneficial to have a good algorithm to simultaneously extract rules and select features for better interpretation of the predictive model. We propose an efficient algorithm, Combined Rule Extraction and Feature Elimination (CRF), based on 1-norm regularized random forests. CRF simultaneously extracts a small number of rules generated by random forests and selects important features. We applied CRF to several drug activity prediction and microarray data sets. CRF is capable of producing performance comparable with state-of-the-art prediction algorithms using a small number of decision rules. Some of the decision rules are biologically significant.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gong, Y; Yu, J; Yeung, V
Purpose: Artificial neural networks (ANN) can be used to discover complex relations within datasets to help with medical decision making. This study aimed to develop an ANN method to predict two-year overall survival of patients with peri-ampullary cancer (PAC) following resection. Methods: Data were collected from 334 patients with PAC following resection treated in our institutional pancreatic tumor registry between 2006 and 2012. The dataset contains 14 variables including age, gender, T-stage, tumor differentiation, positive-lymph-node ratio, positive resection margins, chemotherapy, radiation therapy, and tumor histology.After censoring for two-year survival analysis, 309 patients were left, of which 44 patients (∼15%) weremore » randomly selected to form testing set. The remaining 265 cases were randomly divided into training set (211 cases, ∼80% of 265) and validation set (54 cases, ∼20% of 265) for 20 times to build 20 ANN models. Each ANN has one hidden layer with 5 units. The 20 ANN models were ranked according to their concordance index (c-index) of prediction on validation sets. To further improve prediction, the top 10% of ANN models were selected, and their outputs averaged for prediction on testing set. Results: By random division, 44 cases in testing set and the remaining 265 cases have approximately equal two-year survival rates, 36.4% and 35.5% respectively. The 20 ANN models, which were trained and validated on the 265 cases, yielded mean c-indexes as 0.59 and 0.63 on validation sets and the testing set, respectively. C-index was 0.72 when the two best ANN models (top 10%) were used in prediction on testing set. The c-index of Cox regression analysis was 0.63. Conclusion: ANN improved survival prediction for patients with PAC. More patient data and further analysis of additional factors may be needed for a more robust model, which will help guide physicians in providing optimal post-operative care. This project was supported by PA CURE Grant.« less
Genetic gains in the UENF-14 popcorn population with recurrent selection.
Freitas, I L J; do Amaral Júnior, A T; Freitas, S P; Cabral, P D S; Ribeiro, R M; Gonçalves, L S A
2014-01-21
The popcorn breeding program of Universidade Estadual do Norte Fluminense Darcy Ribeiro aims to provide farmers a cultivar with desirable agronomic traits, particularly with respect to grain yield (GY) and popping expansion (PE). We evaluated full-sib families from the seventh cycle of recurrent selection and estimated the genetic progress with respect to GY and PE. Eight traits were evaluated in 200 full-sib families that were randomized into blocks with two replicates per set in two contrasting environments, Campos dos Goytacazes and Itaocara, located in north and northwest Rio de Janeiro State, respectively. There were significant differences between sets in families with respect to all traits evaluated, which indicates genetic variability that may be explored in future cycles. Using random economic weights in the selection of superior progenies, the Mulamba and Mock index showed gains for PE and GY of 5.11 and 7.78%, respectively. Significant PE and GY increases were found when comparing the evolution of mean values of these two parameters that were assessed at cycles C₀-C₆ and predicted for C₇. Thus, an advanced-cycle popcorn cultivar with genotypic superiority for the main traits of economic interest can be made available to farmers in Rio de Janeiro State.
Decorrelation of the true and estimated classifier errors in high-dimensional settings.
Hanczar, Blaise; Hua, Jianping; Dougherty, Edward R
2007-01-01
The aim of many microarray experiments is to build discriminatory diagnosis and prognosis models. Given the huge number of features and the small number of examples, model validity which refers to the precision of error estimation is a critical issue. Previous studies have addressed this issue via the deviation distribution (estimated error minus true error), in particular, the deterioration of cross-validation precision in high-dimensional settings where feature selection is used to mitigate the peaking phenomenon (overfitting). Because classifier design is based upon random samples, both the true and estimated errors are sample-dependent random variables, and one would expect a loss of precision if the estimated and true errors are not well correlated, so that natural questions arise as to the degree of correlation and the manner in which lack of correlation impacts error estimation. We demonstrate the effect of correlation on error precision via a decomposition of the variance of the deviation distribution, observe that the correlation is often severely decreased in high-dimensional settings, and show that the effect of high dimensionality on error estimation tends to result more from its decorrelating effects than from its impact on the variance of the estimated error. We consider the correlation between the true and estimated errors under different experimental conditions using both synthetic and real data, several feature-selection methods, different classification rules, and three error estimators commonly used (leave-one-out cross-validation, k-fold cross-validation, and .632 bootstrap). Moreover, three scenarios are considered: (1) feature selection, (2) known-feature set, and (3) all features. Only the first is of practical interest; however, the other two are needed for comparison purposes. We will observe that the true and estimated errors tend to be much more correlated in the case of a known feature set than with either feature selection or using all features, with the better correlation between the latter two showing no general trend, but differing for different models.
Reward and uncertainty in exploration programs
NASA Technical Reports Server (NTRS)
Kaufman, G. M.; Bradley, P. G.
1971-01-01
A set of variables which are crucial to the economic outcome of petroleum exploration are discussed. These are treated as random variables; the values they assume indicate the number of successes that occur in a drilling program and determine, for a particular discovery, the unit production cost and net economic return if that reservoir is developed. In specifying the joint probability law for those variables, extreme and probably unrealistic assumptions are made. In particular, the different random variables are assumed to be independently distributed. Using postulated probability functions and specified parameters, values are generated for selected random variables, such as reservoir size. From this set of values the economic magnitudes of interest, net return and unit production cost are computed. This constitutes a single trial, and the procedure is repeated many times. The resulting histograms approximate the probability density functions of the variables which describe the economic outcomes of an exploratory drilling program.
The variability of software scoring of the CDMAM phantom associated with a limited number of images
NASA Astrophysics Data System (ADS)
Yang, Chang-Ying J.; Van Metter, Richard
2007-03-01
Software scoring approaches provide an attractive alternative to human evaluation of CDMAM images from digital mammography systems, particularly for annual quality control testing as recommended by the European Protocol for the Quality Control of the Physical and Technical Aspects of Mammography Screening (EPQCM). Methods for correlating CDCOM-based results with human observer performance have been proposed. A common feature of all methods is the use of a small number (at most eight) of CDMAM images to evaluate the system. This study focuses on the potential variability in the estimated system performance that is associated with these methods. Sets of 36 CDMAM images were acquired under carefully controlled conditions from three different digital mammography systems. The threshold visibility thickness (TVT) for each disk diameter was determined using previously reported post-analysis methods from the CDCOM scorings for a randomly selected group of eight images for one measurement trial. This random selection process was repeated 3000 times to estimate the variability in the resulting TVT values for each disk diameter. The results from using different post-analysis methods, different random selection strategies and different digital systems were compared. Additional variability of the 0.1 mm disk diameter was explored by comparing the results from two different image data sets acquired under the same conditions from the same system. The magnitude and the type of error estimated for experimental data was explained through modeling. The modeled results also suggest a limitation in the current phantom design for the 0.1 mm diameter disks. Through modeling, it was also found that, because of the binomial statistic nature of the CDMAM test, the true variability of the test could be underestimated by the commonly used method of random re-sampling.
Generation of kth-order random toposequences
NASA Astrophysics Data System (ADS)
Odgers, Nathan P.; McBratney, Alex. B.; Minasny, Budiman
2008-05-01
The model presented in this paper derives toposequences from a digital elevation model (DEM). It is written in ArcInfo Macro Language (AML). The toposequences are called kth-order random toposequences, because they take a random path uphill to the top of a hill and downhill to a stream or valley bottom from a randomly selected seed point, and they are located in a streamshed of order k according to a particular stream-ordering system. We define a kth-order streamshed as the area of land that drains directly to a stream segment of stream order k. The model attempts to optimise the spatial configuration of a set of derived toposequences iteratively by using simulated annealing to maximise the total sum of distances between each toposequence hilltop in the set. The user is able to select the order, k, of the derived toposequences. Toposequences are useful for determining soil sampling locations for use in collecting soil data for digital soil mapping applications. Sampling locations can be allocated according to equal elevation or equal-distance intervals along the length of the toposequence, for example. We demonstrate the use of this model for a study area in the Hunter Valley of New South Wales, Australia. Of the 64 toposequences derived, 32 were first-order random toposequences according to Strahler's stream-ordering system, and 32 were second-order random toposequences. The model that we present in this paper is an efficient method for sampling soil along soil toposequences. The soils along a toposequence are related to each other by the topography they are found in, so soil data collected by this method is useful for establishing soil-landscape rules for the preparation of digital soil maps.
Nguyen, Thanh-Tung; Huang, Joshua; Wu, Qingyao; Nguyen, Thuy; Li, Mark
2015-01-01
Single-nucleotide polymorphisms (SNPs) selection and identification are the most important tasks in Genome-wide association data analysis. The problem is difficult because genome-wide association data is very high dimensional and a large portion of SNPs in the data is irrelevant to the disease. Advanced machine learning methods have been successfully used in Genome-wide association studies (GWAS) for identification of genetic variants that have relatively big effects in some common, complex diseases. Among them, the most successful one is Random Forests (RF). Despite of performing well in terms of prediction accuracy in some data sets with moderate size, RF still suffers from working in GWAS for selecting informative SNPs and building accurate prediction models. In this paper, we propose to use a new two-stage quality-based sampling method in random forests, named ts-RF, for SNP subspace selection for GWAS. The method first applies p-value assessment to find a cut-off point that separates informative and irrelevant SNPs in two groups. The informative SNPs group is further divided into two sub-groups: highly informative and weak informative SNPs. When sampling the SNP subspace for building trees for the forest, only those SNPs from the two sub-groups are taken into account. The feature subspaces always contain highly informative SNPs when used to split a node at a tree. This approach enables one to generate more accurate trees with a lower prediction error, meanwhile possibly avoiding overfitting. It allows one to detect interactions of multiple SNPs with the diseases, and to reduce the dimensionality and the amount of Genome-wide association data needed for learning the RF model. Extensive experiments on two genome-wide SNP data sets (Parkinson case-control data comprised of 408,803 SNPs and Alzheimer case-control data comprised of 380,157 SNPs) and 10 gene data sets have demonstrated that the proposed model significantly reduced prediction errors and outperformed most existing the-state-of-the-art random forests. The top 25 SNPs in Parkinson data set were identified by the proposed model including four interesting genes associated with neurological disorders. The presented approach has shown to be effective in selecting informative sub-groups of SNPs potentially associated with diseases that traditional statistical approaches might fail. The new RF works well for the data where the number of case-control objects is much smaller than the number of SNPs, which is a typical problem in gene data and GWAS. Experiment results demonstrated the effectiveness of the proposed RF model that outperformed the state-of-the-art RFs, including Breiman's RF, GRRF and wsRF methods.
Eid, Wael E; Pottala, James V
2010-01-01
To develop a receiver operating characteristic (ROC) curve of glycosylated hemoglobin (HbA1c) for diagnosing diabetes mellitus within a chronic disease management system. A case-control study including medical records from January 1, 1997, to December 31, 2005, was conducted at the Sioux Falls Veterans Affairs Medical Center. Medical records for the case group (patients with diabetes) were selected based on 1 of 3 criteria: International Classification of Diseases, Ninth Revision, Clinical Modification or Current Procedural Terminology codes specific for type 1 and type 2 diabetes; patients' use of medications (oral hypoglycemic agents, antidiabetes agents, or insulin); or results from random blood or plasma glucose tests (at least 2 measurements of blood glucose > or = 200 mg/dL). Records for the control group were selected based on patients having HbA1c measured, but not meeting the above diagnostic criteria for diabetes during the study period. Records for cases and controls were randomly frequency-matched, one-to-one. The control group was randomly divided into 5 sets of an equal number of records. Five sets of an equal number of cases were then randomly selected from the total number of cases. Each test data set included 1 case group and 1 control group, resulting in 5 independent data sets. In total, 5040 patient records met the case definition in the diabetes registry. Records of 15 patients who were prescribed metformin only, but did not meet any other case criteria, were reviewed and excluded after determining the patients were not diabetic. The control group consisted of 5 sets of 616 records each (totaling 3080 records), and the case group consisted of 5 sets of 616 records each (totaling 3080 records). Thus, each of the 5 independent data sets of 1 case group and 1 control group contained 1232 records. The case group was predominantly composed of white men (mean age, 69 years; mean body mass index, 31 kg/m2). Demographic data were similar for control patients. The ROC curve revealed that a HbA1c > or = 6.3% (mean + 1 SD) offered the most accurate cutoff value for diagnosing type 2 diabetes mellitus, with the following statistical values: C statistic, 0.78; sensitivity, 70%; specificity, 85%; and positive likelihood ratio, 4.6 (95% confidence interval, 4.2-5.0). A HbA1c value > or = 6.3% may be a useful benchmark for diagnosing diabetes mellitus within a chronic disease management system and may be a useful tool for monitoring high-risk populations.
Bürger, R; Gimelfarb, A
1999-01-01
Stabilizing selection for an intermediate optimum is generally considered to deplete genetic variation in quantitative traits. However, conflicting results from various types of models have been obtained. While classical analyses assuming a large number of independent additive loci with individually small effects indicated that no genetic variation is preserved under stabilizing selection, several analyses of two-locus models showed the contrary. We perform a complete analysis of a generalization of Wright's two-locus quadratic-optimum model and investigate numerically the ability of quadratic stabilizing selection to maintain genetic variation in additive quantitative traits controlled by up to five loci. A statistical approach is employed by choosing randomly 4000 parameter sets (allelic effects, recombination rates, and strength of selection) for a given number of loci. For each parameter set we iterate the recursion equations that describe the dynamics of gamete frequencies starting from 20 randomly chosen initial conditions until an equilibrium is reached, record the quantities of interest, and calculate their corresponding mean values. As the number of loci increases from two to five, the fraction of the genome expected to be polymorphic declines surprisingly rapidly, and the loci that are polymorphic increasingly are those with small effects on the trait. As a result, the genetic variance expected to be maintained under stabilizing selection decreases very rapidly with increased number of loci. The equilibrium structure expected under stabilizing selection on an additive trait differs markedly from that expected under selection with no constraints on genotypic fitness values. The expected genetic variance, the expected polymorphic fraction of the genome, as well as other quantities of interest, are only weakly dependent on the selection intensity and the level of recombination. PMID:10353920
GuiTope: an application for mapping random-sequence peptides to protein sequences.
Halperin, Rebecca F; Stafford, Phillip; Emery, Jack S; Navalkar, Krupa Arun; Johnston, Stephen Albert
2012-01-03
Random-sequence peptide libraries are a commonly used tool to identify novel ligands for binding antibodies, other proteins, and small molecules. It is often of interest to compare the selected peptide sequences to the natural protein binding partners to infer the exact binding site or the importance of particular residues. The ability to search a set of sequences for similarity to a set of peptides may sometimes enable the prediction of an antibody epitope or a novel binding partner. We have developed a software application designed specifically for this task. GuiTope provides a graphical user interface for aligning peptide sequences to protein sequences. All alignment parameters are accessible to the user including the ability to specify the amino acid frequency in the peptide library; these frequencies often differ significantly from those assumed by popular alignment programs. It also includes a novel feature to align di-peptide inversions, which we have found improves the accuracy of antibody epitope prediction from peptide microarray data and shows utility in analyzing phage display datasets. Finally, GuiTope can randomly select peptides from a given library to estimate a null distribution of scores and calculate statistical significance. GuiTope provides a convenient method for comparing selected peptide sequences to protein sequences, including flexible alignment parameters, novel alignment features, ability to search a database, and statistical significance of results. The software is available as an executable (for PC) at http://www.immunosignature.com/software and ongoing updates and source code will be available at sourceforge.net.
Zhao, Jiangsan; Bodner, Gernot; Rewald, Boris
2016-01-01
Phenotyping local crop cultivars is becoming more and more important, as they are an important genetic source for breeding – especially in regard to inherent root system architectures. Machine learning algorithms are promising tools to assist in the analysis of complex data sets; novel approaches are need to apply them on root phenotyping data of mature plants. A greenhouse experiment was conducted in large, sand-filled columns to differentiate 16 European Pisum sativum cultivars based on 36 manually derived root traits. Through combining random forest and support vector machine models, machine learning algorithms were successfully used for unbiased identification of most distinguishing root traits and subsequent pairwise cultivar differentiation. Up to 86% of pea cultivar pairs could be distinguished based on top five important root traits (Timp5) – Timp5 differed widely between cultivar pairs. Selecting top important root traits (Timp) provided a significant improved classification compared to using all available traits or randomly selected trait sets. The most frequent Timp of mature pea cultivars was total surface area of lateral roots originating from tap root segments at 0–5 cm depth. The high classification rate implies that culturing did not lead to a major loss of variability in root system architecture in the studied pea cultivars. Our results illustrate the potential of machine learning approaches for unbiased (root) trait selection and cultivar classification based on rather small, complex phenotypic data sets derived from pot experiments. Powerful statistical approaches are essential to make use of the increasing amount of (root) phenotyping information, integrating the complex trait sets describing crop cultivars. PMID:27999587
Random Bits Forest: a Strong Classifier/Regressor for Big Data
NASA Astrophysics Data System (ADS)
Wang, Yi; Li, Yi; Pu, Weilin; Wen, Kathryn; Shugart, Yin Yao; Xiong, Momiao; Jin, Li
2016-07-01
Efficiency, memory consumption, and robustness are common problems with many popular methods for data analysis. As a solution, we present Random Bits Forest (RBF), a classification and regression algorithm that integrates neural networks (for depth), boosting (for width), and random forests (for prediction accuracy). Through a gradient boosting scheme, it first generates and selects ~10,000 small, 3-layer random neural networks. These networks are then fed into a modified random forest algorithm to obtain predictions. Testing with datasets from the UCI (University of California, Irvine) Machine Learning Repository shows that RBF outperforms other popular methods in both accuracy and robustness, especially with large datasets (N > 1000). The algorithm also performed highly in testing with an independent data set, a real psoriasis genome-wide association study (GWAS).
Quantification of moving target cyber defenses
NASA Astrophysics Data System (ADS)
Farris, Katheryn A.; Cybenko, George
2015-05-01
Current network and information systems are static, making it simple for attackers to maintain an advantage. Adaptive defenses, such as Moving Target Defenses (MTD) have been developed as potential "game-changers" in an effort to increase the attacker's workload. With many new methods being developed, it is difficult to accurately quantify and compare their overall costs and effectiveness. This paper compares the tradeoffs between current approaches to the quantification of MTDs. We present results from an expert opinion survey on quantifying the overall effectiveness, upfront and operating costs of a select set of MTD techniques. We find that gathering informed scientific opinions can be advantageous for evaluating such new technologies as it offers a more comprehensive assessment. We end by presenting a coarse ordering of a set of MTD techniques from most to least dominant. We found that seven out of 23 methods rank as the more dominant techniques. Five of which are techniques of either address space layout randomization or instruction set randomization. The remaining two techniques are applicable to software and computer platforms. Among the techniques that performed the worst are those primarily aimed at network randomization.
NASA Astrophysics Data System (ADS)
Norajitra, Tobias; Meinzer, Hans-Peter; Maier-Hein, Klaus H.
2015-03-01
During image segmentation, 3D Statistical Shape Models (SSM) usually conduct a limited search for target landmarks within one-dimensional search profiles perpendicular to the model surface. In addition, landmark appearance is modeled only locally based on linear profiles and weak learners, altogether leading to segmentation errors from landmark ambiguities and limited search coverage. We present a new method for 3D SSM segmentation based on 3D Random Forest Regression Voting. For each surface landmark, a Random Regression Forest is trained that learns a 3D spatial displacement function between the according reference landmark and a set of surrounding sample points, based on an infinite set of non-local randomized 3D Haar-like features. Landmark search is then conducted omni-directionally within 3D search spaces, where voxelwise forest predictions on landmark position contribute to a common voting map which reflects the overall position estimate. Segmentation experiments were conducted on a set of 45 CT volumes of the human liver, of which 40 images were randomly chosen for training and 5 for testing. Without parameter optimization, using a simple candidate selection and a single resolution approach, excellent results were achieved, while faster convergence and better concavity segmentation were observed, altogether underlining the potential of our approach in terms of increased robustness from distinct landmark detection and from better search coverage.
Optimizing the availability of a buffered industrial process
Martz, Jr., Harry F.; Hamada, Michael S.; Koehler, Arthur J.; Berg, Eric C.
2004-08-24
A computer-implemented process determines optimum configuration parameters for a buffered industrial process. A population size is initialized by randomly selecting a first set of design and operation values associated with subsystems and buffers of the buffered industrial process to form a set of operating parameters for each member of the population. An availability discrete event simulation (ADES) is performed on each member of the population to determine the product-based availability of each member. A new population is formed having members with a second set of design and operation values related to the first set of design and operation values through a genetic algorithm and the product-based availability determined by the ADES. Subsequent population members are then determined by iterating the genetic algorithm with product-based availability determined by ADES to form improved design and operation values from which the configuration parameters are selected for the buffered industrial process.
Kuznetsova, Olga M; Tymofyeyev, Yevgen
2014-04-30
In open-label studies, partial predictability of permuted block randomization provides potential for selection bias. To lessen the selection bias in two-arm studies with equal allocation, a number of allocation procedures that limit the imbalance in treatment totals at a pre-specified level but do not require the exact balance at the ends of the blocks were developed. In studies with unequal allocation, however, the task of designing a randomization procedure that sets a pre-specified limit on imbalance in group totals is not resolved. Existing allocation procedures either do not preserve the allocation ratio at every allocation or do not include all allocation sequences that comply with the pre-specified imbalance threshold. Kuznetsova and Tymofyeyev described the brick tunnel randomization for studies with unequal allocation that preserves the allocation ratio at every step and, in the two-arm case, includes all sequences that satisfy the smallest possible imbalance threshold. This article introduces wide brick tunnel randomization for studies with unequal allocation that allows all allocation sequences with imbalance not exceeding any pre-specified threshold while preserving the allocation ratio at every step. In open-label studies, allowing a larger imbalance in treatment totals lowers selection bias because of the predictability of treatment assignments. The applications of the technique in two-arm and multi-arm open-label studies with unequal allocation are described. Copyright © 2013 John Wiley & Sons, Ltd.
40 CFR 86.1823-08 - Durability demonstration procedures for exhaust emissions.
Code of Federal Regulations, 2014 CFR
2014-07-01
... judgement, a catalyst aging bench that follows the SBC and delivers the appropriate exhaust flow, exhaust... set must consist of randomly procured vehicles from actual customer use. The vehicles selected for... submit an analysis which evaluates whether the durability objective will be achieved for the vehicle...
Factors Associated with Abnormal Eating Attitudes among Greek Adolescents
ERIC Educational Resources Information Center
Bilali, Aggeliki; Galanis, Petros; Velonakis, Emmanuel; Katostaras, Theofanis
2010-01-01
Objective: To estimate the prevalence of abnormal eating attitudes among Greek adolescents and identify possible risk factors associated with these attitudes. Design: Cross-sectional, school-based study. Setting: Six randomly selected schools in Patras, southern Greece. Participants: The study population consisted of 540 Greek students aged 13-18…
ERIC Educational Resources Information Center
Richardson, Ian M.
1990-01-01
A possible syllabus for English for Science and Technology is suggested based upon a set of causal relations, arising from a logical description of the presuppositional rhetoric of scientific passages that underlie most semantic functions. An empirical study is reported of the semantic functions present in 52 randomly selected passages.…
Perceptions of the Performance of Community College Faculty: Dissertation Research Findings.
ERIC Educational Resources Information Center
Vickers, Mozelle Carver
A sample of 30 instructors nominated as effective teachers to the Piper Foundation and 31 randomly selected control instructors from the same 14 Texas colleges were evaluated by students, former students, administrators, peers, and the instructors themselves. The research instrument incorporated ten sets of characteristics expressed as polar…
PATTERN PREDICTION OF ACADEMIC SUCCESS.
ERIC Educational Resources Information Center
LUNNEBORG, CLIFFORD E.; LUNNEBORG, PATRICIA W.
A TECHNIQUE OF PATTERN ANALYSIS WHICH EMPHASIZES THE DEVELOPMENT OF MORE EFFECTIVE WAYS OF SCORING A GIVEN SET OF VARIABLES WAS FORMULATED. TO THE ORIGINAL VARIABLES WERE SUCCESSIVELY ADDED TWO, THREE, AND FOUR VARIABLE PATTERNS AND THE INCREASE IN PREDICTIVE EFFICIENCY ASSESSED. RANDOMLY SELECTED HIGH SCHOOL SENIORS WHO HAD PARTICIPATED IN THE…
The Relationship Between Self Concept and Marital Adjustment.
ERIC Educational Resources Information Center
Hall, William M., Jr.; Valine, Warren J.
The purpose of this study was to investigate the relationship between self concept and marital adjustment for married students and their spouses in a commuter college setting. The sample consisted of a random selection of 50 "both spouses commuting" couples, 50 "husband only commuting" couples, and 50 "wife only…
Inequity of Human Services: The Rural Tennessee Dilemma.
ERIC Educational Resources Information Center
Tennessee State Univ., Nashville.
Davidson, Williamson, Rutherford, and Cheatham counties of Tennessee were the setting for a study that sought to determine the types of health and social services provided to residents of rural areas and to assess the present status of the service delivery system. Interviews with both agency representatives and randomly selected household…
Gill, C O; McGinnis, J C; Bryant, J
1998-07-21
The microbiological effects on the product of the series of operations for skinning the hindquarters of beef carcasses at three packing plants were assessed. Samples were obtained at each plant from randomly selected carcasses, by swabbing specified sites related to opening cuts, rump skinning or flank skinning operations, randomly selected sites along the lines of the opening cuts, or randomly selected sites on the skinned hindquarters of carcasses. A set of 25 samples of each type was collected at each plant, with the collection of a single sample from each selected carcass. Aerobic counts, coliforms and Escherichia coli were enumerated in each sample, and a log mean value was estimated for each set of 25 counts on the assumption of a log normal distribution of the counts. The data indicated that the hindquarters skinning operations at plant A were hygienically inferior to those at the other two plants, with mean numbers of coliforms and E. coli being about two orders of magnitude greater, and aerobic counts being an order of magnitude greater on the skinned hindquarters of carcasses from plant A than on those from plants B or C. The data further indicated that the operation for cutting open the skin at plant C was hygienically superior to the equivalent operation at plant B, but that the operations for skinning the rump and flank at plant B were hygienically superior to the equivalent operations at plant C. The findings suggest that objective assessment of the microbiological effects on carcasses of beef carcass dressing processes will be required to ensure that Hazard Analysis: Critical Control Point and Quality Management Systems are operated to control the microbiological condition of carcasses.
LQTA-QSAR: a new 4D-QSAR methodology.
Martins, João Paulo A; Barbosa, Euzébio G; Pasqualoto, Kerly F M; Ferreira, Márcia M C
2009-06-01
A novel 4D-QSAR approach which makes use of the molecular dynamics (MD) trajectories and topology information retrieved from the GROMACS package is presented in this study. This new methodology, named LQTA-QSAR (LQTA, Laboratório de Quimiometria Teórica e Aplicada), has a module (LQTAgrid) that calculates intermolecular interaction energies at each grid point considering probes and all aligned conformations resulting from MD simulations. These interaction energies are the independent variables or descriptors employed in a QSAR analysis. The comparison of the proposed methodology to other 4D-QSAR and CoMFA formalisms was performed using a set of forty-seven glycogen phosphorylase b inhibitors (data set 1) and a set of forty-four MAP p38 kinase inhibitors (data set 2). The QSAR models for both data sets were built using the ordered predictor selection (OPS) algorithm for variable selection. Model validation was carried out applying y-randomization and leave-N-out cross-validation in addition to the external validation. PLS models for data set 1 and 2 provided the following statistics: q(2) = 0.72, r(2) = 0.81 for 12 variables selected and 2 latent variables and q(2) = 0.82, r(2) = 0.90 for 10 variables selected and 5 latent variables, respectively. Visualization of the descriptors in 3D space was successfully interpreted from the chemical point of view, supporting the applicability of this new approach in rational drug design.
Assessing the accuracy and stability of variable selection ...
Random forest (RF) modeling has emerged as an important statistical learning method in ecology due to its exceptional predictive performance. However, for large and complex ecological datasets there is limited guidance on variable selection methods for RF modeling. Typically, either a preselected set of predictor variables are used, or stepwise procedures are employed which iteratively add/remove variables according to their importance measures. This paper investigates the application of variable selection methods to RF models for predicting probable biological stream condition. Our motivating dataset consists of the good/poor condition of n=1365 stream survey sites from the 2008/2009 National Rivers and Stream Assessment, and a large set (p=212) of landscape features from the StreamCat dataset. Two types of RF models are compared: a full variable set model with all 212 predictors, and a reduced variable set model selected using a backwards elimination approach. We assess model accuracy using RF's internal out-of-bag estimate, and a cross-validation procedure with validation folds external to the variable selection process. We also assess the stability of the spatial predictions generated by the RF models to changes in the number of predictors, and argue that model selection needs to consider both accuracy and stability. The results suggest that RF modeling is robust to the inclusion of many variables of moderate to low importance. We found no substanti
Stacked Denoising Autoencoders Applied to Star/Galaxy Classification
NASA Astrophysics Data System (ADS)
Qin, Hao-ran; Lin, Ji-ming; Wang, Jun-yi
2017-04-01
In recent years, the deep learning algorithm, with the characteristics of strong adaptability, high accuracy, and structural complexity, has become more and more popular, but it has not yet been used in astronomy. In order to solve the problem that the star/galaxy classification accuracy is high for the bright source set, but low for the faint source set of the Sloan Digital Sky Survey (SDSS) data, we introduced the new deep learning algorithm, namely the SDA (stacked denoising autoencoder) neural network and the dropout fine-tuning technique, which can greatly improve the robustness and antinoise performance. We randomly selected respectively the bright source sets and faint source sets from the SDSS DR12 and DR7 data with spectroscopic measurements, and made preprocessing on them. Then, we randomly selected respectively the training sets and testing sets without replacement from the bright source sets and faint source sets. At last, using these training sets we made the training to obtain the SDA models of the bright sources and faint sources in the SDSS DR7 and DR12, respectively. We compared the test result of the SDA model on the DR12 testing set with the test results of the Library for Support Vector Machines (LibSVM), J48 decision tree, Logistic Model Tree (LMT), Support Vector Machine (SVM), Logistic Regression, and Decision Stump algorithm, and compared the test result of the SDA model on the DR7 testing set with the test results of six kinds of decision trees. The experiments show that the SDA has a better classification accuracy than other machine learning algorithms for the faint source sets of DR7 and DR12. Especially, when the completeness function is used as the evaluation index, compared with the decision tree algorithms, the correctness rate of SDA has improved about 15% for the faint source set of SDSS-DR7.
Treatment selection in a randomized clinical trial via covariate-specific treatment effect curves.
Ma, Yunbei; Zhou, Xiao-Hua
2017-02-01
For time-to-event data in a randomized clinical trial, we proposed two new methods for selecting an optimal treatment for a patient based on the covariate-specific treatment effect curve, which is used to represent the clinical utility of a predictive biomarker. To select an optimal treatment for a patient with a specific biomarker value, we proposed pointwise confidence intervals for each covariate-specific treatment effect curve and the difference between covariate-specific treatment effect curves of two treatments. Furthermore, to select an optimal treatment for a future biomarker-defined subpopulation of patients, we proposed confidence bands for each covariate-specific treatment effect curve and the difference between each pair of covariate-specific treatment effect curve over a fixed interval of biomarker values. We constructed the confidence bands based on a resampling technique. We also conducted simulation studies to evaluate finite-sample properties of the proposed estimation methods. Finally, we illustrated the application of the proposed method in a real-world data set.
Ribeiro, R M; do Amaral Júnior, A T; Gonçalves, L S A; Candido, L S; Silva, T R C; Pena, G F
2012-05-15
As part of the Universidade Estadual do Norte Fluminense recurrent selection program of popcorn, we evaluated full-sib families of the sixth cycle of recurrent selection and estimated genetic progress for grain yield and expansion capacity. We assessed 200 full-sib families for 10 agronomic traits, in a randomized block design, with two replications within sets in two environments: Campos dos Goytacazes and Itaocara, in the State of Rio de Janeiro, Brazil. There were significant differences for families/"sets" for all traits, indicating genetic variability that could be exploited in future cycles. In the selection of superior progenies, the Mulamba and Mock index gave the best gains for popping expansion (PE) and grain yield (GY), with values of 10.97 and 15.30%, respectively, using random economic weights. By comparing the evolution of the means obtained for PE and GY in the cycles C(0), C(1), C(2), C(3), C(4), C(5), and predicted for C(6), a steady increase was observed for both PE and GY, with the addition of 1.71 mL/g (R(2) = 0.93) and 192.87 kg/ha (R(2) = 0.88), respectively, in each cycle. Given the good performance of this popcorn population in successive cycles of intrapopulation recurrent selection, we expect that a productive variety with high expansion capacity will soon be available for producers in the north and northwest regions of Rio de Janeiro State, Brazil.
The distribution of genetic variance across phenotypic space and the response to selection.
Blows, Mark W; McGuigan, Katrina
2015-05-01
The role of adaptation in biological invasions will depend on the availability of genetic variation for traits under selection in the new environment. Although genetic variation is present for most traits in most populations, selection is expected to act on combinations of traits, not individual traits in isolation. The distribution of genetic variance across trait combinations can be characterized by the empirical spectral distribution of the genetic variance-covariance (G) matrix. Empirical spectral distributions of G from a range of trait types and taxa all exhibit a characteristic shape; some trait combinations have large levels of genetic variance, while others have very little genetic variance. In this study, we review what is known about the empirical spectral distribution of G and show how it predicts the response to selection across phenotypic space. In particular, trait combinations that form a nearly null genetic subspace with little genetic variance respond only inconsistently to selection. We go on to set out a framework for understanding how the empirical spectral distribution of G may differ from the random expectations that have been developed under random matrix theory (RMT). Using a data set containing a large number of gene expression traits, we illustrate how hypotheses concerning the distribution of multivariate genetic variance can be tested using RMT methods. We suggest that the relative alignment between novel selection pressures during invasion and the nearly null genetic subspace is likely to be an important component of the success or failure of invasion, and for the likelihood of rapid adaptation in small populations in general. © 2014 John Wiley & Sons Ltd.
Zeng, Xueqiang; Luo, Gang
2017-12-01
Machine learning is broadly used for clinical data analysis. Before training a model, a machine learning algorithm must be selected. Also, the values of one or more model parameters termed hyper-parameters must be set. Selecting algorithms and hyper-parameter values requires advanced machine learning knowledge and many labor-intensive manual iterations. To lower the bar to machine learning, miscellaneous automatic selection methods for algorithms and/or hyper-parameter values have been proposed. Existing automatic selection methods are inefficient on large data sets. This poses a challenge for using machine learning in the clinical big data era. To address the challenge, this paper presents progressive sampling-based Bayesian optimization, an efficient and automatic selection method for both algorithms and hyper-parameter values. We report an implementation of the method. We show that compared to a state of the art automatic selection method, our method can significantly reduce search time, classification error rate, and standard deviation of error rate due to randomization. This is major progress towards enabling fast turnaround in identifying high-quality solutions required by many machine learning-based clinical data analysis tasks.
Wolde, Mistire; Tarekegn, Getahun; Kebede, Tedla
2018-05-01
Point-of-care glucometer (PoCG) devices play a significant role in self-monitoring of the blood sugar level, particularly in the follow-up of high blood sugar therapeutic response. The aim of this study was to evaluate blood glucose test results performed with four randomly selected glucometers on diabetes and control subjects versus standard wet chemistry (hexokinase) methods in Addis Ababa, Ethiopia. A prospective cross-sectional study was conducted on randomly selected 200 study participants (100 participants with diabetes and 100 healthy controls). Four randomly selected PoCG devices (CareSens N, DIAVUE Prudential, On Call Extra, i-QARE DS-W) were evaluated against hexokinase method and ISO 15197:2003 and ISO 15197:2013 standards. The minimum and maximum blood sugar values were recorded by CareSens N (21 mg/dl) and hexokinase method (498.8 mg/dl), respectively. The mean sugar values of all PoCG devices except On Call Extra showed significant differences compared with the reference hexokinase method. Meanwhile, all four PoCG devices had strong positive relationship (>80%) with the reference method (hexokinase). On the other hand, none of the four PoCG devices fulfilled the minimum accuracy measurement set by ISO 15197:2003 and ISO 15197:2013 standards. In addition, the linear regression analysis revealed that all four selected PoCG overestimated the glucose concentrations. The overall evaluation of the selected four PoCG measurements were poorly correlated with standard reference method. Therefore, before introducing PoCG devices to the market, there should be a standardized evaluation platform for validation. Further similar large-scale studies on other PoCG devices also need to be undertaken.
Effect of Expanding Medicaid for Parents on Children’s Health Insurance Coverage
DeVoe, Jennifer E.; Marino, Miguel; Angier, Heather; O’Malley, Jean P.; Crawford, Courtney; Nelson, Christine; Tillotson, Carrie J.; Bailey, Steffani R.; Gallia, Charles; Gold, Rachel
2016-01-01
IMPORTANCE In the United States, health insurance is not universal. Observational studies show an association between uninsured parents and children. This association persisted even after expansions in child-only public health insurance. Oregon’s randomized Medicaid expansion for adults, known as the Oregon Experiment, created a rare opportunity to assess causality between parent and child coverage. OBJECTIVE To estimate the effect on a child’s health insurance coverage status when (1) a parent randomly gains access to health insurance and (2) a parent obtains coverage. DESIGN, SETTING, AND PARTICIPANTS Oregon Experiment randomized natural experiment assessing the results of Oregon’s 2008 Medicaid expansion. We used generalized estimating equation models to examine the longitudinal effect of a parent randomly selected to apply for Medicaid on their child’s Medicaid or Children’s Health Insurance Program (CHIP) coverage (intent-to-treat analyses). We used per-protocol analyses to understand the impact on children’s coverage when a parent was randomly selected to apply for and obtained Medicaid. Participants included 14 409 children aged 2 to 18 years whose parents participated in the Oregon Experiment. EXPOSURES For intent-to-treat analyses, the date a parent was selected to apply for Medicaid was considered the date the child was exposed to the intervention. In per-protocol analyses, exposure was defined as whether a selected parent obtained Medicaid. MAIN OUTCOMES AND MEASURES Children’s Medicaid or CHIP coverage, assessed monthly and in 6-month intervals relative to their parent’s selection date. RESULTS In the immediate period after selection, children whose parents were selected to apply significantly increased from 3830 (61.4%) to 4152 (66.6%) compared with a nonsignificant change from 5049 (61.8%) to 5044 (61.7%) for children whose parents were not selected to apply. Children whose parents were randomly selected to apply for Medicaid had 18% higher odds of being covered in the first 6 months after parent’s selection compared with children whose parents were not selected (adjusted odds ratio [AOR] = 1.18; 95% CI, 1.10–1.27). The effect remained significant during months 7 to 12 (AOR = 1.11; 95% CI, 1.03–1.19); months 13 to 18 showed a positive but not significant effect (AOR = 1.07; 95% CI, 0.99–1.14). Children whose parents were selected and obtained coverage had more than double the odds of having coverage compared with children whose parents were not selected and did not gain coverage (AOR = 2.37; 95% CI, 2.14–2.64). CONCLUSIONS AND RELEVANCE Children’s odds of having Medicaid or CHIP coverage increased when their parents were randomly selected to apply for Medicaid. Children whose parents were selected and subsequently obtained coverage benefited most. This study demonstrates a causal link between parents’ access to Medicaid coverage and their children’s coverage. PMID:25561041
Shaping Attention with Reward: Effects of Reward on Space- and Object-Based Selection
Shomstein, Sarah; Johnson, Jacoba
2014-01-01
The contribution of rewarded actions to automatic attentional selection remains obscure. We hypothesized that some forms of automatic orienting, such as object-based selection, can be completely abandoned in lieu of reward maximizing strategy. While presenting identical visual stimuli to the observer, in a set of two experiments, we manipulate what is being rewarded (different object targets or random object locations) and the type of reward received (money or points). It was observed that reward alone guides attentional selection, entirely predicting behavior. These results suggest that guidance of selective attention, while automatic, is flexible and can be adjusted in accordance with external non-sensory reward-based factors. PMID:24121412
Assessing Multivariate Constraints to Evolution across Ten Long-Term Avian Studies
Teplitsky, Celine; Tarka, Maja; Møller, Anders P.; Nakagawa, Shinichi; Balbontín, Javier; Burke, Terry A.; Doutrelant, Claire; Gregoire, Arnaud; Hansson, Bengt; Hasselquist, Dennis; Gustafsson, Lars; de Lope, Florentino; Marzal, Alfonso; Mills, James A.; Wheelwright, Nathaniel T.; Yarrall, John W.; Charmantier, Anne
2014-01-01
Background In a rapidly changing world, it is of fundamental importance to understand processes constraining or facilitating adaptation through microevolution. As different traits of an organism covary, genetic correlations are expected to affect evolutionary trajectories. However, only limited empirical data are available. Methodology/Principal Findings We investigate the extent to which multivariate constraints affect the rate of adaptation, focusing on four morphological traits often shown to harbour large amounts of genetic variance and considered to be subject to limited evolutionary constraints. Our data set includes unique long-term data for seven bird species and a total of 10 populations. We estimate population-specific matrices of genetic correlations and multivariate selection coefficients to predict evolutionary responses to selection. Using Bayesian methods that facilitate the propagation of errors in estimates, we compare (1) the rate of adaptation based on predicted response to selection when including genetic correlations with predictions from models where these genetic correlations were set to zero and (2) the multivariate evolvability in the direction of current selection to the average evolvability in random directions of the phenotypic space. We show that genetic correlations on average decrease the predicted rate of adaptation by 28%. Multivariate evolvability in the direction of current selection was systematically lower than average evolvability in random directions of space. These significant reductions in the rate of adaptation and reduced evolvability were due to a general nonalignment of selection and genetic variance, notably orthogonality of directional selection with the size axis along which most (60%) of the genetic variance is found. Conclusions These results suggest that genetic correlations can impose significant constraints on the evolution of avian morphology in wild populations. This could have important impacts on evolutionary dynamics and hence population persistence in the face of rapid environmental change. PMID:24608111
Family Meals among New Zealand Young People: Relationships with Eating Behaviors and Body Mass Index
ERIC Educational Resources Information Center
Utter, Jennifer; Denny, Simon; Robinson, Elizabeth; Fleming, Terry; Ameratunga, Shanthi; Grant, Sue
2013-01-01
Objective: To examine the relationship between family meals and nutrition behaviors of adolescents. Design: Secondary analysis of Youth'07, a nationally representative survey. Setting: Secondary schools in New Zealand. Participants: Randomly selected adolescents (aged 13-17 years, n = 9,107) completed a multimedia and anonymous survey about their…
The Relationship of Somatotype to Source Credibility.
ERIC Educational Resources Information Center
Toomb, Kevin; Divers, Lawrence T.
A study was designed to measure the effects of the source's body type--endomorph (fat), mesomorph (muscular), and ectomorph (thin)--in relation to his perceived credibility by the receiver. Five hundred subjects were randomly selected from a basic communication course and, in groups of twenty in a classroom setting, were each given a…
ERIC Educational Resources Information Center
Borick, Timothy J.
2011-01-01
This study examined school psychologists' assessment and intervention practices regarding ADHD. Five hundred school psychologists who practiced in a school setting and were regular members of the National Association of School Psychologists were randomly selected to complete and return a questionnaire titled Assessment and Intervention Practices…
Physical Education in Primary Schools: Classroom Teachers' Perceptions of Benefits and Outcomes
ERIC Educational Resources Information Center
Morgan, Philip J.; Hansen, Vibeke
2008-01-01
Objective: The aim of the current study was to examine the perceptions of classroom teachers regarding the benefits and outcomes of their PE programs. Design: Cross-sectional. Setting: Thirty eight randomly selected primary schools in New South Wales (NSW), Australia. Method: A mixed-mode methodology was utilized, incorporating semi-structured…
Examining Evolutions in the Adoption of Metacognitive Regulation in Reciprocal Peer Tutoring Groups
ERIC Educational Resources Information Center
De Backer, Liesje; Van Keer, Hilde; Moerkerke, Beatrijs; Valcke, Martin
2016-01-01
We aimed to investigate how metacognitive regulation is characterised during collaborative learning in a higher education reciprocal peer tutoring (RPT) setting. Sixty-four Educational Sciences students participated in a semester-long RPT-intervention and tutored one another in small groups of six. All sessions of five randomly selected RPT-groups…
47 CFR 1.9050 - Who may sign spectrum leasing notifications and applications.
Code of Federal Regulations, 2014 CFR
2014-10-01
... 47 Telecommunication 1 2014-10-01 2014-10-01 false Who may sign spectrum leasing notifications and... PROCEDURE Grants by Random Selection Spectrum Leasing General Policies and Procedures § 1.9050 Who may sign spectrum leasing notifications and applications. Under the rules set forth in this subpart, certain...
47 CFR 1.9050 - Who may sign spectrum leasing notifications and applications.
Code of Federal Regulations, 2013 CFR
2013-10-01
... 47 Telecommunication 1 2013-10-01 2013-10-01 false Who may sign spectrum leasing notifications and... PROCEDURE Grants by Random Selection Spectrum Leasing General Policies and Procedures § 1.9050 Who may sign spectrum leasing notifications and applications. Under the rules set forth in this subpart, certain...
47 CFR 1.9050 - Who may sign spectrum leasing notifications and applications.
Code of Federal Regulations, 2012 CFR
2012-10-01
... 47 Telecommunication 1 2012-10-01 2012-10-01 false Who may sign spectrum leasing notifications and... PROCEDURE Grants by Random Selection Spectrum Leasing General Policies and Procedures § 1.9050 Who may sign spectrum leasing notifications and applications. Under the rules set forth in this subpart, certain...
The Quest for Quality. Sixteen Forms of Heresy in Higher Education.
ERIC Educational Resources Information Center
Goodlad, Sinclair
This book is an exploration of the current debate about quality in higher education. Using a construct of "heresies," it suggests a set of guiding principles in four key areas of university life: curriculum (because selecting what is worth learning in universities is not random); teaching methods (because universities offer opportunities…
Landing the Best Trustees in Your Boardroom
ERIC Educational Resources Information Center
Shultz, Susan F.
2004-01-01
Isn't it ironic that the one organization with the power to set the direction for schools, the board of education, is the one organization whose members are often randomly selected, rarely evaluated and almost never held accountable to measurable standards of excellence? The author's premise is simple: Better boards of education mean better…
Challenges of Attending E-Learning Studies in Nigeria
ERIC Educational Resources Information Center
Bugi, Stephan Z.
2012-01-01
This study set out to find out what challenges the E-leaner faces in the Nigerian environment. Survey research design was used to obtain the opinion of 200 randomly selected E-learners in Kaduna metropolis. Their responses revealed that the most prominent challenges they face are, Inadequate Power supply, Internet connectivity problems, Efficacy…
Exploring Bullying: An Early Childhood Perspective from Mainland China
ERIC Educational Resources Information Center
Arndt, Janet S.; Luo, Nili
2008-01-01
This article explores bullying in mainland China. The authors conducted a study to determine the existence of a problem with bullying in younger Chinese children. Samples included 40 randomly selected, early childhood educators serving children ages 2 through 6, located in 10 different urban school settings along the Yangzi River. The authors…
Predictors of Career Adaptability Skill among Higher Education Students in Nigeria
ERIC Educational Resources Information Center
Ebenehi, Amos Shaibu; Rashid, Abdullah Mat; Bakar, Ab Rahim
2016-01-01
This paper examined predictors of career adaptability skill among higher education students in Nigeria. A sample of 603 higher education students randomly selected from six colleges of education in Nigeria participated in this study. A set of self-reported questionnaire was used for data collection, and multiple linear regression analysis was used…
NASA Astrophysics Data System (ADS)
Liu, Fei; He, Yong; Wang, Li
2007-11-01
In order to implement the fast discrimination of different milk tea powders with different internal qualities, visible and near infrared (Vis/NIR) spectroscopy combined with effective wavelengths (EWs) and BP neural network (BPNN) was investigated as a new approach. Five brands of milk teas were obtained and 225 samples were selected randomly for the calibration set, while 75 samples for the validation set. The EWs were selected according to x-loading weights and regression coefficients by PLS analysis after some preprocessing. A total of 18 EWs (400, 401, 452, 453, 502, 503, 534, 535, 594, 595, 635, 636, 688, 689, 987, 988, 995 and 996 nm) were selected as the inputs of BPNN model. The performance was validated by the calibration and validation sets. The threshold error of prediction was set as +/-0.1 and an excellent precision and recognition ratio of 100% for calibration set and 98.7% for validation set were achieved. The prediction results indicated that the EWs reflected the main characteristics of milk tea of different brands based on Vis/NIR spectroscopy and BPNN model, and the EWs would be useful for the development of portable instrument to discriminate the variety and detect the adulteration of instant milk tea powders.
A new mosaic method for three-dimensional surface
NASA Astrophysics Data System (ADS)
Yuan, Yun; Zhu, Zhaokun; Ding, Yongjun
2011-08-01
Three-dimensional (3-D) data mosaic is a indispensable link in surface measurement and digital terrain map generation. With respect to the mosaic problem of the local unorganized cloud points with rude registration and mass mismatched points, a new mosaic method for 3-D surface based on RANSAC is proposed. Every circular of this method is processed sequentially by random sample with additional shape constraint, data normalization of cloud points, absolute orientation, data denormalization of cloud points, inlier number statistic, etc. After N random sample trials the largest consensus set is selected, and at last the model is re-estimated using all the points in the selected subset. The minimal subset is composed of three non-colinear points which form a triangle. The shape of triangle is considered in random sample selection in order to make the sample selection reasonable. A new coordinate system transformation algorithm presented in this paper is used to avoid the singularity. The whole rotation transformation between the two coordinate systems can be solved by twice rotations expressed by Euler angle vector, each rotation has explicit physical means. Both simulation and real data are used to prove the correctness and validity of this mosaic method. This method has better noise immunity due to its robust estimation property, and has high accuracy as the shape constraint is added to random sample and the data normalization added to the absolute orientation. This method is applicable for high precision measurement of three-dimensional surface and also for the 3-D terrain mosaic.
Multilabel learning via random label selection for protein subcellular multilocations prediction.
Wang, Xiao; Li, Guo-Zheng
2013-01-01
Prediction of protein subcellular localization is an important but challenging problem, particularly when proteins may simultaneously exist at, or move between, two or more different subcellular location sites. Most of the existing protein subcellular localization methods are only used to deal with the single-location proteins. In the past few years, only a few methods have been proposed to tackle proteins with multiple locations. However, they only adopt a simple strategy, that is, transforming the multilocation proteins to multiple proteins with single location, which does not take correlations among different subcellular locations into account. In this paper, a novel method named random label selection (RALS) (multilabel learning via RALS), which extends the simple binary relevance (BR) method, is proposed to learn from multilocation proteins in an effective and efficient way. RALS does not explicitly find the correlations among labels, but rather implicitly attempts to learn the label correlations from data by augmenting original feature space with randomly selected labels as its additional input features. Through the fivefold cross-validation test on a benchmark data set, we demonstrate our proposed method with consideration of label correlations obviously outperforms the baseline BR method without consideration of label correlations, indicating correlations among different subcellular locations really exist and contribute to improvement of prediction performance. Experimental results on two benchmark data sets also show that our proposed methods achieve significantly higher performance than some other state-of-the-art methods in predicting subcellular multilocations of proteins. The prediction web server is available at >http://levis.tongji.edu.cn:8080/bioinfo/MLPred-Euk/ for the public usage.
Frequentist Model Averaging in Structural Equation Modelling.
Jin, Shaobo; Ankargren, Sebastian
2018-06-04
Model selection from a set of candidate models plays an important role in many structural equation modelling applications. However, traditional model selection methods introduce extra randomness that is not accounted for by post-model selection inference. In the current study, we propose a model averaging technique within the frequentist statistical framework. Instead of selecting an optimal model, the contributions of all candidate models are acknowledged. Valid confidence intervals and a [Formula: see text] test statistic are proposed. A simulation study shows that the proposed method is able to produce a robust mean-squared error, a better coverage probability, and a better goodness-of-fit test compared to model selection. It is an interesting compromise between model selection and the full model.
Rincent, R; Laloë, D; Nicolas, S; Altmann, T; Brunel, D; Revilla, P; Rodríguez, V M; Moreno-Gonzalez, J; Melchinger, A; Bauer, E; Schoen, C-C; Meyer, N; Giauffret, C; Bauland, C; Jamin, P; Laborde, J; Monod, H; Flament, P; Charcosset, A; Moreau, L
2012-10-01
Genomic selection refers to the use of genotypic information for predicting breeding values of selection candidates. A prediction formula is calibrated with the genotypes and phenotypes of reference individuals constituting the calibration set. The size and the composition of this set are essential parameters affecting the prediction reliabilities. The objective of this study was to maximize reliabilities by optimizing the calibration set. Different criteria based on the diversity or on the prediction error variance (PEV) derived from the realized additive relationship matrix-best linear unbiased predictions model (RA-BLUP) were used to select the reference individuals. For the latter, we considered the mean of the PEV of the contrasts between each selection candidate and the mean of the population (PEVmean) and the mean of the expected reliabilities of the same contrasts (CDmean). These criteria were tested with phenotypic data collected on two diversity panels of maize (Zea mays L.) genotyped with a 50k SNPs array. In the two panels, samples chosen based on CDmean gave higher reliabilities than random samples for various calibration set sizes. CDmean also appeared superior to PEVmean, which can be explained by the fact that it takes into account the reduction of variance due to the relatedness between individuals. Selected samples were close to optimality for a wide range of trait heritabilities, which suggests that the strategy presented here can efficiently sample subsets in panels of inbred lines. A script to optimize reference samples based on CDmean is available on request.
Training set optimization under population structure in genomic selection.
Isidro, Julio; Jannink, Jean-Luc; Akdemir, Deniz; Poland, Jesse; Heslot, Nicolas; Sorrells, Mark E
2015-01-01
Population structure must be evaluated before optimization of the training set population. Maximizing the phenotypic variance captured by the training set is important for optimal performance. The optimization of the training set (TRS) in genomic selection has received much interest in both animal and plant breeding, because it is critical to the accuracy of the prediction models. In this study, five different TRS sampling algorithms, stratified sampling, mean of the coefficient of determination (CDmean), mean of predictor error variance (PEVmean), stratified CDmean (StratCDmean) and random sampling, were evaluated for prediction accuracy in the presence of different levels of population structure. In the presence of population structure, the most phenotypic variation captured by a sampling method in the TRS is desirable. The wheat dataset showed mild population structure, and CDmean and stratified CDmean methods showed the highest accuracies for all the traits except for test weight and heading date. The rice dataset had strong population structure and the approach based on stratified sampling showed the highest accuracies for all traits. In general, CDmean minimized the relationship between genotypes in the TRS, maximizing the relationship between TRS and the test set. This makes it suitable as an optimization criterion for long-term selection. Our results indicated that the best selection criterion used to optimize the TRS seems to depend on the interaction of trait architecture and population structure.
Defining an essence of structure determining residue contacts in proteins.
Sathyapriya, R; Duarte, Jose M; Stehr, Henning; Filippis, Ioannis; Lappe, Michael
2009-12-01
The network of native non-covalent residue contacts determines the three-dimensional structure of a protein. However, not all contacts are of equal structural significance, and little knowledge exists about a minimal, yet sufficient, subset required to define the global features of a protein. Characterisation of this "structural essence" has remained elusive so far: no algorithmic strategy has been devised to-date that could outperform a random selection in terms of 3D reconstruction accuracy (measured as the Ca RMSD). It is not only of theoretical interest (i.e., for design of advanced statistical potentials) to identify the number and nature of essential native contacts-such a subset of spatial constraints is very useful in a number of novel experimental methods (like EPR) which rely heavily on constraint-based protein modelling. To derive accurate three-dimensional models from distance constraints, we implemented a reconstruction pipeline using distance geometry. We selected a test-set of 12 protein structures from the four major SCOP fold classes and performed our reconstruction analysis. As a reference set, series of random subsets (ranging from 10% to 90% of native contacts) are generated for each protein, and the reconstruction accuracy is computed for each subset. We have developed a rational strategy, termed "cone-peeling" that combines sequence features and network descriptors to select minimal subsets that outperform the reference sets. We present, for the first time, a rational strategy to derive a structural essence of residue contacts and provide an estimate of the size of this minimal subset. Our algorithm computes sparse subsets capable of determining the tertiary structure at approximately 4.8 A Ca RMSD with as little as 8% of the native contacts (Ca-Ca and Cb-Cb). At the same time, a randomly chosen subset of native contacts needs about twice as many contacts to reach the same level of accuracy. This "structural essence" opens new avenues in the fields of structure prediction, empirical potentials and docking.
Defining an Essence of Structure Determining Residue Contacts in Proteins
Sathyapriya, R.; Duarte, Jose M.; Stehr, Henning; Filippis, Ioannis; Lappe, Michael
2009-01-01
The network of native non-covalent residue contacts determines the three-dimensional structure of a protein. However, not all contacts are of equal structural significance, and little knowledge exists about a minimal, yet sufficient, subset required to define the global features of a protein. Characterisation of this “structural essence” has remained elusive so far: no algorithmic strategy has been devised to-date that could outperform a random selection in terms of 3D reconstruction accuracy (measured as the Ca RMSD). It is not only of theoretical interest (i.e., for design of advanced statistical potentials) to identify the number and nature of essential native contacts—such a subset of spatial constraints is very useful in a number of novel experimental methods (like EPR) which rely heavily on constraint-based protein modelling. To derive accurate three-dimensional models from distance constraints, we implemented a reconstruction pipeline using distance geometry. We selected a test-set of 12 protein structures from the four major SCOP fold classes and performed our reconstruction analysis. As a reference set, series of random subsets (ranging from 10% to 90% of native contacts) are generated for each protein, and the reconstruction accuracy is computed for each subset. We have developed a rational strategy, termed “cone-peeling” that combines sequence features and network descriptors to select minimal subsets that outperform the reference sets. We present, for the first time, a rational strategy to derive a structural essence of residue contacts and provide an estimate of the size of this minimal subset. Our algorithm computes sparse subsets capable of determining the tertiary structure at approximately 4.8 Å Ca RMSD with as little as 8% of the native contacts (Ca-Ca and Cb-Cb). At the same time, a randomly chosen subset of native contacts needs about twice as many contacts to reach the same level of accuracy. This “structural essence” opens new avenues in the fields of structure prediction, empirical potentials and docking. PMID:19997489
CNN-BLPred: a Convolutional neural network based predictor for β-Lactamases (BL) and their classes.
White, Clarence; Ismail, Hamid D; Saigo, Hiroto; Kc, Dukka B
2017-12-28
The β-Lactamase (BL) enzyme family is an important class of enzymes that plays a key role in bacterial resistance to antibiotics. As the newly identified number of BL enzymes is increasing daily, it is imperative to develop a computational tool to classify the newly identified BL enzymes into one of its classes. There are two types of classification of BL enzymes: Molecular Classification and Functional Classification. Existing computational methods only address Molecular Classification and the performance of these existing methods is unsatisfactory. We addressed the unsatisfactory performance of the existing methods by implementing a Deep Learning approach called Convolutional Neural Network (CNN). We developed CNN-BLPred, an approach for the classification of BL proteins. The CNN-BLPred uses Gradient Boosted Feature Selection (GBFS) in order to select the ideal feature set for each BL classification. Based on the rigorous benchmarking of CCN-BLPred using both leave-one-out cross-validation and independent test sets, CCN-BLPred performed better than the other existing algorithms. Compared with other architectures of CNN, Recurrent Neural Network, and Random Forest, the simple CNN architecture with only one convolutional layer performs the best. After feature extraction, we were able to remove ~95% of the 10,912 features using Gradient Boosted Trees. During 10-fold cross validation, we increased the accuracy of the classic BL predictions by 7%. We also increased the accuracy of Class A, Class B, Class C, and Class D performance by an average of 25.64%. The independent test results followed a similar trend. We implemented a deep learning algorithm known as Convolutional Neural Network (CNN) to develop a classifier for BL classification. Combined with feature selection on an exhaustive feature set and using balancing method such as Random Oversampling (ROS), Random Undersampling (RUS) and Synthetic Minority Oversampling Technique (SMOTE), CNN-BLPred performs significantly better than existing algorithms for BL classification.
Hybrid feature selection for supporting lightweight intrusion detection systems
NASA Astrophysics Data System (ADS)
Song, Jianglong; Zhao, Wentao; Liu, Qiang; Wang, Xin
2017-08-01
Redundant and irrelevant features not only cause high resource consumption but also degrade the performance of Intrusion Detection Systems (IDS), especially when coping with big data. These features slow down the process of training and testing in network traffic classification. Therefore, a hybrid feature selection approach in combination with wrapper and filter selection is designed in this paper to build a lightweight intrusion detection system. Two main phases are involved in this method. The first phase conducts a preliminary search for an optimal subset of features, in which the chi-square feature selection is utilized. The selected set of features from the previous phase is further refined in the second phase in a wrapper manner, in which the Random Forest(RF) is used to guide the selection process and retain an optimized set of features. After that, we build an RF-based detection model and make a fair comparison with other approaches. The experimental results on NSL-KDD datasets show that our approach results are in higher detection accuracy as well as faster training and testing processes.
NASA Astrophysics Data System (ADS)
Fridrich, Jessica; Goljan, Miroslav; Lisonek, Petr; Soukal, David
2005-03-01
In this paper, we show that the communication channel known as writing in memory with defective cells is a relevant information-theoretical model for a specific case of passive warden steganography when the sender embeds a secret message into a subset C of the cover object X without sharing the selection channel C with the recipient. The set C could be arbitrary, determined by the sender from the cover object using a deterministic, pseudo-random, or a truly random process. We call this steganography "writing on wet paper" and realize it using low-density random linear codes with the encoding step based on the LT process. The importance of writing on wet paper for covert communication is discussed within the context of adaptive steganography and perturbed quantization steganography. Heuristic arguments supported by tests using blind steganalysis indicate that the wet paper steganography provides improved steganographic security for embedding in JPEG images and is less vulnerable to attacks when compared to existing methods with shared selection channels.
Knowledge diffusion of dynamical network in terms of interaction frequency.
Liu, Jian-Guo; Zhou, Qing; Guo, Qiang; Yang, Zhen-Hua; Xie, Fei; Han, Jing-Ti
2017-09-07
In this paper, we present a knowledge diffusion (SKD) model for dynamic networks by taking into account the interaction frequency which always used to measure the social closeness. A set of agents, which are initially interconnected to form a random network, either exchange knowledge with their neighbors or move toward a new location through an edge-rewiring procedure. The activity of knowledge exchange between agents is determined by a knowledge transfer rule that the target node would preferentially select one neighbor node to transfer knowledge with probability p according to their interaction frequency instead of the knowledge distance, otherwise, the target node would build a new link with its second-order neighbor preferentially or select one node in the system randomly with probability 1 - p. The simulation results show that, comparing with the Null model defined by the random selection mechanism and the traditional knowledge diffusion (TKD) model driven by knowledge distance, the knowledge would spread more fast based on SKD driven by interaction frequency. In particular, the network structure of SKD would evolve as an assortative one, which is a fundamental feature of social networks. This work would be helpful for deeply understanding the coevolution of the knowledge diffusion and network structure.
Improvement of Predictive Ability by Uniform Coverage of the Target Genetic Space
Bustos-Korts, Daniela; Malosetti, Marcos; Chapman, Scott; Biddulph, Ben; van Eeuwijk, Fred
2016-01-01
Genome-enabled prediction provides breeders with the means to increase the number of genotypes that can be evaluated for selection. One of the major challenges in genome-enabled prediction is how to construct a training set of genotypes from a calibration set that represents the target population of genotypes, where the calibration set is composed of a training and validation set. A random sampling protocol of genotypes from the calibration set will lead to low quality coverage of the total genetic space by the training set when the calibration set contains population structure. As a consequence, predictive ability will be affected negatively, because some parts of the genotypic diversity in the target population will be under-represented in the training set, whereas other parts will be over-represented. Therefore, we propose a training set construction method that uniformly samples the genetic space spanned by the target population of genotypes, thereby increasing predictive ability. To evaluate our method, we constructed training sets alongside with the identification of corresponding genomic prediction models for four genotype panels that differed in the amount of population structure they contained (maize Flint, maize Dent, wheat, and rice). Training sets were constructed using uniform sampling, stratified-uniform sampling, stratified sampling and random sampling. We compared these methods with a method that maximizes the generalized coefficient of determination (CD). Several training set sizes were considered. We investigated four genomic prediction models: multi-locus QTL models, GBLUP models, combinations of QTL and GBLUPs, and Reproducing Kernel Hilbert Space (RKHS) models. For the maize and wheat panels, construction of the training set under uniform sampling led to a larger predictive ability than under stratified and random sampling. The results of our methods were similar to those of the CD method. For the rice panel, all training set construction methods led to similar predictive ability, a reflection of the very strong population structure in this panel. PMID:27672112
Physical key-protected one-time pad
Horstmeyer, Roarke; Judkewitz, Benjamin; Vellekoop, Ivo M.; Assawaworrarit, Sid; Yang, Changhuei
2013-01-01
We describe an encrypted communication principle that forms a secure link between two parties without electronically saving either of their keys. Instead, random cryptographic bits are kept safe within the unique mesoscopic randomness of two volumetric scattering materials. We demonstrate how a shared set of patterned optical probes can generate 10 gigabits of statistically verified randomness between a pair of unique 2 mm3 scattering objects. This shared randomness is used to facilitate information-theoretically secure communication following a modified one-time pad protocol. Benefits of volumetric physical storage over electronic memory include the inability to probe, duplicate or selectively reset any bits without fundamentally altering the entire key space. Our ability to securely couple the randomness contained within two unique physical objects can extend to strengthen hardware required by a variety of cryptographic protocols, which is currently a critically weak link in the security pipeline of our increasingly mobile communication culture. PMID:24345925
Bayesian Genomic Prediction with Genotype × Environment Interaction Kernel Models
Cuevas, Jaime; Crossa, José; Montesinos-López, Osval A.; Burgueño, Juan; Pérez-Rodríguez, Paulino; de los Campos, Gustavo
2016-01-01
The phenomenon of genotype × environment (G × E) interaction in plant breeding decreases selection accuracy, thereby negatively affecting genetic gains. Several genomic prediction models incorporating G × E have been recently developed and used in genomic selection of plant breeding programs. Genomic prediction models for assessing multi-environment G × E interaction are extensions of a single-environment model, and have advantages and limitations. In this study, we propose two multi-environment Bayesian genomic models: the first model considers genetic effects (u) that can be assessed by the Kronecker product of variance–covariance matrices of genetic correlations between environments and genomic kernels through markers under two linear kernel methods, linear (genomic best linear unbiased predictors, GBLUP) and Gaussian (Gaussian kernel, GK). The other model has the same genetic component as the first model (u) plus an extra component, f, that captures random effects between environments that were not captured by the random effects u. We used five CIMMYT data sets (one maize and four wheat) that were previously used in different studies. Results show that models with G × E always have superior prediction ability than single-environment models, and the higher prediction ability of multi-environment models with u and f over the multi-environment model with only u occurred 85% of the time with GBLUP and 45% of the time with GK across the five data sets. The latter result indicated that including the random effect f is still beneficial for increasing prediction ability after adjusting by the random effect u. PMID:27793970
Bayesian Genomic Prediction with Genotype × Environment Interaction Kernel Models.
Cuevas, Jaime; Crossa, José; Montesinos-López, Osval A; Burgueño, Juan; Pérez-Rodríguez, Paulino; de Los Campos, Gustavo
2017-01-05
The phenomenon of genotype × environment (G × E) interaction in plant breeding decreases selection accuracy, thereby negatively affecting genetic gains. Several genomic prediction models incorporating G × E have been recently developed and used in genomic selection of plant breeding programs. Genomic prediction models for assessing multi-environment G × E interaction are extensions of a single-environment model, and have advantages and limitations. In this study, we propose two multi-environment Bayesian genomic models: the first model considers genetic effects [Formula: see text] that can be assessed by the Kronecker product of variance-covariance matrices of genetic correlations between environments and genomic kernels through markers under two linear kernel methods, linear (genomic best linear unbiased predictors, GBLUP) and Gaussian (Gaussian kernel, GK). The other model has the same genetic component as the first model [Formula: see text] plus an extra component, F: , that captures random effects between environments that were not captured by the random effects [Formula: see text] We used five CIMMYT data sets (one maize and four wheat) that were previously used in different studies. Results show that models with G × E always have superior prediction ability than single-environment models, and the higher prediction ability of multi-environment models with [Formula: see text] over the multi-environment model with only u occurred 85% of the time with GBLUP and 45% of the time with GK across the five data sets. The latter result indicated that including the random effect f is still beneficial for increasing prediction ability after adjusting by the random effect [Formula: see text]. Copyright © 2017 Cuevas et al.
Selecting promising treatments in randomized Phase II cancer trials with an active control.
Cheung, Ying Kuen
2009-01-01
The primary objective of Phase II cancer trials is to evaluate the potential efficacy of a new regimen in terms of its antitumor activity in a given type of cancer. Due to advances in oncology therapeutics and heterogeneity in the patient population, such evaluation can be interpreted objectively only in the presence of a prospective control group of an active standard treatment. This paper deals with the design problem of Phase II selection trials in which several experimental regimens are compared to an active control, with an objective to identify an experimental arm that is more effective than the control or to declare futility if no such treatment exists. Conducting a multi-arm randomized selection trial is a useful strategy to prioritize experimental treatments for further testing when many candidates are available, but the sample size required in such a trial with an active control could raise feasibility concerns. In this study, we extend the sequential probability ratio test for normal observations to the multi-arm selection setting. The proposed methods, allowing frequent interim monitoring, offer high likelihood of early trial termination, and as such enhance enrollment feasibility. The termination and selection criteria have closed form solutions and are easy to compute with respect to any given set of error constraints. The proposed methods are applied to design a selection trial in which combinations of sorafenib and erlotinib are compared to a control group in patients with non-small-cell lung cancer using a continuous endpoint of change in tumor size. The operating characteristics of the proposed methods are compared to that of a single-stage design via simulations: The sample size requirement is reduced substantially and is feasible at an early stage of drug development.
Jackknifing Techniques for Evaluation of Equating Accuracy. Research Report. ETS RR-09-39
ERIC Educational Resources Information Center
Haberman, Shelby J.; Lee, Yi-Hsuan; Qian, Jiahe
2009-01-01
Grouped jackknifing may be used to evaluate the stability of equating procedures with respect to sampling error and with respect to changes in anchor selection. Properties of grouped jackknifing are reviewed for simple-random and stratified sampling, and its use is described for comparisons of anchor sets. Application is made to examples of item…
Assessing the Warm Glow Effect in Contingent Valuations for Public Libraries
ERIC Educational Resources Information Center
Lee, Soon-Jae; Chung, Hye-Kyung; Jung, Eun-Joo
2010-01-01
This article aims to present evidence of the warm glow effect in a public library setting. More specifically, it tests whether individual respondents with different values for the warm glow component report different values for their willingness to pay (WTP). The data come from a contingent valuation survey conducted on randomly selected citizens…
USDA-ARS?s Scientific Manuscript database
During normal bacterial DNA replication, gene duplication and amplification (GDA) events occur randomly at a low frequency in the genome throughout a population. In the absence of selection, GDA events that increase the number of copies of a bacterial gene (or a set of genes) are lost. Antibiotic ...
Knowledge about HIV and AIDS among Young South Africans in the Capricorn District, Limpopo Province
ERIC Educational Resources Information Center
Melwa, Irene T.; Oduntan, Olalekan A.
2012-01-01
Objective: To assess the basic knowledge about HIV and AIDS among young South Africans in the Capricorn District of Limpopo Province, South Africa. Design: A questionnaire-based cohort study, involving data collection from senior high school students. Setting: Randomly selected high schools in the Capricorn District, Limpopo Province, South…
ERIC Educational Resources Information Center
Gavaravarapu, Subba Rao M.; Vemula, Sudershan R.; Rao, Pratima; Mendu, Vishnu Vardhana Rao; Polasa, Kalpagam
2009-01-01
Objective: To understand food safety knowledge, perceptions, and practices of adolescent girls. Design: Focus group discussions (FGDs) with 32 groups selected using stratified random sampling. Setting: Four South Indian states. Participants: Adolescent girls (10-19 years). Phenomena of Interest: Food safety knowledge, perceptions, and practices.…
ERIC Educational Resources Information Center
Scott, Joseph J.; Hansen, Vibeke; Morgan, Philip J.; Plotnikoff, Ronald C.; Lubans, David R.
2018-01-01
Objective: To explore young people's perceptions of pedometers and investigate behaviours exhibited while being monitored. Design: Qualitative design using six focus groups with participants (mean age 14.7 years). Setting: Study participants (n = 24) were randomly selected from a previous study of 123 young people aged 14-15 years from three…
Short Form of the Developmental Behaviour Checklist
ERIC Educational Resources Information Center
Taffe, John R.; Gray, Kylie M.; Einfeld, Stewart L.; Dekker, Marielle C.; Koot, Hans M.; Emerson, Eric; Koskentausta, Terhi; Tonge, Bruce J.
2007-01-01
A 24-item short form of the 96-item Developmental Behaviour Checklist was developed to provide a brief measure of Total Behaviour Problem Score for research purposes. The short form Developmental Behaviour Checklist (DBC-P24) was chosen for low bias and high precision from among 100 randomly selected item sets. The DBC-P24 was developed from…
A Qualitative Study of Irish Teachers' Perspective of Student Substance Use
ERIC Educational Resources Information Center
Van Hout, Marie Claire; Connor, Sean
2008-01-01
Research Aim: This research aimed to provide an anecdotal perception of student substance use according to the teachers' personal experience in the Irish secondary level educational setting. Methodology: Sampling Interviews were conducted with teachers (n=95) at 10 randomly selected schools in County Carlow in the South East of Ireland, as part of…
Agent Reward Shaping for Alleviating Traffic Congestion
NASA Technical Reports Server (NTRS)
Tumer, Kagan; Agogino, Adrian
2006-01-01
Traffic congestion problems provide a unique environment to study how multi-agent systems promote desired system level behavior. What is particularly interesting in this class of problems is that no individual action is intrinsically "bad" for the system but that combinations of actions among agents lead to undesirable outcomes, As a consequence, agents need to learn how to coordinate their actions with those of other agents, rather than learn a particular set of "good" actions. This problem is ubiquitous in various traffic problems, including selecting departure times for commuters, routes for airlines, and paths for data routers. In this paper we present a multi-agent approach to two traffic problems, where far each driver, an agent selects the most suitable action using reinforcement learning. The agent rewards are based on concepts from collectives and aim to provide the agents with rewards that are both easy to learn and that if learned, lead to good system level behavior. In the first problem, we study how agents learn the best departure times of drivers in a daily commuting environment and how following those departure times alleviates congestion. In the second problem, we study how agents learn to select desirable routes to improve traffic flow and minimize delays for. all drivers.. In both sets of experiments,. agents using collective-based rewards produced near optimal performance (93-96% of optimal) whereas agents using system rewards (63-68%) barely outperformed random action selection (62-64%) and agents using local rewards (48-72%) performed worse than random in some instances.
Dimitriadis, S I; Liparas, Dimitris; Tsolaki, Magda N
2018-05-15
In the era of computer-assisted diagnostic tools for various brain diseases, Alzheimer's disease (AD) covers a large percentage of neuroimaging research, with the main scope being its use in daily practice. However, there has been no study attempting to simultaneously discriminate among Healthy Controls (HC), early mild cognitive impairment (MCI), late MCI (cMCI) and stable AD, using features derived from a single modality, namely MRI. Based on preprocessed MRI images from the organizers of a neuroimaging challenge, 3 we attempted to quantify the prediction accuracy of multiple morphological MRI features to simultaneously discriminate among HC, MCI, cMCI and AD. We explored the efficacy of a novel scheme that includes multiple feature selections via Random Forest from subsets of the whole set of features (e.g. whole set, left/right hemisphere etc.), Random Forest classification using a fusion approach and ensemble classification via majority voting. From the ADNI database, 60 HC, 60 MCI, 60 cMCI and 60 CE were used as a training set with known labels. An extra dataset of 160 subjects (HC: 40, MCI: 40, cMCI: 40 and AD: 40) was used as an external blind validation dataset to evaluate the proposed machine learning scheme. In the second blind dataset, we succeeded in a four-class classification of 61.9% by combining MRI-based features with a Random Forest-based Ensemble Strategy. We achieved the best classification accuracy of all teams that participated in this neuroimaging competition. The results demonstrate the effectiveness of the proposed scheme to simultaneously discriminate among four groups using morphological MRI features for the very first time in the literature. Hence, the proposed machine learning scheme can be used to define single and multi-modal biomarkers for AD. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Chemura, Abel; Mutanga, Onisimo; Dube, Timothy
2017-08-01
Water management is an important component in agriculture, particularly for perennial tree crops such as coffee. Proper detection and monitoring of water stress therefore plays an important role not only in mitigating the associated adverse impacts on crop growth and productivity but also in reducing expensive and environmentally unsustainable irrigation practices. Current methods for water stress detection in coffee production mainly involve monitoring plant physiological characteristics and soil conditions. In this study, we tested the ability of selected wavebands in the VIS/NIR range to predict plant water content (PWC) in coffee using the random forest algorithm. An experiment was set up such that coffee plants were exposed to different levels of water stress and reflectance and plant water content measured. In selecting appropriate parameters, cross-correlation identified 11 wavebands, reflectance difference identified 16 and reflectance sensitivity identified 22 variables related to PWC. Only three wavebands (485 nm, 670 nm and 885 nm) were identified by at least two methods as significant. The selected wavebands were trained (n = 36) and tested on independent data (n = 24) after being integrated into the random forest algorithm to predict coffee PWC. The results showed that the reflectance sensitivity selected bands performed the best in water stress detection (r = 0.87, RMSE = 4.91% and pBias = 0.9%), when compared to reflectance difference (r = 0.79, RMSE = 6.19 and pBias = 2.5%) and cross-correlation selected wavebands (r = 0.75, RMSE = 6.52 and pBias = 1.6). These results indicate that it is possible to reliably predict PWC using wavebands in the VIS/NIR range that correspond with many of the available multispectral scanners using random forests and further research at field and landscape scale is required to operationalize these findings.
Messier, S P; Callahan, L F; Golightly, Y M; Keefe, F J
2015-05-01
The objective was to develop a set of "best practices" for use as a primer for those interested in entering the clinical trials field for lifestyle diet and/or exercise interventions in osteoarthritis (OA), and as a set of recommendations for experienced clinical trials investigators. A subcommittee of the non-pharmacologic therapies committee of the OARSI Clinical Trials Working Group was selected by the Steering Committee to develop a set of recommended principles for non-pharmacologic diet/exercise OA randomized clinical trials. Topics were identified for inclusion by co-authors and reviewed by the subcommittee. Resources included authors' expert opinions, traditional search methods including MEDLINE (via PubMed), and previously published guidelines. Suggested steps and considerations for study methods (e.g., recruitment and enrollment of participants, study design, intervention and assessment methods) were recommended. The recommendations set forth in this paper provide a guide from which a research group can design a lifestyle diet/exercise randomized clinical trial in patients with OA. Copyright © 2015 Osteoarthritis Research Society International. Published by Elsevier Ltd. All rights reserved.
Advantages of Synthetic Noise and Machine Learning for Analyzing Radioecological Data Sets.
Shuryak, Igor
2017-01-01
The ecological effects of accidental or malicious radioactive contamination are insufficiently understood because of the hazards and difficulties associated with conducting studies in radioactively-polluted areas. Data sets from severely contaminated locations can therefore be small. Moreover, many potentially important factors, such as soil concentrations of toxic chemicals, pH, and temperature, can be correlated with radiation levels and with each other. In such situations, commonly-used statistical techniques like generalized linear models (GLMs) may not be able to provide useful information about how radiation and/or these other variables affect the outcome (e.g. abundance of the studied organisms). Ensemble machine learning methods such as random forests offer powerful alternatives. We propose that analysis of small radioecological data sets by GLMs and/or machine learning can be made more informative by using the following techniques: (1) adding synthetic noise variables to provide benchmarks for distinguishing the performances of valuable predictors from irrelevant ones; (2) adding noise directly to the predictors and/or to the outcome to test the robustness of analysis results against random data fluctuations; (3) adding artificial effects to selected predictors to test the sensitivity of the analysis methods in detecting predictor effects; (4) running a selected machine learning method multiple times (with different random-number seeds) to test the robustness of the detected "signal"; (5) using several machine learning methods to test the "signal's" sensitivity to differences in analysis techniques. Here, we applied these approaches to simulated data, and to two published examples of small radioecological data sets: (I) counts of fungal taxa in samples of soil contaminated by the Chernobyl nuclear power plan accident (Ukraine), and (II) bacterial abundance in soil samples under a ruptured nuclear waste storage tank (USA). We show that the proposed techniques were advantageous compared with the methodology used in the original publications where the data sets were presented. Specifically, our approach identified a negative effect of radioactive contamination in data set I, and suggested that in data set II stable chromium could have been a stronger limiting factor for bacterial abundance than the radionuclides 137Cs and 99Tc. This new information, which was extracted from these data sets using the proposed techniques, can potentially enhance the design of radioactive waste bioremediation.
Advantages of Synthetic Noise and Machine Learning for Analyzing Radioecological Data Sets
Shuryak, Igor
2017-01-01
The ecological effects of accidental or malicious radioactive contamination are insufficiently understood because of the hazards and difficulties associated with conducting studies in radioactively-polluted areas. Data sets from severely contaminated locations can therefore be small. Moreover, many potentially important factors, such as soil concentrations of toxic chemicals, pH, and temperature, can be correlated with radiation levels and with each other. In such situations, commonly-used statistical techniques like generalized linear models (GLMs) may not be able to provide useful information about how radiation and/or these other variables affect the outcome (e.g. abundance of the studied organisms). Ensemble machine learning methods such as random forests offer powerful alternatives. We propose that analysis of small radioecological data sets by GLMs and/or machine learning can be made more informative by using the following techniques: (1) adding synthetic noise variables to provide benchmarks for distinguishing the performances of valuable predictors from irrelevant ones; (2) adding noise directly to the predictors and/or to the outcome to test the robustness of analysis results against random data fluctuations; (3) adding artificial effects to selected predictors to test the sensitivity of the analysis methods in detecting predictor effects; (4) running a selected machine learning method multiple times (with different random-number seeds) to test the robustness of the detected “signal”; (5) using several machine learning methods to test the “signal’s” sensitivity to differences in analysis techniques. Here, we applied these approaches to simulated data, and to two published examples of small radioecological data sets: (I) counts of fungal taxa in samples of soil contaminated by the Chernobyl nuclear power plan accident (Ukraine), and (II) bacterial abundance in soil samples under a ruptured nuclear waste storage tank (USA). We show that the proposed techniques were advantageous compared with the methodology used in the original publications where the data sets were presented. Specifically, our approach identified a negative effect of radioactive contamination in data set I, and suggested that in data set II stable chromium could have been a stronger limiting factor for bacterial abundance than the radionuclides 137Cs and 99Tc. This new information, which was extracted from these data sets using the proposed techniques, can potentially enhance the design of radioactive waste bioremediation. PMID:28068401
Selecting materialized views using random algorithm
NASA Astrophysics Data System (ADS)
Zhou, Lijuan; Hao, Zhongxiao; Liu, Chi
2007-04-01
The data warehouse is a repository of information collected from multiple possibly heterogeneous autonomous distributed databases. The information stored at the data warehouse is in form of views referred to as materialized views. The selection of the materialized views is one of the most important decisions in designing a data warehouse. Materialized views are stored in the data warehouse for the purpose of efficiently implementing on-line analytical processing queries. The first issue for the user to consider is query response time. So in this paper, we develop algorithms to select a set of views to materialize in data warehouse in order to minimize the total view maintenance cost under the constraint of a given query response time. We call it query_cost view_ selection problem. First, cost graph and cost model of query_cost view_ selection problem are presented. Second, the methods for selecting materialized views by using random algorithms are presented. The genetic algorithm is applied to the materialized views selection problem. But with the development of genetic process, the legal solution produced become more and more difficult, so a lot of solutions are eliminated and producing time of the solutions is lengthened in genetic algorithm. Therefore, improved algorithm has been presented in this paper, which is the combination of simulated annealing algorithm and genetic algorithm for the purpose of solving the query cost view selection problem. Finally, in order to test the function and efficiency of our algorithms experiment simulation is adopted. The experiments show that the given methods can provide near-optimal solutions in limited time and works better in practical cases. Randomized algorithms will become invaluable tools for data warehouse evolution.
Probabilistic pathway construction.
Yousofshahi, Mona; Lee, Kyongbum; Hassoun, Soha
2011-07-01
Expression of novel synthesis pathways in host organisms amenable to genetic manipulations has emerged as an attractive metabolic engineering strategy to overproduce natural products, biofuels, biopolymers and other commercially useful metabolites. We present a pathway construction algorithm for identifying viable synthesis pathways compatible with balanced cell growth. Rather than exhaustive exploration, we investigate probabilistic selection of reactions to construct the pathways. Three different selection schemes are investigated for the selection of reactions: high metabolite connectivity, low connectivity and uniformly random. For all case studies, which involved a diverse set of target metabolites, the uniformly random selection scheme resulted in the highest average maximum yield. When compared to an exhaustive search enumerating all possible reaction routes, our probabilistic algorithm returned nearly identical distributions of yields, while requiring far less computing time (minutes vs. years). The pathways identified by our algorithm have previously been confirmed in the literature as viable, high-yield synthesis routes. Prospectively, our algorithm could facilitate the design of novel, non-native synthesis routes by efficiently exploring the diversity of biochemical transformations in nature. Copyright © 2011 Elsevier Inc. All rights reserved.
Correcting Classifiers for Sample Selection Bias in Two-Phase Case-Control Studies
Theis, Fabian J.
2017-01-01
Epidemiological studies often utilize stratified data in which rare outcomes or exposures are artificially enriched. This design can increase precision in association tests but distorts predictions when applying classifiers on nonstratified data. Several methods correct for this so-called sample selection bias, but their performance remains unclear especially for machine learning classifiers. With an emphasis on two-phase case-control studies, we aim to assess which corrections to perform in which setting and to obtain methods suitable for machine learning techniques, especially the random forest. We propose two new resampling-based methods to resemble the original data and covariance structure: stochastic inverse-probability oversampling and parametric inverse-probability bagging. We compare all techniques for the random forest and other classifiers, both theoretically and on simulated and real data. Empirical results show that the random forest profits from only the parametric inverse-probability bagging proposed by us. For other classifiers, correction is mostly advantageous, and methods perform uniformly. We discuss consequences of inappropriate distribution assumptions and reason for different behaviors between the random forest and other classifiers. In conclusion, we provide guidance for choosing correction methods when training classifiers on biased samples. For random forests, our method outperforms state-of-the-art procedures if distribution assumptions are roughly fulfilled. We provide our implementation in the R package sambia. PMID:29312464
Evolving optimised decision rules for intrusion detection using particle swarm paradigm
NASA Astrophysics Data System (ADS)
Sivatha Sindhu, Siva S.; Geetha, S.; Kannan, A.
2012-12-01
The aim of this article is to construct a practical intrusion detection system (IDS) that properly analyses the statistics of network traffic pattern and classify them as normal or anomalous class. The objective of this article is to prove that the choice of effective network traffic features and a proficient machine-learning paradigm enhances the detection accuracy of IDS. In this article, a rule-based approach with a family of six decision tree classifiers, namely Decision Stump, C4.5, Naive Baye's Tree, Random Forest, Random Tree and Representative Tree model to perform the detection of anomalous network pattern is introduced. In particular, the proposed swarm optimisation-based approach selects instances that compose training set and optimised decision tree operate over this trained set producing classification rules with improved coverage, classification capability and generalisation ability. Experiment with the Knowledge Discovery and Data mining (KDD) data set which have information on traffic pattern, during normal and intrusive behaviour shows that the proposed algorithm produces optimised decision rules and outperforms other machine-learning algorithm.
Full-custom design of split-set data weighted averaging with output register for jitter suppression
NASA Astrophysics Data System (ADS)
Jubay, M. C.; Gerasta, O. J.
2015-06-01
A full-custom design of an element selection algorithm, named as Split-set Data Weighted Averaging (SDWA) is implemented in 90nm CMOS Technology Synopsys Library. SDWA is applied in seven unit elements (3-bit) using a thermometer-coded input. Split-set DWA is an improved DWA algorithm which caters the requirement for randomization along with long-term equal element usage. Randomization and equal element-usage improve the spectral response of the unit elements due to higher Spurious-free dynamic range (SFDR) and without significantly degrading signal-to-noise ratio (SNR). Since a full-custom, the design is brought to transistor-level and the chip custom layout is also provided, having a total area of 0.3mm2, a power consumption of 0.566 mW, and simulated at 50MHz clock frequency. On this implementation, SDWA is successfully derived and improved by introducing a register at the output that suppresses the jitter introduced at the final stage due to switching loops and successive delays.
Eliminating Survivor Bias in Two-stage Instrumental Variable Estimators.
Vansteelandt, Stijn; Walter, Stefan; Tchetgen Tchetgen, Eric
2018-07-01
Mendelian randomization studies commonly focus on elderly populations. This makes the instrumental variables analysis of such studies sensitive to survivor bias, a type of selection bias. A particular concern is that the instrumental variable conditions, even when valid for the source population, may be violated for the selective population of individuals who survive the onset of the study. This is potentially very damaging because Mendelian randomization studies are known to be sensitive to bias due to even minor violations of the instrumental variable conditions. Interestingly, the instrumental variable conditions continue to hold within certain risk sets of individuals who are still alive at a given age when the instrument and unmeasured confounders exert additive effects on the exposure, and moreover, the exposure and unmeasured confounders exert additive effects on the hazard of death. In this article, we will exploit this property to derive a two-stage instrumental variable estimator for the effect of exposure on mortality, which is insulated against the above described selection bias under these additivity assumptions.
Seismic random noise attenuation method based on empirical mode decomposition of Hausdorff dimension
NASA Astrophysics Data System (ADS)
Yan, Z.; Luan, X.
2017-12-01
Introduction Empirical mode decomposition (EMD) is a noise suppression algorithm by using wave field separation, which is based on the scale differences between effective signal and noise. However, since the complexity of the real seismic wave field results in serious aliasing modes, it is not ideal and effective to denoise with this method alone. Based on the multi-scale decomposition characteristics of the signal EMD algorithm, combining with Hausdorff dimension constraints, we propose a new method for seismic random noise attenuation. First of all, We apply EMD algorithm adaptive decomposition of seismic data and obtain a series of intrinsic mode function (IMF)with different scales. Based on the difference of Hausdorff dimension between effectively signals and random noise, we identify IMF component mixed with random noise. Then we use threshold correlation filtering process to separate the valid signal and random noise effectively. Compared with traditional EMD method, the results show that the new method of seismic random noise attenuation has a better suppression effect. The implementation process The EMD algorithm is used to decompose seismic signals into IMF sets and analyze its spectrum. Since most of the random noise is high frequency noise, the IMF sets can be divided into three categories: the first category is the effective wave composition of the larger scale; the second category is the noise part of the smaller scale; the third category is the IMF component containing random noise. Then, the third kind of IMF component is processed by the Hausdorff dimension algorithm, and the appropriate time window size, initial step and increment amount are selected to calculate the Hausdorff instantaneous dimension of each component. The dimension of the random noise is between 1.0 and 1.05, while the dimension of the effective wave is between 1.05 and 2.0. On the basis of the previous steps, according to the dimension difference between the random noise and effective signal, we extracted the sample points, whose fractal dimension value is less than or equal to 1.05 for the each IMF components, to separate the residual noise. Using the IMF components after dimension filtering processing and the effective wave IMF components after the first selection for reconstruction, we can obtained the results of de-noising.
Heo, Moonseong; Meissner, Paul; Litwin, Alain H; Arnsten, Julia H; McKee, M Diane; Karasz, Alison; McKinley, Paula; Rehm, Colin D; Chambers, Earle C; Yeh, Ming-Chin; Wylie-Rosett, Judith
2017-01-01
Comparative effectiveness research trials in real-world settings may require participants to choose between preferred intervention options. A randomized clinical trial with parallel experimental and control arms is straightforward and regarded as a gold standard design, but by design it forces and anticipates the participants to comply with a randomly assigned intervention regardless of their preference. Therefore, the randomized clinical trial may impose impractical limitations when planning comparative effectiveness research trials. To accommodate participants' preference if they are expressed, and to maintain randomization, we propose an alternative design that allows participants' preference after randomization, which we call a "preference option randomized design (PORD)". In contrast to other preference designs, which ask whether or not participants consent to the assigned intervention after randomization, the crucial feature of preference option randomized design is its unique informed consent process before randomization. Specifically, the preference option randomized design consent process informs participants that they can opt out and switch to the other intervention only if after randomization they actively express the desire to do so. Participants who do not independently express explicit alternate preference or assent to the randomly assigned intervention are considered to not have an alternate preference. In sum, preference option randomized design intends to maximize retention, minimize possibility of forced assignment for any participants, and to maintain randomization by allowing participants with no or equal preference to represent random assignments. This design scheme enables to define five effects that are interconnected with each other through common design parameters-comparative, preference, selection, intent-to-treat, and overall/as-treated-to collectively guide decision making between interventions. Statistical power functions for testing all these effects are derived, and simulations verified the validity of the power functions under normal and binomial distributions.
NASA Astrophysics Data System (ADS)
He, Song-Bing; Ben Hu; Kuang, Zheng-Kun; Wang, Dong; Kong, De-Xin
2016-11-01
Adenosine receptors (ARs) are potential therapeutic targets for Parkinson’s disease, diabetes, pain, stroke and cancers. Prediction of subtype selectivity is therefore important from both therapeutic and mechanistic perspectives. In this paper, we introduced a shape similarity profile as molecular descriptor, namely three-dimensional biologically relevant spectrum (BRS-3D), for AR selectivity prediction. Pairwise regression and discrimination models were built with the support vector machine methods. The average determination coefficient (r2) of the regression models was 0.664 (for test sets). The 2B-3 (A2B vs A3) model performed best with q2 = 0.769 for training sets (10-fold cross-validation), and r2 = 0.766, RMSE = 0.828 for test sets. The models’ robustness and stability were validated with 100 times resampling and 500 times Y-randomization. We compared the performance of BRS-3D with 3D descriptors calculated by MOE. BRS-3D performed as good as, or better than, MOE 3D descriptors. The performances of the discrimination models were also encouraging, with average accuracy (ACC) 0.912 and MCC 0.792 (test set). The 2A-3 (A2A vs A3) selectivity discrimination model (ACC = 0.882 and MCC = 0.715 for test set) outperformed an earlier reported one (ACC = 0.784). These results demonstrated that, through multiple conformation encoding, BRS-3D can be used as an effective molecular descriptor for AR subtype selectivity prediction.
High capacity low delay packet broadcasting multiaccess schemes for satellite repeater systems
NASA Astrophysics Data System (ADS)
Bose, S. K.
1980-12-01
Demand assigned packet radio schemes using satellite repeaters can achieve high capacities but often exhibit relatively large delays under low traffic conditions when compared to random access. Several schemes which improve delay performance at low traffic but which have high capacity are presented and analyzed. These schemes allow random acess attempts by users, who are waiting for channel assignments. The performance of these are considered in the context of a multiple point communication system carrying fixed length messages between geographically distributed (ground) user terminals which are linked via a satellite repeater. Channel assignments are done following a BCC queueing discipline by a (ground) central controller on the basis of requests correctly received over a collision type access channel. In TBACR Scheme A, some of the forward message channels are set aside for random access transmissions; the rest are used in a demand assigned mode. Schemes B and C operate all their forward message channels in a demand assignment mode but, by means of appropriate algorithms for trailer channel selection, allow random access attempts on unassigned channels. The latter scheme also introduces framing and slotting of the time axis to implement a more efficient algorithm for trailer channel selection than the former.
Kraschnewski, Jennifer L; Keyserling, Thomas C; Bangdiwala, Shrikant I; Gizlice, Ziya; Garcia, Beverly A; Johnston, Larry F; Gustafson, Alison; Petrovic, Lindsay; Glasgow, Russell E; Samuel-Hodge, Carmen D
2010-01-01
Studies of type 2 translation, the adaption of evidence-based interventions to real-world settings, should include representative study sites and staff to improve external validity. Sites for such studies are, however, often selected by convenience sampling, which limits generalizability. We used an optimized probability sampling protocol to select an unbiased, representative sample of study sites to prepare for a randomized trial of a weight loss intervention. We invited North Carolina health departments within 200 miles of the research center to participate (N = 81). Of the 43 health departments that were eligible, 30 were interested in participating. To select a representative and feasible sample of 6 health departments that met inclusion criteria, we generated all combinations of 6 from the 30 health departments that were eligible and interested. From the subset of combinations that met inclusion criteria, we selected 1 at random. Of 593,775 possible combinations of 6 counties, 15,177 (3%) met inclusion criteria. Sites in the selected subset were similar to all eligible sites in terms of health department characteristics and county demographics. Optimized probability sampling improved generalizability by ensuring an unbiased and representative sample of study sites.
Accounting for selection bias in association studies with complex survey data.
Wirth, Kathleen E; Tchetgen Tchetgen, Eric J
2014-05-01
Obtaining representative information from hidden and hard-to-reach populations is fundamental to describe the epidemiology of many sexually transmitted diseases, including HIV. Unfortunately, simple random sampling is impractical in these settings, as no registry of names exists from which to sample the population at random. However, complex sampling designs can be used, as members of these populations tend to congregate at known locations, which can be enumerated and sampled at random. For example, female sex workers may be found at brothels and street corners, whereas injection drug users often come together at shooting galleries. Despite the logistical appeal, complex sampling schemes lead to unequal probabilities of selection, and failure to account for this differential selection can result in biased estimates of population averages and relative risks. However, standard techniques to account for selection can lead to substantial losses in efficiency. Consequently, researchers implement a variety of strategies in an effort to balance validity and efficiency. Some researchers fully or partially account for the survey design, whereas others do nothing and treat the sample as a realization of the population of interest. We use directed acyclic graphs to show how certain survey sampling designs, combined with subject-matter considerations unique to individual exposure-outcome associations, can induce selection bias. Finally, we present a novel yet simple maximum likelihood approach for analyzing complex survey data; this approach optimizes statistical efficiency at no cost to validity. We use simulated data to illustrate this method and compare it with other analytic techniques.
Randomized Prediction Games for Adversarial Machine Learning.
Rota Bulo, Samuel; Biggio, Battista; Pillai, Ignazio; Pelillo, Marcello; Roli, Fabio
In spam and malware detection, attackers exploit randomization to obfuscate malicious data and increase their chances of evading detection at test time, e.g., malware code is typically obfuscated using random strings or byte sequences to hide known exploits. Interestingly, randomization has also been proposed to improve security of learning algorithms against evasion attacks, as it results in hiding information about the classifier to the attacker. Recent work has proposed game-theoretical formulations to learn secure classifiers, by simulating different evasion attacks and modifying the classification function accordingly. However, both the classification function and the simulated data manipulations have been modeled in a deterministic manner, without accounting for any form of randomization. In this paper, we overcome this limitation by proposing a randomized prediction game, namely, a noncooperative game-theoretic formulation in which the classifier and the attacker make randomized strategy selections according to some probability distribution defined over the respective strategy set. We show that our approach allows one to improve the tradeoff between attack detection and false alarms with respect to the state-of-the-art secure classifiers, even against attacks that are different from those hypothesized during design, on application examples including handwritten digit recognition, spam, and malware detection.In spam and malware detection, attackers exploit randomization to obfuscate malicious data and increase their chances of evading detection at test time, e.g., malware code is typically obfuscated using random strings or byte sequences to hide known exploits. Interestingly, randomization has also been proposed to improve security of learning algorithms against evasion attacks, as it results in hiding information about the classifier to the attacker. Recent work has proposed game-theoretical formulations to learn secure classifiers, by simulating different evasion attacks and modifying the classification function accordingly. However, both the classification function and the simulated data manipulations have been modeled in a deterministic manner, without accounting for any form of randomization. In this paper, we overcome this limitation by proposing a randomized prediction game, namely, a noncooperative game-theoretic formulation in which the classifier and the attacker make randomized strategy selections according to some probability distribution defined over the respective strategy set. We show that our approach allows one to improve the tradeoff between attack detection and false alarms with respect to the state-of-the-art secure classifiers, even against attacks that are different from those hypothesized during design, on application examples including handwritten digit recognition, spam, and malware detection.
Composing Music with Complex Networks
NASA Astrophysics Data System (ADS)
Liu, Xiaofan; Tse, Chi K.; Small, Michael
In this paper we study the network structure in music and attempt to compose music artificially. Networks are constructed with nodes and edges corresponding to musical notes and their co-occurrences. We analyze sample compositions from Bach, Mozart, Chopin, as well as other types of music including Chinese pop music. We observe remarkably similar properties in all networks constructed from the selected compositions. Power-law exponents of degree distributions, mean degrees, clustering coefficients, mean geodesic distances, etc. are reported. With the network constructed, music can be created by using a biased random walk algorithm, which begins with a randomly chosen note and selects the subsequent notes according to a simple set of rules that compares the weights of the edges, weights of the nodes, and/or the degrees of nodes. The newly created music from complex networks will be played in the presentation.
Blocking for Sequential Political Experiments
Moore, Sally A.
2013-01-01
In typical political experiments, researchers randomize a set of households, precincts, or individuals to treatments all at once, and characteristics of all units are known at the time of randomization. However, in many other experiments, subjects “trickle in” to be randomized to treatment conditions, usually via complete randomization. To take advantage of the rich background data that researchers often have (but underutilize) in these experiments, we develop methods that use continuous covariates to assign treatments sequentially. We build on biased coin and minimization procedures for discrete covariates and demonstrate that our methods outperform complete randomization, producing better covariate balance in simulated data. We then describe how we selected and deployed a sequential blocking method in a clinical trial and demonstrate the advantages of our having done so. Further, we show how that method would have performed in two larger sequential political trials. Finally, we compare causal effect estimates from differences in means, augmented inverse propensity weighted estimators, and randomization test inversion. PMID:24143061
Burkey, Matthew D; Hosein, Megan; Morton, Isabella; Purgato, Marianna; Adi, Ahmad; Kurzrok, Mark; Kohrt, Brandon A; Tol, Wietse A
2018-04-06
Most of the evidence for psychosocial interventions for disruptive behaviour problems comes from Western, high-income countries. The transferability of this evidence to culturally diverse, low-resource settings with few mental health specialists is unknown. We conducted a systematic review with random-effects meta-analysis of randomized controlled trials examining the effects of psychosocial interventions on reducing behaviour problems among children (under 18) living in low- and middle-income countries (LMIC). Twenty-six randomized controlled trials (representing 28 psychosocial interventions), evaluating 4,441 subjects, met selection criteria. Fifteen (54%) prevention interventions targeted general or at-risk populations, whereas 13 (46%) treatment interventions targeted children selected for elevated behaviour problems. Most interventions were delivered in group settings (96%) and half (50%) were administered by non-specialist providers. The overall effect (standardized mean difference, SMD) of prevention studies was -0.25 (95% confidence interval (CI): -0.41 to -0.09; I 2 : 78%) and of treatment studies was -0.56 (95% CI: -0.51 to -0.24; I 2 : 74%). Subgroup analyses demonstrated effectiveness for child-focused (SMD: -0.35; 95% CI: -0.57 to -0.14) and behavioural parenting interventions (SMD: -0.43; 95% CI: -0.66 to -0.20), and that interventions were effective across age ranges. Our meta-analysis supports the use of psychosocial interventions as a feasible and effective way to reduce disruptive behaviour problems among children in LMIC. Our study provides strong evidence for child-focused and behavioural parenting interventions, interventions across age ranges and interventions delivered in groups. Additional research is needed on training and supervision of non-specialists and on implementation of effective interventions in LMIC settings. © 2018 Association for Child and Adolescent Mental Health.
Wyman, Peter A; Henry, David; Knoblauch, Shannon; Brown, C Hendricks
2015-10-01
The dynamic wait-listed design (DWLD) and regression point displacement design (RPDD) address several challenges in evaluating group-based interventions when there is a limited number of groups. Both DWLD and RPDD utilize efficiencies that increase statistical power and can enhance balance between community needs and research priorities. The DWLD blocks on more time units than traditional wait-listed designs, thereby increasing the proportion of a study period during which intervention and control conditions can be compared, and can also improve logistics of implementing intervention across multiple sites and strengthen fidelity. We discuss DWLDs in the larger context of roll-out randomized designs and compare it with its cousin the Stepped Wedge design. The RPDD uses archival data on the population of settings from which intervention unit(s) are selected to create expected posttest scores for units receiving intervention, to which actual posttest scores are compared. High pretest-posttest correlations give the RPDD statistical power for assessing intervention impact even when one or a few settings receive intervention. RPDD works best when archival data are available over a number of years prior to and following intervention. If intervention units were not randomly selected, propensity scores can be used to control for non-random selection factors. Examples are provided of the DWLD and RPDD used to evaluate, respectively, suicide prevention training (QPR) in 32 schools and a violence prevention program (CeaseFire) in two Chicago police districts over a 10-year period. How DWLD and RPDD address common threats to internal and external validity, as well as their limitations, are discussed.
featsel: A framework for benchmarking of feature selection algorithms and cost functions
NASA Astrophysics Data System (ADS)
Reis, Marcelo S.; Estrela, Gustavo; Ferreira, Carlos Eduardo; Barrera, Junior
In this paper, we introduce featsel, a framework for benchmarking of feature selection algorithms and cost functions. This framework allows the user to deal with the search space as a Boolean lattice and has its core coded in C++ for computational efficiency purposes. Moreover, featsel includes Perl scripts to add new algorithms and/or cost functions, generate random instances, plot graphs and organize results into tables. Besides, this framework already comes with dozens of algorithms and cost functions for benchmarking experiments. We also provide illustrative examples, in which featsel outperforms the popular Weka workbench in feature selection procedures on data sets from the UCI Machine Learning Repository.
Shultz, Mary
2006-01-01
Introduction: Given the common use of acronyms and initialisms in the health sciences, searchers may be entering these abbreviated terms rather than full phrases when searching online systems. The purpose of this study is to evaluate how various MEDLINE Medical Subject Headings (MeSH) interfaces map acronyms and initialisms to the MeSH vocabulary. Methods: The interfaces used in this study were: the PubMed MeSH database, the PubMed Automatic Term Mapping feature, the NLM Gateway Term Finder, and Ovid MEDLINE. Acronyms and initialisms were randomly selected from 2 print sources. The test data set included 415 randomly selected acronyms and initialisms whose related meanings were found to be MeSH terms. Each acronym and initialism was entered into each MEDLINE MeSH interface to determine if it mapped to the corresponding MeSH term. Separately, 46 commonly used acronyms and initialisms were tested. Results: While performance differed widely, the success rates were low across all interfaces for the randomly selected terms. The common acronyms and initialisms tested at higher success rates across the interfaces, but the differences between the interfaces remained. Conclusion: Online interfaces do not always map medical acronyms and initialisms to their corresponding MeSH phrases. This may lead to inaccurate results and missed information if acronyms and initialisms are used in search strategies. PMID:17082832
Decision tree modeling using R.
Zhang, Zhongheng
2016-08-01
In machine learning field, decision tree learner is powerful and easy to interpret. It employs recursive binary partitioning algorithm that splits the sample in partitioning variable with the strongest association with the response variable. The process continues until some stopping criteria are met. In the example I focus on conditional inference tree, which incorporates tree-structured regression models into conditional inference procedures. While growing a single tree is subject to small changes in the training data, random forests procedure is introduced to address this problem. The sources of diversity for random forests come from the random sampling and restricted set of input variables to be selected. Finally, I introduce R functions to perform model based recursive partitioning. This method incorporates recursive partitioning into conventional parametric model building.
Evolutionary constraints and the neutral theory. [mutation-caused nucleotide substitutions in DNA
NASA Technical Reports Server (NTRS)
Jukes, T. H.; Kimura, M.
1984-01-01
The neutral theory of molecular evolution postulates that nucleotide substitutions inherently take place in DNA as a result of point mutations followed by random genetic drift. In the absence of selective constraints, the substitution rate reaches the maximum value set by the mutation rate. The rate in globin pseudogenes is about 5 x 10 to the -9th substitutions per site per year in mammals. Rates slower than this indicate the presence of constraints imposed by negative (natural) selection, which rejects and discards deleterious mutations.
Mahar, Benazeer; Kumar, Ramesh; Rizvi, Narjis; Bahalkani, Habib Akhtar; Haq, Mahboobul; Soomro, Jamila
2012-01-01
Information, education and communication (IEC) by health care provider to pregnant woman during the antenatal visit are very crucial for healthier outcome of pregnancy. This study analysed the quality and quantity of antenatal visit at a private and a public hospital of Bahawalpur, Pakistan. An exit interview was conducted from 216 pregnant women by using validated, reliable and pre-tested adapted questionnaire. First sample was selected by simple random sampling, for rest of the sample selection systematic random sampling was adapted by selecting every 7th women for interview. Ethical considerations were taken. Average communication time among pregnant woman and her healthcare provider was 3 minute in public and 8 minutes in private hospital. IEC mainly focused on diet and nutrition in private (86%) and (53%) public, advice for family planning after delivery was discussed with 13% versus 7% in public and private setting. None of the respondents in both facilities got advice or counselling on breastfeeding and neonatal care. Birth preparedness components were discussed, woman in public and private hospital respectively. In both settings antenatal clients were not received information and education communication according to World Health Organization guidelines. Quality and quantity of IEC during antenatal care was found very poor in both public and private sector hospitals of urban Pakistan.
ERIC Educational Resources Information Center
Nutting, Paul A.; And Others
Six Indian Health Service (IHS) units, chosen in a non-random manner, were evaluated via a quality assessment methodology currently under development by the IHS Office of Research and Development. A set of seven health problems (tracers) was selected to represent major health problems, and clinical algorithms (process maps) were constructed for…
ERIC Educational Resources Information Center
Hopkins, Layne Victor
Certain transitivity relationships formulated from reversible operations were examined. Thirty randomly selected fifth grade students received instructional episodes, developed for each identified behavioral objective and its inverse (on unspecified content), presented via the IBM 1500 Instructional Computer System. It was found that students who…
Motivation and Quality of Work Life among Secondary School EFL Teachers
ERIC Educational Resources Information Center
Baleghizadeh, Sasan; Gordani, Yahya
2012-01-01
This study set out to investigate the relationship between quality of work life and teacher motivation among 160 secondary school English as a foreign language (EFL) teachers in Tehran, Iran. In addition, 30 of the participants were randomly selected to take part in follow-up interviews which asked why they felt the way they reported. The results…
Changes in Badminton Game Play across Developmental Skill Levels among High School Students
ERIC Educational Resources Information Center
Wang, Jianyu; Liu, Wenhao
2012-01-01
The study examined changes in badminton game play across developmental skill levels among high school students in a physical education setting. Videotapes of badminton game play of 80 students (40 boys and 40 girls) in the four developmental skill levels (each skill level had 10 boys and 10 girls) were randomly selected from a database associated…
Structural Relations among Spirituality, Religiosity, and Thriving in Adolescence
ERIC Educational Resources Information Center
Dowling, Elizabeth M.; Gestsdottir, Steinunn; Anderson, Pamela M.; Von Eye, Alexander; Almerigi, Jason; Lerner, Richard M.
2004-01-01
Using the randomly selected subsample of 1,000 youth (472 boys, M age = 12.2 years, SD = 1.5; 528 girls, M age = 12.1 years, SD = 1.4) drawn by Dowling, Getsdottir, Anderson, von Eye, and Lerner (in press) from a Search Institute (1984) archival data set, Young Adolescents and Their Parents (YAP), this research employed structural equation…
The Accuracy of Estimated Total Test Statistics. Final Report.
ERIC Educational Resources Information Center
Kleinke, David J.
In a post-mortem study of item sampling, 1,050 examinees were divided into ten groups 50 times. Each time, their papers were scored on four different sets of item samples from a 150-item test of academic aptitude. These samples were selected using (a) unstratified random sampling and stratification on (b) content, (c) difficulty, and (d) both.…
ERIC Educational Resources Information Center
Bird, Yelena; Moraros, John; Olsen, Larry K.; Coronado, Gloria D.; Thompson, Beti
2006-01-01
Objective: To assess the smoking behaviors, beliefs about the risks of smoking, and exposure to ETS among adolescents in Juarez, Mexico. Methods: A cross-sectional study was conducted with sixth-grade students (N=506), aged 11-13 years old, attending 6 randomly selected schools. Schools were classified by school setting and SES. Results:…
Health Promotion Intervention for Hygienic Disposal of Children's Faeces in a Rural Area of Nigeria
ERIC Educational Resources Information Center
Jinadu, M. K.; Adegbenro, C. A.; Esmai, A. O.; Ojo, A. A.; Oyeleye, B. A.
2007-01-01
Objective: Community-based health promotion intervention for improving unhygienic disposal of children's faeces was conducted in a rural area of Nigeria. Setting: The study was conducted in Ife South Local Government area of Osun State, Nigeria. Design: The study was conducted in 10 randomly selected rural villages: five control and five active.…
Baseline Survey of Sun Protection Policies and Practices in Primary School Settings in New Zealand
ERIC Educational Resources Information Center
Reeder, A. I.; Jopson, J. A.; Gray, A.
2009-01-01
The SunSmart Schools Accreditation Programme (SSAP) was launched as a national programme in October 2005 to help reduce the risk of excessive child exposure to ultraviolet radiation. As part of the need for evaluation, this paper reports the findings of a national survey of a randomly selected sample of approximately 12% of New Zealand primary…
ERIC Educational Resources Information Center
Tang, Eunice Lai-Yiu; Lee, John Chi-Kin; Chun, Cecilia Ka-Wai
2012-01-01
This study sets out to investigate how pre-service ESL teachers shape their beliefs in the process of experimenting with new teaching methods introduced in the teacher education programme. A 4-year longitudinal study was conducted with four randomly selected ESL pre-service teachers. Their theoretical orientations of ESL instruction were tracked…
Looking, Smiling, Laughing, and Moving in Restaurants: Sex and Age Differences.
ERIC Educational Resources Information Center
Adams, Robert M.; Kirkevold, Barbara
Body movements and facial expressions of males and females in a restaurant setting were examined, with the goal of providing differences in frequency as a function of age and sex. The subjects (N-197 males and N=131 females) were seated in three Seattle fast food restaurants and were selected on a semi-random basis and then observed for three…
PSO algorithm enhanced with Lozi Chaotic Map - Tuning experiment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pluhacek, Michal; Senkerik, Roman; Zelinka, Ivan
2015-03-10
In this paper it is investigated the effect of tuning of control parameters of the Lozi Chaotic Map employed as a chaotic pseudo-random number generator for the particle swarm optimization algorithm. Three different benchmark functions are selected from the IEEE CEC 2013 competition benchmark set. The Lozi map is extensively tuned and the performance of PSO is evaluated.
ERIC Educational Resources Information Center
Moeller, Michelle; Day, Scott L.; Rivera, Beverly D.
2004-01-01
This study explores a group of inmates' perceptions of their correctional education and environment based on Fetterman's 1994 idea of empowerment evaluation. A group of 16 male inmates were randomly selected from GED and ABE courses in a high minimum correctional facility in Illinois. A self-administered questionnaire included 5 topics:…
2013-01-01
Background Meat quality involves many traits, such as marbling, tenderness, juiciness, and backfat thickness, all of which require attention from livestock producers. Backfat thickness improvement by means of traditional selection techniques in Canchim beef cattle has been challenging due to its low heritability, and it is measured late in an animal’s life. Therefore, the implementation of new methodologies for identification of single nucleotide polymorphisms (SNPs) linked to backfat thickness are an important strategy for genetic improvement of carcass and meat quality. Results The set of SNPs identified by the random forest approach explained as much as 50% of the deregressed estimated breeding value (dEBV) variance associated with backfat thickness, and a small set of 5 SNPs were able to explain 34% of the dEBV for backfat thickness. Several quantitative trait loci (QTL) for fat-related traits were found in the surrounding areas of the SNPs, as well as many genes with roles in lipid metabolism. Conclusions These results provided a better understanding of the backfat deposition and regulation pathways, and can be considered a starting point for future implementation of a genomic selection program for backfat thickness in Canchim beef cattle. PMID:23738659
Chen, Jiaqing; Zhang, Pei; Lv, Mengying; Guo, Huimin; Huang, Yin; Zhang, Zunjian; Xu, Fengguo
2017-05-16
Data reduction techniques in gas chromatography-mass spectrometry-based untargeted metabolomics has made the following workflow of data analysis more lucid. However, the normalization process still perplexes researchers, and its effects are always ignored. In order to reveal the influences of normalization method, five representative normalization methods (mass spectrometry total useful signal, median, probabilistic quotient normalization, remove unwanted variation-random, and systematic ratio normalization) were compared in three real data sets with different types. First, data reduction techniques were used to refine the original data. Then, quality control samples and relative log abundance plots were utilized to evaluate the unwanted variations and the efficiencies of normalization process. Furthermore, the potential biomarkers which were screened out by the Mann-Whitney U test, receiver operating characteristic curve analysis, random forest, and feature selection algorithm Boruta in different normalized data sets were compared. The results indicated the determination of the normalization method was difficult because the commonly accepted rules were easy to fulfill but different normalization methods had unforeseen influences on both the kind and number of potential biomarkers. Lastly, an integrated strategy for normalization method selection was recommended.
Kaspi, Omer; Yosipof, Abraham; Senderowitz, Hanoch
2017-06-06
An important aspect of chemoinformatics and material-informatics is the usage of machine learning algorithms to build Quantitative Structure Activity Relationship (QSAR) models. The RANdom SAmple Consensus (RANSAC) algorithm is a predictive modeling tool widely used in the image processing field for cleaning datasets from noise. RANSAC could be used as a "one stop shop" algorithm for developing and validating QSAR models, performing outlier removal, descriptors selection, model development and predictions for test set samples using applicability domain. For "future" predictions (i.e., for samples not included in the original test set) RANSAC provides a statistical estimate for the probability of obtaining reliable predictions, i.e., predictions within a pre-defined number of standard deviations from the true values. In this work we describe the first application of RNASAC in material informatics, focusing on the analysis of solar cells. We demonstrate that for three datasets representing different metal oxide (MO) based solar cell libraries RANSAC-derived models select descriptors previously shown to correlate with key photovoltaic properties and lead to good predictive statistics for these properties. These models were subsequently used to predict the properties of virtual solar cells libraries highlighting interesting dependencies of PV properties on MO compositions.
Primary care quality: community health center and health maintenance organization.
Shi, Leiyu; Starfield, Barbara; Xu, Jiahong; Politzer, Robert; Regan, Jerrilyn
2003-08-01
This study compares the primary health care quality of community health centers (CHCs) and health maintenance organizations (HMOs) in South Carolina to elucidate the quality of CHC performance relative to mainstream settings such as the HMO. Mail surveys were used to obtain data from 350 randomly selected HMO users. Surveys with follow-up interviews were conducted to obtain data from 540 randomly selected CHC users. A validated adult primary care assessment tool was used in both surveys. Multivariate analyses were performed to assess the association of health care setting (HMO versus CHC) with primary care quality while controlling for sociodemographic and health care characteristics. After controlling for sociodemographic and health care use measures, CHC patients demonstrated higher scores in several primary care domains (ongoing care, coordination of service, comprehensiveness, and community orientation) as well as total primary care performance. Users of CHC are more likely than HMO users to rate their primary health care provider as good, except in the area of ease of first contact. The positive rating of the CHC is particularly impressive after taking into account that many CHC users have characteristics associated with poorer ratings of care.
Zhang, Haixia; Zhao, Junkang; Gu, Caijiao; Cui, Yan; Rong, Huiying; Meng, Fanlong; Wang, Tong
2015-05-01
The study of the medical expenditure and its influencing factors among the students enrolling in Urban Resident Basic Medical Insurance (URBMI) in Taiyuan indicated that non response bias and selection bias coexist in dependent variable of the survey data. Unlike previous studies only focused on one missing mechanism, a two-stage method to deal with two missing mechanisms simultaneously was suggested in this study, combining multiple imputation with sample selection model. A total of 1 190 questionnaires were returned by the students (or their parents) selected in child care settings, schools and universities in Taiyuan by stratified cluster random sampling in 2012. In the returned questionnaires, 2.52% existed not missing at random (NMAR) of dependent variable and 7.14% existed missing at random (MAR) of dependent variable. First, multiple imputation was conducted for MAR by using completed data, then sample selection model was used to correct NMAR in multiple imputation, and a multi influencing factor analysis model was established. Based on 1 000 times resampling, the best scheme of filling the random missing values is the predictive mean matching (PMM) method under the missing proportion. With this optimal scheme, a two stage survey was conducted. Finally, it was found that the influencing factors on annual medical expenditure among the students enrolling in URBMI in Taiyuan included population group, annual household gross income, affordability of medical insurance expenditure, chronic disease, seeking medical care in hospital, seeking medical care in community health center or private clinic, hospitalization, hospitalization canceled due to certain reason, self medication and acceptable proportion of self-paid medical expenditure. The two-stage method combining multiple imputation with sample selection model can deal with non response bias and selection bias effectively in dependent variable of the survey data.
How does epistasis influence the response to selection?
Barton, N H
2017-01-01
Much of quantitative genetics is based on the ‘infinitesimal model', under which selection has a negligible effect on the genetic variance. This is typically justified by assuming a very large number of loci with additive effects. However, it applies even when genes interact, provided that the number of loci is large enough that selection on each of them is weak relative to random drift. In the long term, directional selection will change allele frequencies, but even then, the effects of epistasis on the ultimate change in trait mean due to selection may be modest. Stabilising selection can maintain many traits close to their optima, even when the underlying alleles are weakly selected. However, the number of traits that can be optimised is apparently limited to ~4Ne by the ‘drift load', and this is hard to reconcile with the apparent complexity of many organisms. Just as for the mutation load, this limit can be evaded by a particular form of negative epistasis. A more robust limit is set by the variance in reproductive success. This suggests that selection accumulates information most efficiently in the infinitesimal regime, when selection on individual alleles is weak, and comparable with random drift. A review of evidence on selection strength suggests that although most variance in fitness may be because of alleles with large Nes, substantial amounts of adaptation may be because of alleles in the infinitesimal regime, in which epistasis has modest effects. PMID:27901509
How does epistasis influence the response to selection?
Barton, N H
2017-01-01
Much of quantitative genetics is based on the 'infinitesimal model', under which selection has a negligible effect on the genetic variance. This is typically justified by assuming a very large number of loci with additive effects. However, it applies even when genes interact, provided that the number of loci is large enough that selection on each of them is weak relative to random drift. In the long term, directional selection will change allele frequencies, but even then, the effects of epistasis on the ultimate change in trait mean due to selection may be modest. Stabilising selection can maintain many traits close to their optima, even when the underlying alleles are weakly selected. However, the number of traits that can be optimised is apparently limited to ~4N e by the 'drift load', and this is hard to reconcile with the apparent complexity of many organisms. Just as for the mutation load, this limit can be evaded by a particular form of negative epistasis. A more robust limit is set by the variance in reproductive success. This suggests that selection accumulates information most efficiently in the infinitesimal regime, when selection on individual alleles is weak, and comparable with random drift. A review of evidence on selection strength suggests that although most variance in fitness may be because of alleles with large N e s, substantial amounts of adaptation may be because of alleles in the infinitesimal regime, in which epistasis has modest effects.
Extension of mixture-of-experts networks for binary classification of hierarchical data.
Ng, Shu-Kay; McLachlan, Geoffrey J
2007-09-01
For many applied problems in the context of medically relevant artificial intelligence, the data collected exhibit a hierarchical or clustered structure. Ignoring the interdependence between hierarchical data can result in misleading classification. In this paper, we extend the mechanism for mixture-of-experts (ME) networks for binary classification of hierarchical data. Another extension is to quantify cluster-specific information on data hierarchy by random effects via the generalized linear mixed-effects model (GLMM). The extension of ME networks is implemented by allowing for correlation in the hierarchical data in both the gating and expert networks via the GLMM. The proposed model is illustrated using a real thyroid disease data set. In our study, we consider 7652 thyroid diagnosis records from 1984 to early 1987 with complete information on 20 attribute values. We obtain 10 independent random splits of the data into a training set and a test set in the proportions 85% and 15%. The test sets are used to assess the generalization performance of the proposed model, based on the percentage of misclassifications. For comparison, the results obtained from the ME network with independence assumption are also included. With the thyroid disease data, the misclassification rate on test sets for the extended ME network is 8.9%, compared to 13.9% for the ME network. In addition, based on model selection methods described in Section 2, a network with two experts is selected. These two expert networks can be considered as modeling two groups of patients with high and low incidence rates. Significant variation among the predicted cluster-specific random effects is detected in the patient group with low incidence rate. It is shown that the extended ME network outperforms the ME network for binary classification of hierarchical data. With the thyroid disease data, useful information on the relative log odds of patients with diagnosed conditions at different periods can be evaluated. This information can be taken into consideration for the assessment of treatment planning of the disease. The proposed extended ME network thus facilitates a more general approach to incorporate data hierarchy mechanism in network modeling.
Involving patients in setting priorities for healthcare improvement: a cluster randomized trial
2014-01-01
Background Patients are increasingly seen as active partners in healthcare. While patient involvement in individual clinical decisions has been extensively studied, no trial has assessed how patients can effectively be involved in collective healthcare decisions affecting the population. The goal of this study was to test the impact of involving patients in setting healthcare improvement priorities for chronic care at the community level. Methods Design: Cluster randomized controlled trial. Local communities were randomized in intervention (priority setting with patient involvement) and control sites (no patient involvement). Setting: Communities in a canadian region were required to set priorities for improving chronic disease management in primary care, from a list of 37 validated quality indicators. Intervention: Patients were consulted in writing, before participating in face-to-face deliberation with professionals. Control: Professionals established priorities among themselves, without patient involvement. Participants: A total of 172 individuals from six communities participated in the study, including 83 chronic disease patients, and 89 health professionals. Outcomes: The primary outcome was the level of agreement between patients’ and professionals’ priorities. Secondary outcomes included professionals’ intention to use the selected quality indicators, and the costs of patient involvement. Results Priorities established with patients were more aligned with core generic components of the Medical Home and Chronic Care Model, including: access to primary care, self-care support, patient participation in clinical decisions, and partnership with community organizations (p < 0.01). Priorities established by professionals alone placed more emphasis on the technical quality of single disease management. The involvement intervention fostered mutual influence between patients and professionals, which resulted in a 41% increase in agreement on common priorities (95%CI: +12% to +58%, p < 0.01). Professionals’ intention to use the selected quality indicators was similar in intervention and control sites. Patient involvement increased the costs of the prioritization process by 17%, and required 10% more time to reach consensus on common priorities. Conclusions Patient involvement can change priorities driving healthcare improvement at the population level. Future research should test the generalizability of these findings to other contexts, and assess its impact on patient care. Trial registration The Netherlands National Trial Register #NTR2496. PMID:24555508
Personal Bankruptcy After Traumatic Brain or Spinal Cord Injury: The Role of Medical Debt
Relyea-Chew, Annemarie; Hollingworth, William; Chan, Leighton; Comstock, Bryan A.; Overstreet, Karen A.; Jarvik, Jeffrey G.
2012-01-01
Objective To estimate the prevalence of medical debt among traumatic brain injury (TBI) and spinal cord injury (SCI) patients who discharged their debts through bankruptcy. Design A cross-sectional comparison of bankruptcy filings of injured versus randomly selected bankruptcy petitioners. Setting Patients hospitalized with SCI or TBI (1996–2002) and personal bankruptcy petitioners (2001–2004) in western Washington State. Participants Subjects (N=186) who filed for bankruptcy, comprised of 93 patients with previous SCI or TBI and 93 randomly selected bankruptcy petitioners. Interventions Not applicable. Main Outcome Measures Medical and nonmedical debt, assets, income, expenses, and employment recorded in the bankruptcy petition. Results Five percent of randomly selected petitioners and 26% of petitioners with TBI or SCI had substantial medical debt (debt that accounted for more than 20% of all unsecured debts). SCI and TBI petitioners had fewer assets and were more likely to be receiving government income assistance at the time of bankruptcy than controls. SCI and TBI patients with a higher blood alcohol content at injury were more likely to have substantial medical debts (odds ratio=2.70; 95% confidence interval, 1.04–7.00). Conclusions Medical debt plays an important role in some bankruptcies after TBI or SCI. We discuss policy options for reducing financial distress after serious injury. PMID:19254605
Koh, Bongyeun; Hong, Sunggi; Kim, Soon-Sim; Hyun, Jin-Sook; Baek, Milye; Moon, Jundong; Kwon, Hayran; Kim, Gyoungyong; Min, Seonggi; Kang, Gu-Hyun
2016-01-01
The goal of this study was to characterize the difficulty index of the items in the skills test components of the class I and II Korean emergency medical technician licensing examination (KEMTLE), which requires examinees to select items randomly. The results of 1,309 class I KEMTLE examinations and 1,801 class II KEMTLE examinations in 2013 were subjected to analysis. Items from the basic and advanced skills test sections of the KEMTLE were compared to determine whether some were significantly more difficult than others. In the class I KEMTLE, all 4 of the items on the basic skills test showed significant variation in difficulty index (P<0.01), as well as 4 of the 5 items on the advanced skills test (P<0.05). In the class II KEMTLE, 4 of the 5 items on the basic skills test showed significantly different difficulty index (P<0.01), as well as all 3 of the advanced skills test items (P<0.01). In the skills test components of the class I and II KEMTLE, the procedure in which examinees randomly select questions should be revised to require examinees to respond to a set of fixed items in order to improve the reliability of the national licensing examination.
NASA Astrophysics Data System (ADS)
Li, Xiayue; Curtis, Farren S.; Rose, Timothy; Schober, Christoph; Vazquez-Mayagoitia, Alvaro; Reuter, Karsten; Oberhofer, Harald; Marom, Noa
2018-06-01
We present Genarris, a Python package that performs configuration space screening for molecular crystals of rigid molecules by random sampling with physical constraints. For fast energy evaluations, Genarris employs a Harris approximation, whereby the total density of a molecular crystal is constructed via superposition of single molecule densities. Dispersion-inclusive density functional theory is then used for the Harris density without performing a self-consistency cycle. Genarris uses machine learning for clustering, based on a relative coordinate descriptor developed specifically for molecular crystals, which is shown to be robust in identifying packing motif similarity. In addition to random structure generation, Genarris offers three workflows based on different sequences of successive clustering and selection steps: the "Rigorous" workflow is an exhaustive exploration of the potential energy landscape, the "Energy" workflow produces a set of low energy structures, and the "Diverse" workflow produces a maximally diverse set of structures. The latter is recommended for generating initial populations for genetic algorithms. Here, the implementation of Genarris is reported and its application is demonstrated for three test cases.
[Evidence-based obstetric conduct. Severe preeclampsia: aggressive or expectant management?].
Briceño Pérez, Carlos; Briceño Sanabria, Liliana
2007-02-01
In severe preeclampsia, delivery is assisted immediately without thinking in fetal conditions. Some decades ago, there is agreement to hospitalize, but there is no agreement between expectant or aggressive management. Here are revised these two management evidence based medicine. Fifteen non randomized non controlled trials in English and 4 in Latin American literature highlight 10-14 days pregnancy prolongation without increase maternal morbidity with conservative management; but there were criticized by non random patient selection and non controlled. Two randomized controlled trials showed improvement in neonatal results with no change in maternal, with expectant management. One systematic review of these two trials concluded there is not sufficient data to any reliable recommendation and proposes longer trials are necessary. In United States, National Working Group in the High Blood Pressure Educational Program believes expectant management is only possible in selective women group between 23-32 weeks. The American College of Obstetricians and Gynecologist recommends this management in a tertiary care setting or in consultation with an obstetrician-gynecologist with training in high risk pregnancies. Expectant management present proposal in severe preeclampsia remote from term is summarized.
Barnado, April; Casey, Carolyn; Carroll, Robert J; Wheless, Lee; Denny, Joshua C; Crofford, Leslie J
2017-05-01
To study systemic lupus erythematosus (SLE) in the electronic health record (EHR), we must accurately identify patients with SLE. Our objective was to develop and validate novel EHR algorithms that use International Classification of Diseases, Ninth Revision (ICD-9), Clinical Modification codes, laboratory testing, and medications to identify SLE patients. We used Vanderbilt's Synthetic Derivative, a de-identified version of the EHR, with 2.5 million subjects. We selected all individuals with at least 1 SLE ICD-9 code (710.0), yielding 5,959 individuals. To create a training set, 200 subjects were randomly selected for chart review. A subject was defined as a case if diagnosed with SLE by a rheumatologist, nephrologist, or dermatologist. Positive predictive values (PPVs) and sensitivity were calculated for combinations of code counts of the SLE ICD-9 code, a positive antinuclear antibody (ANA), ever use of medications, and a keyword of "lupus" in the problem list. The algorithms with the highest PPV were each internally validated using a random set of 100 individuals from the remaining 5,759 subjects. The algorithm with the highest PPV at 95% in the training set and 91% in the validation set was 3 or more counts of the SLE ICD-9 code, ANA positive (≥1:40), and ever use of both disease-modifying antirheumatic drugs and steroids, while excluding individuals with systemic sclerosis and dermatomyositis ICD-9 codes. We developed and validated the first EHR algorithm that incorporates laboratory values and medications with the SLE ICD-9 code to identify patients with SLE accurately. © 2016, American College of Rheumatology.
Harpur, Brock A; Zayed, Amro
2013-07-01
The genomes of eusocial insects have a reduced complement of immune genes-an unusual finding considering that sociality provides ideal conditions for disease transmission. The following three hypotheses have been invoked to explain this finding: 1) social insects are attacked by fewer pathogens, 2) social insects have effective behavioral or 3) novel molecular mechanisms for combating pathogens. At the molecular level, these hypotheses predict that canonical innate immune pathways experience a relaxation of selective constraint. A recent study of several innate immune genes in ants and bees showed a pattern of accelerated amino acid evolution, which is consistent with either positive selection or a relaxation of constraint. We studied the population genetics of innate immune genes in the honey bee Apis mellifera by partially sequencing 13 genes from the bee's Toll pathway (∼10.5 kb) and 20 randomly chosen genes (∼16.5 kb) sequenced in 43 diploid workers. Relative to the random gene set, Toll pathway genes had significantly higher levels of amino acid replacement mutations segregating within A. mellifera and fixed between A. mellifera and A. cerana. However, levels of diversity and divergence at synonymous sites did not differ between the two gene sets. Although we detect strong signs of balancing selection on the pathogen recognition gene pgrp-sa, many of the genes in the Toll pathway show signatures of relaxed selective constraint. These results are consistent with the reduced complement of innate immune genes found in social insects and support the hypothesis that some aspect of eusociality renders canonical innate immunity superfluous.
Peer-Selected “Best Papers”—Are They Really That “Good”?
Wainer, Jacques; Eckmann, Michael; Rocha, Anderson
2015-01-01
Background Peer evaluation is the cornerstone of science evaluation. In this paper, we analyze whether or not a form of peer evaluation, the pre-publication selection of the best papers in Computer Science (CS) conferences, is better than random, when considering future citations received by the papers. Methods Considering 12 conferences (for several years), we collected the citation counts from Scopus for both the best papers and the non-best papers. For a different set of 17 conferences, we collected the data from Google Scholar. For each data set, we computed the proportion of cases whereby the best paper has more citations. We also compare this proportion for years before 2010 and after to evaluate if there is a propaganda effect. Finally, we count the proportion of best papers that are in the top 10% and 20% most cited for each conference instance. Results The probability that a best paper will receive more citations than a non best paper is 0.72 (95% CI = 0.66, 0.77) for the Scopus data, and 0.78 (95% CI = 0.74, 0.81) for the Scholar data. There are no significant changes in the probabilities for different years. Also, 51% of the best papers are among the top 10% most cited papers in each conference/year, and 64% of them are among the top 20% most cited. Discussion There is strong evidence that the selection of best papers in Computer Science conferences is better than a random selection, and that a significant number of the best papers are among the top cited papers in the conference. PMID:25789480
Peer-selected "best papers"-are they really that "good"?
Wainer, Jacques; Eckmann, Michael; Rocha, Anderson
2015-01-01
Peer evaluation is the cornerstone of science evaluation. In this paper, we analyze whether or not a form of peer evaluation, the pre-publication selection of the best papers in Computer Science (CS) conferences, is better than random, when considering future citations received by the papers. Considering 12 conferences (for several years), we collected the citation counts from Scopus for both the best papers and the non-best papers. For a different set of 17 conferences, we collected the data from Google Scholar. For each data set, we computed the proportion of cases whereby the best paper has more citations. We also compare this proportion for years before 2010 and after to evaluate if there is a propaganda effect. Finally, we count the proportion of best papers that are in the top 10% and 20% most cited for each conference instance. The probability that a best paper will receive more citations than a non best paper is 0.72 (95% CI = 0.66, 0.77) for the Scopus data, and 0.78 (95% CI = 0.74, 0.81) for the Scholar data. There are no significant changes in the probabilities for different years. Also, 51% of the best papers are among the top 10% most cited papers in each conference/year, and 64% of them are among the top 20% most cited. There is strong evidence that the selection of best papers in Computer Science conferences is better than a random selection, and that a significant number of the best papers are among the top cited papers in the conference.
Mendis, Shanthi; Abegunde, Dele; Oladapo, Olulola; Celletti, Francesca; Nordet, Porfirio
2004-01-01
Assess capacity of health-care facilities in a low-resource setting to implement the absolute risk approach for assessment of cardiovascular risk in hypertensive patients and effective management of hypertension. A descriptive cross-sectional study in Egbeda and Oluyole local government areas of Oyo State in Nigeria in 56 randomly selected primary- (n = 42) and secondary-level (n = 2) health-care and private health-care (n = 12) facilities. One thousand consecutive, known hypertensives attending the selected facilities for follow-up, and health-care providers working in the above randomly selected facilities, were interviewed. About two-thirds of hypertensives utilized primary-care centers both for diagnosis and for follow-up. Laboratory and other investigations to exclude secondary hypertension or to assess target organ damage were not available in the majority of facilities, particularly in primary care. A considerable knowledge and awareness gap related to hypertension and its complications was found, both among patients and health-care providers. Blood pressure control rates were poor (28% with systolic blood pressure (SBP) < 140 mmHg and diastolic blood pressure (DBP) < 90 mmHg] and drug prescription patterns were not evidence based and cost effective. The majority of patients (73%) in this low socio-economic group (mean monthly income 73 US dollars) had to pay fully, out of their own pocket, for consultations and medications. If the absolute risk approach for assessment of risk and effective management of hypertension is to be implemented in low-resource settings, appropriate policy measures need to be taken to improve the competency of health-care providers, to provide basic laboratory facilities and to develop affordable financing mechanisms.
Artificial neural network study on organ-targeting peptides
NASA Astrophysics Data System (ADS)
Jung, Eunkyoung; Kim, Junhyoung; Choi, Seung-Hoon; Kim, Minkyoung; Rhee, Hokyoung; Shin, Jae-Min; Choi, Kihang; Kang, Sang-Kee; Lee, Nam Kyung; Choi, Yun-Jaie; Jung, Dong Hyun
2010-01-01
We report a new approach to studying organ targeting of peptides on the basis of peptide sequence information. The positive control data sets consist of organ-targeting peptide sequences identified by the peroral phage-display technique for four organs, and the negative control data are prepared from random sequences. The capacity of our models to make appropriate predictions is validated by statistical indicators including sensitivity, specificity, enrichment curve, and the area under the receiver operating characteristic (ROC) curve (the ROC score). VHSE descriptor produces statistically significant training models and the models with simple neural network architectures show slightly greater predictive power than those with complex ones. The training and test set statistics indicate that our models could discriminate between organ-targeting and random sequences. We anticipate that our models will be applicable to the selection of organ-targeting peptides for generating peptide drugs or peptidomimetics.
CrowdPhase: crowdsourcing the phase problem
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jorda, Julien; Sawaya, Michael R.; Yeates, Todd O., E-mail: yeates@mbi.ucla.edu
The idea of attacking the phase problem by crowdsourcing is introduced. Using an interactive, multi-player, web-based system, participants work simultaneously to select phase sets that correspond to better electron-density maps in order to solve low-resolution phasing problems. The human mind innately excels at some complex tasks that are difficult to solve using computers alone. For complex problems amenable to parallelization, strategies can be developed to exploit human intelligence in a collective form: such approaches are sometimes referred to as ‘crowdsourcing’. Here, a first attempt at a crowdsourced approach for low-resolution ab initio phasing in macromolecular crystallography is proposed. A collaborativemore » online game named CrowdPhase was designed, which relies on a human-powered genetic algorithm, where players control the selection mechanism during the evolutionary process. The algorithm starts from a population of ‘individuals’, each with a random genetic makeup, in this case a map prepared from a random set of phases, and tries to cause the population to evolve towards individuals with better phases based on Darwinian survival of the fittest. Players apply their pattern-recognition capabilities to evaluate the electron-density maps generated from these sets of phases and to select the fittest individuals. A user-friendly interface, a training stage and a competitive scoring system foster a network of well trained players who can guide the genetic algorithm towards better solutions from generation to generation via gameplay. CrowdPhase was applied to two synthetic low-resolution phasing puzzles and it was shown that players could successfully obtain phase sets in the 30° phase error range and corresponding molecular envelopes showing agreement with the low-resolution models. The successful preliminary studies suggest that with further development the crowdsourcing approach could fill a gap in current crystallographic methods by making it possible to extract meaningful information in cases where limited resolution might otherwise prevent initial phasing.« less
Batch Effect Confounding Leads to Strong Bias in Performance Estimates Obtained by Cross-Validation
Delorenzi, Mauro
2014-01-01
Background With the large amount of biological data that is currently publicly available, many investigators combine multiple data sets to increase the sample size and potentially also the power of their analyses. However, technical differences (“batch effects”) as well as differences in sample composition between the data sets may significantly affect the ability to draw generalizable conclusions from such studies. Focus The current study focuses on the construction of classifiers, and the use of cross-validation to estimate their performance. In particular, we investigate the impact of batch effects and differences in sample composition between batches on the accuracy of the classification performance estimate obtained via cross-validation. The focus on estimation bias is a main difference compared to previous studies, which have mostly focused on the predictive performance and how it relates to the presence of batch effects. Data We work on simulated data sets. To have realistic intensity distributions, we use real gene expression data as the basis for our simulation. Random samples from this expression matrix are selected and assigned to group 1 (e.g., ‘control’) or group 2 (e.g., ‘treated’). We introduce batch effects and select some features to be differentially expressed between the two groups. We consider several scenarios for our study, most importantly different levels of confounding between groups and batch effects. Methods We focus on well-known classifiers: logistic regression, Support Vector Machines (SVM), k-nearest neighbors (kNN) and Random Forests (RF). Feature selection is performed with the Wilcoxon test or the lasso. Parameter tuning and feature selection, as well as the estimation of the prediction performance of each classifier, is performed within a nested cross-validation scheme. The estimated classification performance is then compared to what is obtained when applying the classifier to independent data. PMID:24967636
47 CFR 1.1602 - Designation for random selection.
Code of Federal Regulations, 2010 CFR
2010-10-01
... 47 Telecommunication 1 2010-10-01 2010-10-01 false Designation for random selection. 1.1602 Section 1.1602 Telecommunication FEDERAL COMMUNICATIONS COMMISSION GENERAL PRACTICE AND PROCEDURE Random Selection Procedures for Mass Media Services General Procedures § 1.1602 Designation for random selection...
47 CFR 1.1602 - Designation for random selection.
Code of Federal Regulations, 2011 CFR
2011-10-01
... 47 Telecommunication 1 2011-10-01 2011-10-01 false Designation for random selection. 1.1602 Section 1.1602 Telecommunication FEDERAL COMMUNICATIONS COMMISSION GENERAL PRACTICE AND PROCEDURE Random Selection Procedures for Mass Media Services General Procedures § 1.1602 Designation for random selection...
A Preliminary Study on Motivation and Gender in CLIL and Non-CLIL Types of Instruction
ERIC Educational Resources Information Center
Fontecha, Almudena Fernández; Alonso, Andrés Canga
2014-01-01
This paper sets out to enquiry about gender-based differences in motivation towards EFL in two different types of instruction, i.e. CLIL and EFL (non-CLIL). The study was carried out with 4th Primary education Spanish students randomly selected from two mixed-gender schools in La Rioja (Spain). The results show that non-CLIL learners are…
ERIC Educational Resources Information Center
Taylor, Matthew J.; Merritt, Stephanie M.; Austin, Chammie C.
2013-01-01
A model of negative affect and alcohol use was replicated on a sample of African-American high school students. Participants (N = 5,086) were randomly selected from a previously collected data set and consisted of 2,253 males and 2,833 females residing in both rural and urban locations. Multivariate analysis of covariance and structural equation…
ERIC Educational Resources Information Center
Vermillion, James E.
The presence of artifactual bias in analysis of covariance (ANCOVA) and in matching nonequivalent control group (NECG) designs was empirically investigated. The data set was obtained from a study of the effects of a television program on children from three day care centers in Mexico in which the subjects had been randomly selected within centers.…
ERIC Educational Resources Information Center
Rose, Pamela
2010-01-01
This study examined the relationship of adult 4-H volunteers' perceived leadership styles of 4-H Youth Development Educators to the adult 4-H volunteer sense of empowerment. There were 498 Oregon adult 4-H volunteers randomly selected to participate. Participants rated the leadership style of their 4-H Youth Development Educator (YDE) using Bass…
Borysko, Petro; Moroz, Yurii S; Vasylchenko, Oleksandr V; Hurmach, Vasyl V; Starodubtseva, Anastasia; Stefanishena, Natalia; Nesteruk, Kateryna; Zozulya, Sergey; Kondratov, Ivan S; Grygorenko, Oleksandr O
2018-05-09
A combination approach of a fragment screening and "SAR by catalog" was used for the discovery of bromodomain-containing protein 4 (BRD4) inhibitors. Initial screening of 3695-fragment library against bromodomain 1 of BRD4 using thermal shift assay (TSA), followed by initial hit validation, resulted in 73 fragment hits, which were used to construct a follow-up library selected from available screening collection. Additionally, analogs of inactive fragments, as well as a set of randomly selected compounds were also prepared (3 × 3200 compounds in total). Screening of the resulting sets using TSA, followed by re-testing at several concentrations, counter-screen, and TR-FRET assay resulted in 18 confirmed hits. Compounds derived from the initial fragment set showed better hit rate as compared to the other two sets. Finally, building dose-response curves revealed three compounds with IC 50 = 1.9-7.4 μM. For these compounds, binding sites and conformations in the BRD4 (4UYD) have been determined by docking. Copyright © 2018 Elsevier Ltd. All rights reserved.
Zhao, Wenle; Weng, Yanqiu; Wu, Qi; Palesch, Yuko
2012-01-01
To evaluate the performance of randomization designs under various parameter settings and trial sample sizes, and identify optimal designs with respect to both treatment imbalance and allocation randomness, we evaluate 260 design scenarios from 14 randomization designs under 15 sample sizes range from 10 to 300, using three measures for imbalance and three measures for randomness. The maximum absolute imbalance and the correct guess (CG) probability are selected to assess the trade-off performance of each randomization design. As measured by the maximum absolute imbalance and the CG probability, we found that performances of the 14 randomization designs are located in a closed region with the upper boundary (worst case) given by Efron's biased coin design (BCD) and the lower boundary (best case) from the Soares and Wu's big stick design (BSD). Designs close to the lower boundary provide a smaller imbalance and a higher randomness than designs close to the upper boundary. Our research suggested that optimization of randomization design is possible based on quantified evaluation of imbalance and randomness. Based on the maximum imbalance and CG probability, the BSD, Chen's biased coin design with imbalance tolerance method, and Chen's Ehrenfest urn design perform better than popularly used permuted block design, EBCD, and Wei's urn design. Copyright © 2011 John Wiley & Sons, Ltd.
A review of selection-based tests of abiotic surrogates for species representation.
Beier, Paul; Sutcliffe, Patricia; Hjort, Jan; Faith, Daniel P; Pressey, Robert L; Albuquerque, Fabio
2015-06-01
Because conservation planners typically lack data on where species occur, environmental surrogates--including geophysical settings and climate types--have been used to prioritize sites within a planning area. We reviewed 622 evaluations of the effectiveness of abiotic surrogates in representing species in 19 study areas. Sites selected using abiotic surrogates represented more species than an equal number of randomly selected sites in 43% of tests (55% for plants) and on average improved on random selection of sites by about 8% (21% for plants). Environmental diversity (ED) (42% median improvement on random selection) and biotically informed clusters showed promising results and merit additional testing. We suggest 4 ways to improve performance of abiotic surrogates. First, analysts should consider a broad spectrum of candidate variables to define surrogates, including rarely used variables related to geographic separation, distance from coast, hydrology, and within-site abiotic diversity. Second, abiotic surrogates should be defined at fine thematic resolution. Third, sites (the landscape units prioritized within a planning area) should be small enough to ensure that surrogates reflect species' environments and to produce prioritizations that match the spatial resolution of conservation decisions. Fourth, if species inventories are available for some planning units, planners should define surrogates based on the abiotic variables that most influence species turnover in the planning area. Although species inventories increase the cost of using abiotic surrogates, a modest number of inventories could provide the data needed to select variables and evaluate surrogates. Additional tests of nonclimate abiotic surrogates are needed to evaluate the utility of conserving nature's stage as a strategy for conservation planning in the face of climate change. © 2015 Society for Conservation Biology.
Predicting the accuracy of ligand overlay methods with Random Forest models.
Nandigam, Ravi K; Evans, David A; Erickson, Jon A; Kim, Sangtae; Sutherland, Jeffrey J
2008-12-01
The accuracy of binding mode prediction using standard molecular overlay methods (ROCS, FlexS, Phase, and FieldCompare) is studied. Previous work has shown that simple decision tree modeling can be used to improve accuracy by selection of the best overlay template. This concept is extended to the use of Random Forest (RF) modeling for template and algorithm selection. An extensive data set of 815 ligand-bound X-ray structures representing 5 gene families was used for generating ca. 70,000 overlays using four programs. RF models, trained using standard measures of ligand and protein similarity and Lipinski-related descriptors, are used for automatically selecting the reference ligand and overlay method maximizing the probability of reproducing the overlay deduced from X-ray structures (i.e., using rmsd < or = 2 A as the criteria for success). RF model scores are highly predictive of overlay accuracy, and their use in template and method selection produces correct overlays in 57% of cases for 349 overlay ligands not used for training RF models. The inclusion in the models of protein sequence similarity enables the use of templates bound to related protein structures, yielding useful results even for proteins having no available X-ray structures.
Required number of records for ASCE/SEI 7 ground-motion scaling procedure
Reyes, Juan C.; Kalkan, Erol
2011-01-01
The procedures and criteria in 2006 IBC (International Council of Building Officials, 2006) and 2007 CBC (International Council of Building Officials, 2007) for the selection and scaling ground-motions for use in nonlinear response history analysis (RHA) of structures are based on ASCE/SEI 7 provisions (ASCE, 2005, 2010). According to ASCE/SEI 7, earthquake records should be selected from events of magnitudes, fault distance, and source mechanisms that comply with the maximum considered earthquake, and then scaled so that the average value of the 5-percent-damped response spectra for the set of scaled records is not less than the design response spectrum over the period range from 0.2Tn to 1.5Tn sec (where Tn is the fundamental vibration period of the structure). If at least seven ground-motions are analyzed, the design values of engineering demand parameters (EDPs) are taken as the average of the EDPs determined from the analyses. If fewer than seven ground-motions are analyzed, the design values of EDPs are taken as the maximum values of the EDPs. ASCE/SEI 7 requires a minimum of three ground-motions. These limits on the number of records in the ASCE/SEI 7 procedure are based on engineering experience, rather than on a comprehensive evaluation. This study statistically examines the required number of records for the ASCE/SEI 7 procedure, such that the scaled records provide accurate, efficient, and consistent estimates of" true" structural responses. Based on elastic-perfectly-plastic and bilinear single-degree-of-freedom systems, the ASCE/SEI 7 scaling procedure is applied to 480 sets of ground-motions. The number of records in these sets varies from three to ten. The records in each set were selected either (i) randomly, (ii) considering their spectral shapes, or (iii) considering their spectral shapes and design spectral-acceleration value, A(Tn). As compared to benchmark (that is, "true") responses from unscaled records using a larger catalog of ground-motions, it is demonstrated that the ASCE/SEI 7 scaling procedure is overly conservative if fewer than seven ground-motions are employed. Utilizing seven or more randomly selected records provides a more accurate estimate of the EDPs accompanied by reduced record-to-record variability of the responses. Consistency in accuracy and efficiency is achieved only if records are selected on the basis of their spectral shape and A(Tn).
47 CFR 1.1603 - Conduct of random selection.
Code of Federal Regulations, 2010 CFR
2010-10-01
... 47 Telecommunication 1 2010-10-01 2010-10-01 false Conduct of random selection. 1.1603 Section 1.1603 Telecommunication FEDERAL COMMUNICATIONS COMMISSION GENERAL PRACTICE AND PROCEDURE Random Selection Procedures for Mass Media Services General Procedures § 1.1603 Conduct of random selection. The...
47 CFR 1.1603 - Conduct of random selection.
Code of Federal Regulations, 2011 CFR
2011-10-01
... 47 Telecommunication 1 2011-10-01 2011-10-01 false Conduct of random selection. 1.1603 Section 1.1603 Telecommunication FEDERAL COMMUNICATIONS COMMISSION GENERAL PRACTICE AND PROCEDURE Random Selection Procedures for Mass Media Services General Procedures § 1.1603 Conduct of random selection. The...
Study of Dynamic Characteristics of Aeroelastic Systems Utilizing Randomdec Signatures
NASA Technical Reports Server (NTRS)
Chang, C. S.
1975-01-01
The feasibility of utilizing the random decrement method in conjunction with a signature analysis procedure to determine the dynamic characteristics of an aeroelastic system for the purpose of on-line prediction of potential on-set of flutter was examined. Digital computer programs were developed to simulate sampled response signals of a two-mode aeroelastic system. Simulated response data were used to test the random decrement method. A special curve-fit approach was developed for analyzing the resulting signatures. A number of numerical 'experiments' were conducted on the combined processes. The method is capable of determining frequency and damping values accurately from randomdec signatures of carefully selected lengths.
Robust local search for spacecraft operations using adaptive noise
NASA Technical Reports Server (NTRS)
Fukunaga, Alex S.; Rabideau, Gregg; Chien, Steve
2004-01-01
Randomization is a standard technique for improving the performance of local search algorithms for constraint satisfaction. However, it is well-known that local search algorithms are constraints satisfaction. However, it is well-known that local search algorithms are to the noise values selected. We investigate the use of an adaptive noise mechanism in an iterative repair-based planner/scheduler for spacecraft operations. Preliminary results indicate that adaptive noise makes the use of randomized repair moves safe and robust; that is, using adaptive noise makes it possible to consistently achieve, performance comparable with the best tuned noise setting without the need for manually tuning the noise parameter.
Prediction of Baseflow Index of Catchments using Machine Learning Algorithms
NASA Astrophysics Data System (ADS)
Yadav, B.; Hatfield, K.
2017-12-01
We present the results of eight machine learning techniques for predicting the baseflow index (BFI) of ungauged basins using a surrogate of catchment scale climate and physiographic data. The tested algorithms include ordinary least squares, ridge regression, least absolute shrinkage and selection operator (lasso), elasticnet, support vector machine, gradient boosted regression trees, random forests, and extremely randomized trees. Our work seeks to identify the dominant controls of BFI that can be readily obtained from ancillary geospatial databases and remote sensing measurements, such that the developed techniques can be extended to ungauged catchments. More than 800 gauged catchments spanning the continental United States were selected to develop the general methodology. The BFI calculation was based on the baseflow separated from daily streamflow hydrograph using HYSEP filter. The surrogate catchment attributes were compiled from multiple sources including digital elevation model, soil, landuse, climate data, other publicly available ancillary and geospatial data. 80% catchments were used to train the ML algorithms, and the remaining 20% of the catchments were used as an independent test set to measure the generalization performance of fitted models. A k-fold cross-validation using exhaustive grid search was used to fit the hyperparameters of each model. Initial model development was based on 19 independent variables, but after variable selection and feature ranking, we generated revised sparse models of BFI prediction that are based on only six catchment attributes. These key predictive variables selected after the careful evaluation of bias-variance tradeoff include average catchment elevation, slope, fraction of sand, permeability, temperature, and precipitation. The most promising algorithms exceeding an accuracy score (r-square) of 0.7 on test data include support vector machine, gradient boosted regression trees, random forests, and extremely randomized trees. Considering both the accuracy and the computational complexity of these algorithms, we identify the extremely randomized trees as the best performing algorithm for BFI prediction in ungauged basins.
How many records should be used in ASCE/SEI-7 ground motion scaling procedure?
Reyes, Juan C.; Kalkan, Erol
2012-01-01
U.S. national building codes refer to the ASCE/SEI-7 provisions for selecting and scaling ground motions for use in nonlinear response history analysis of structures. Because the limiting values for the number of records in the ASCE/SEI-7 are based on engineering experience, this study examines the required number of records statistically, such that the scaled records provide accurate, efficient, and consistent estimates of “true” structural responses. Based on elastic–perfectly plastic and bilinear single-degree-of-freedom systems, the ASCE/SEI-7 scaling procedure is applied to 480 sets of ground motions; the number of records in these sets varies from three to ten. As compared to benchmark responses, it is demonstrated that the ASCE/SEI-7 scaling procedure is conservative if fewer than seven ground motions are employed. Utilizing seven or more randomly selected records provides more accurate estimate of the responses. Selecting records based on their spectral shape and design spectral acceleration increases the accuracy and efficiency of the procedure.
Selection of habitats by Emperor Geese during brood rearing
Schmutz, J.A.
2001-01-01
Although forage quality strongly affects gosling growth and consequently juvenile survival, the relative use of different plant communities by brood rearing geese has been poorly studied. On the Yukon-Kuskokwim Delta, Alaska, population growth and juvenile recruitment of Emperor Geese (Chen canagica) are comparatively low, and it is unknown whether their selection of habitats during brood rearing differs from other goose species. Radio-telemetry was used to document the use of habitats by 56 families of Emperor Geese in a 70 km2 portion of the Yukon-Kuskokwim Delta during brood rearing in 1994-1996. When contrasted with available habitats (a set of six habitat classes), as estimated from 398 random sampling locations, Emperor Geese strongly selected Saline Ponds, Mudflat, and Ramenskii Meadow habitats and avoided Levee Meadow, Bog Meadow, and Sedge Meadow. These selected habitats were the most saline, comprised one-third of the study area, and 43% of all locations were in Ramenskii Meadow. I contrasted these Emperor Goose locations with habitats used by the composite goose community, as inferred from the presence of goose feces at random locations. The marked difference between groups in this comparison implied that Cackling Canada Geese (Branta canadensis minima) and Greater White-fronted Geese (Anser albifrons) collectively selected much different brood rearing habitats than Emperor Geese. Received 20 February 2001, accepted 18 April 2001.
Protein Loop Structure Prediction Using Conformational Space Annealing.
Heo, Seungryong; Lee, Juyong; Joo, Keehyoung; Shin, Hang-Cheol; Lee, Jooyoung
2017-05-22
We have developed a protein loop structure prediction method by combining a new energy function, which we call E PLM (energy for protein loop modeling), with the conformational space annealing (CSA) global optimization algorithm. The energy function includes stereochemistry, dynamic fragment assembly, distance-scaled finite ideal gas reference (DFIRE), and generalized orientation- and distance-dependent terms. For the conformational search of loop structures, we used the CSA algorithm, which has been quite successful in dealing with various hard global optimization problems. We assessed the performance of E PLM with two widely used loop-decoy sets, Jacobson and RAPPER, and compared the results against the DFIRE potential. The accuracy of model selection from a pool of loop decoys as well as de novo loop modeling starting from randomly generated structures was examined separately. For the selection of a nativelike structure from a decoy set, E PLM was more accurate than DFIRE in the case of the Jacobson set and had similar accuracy in the case of the RAPPER set. In terms of sampling more nativelike loop structures, E PLM outperformed E DFIRE for both decoy sets. This new approach equipped with E PLM and CSA can serve as the state-of-the-art de novo loop modeling method.
Beesley, Tom; Hanafi, Gunadi; Vadillo, Miguel A; Shanks, David R; Livesey, Evan J
2018-05-01
Two experiments examined biases in selective attention during contextual cuing of visual search. When participants were instructed to search for a target of a particular color, overt attention (as measured by the location of fixations) was biased strongly toward distractors presented in that same color. However, when participants searched for targets that could be presented in 1 of 2 possible colors, overt attention was not biased between the different distractors, regardless of whether these distractors predicted the location of the target (repeating) or did not (randomly arranged). These data suggest that selective attention in visual search is guided only by the demands of the target detection task (the attentional set) and not by the predictive validity of the distractor elements. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Instrumentation for investigation of corona discharges from insulated wires
NASA Technical Reports Server (NTRS)
Doreswamy, C. V.; Crowell, C. S.
1975-01-01
A coaxial cylinder configuration is used to investigate the effect of corona impulses on the deterioration of electrical insulation. The corona currents flowing through the resistance develop a voltage which is fed to the measuring set-up. The value of this resistance is made equal to the surge impedance of the coaxial cylinder set up to prevent reflections. This instrumentation includes a phase shifter and Schmidt trigger and is designed to sample, measure, and display corona impulses occurring during any predetermined sampling period of a randomly selectable half cycle of the 60 Hz high voltage wave.
Intelligent Control of a Sensor-Actuator System via Kernelized Least-Squares Policy Iteration
Liu, Bo; Chen, Sanfeng; Li, Shuai; Liang, Yongsheng
2012-01-01
In this paper a new framework, called Compressive Kernelized Reinforcement Learning (CKRL), for computing near-optimal policies in sequential decision making with uncertainty is proposed via incorporating the non-adaptive data-independent Random Projections and nonparametric Kernelized Least-squares Policy Iteration (KLSPI). Random Projections are a fast, non-adaptive dimensionality reduction framework in which high-dimensionality data is projected onto a random lower-dimension subspace via spherically random rotation and coordination sampling. KLSPI introduce kernel trick into the LSPI framework for Reinforcement Learning, often achieving faster convergence and providing automatic feature selection via various kernel sparsification approaches. In this approach, policies are computed in a low-dimensional subspace generated by projecting the high-dimensional features onto a set of random basis. We first show how Random Projections constitute an efficient sparsification technique and how our method often converges faster than regular LSPI, while at lower computational costs. Theoretical foundation underlying this approach is a fast approximation of Singular Value Decomposition (SVD). Finally, simulation results are exhibited on benchmark MDP domains, which confirm gains both in computation time and in performance in large feature spaces. PMID:22736969
Büchi, S; Straub, S; Schwager, U
2010-12-01
Although there is much talk about shared decision making and individualized goal setting, there is a lack of knowledge and knowhow in their realization in daily clinical practice. There is a lack in tools for easy applicable tools to ameliorate person-centred individualized goal setting processes. In three selected psychiatric inpatients the semistructured, theory driven use of PRISM (Pictorial Representation of Illness and Self Measure) in patients with complex psychiatric problems is presented and discussed. PRISM sustains a person-centred individualized process of goal setting and treatment and reinforces the active participation of patients. The process of visualisation and synchronous documentation is validated positively by patients and clinicians. The visual goal setting requires 30 to 45 minutes. In patients with complex psychiatric illness PRISM was used successfully to ameliorate individual goal setting. Specific effects of PRISM-visualisation are actually evaluated in a randomized controlled trial.
Feature Selection based on Machine Learning in MRIs for Hippocampal Segmentation
NASA Astrophysics Data System (ADS)
Tangaro, Sabina; Amoroso, Nicola; Brescia, Massimo; Cavuoti, Stefano; Chincarini, Andrea; Errico, Rosangela; Paolo, Inglese; Longo, Giuseppe; Maglietta, Rosalia; Tateo, Andrea; Riccio, Giuseppe; Bellotti, Roberto
2015-01-01
Neurodegenerative diseases are frequently associated with structural changes in the brain. Magnetic resonance imaging (MRI) scans can show these variations and therefore can be used as a supportive feature for a number of neurodegenerative diseases. The hippocampus has been known to be a biomarker for Alzheimer disease and other neurological and psychiatric diseases. However, it requires accurate, robust, and reproducible delineation of hippocampal structures. Fully automatic methods are usually the voxel based approach; for each voxel a number of local features were calculated. In this paper, we compared four different techniques for feature selection from a set of 315 features extracted for each voxel: (i) filter method based on the Kolmogorov-Smirnov test; two wrapper methods, respectively, (ii) sequential forward selection and (iii) sequential backward elimination; and (iv) embedded method based on the Random Forest Classifier on a set of 10 T1-weighted brain MRIs and tested on an independent set of 25 subjects. The resulting segmentations were compared with manual reference labelling. By using only 23 feature for each voxel (sequential backward elimination) we obtained comparable state-of-the-art performances with respect to the standard tool FreeSurfer.
Maximizing lipocalin prediction through balanced and diversified training set and decision fusion.
Nath, Abhigyan; Subbiah, Karthikeyan
2015-12-01
Lipocalins are short in sequence length and perform several important biological functions. These proteins are having less than 20% sequence similarity among paralogs. Experimentally identifying them is an expensive and time consuming process. The computational methods based on the sequence similarity for allocating putative members to this family are also far elusive due to the low sequence similarity existing among the members of this family. Consequently, the machine learning methods become a viable alternative for their prediction by using the underlying sequence/structurally derived features as the input. Ideally, any machine learning based prediction method must be trained with all possible variations in the input feature vector (all the sub-class input patterns) to achieve perfect learning. A near perfect learning can be achieved by training the model with diverse types of input instances belonging to the different regions of the entire input space. Furthermore, the prediction performance can be improved through balancing the training set as the imbalanced data sets will tend to produce the prediction bias towards majority class and its sub-classes. This paper is aimed to achieve (i) the high generalization ability without any classification bias through the diversified and balanced training sets as well as (ii) enhanced the prediction accuracy by combining the results of individual classifiers with an appropriate fusion scheme. Instead of creating the training set randomly, we have first used the unsupervised Kmeans clustering algorithm to create diversified clusters of input patterns and created the diversified and balanced training set by selecting an equal number of patterns from each of these clusters. Finally, probability based classifier fusion scheme was applied on boosted random forest algorithm (which produced greater sensitivity) and K nearest neighbour algorithm (which produced greater specificity) to achieve the enhanced predictive performance than that of individual base classifiers. The performance of the learned models trained on Kmeans preprocessed training set is far better than the randomly generated training sets. The proposed method achieved a sensitivity of 90.6%, specificity of 91.4% and accuracy of 91.0% on the first test set and sensitivity of 92.9%, specificity of 96.2% and accuracy of 94.7% on the second blind test set. These results have established that diversifying training set improves the performance of predictive models through superior generalization ability and balancing the training set improves prediction accuracy. For smaller data sets, unsupervised Kmeans based sampling can be an effective technique to increase generalization than that of the usual random splitting method. Copyright © 2015 Elsevier Ltd. All rights reserved.
Nidheesh, N; Abdul Nazeer, K A; Ameer, P M
2017-12-01
Clustering algorithms with steps involving randomness usually give different results on different executions for the same dataset. This non-deterministic nature of algorithms such as the K-Means clustering algorithm limits their applicability in areas such as cancer subtype prediction using gene expression data. It is hard to sensibly compare the results of such algorithms with those of other algorithms. The non-deterministic nature of K-Means is due to its random selection of data points as initial centroids. We propose an improved, density based version of K-Means, which involves a novel and systematic method for selecting initial centroids. The key idea of the algorithm is to select data points which belong to dense regions and which are adequately separated in feature space as the initial centroids. We compared the proposed algorithm to a set of eleven widely used single clustering algorithms and a prominent ensemble clustering algorithm which is being used for cancer data classification, based on the performances on a set of datasets comprising ten cancer gene expression datasets. The proposed algorithm has shown better overall performance than the others. There is a pressing need in the Biomedical domain for simple, easy-to-use and more accurate Machine Learning tools for cancer subtype prediction. The proposed algorithm is simple, easy-to-use and gives stable results. Moreover, it provides comparatively better predictions of cancer subtypes from gene expression data. Copyright © 2017 Elsevier Ltd. All rights reserved.
Improving the performance of minimizers and winnowing schemes
Marçais, Guillaume; Pellow, David; Bork, Daniel; Orenstein, Yaron; Shamir, Ron; Kingsford, Carl
2017-01-01
Abstract Motivation: The minimizers scheme is a method for selecting k-mers from sequences. It is used in many bioinformatics software tools to bin comparable sequences or to sample a sequence in a deterministic fashion at approximately regular intervals, in order to reduce memory consumption and processing time. Although very useful, the minimizers selection procedure has undesirable behaviors (e.g. too many k-mers are selected when processing certain sequences). Some of these problems were already known to the authors of the minimizers technique, and the natural lexicographic ordering of k-mers used by minimizers was recognized as their origin. Many software tools using minimizers employ ad hoc variations of the lexicographic order to alleviate those issues. Results: We provide an in-depth analysis of the effect of k-mer ordering on the performance of the minimizers technique. By using small universal hitting sets (a recently defined concept), we show how to significantly improve the performance of minimizers and avoid some of its worse behaviors. Based on these results, we encourage bioinformatics software developers to use an ordering based on a universal hitting set or, if not possible, a randomized ordering, rather than the lexicographic order. This analysis also settles negatively a conjecture (by Schleimer et al.) on the expected density of minimizers in a random sequence. Availability and Implementation: The software used for this analysis is available on GitHub: https://github.com/gmarcais/minimizers.git. Contact: gmarcais@cs.cmu.edu or carlk@cs.cmu.edu PMID:28881970
Zawbaa, Hossam M; Szlȩk, Jakub; Grosan, Crina; Jachowicz, Renata; Mendyk, Aleksander
2016-01-01
Poly-lactide-co-glycolide (PLGA) is a copolymer of lactic and glycolic acid. Drug release from PLGA microspheres depends not only on polymer properties but also on drug type, particle size, morphology of microspheres, release conditions, etc. Selecting a subset of relevant properties for PLGA is a challenging machine learning task as there are over three hundred features to consider. In this work, we formulate the selection of critical attributes for PLGA as a multiobjective optimization problem with the aim of minimizing the error of predicting the dissolution profile while reducing the number of attributes selected. Four bio-inspired optimization algorithms: antlion optimization, binary version of antlion optimization, grey wolf optimization, and social spider optimization are used to select the optimal feature set for predicting the dissolution profile of PLGA. Besides these, LASSO algorithm is also used for comparisons. Selection of crucial variables is performed under the assumption that both predictability and model simplicity are of equal importance to the final result. During the feature selection process, a set of input variables is employed to find minimum generalization error across different predictive models and their settings/architectures. The methodology is evaluated using predictive modeling for which various tools are chosen, such as Cubist, random forests, artificial neural networks (monotonic MLP, deep learning MLP), multivariate adaptive regression splines, classification and regression tree, and hybrid systems of fuzzy logic and evolutionary computations (fugeR). The experimental results are compared with the results reported by Szlȩk. We obtain a normalized root mean square error (NRMSE) of 15.97% versus 15.4%, and the number of selected input features is smaller, nine versus eleven.
Zawbaa, Hossam M.; Szlȩk, Jakub; Grosan, Crina; Jachowicz, Renata; Mendyk, Aleksander
2016-01-01
Poly-lactide-co-glycolide (PLGA) is a copolymer of lactic and glycolic acid. Drug release from PLGA microspheres depends not only on polymer properties but also on drug type, particle size, morphology of microspheres, release conditions, etc. Selecting a subset of relevant properties for PLGA is a challenging machine learning task as there are over three hundred features to consider. In this work, we formulate the selection of critical attributes for PLGA as a multiobjective optimization problem with the aim of minimizing the error of predicting the dissolution profile while reducing the number of attributes selected. Four bio-inspired optimization algorithms: antlion optimization, binary version of antlion optimization, grey wolf optimization, and social spider optimization are used to select the optimal feature set for predicting the dissolution profile of PLGA. Besides these, LASSO algorithm is also used for comparisons. Selection of crucial variables is performed under the assumption that both predictability and model simplicity are of equal importance to the final result. During the feature selection process, a set of input variables is employed to find minimum generalization error across different predictive models and their settings/architectures. The methodology is evaluated using predictive modeling for which various tools are chosen, such as Cubist, random forests, artificial neural networks (monotonic MLP, deep learning MLP), multivariate adaptive regression splines, classification and regression tree, and hybrid systems of fuzzy logic and evolutionary computations (fugeR). The experimental results are compared with the results reported by Szlȩk. We obtain a normalized root mean square error (NRMSE) of 15.97% versus 15.4%, and the number of selected input features is smaller, nine versus eleven. PMID:27315205
The CO₂ GAP Project--CO₂ GAP as a prognostic tool in emergency departments.
Shetty, Amith L; Lai, Kevin H; Byth, Karen
2010-12-01
To determine whether CO₂ GAP [(a-ET) PCO₂] value differs consistently in patients presenting with shortness of breath to the ED requiring ventilatory support. To determine a cut-off value of CO₂ GAP, which is consistently associated with measured outcome and to compare its performance against other derived variables. This prospective observational study was conducted in ED on a convenience sample of 412 from 759 patients who underwent concurrent arterial blood gas and ETCO₂ (end-tidal CO₂) measurement. They were randomized to test sample of 312 patients and validation set of 100 patients. The primary outcome of interest was the need for ventilatory support and secondary outcomes were admission to high dependency unit or death during stay in ED. The randomly selected training set was used to select cut-points for the possible predictors; that is, CO₂ GAP, CO₂ gradient, physiologic dead space and A-a gradient. The sensitivity, specificity and predictive values of these predictors were validated in the test set of 100 patients. Analysis of the receiver operating characteristic curves revealed the CO₂ GAP performed significantly better than the arterial-alveolar gradient in patients requiring ventilator support (area under the curve 0.950 vs 0.726). A CO₂ GAP ≥10 was associated with assisted ventilation outcomes when applied to the validation test set (100% sensitivity 70% specificity). The CO₂ GAP [(a-ET) PCO₂] differs significantly in patients requiring assisted ventilation when presenting with shortness of breath to EDs and further research addressing the prognostic value of CO₂ GAP in this specific aspect is required. © 2010 The Authors. EMA © 2010 Australasian College for Emergency Medicine and Australasian Society for Emergency Medicine.
2016-01-01
Purpose: The goal of this study was to characterize the difficulty index of the items in the skills test components of the class I and II Korean emergency medical technician licensing examination (KEMTLE), which requires examinees to select items randomly. Methods: The results of 1,309 class I KEMTLE examinations and 1,801 class II KEMTLE examinations in 2013 were subjected to analysis. Items from the basic and advanced skills test sections of the KEMTLE were compared to determine whether some were significantly more difficult than others. Results: In the class I KEMTLE, all 4 of the items on the basic skills test showed significant variation in difficulty index (P<0.01), as well as 4 of the 5 items on the advanced skills test (P<0.05). In the class II KEMTLE, 4 of the 5 items on the basic skills test showed significantly different difficulty index (P<0.01), as well as all 3 of the advanced skills test items (P<0.01). Conclusion: In the skills test components of the class I and II KEMTLE, the procedure in which examinees randomly select questions should be revised to require examinees to respond to a set of fixed items in order to improve the reliability of the national licensing examination. PMID:26883810
Multidimensional density shaping by sigmoids.
Roth, Z; Baram, Y
1996-01-01
An estimate of the probability density function of a random vector is obtained by maximizing the output entropy of a feedforward network of sigmoidal units with respect to the input weights. Classification problems can be solved by selecting the class associated with the maximal estimated density. Newton's optimization method, applied to the estimated density, yields a recursive estimator for a random variable or a random sequence. A constrained connectivity structure yields a linear estimator, which is particularly suitable for "real time" prediction. A Gaussian nonlinearity yields a closed-form solution for the network's parameters, which may also be used for initializing the optimization algorithm when other nonlinearities are employed. A triangular connectivity between the neurons and the input, which is naturally suggested by the statistical setting, reduces the number of parameters. Applications to classification and forecasting problems are demonstrated.
Frequency of RNA–RNA interaction in a model of the RNA World
STRIGGLES, JOHN C.; MARTIN, MATTHEW B.; SCHMIDT, FRANCIS J.
2006-01-01
The RNA World model for prebiotic evolution posits the selection of catalytic/template RNAs from random populations. The mechanisms by which these random populations could be generated de novo are unclear. Non-enzymatic and RNA-catalyzed nucleic acid polymerizations are poorly processive, which means that the resulting short-chain RNA population could contain only limited diversity. Nonreciprocal recombination of smaller RNAs provides an alternative mechanism for the assembly of larger species with concomitantly greater structural diversity; however, the frequency of any specific recombination event in a random RNA population is limited by the low probability of an encounter between any two given molecules. This low probability could be overcome if the molecules capable of productive recombination were redundant, with many nonhomologous but functionally equivalent RNAs being present in a random population. Here we report fluctuation experiments to estimate the redundancy of the set of RNAs in a population of random sequences that are capable of non-Watson-Crick interaction with another RNA. Parallel SELEX experiments showed that at least one in 106 random 20-mers binds to the P5.1 stem–loop of Bacillus subtilis RNase P RNA with affinities equal to that of its naturally occurring partner. This high frequency predicts that a single RNA in an RNA World would encounter multiple interacting RNAs within its lifetime, supporting recombination as a plausible mechanism for prebiotic RNA evolution. The large number of equivalent species implies that the selection of any single interacting species in the RNA World would be a contingent event, i.e., one resulting from historical accident. PMID:16495233
Amaral Júnior, A T; Freitas Júnior, S P; Rangel, R M; Pena, G F; Ribeiro, R M; Morais, R C; Schuelter, A R
2010-03-02
We estimated genetic gains for popcorn varieties using selection indexes in a fourth cycle of intrapopulation recurrent selection developed in the campus of the Universidade Estadual do Norte Fluminense. Two hundred full-sib families were obtained from the popcorn population UNB-2U of the third recurrent selection cycle. The progenies were evaluated in a randomized block design with two replications at sites in two different environments: the Colégio Estadual Agrícola Antônio Sarlo, in Campos dos Goytacazes, and the Empresa de Pesquisa Agropecuária do Estado do Rio de Janeiro (PESAGRO-RIO), in Itaocara, both in the State of Rio de Janeiro. There were significant differences between families within sets in all traits, indicating genetic variability that could be exploited in future cycles. Thirty full-sib families were selected to continue the program. The selection indexes used to predict the gains were those of Mulamba and Mock, Smith and Hazel. The best results were obtained with the Mulamba and Mock index, which allowed the prediction of negative gains for the traits number of diseased ears and ears attacked by pests, number of broken plants and lodging, as well as ears with poor husk cover. It also provided higher gains for popping expansion and grain yield than with the other indexes, giving values of 10.55 and 8.50%, respectively, based on tentatively assigned random weights.
A stratified two-stage sampling design for digital soil mapping in a Mediterranean basin
NASA Astrophysics Data System (ADS)
Blaschek, Michael; Duttmann, Rainer
2015-04-01
The quality of environmental modelling results often depends on reliable soil information. In order to obtain soil data in an efficient manner, several sampling strategies are at hand depending on the level of prior knowledge and the overall objective of the planned survey. This study focuses on the collection of soil samples considering available continuous secondary information in an undulating, 16 km²-sized river catchment near Ussana in southern Sardinia (Italy). A design-based, stratified, two-stage sampling design has been applied aiming at the spatial prediction of soil property values at individual locations. The stratification based on quantiles from density functions of two land-surface parameters - topographic wetness index and potential incoming solar radiation - derived from a digital elevation model. Combined with four main geological units, the applied procedure led to 30 different classes in the given test site. Up to six polygons of each available class were selected randomly excluding those areas smaller than 1ha to avoid incorrect location of the points in the field. Further exclusion rules were applied before polygon selection masking out roads and buildings using a 20m buffer. The selection procedure was repeated ten times and the set of polygons with the best geographical spread were chosen. Finally, exact point locations were selected randomly from inside the chosen polygon features. A second selection based on the same stratification and following the same methodology (selecting one polygon instead of six) was made in order to create an appropriate validation set. Supplementary samples were obtained during a second survey focusing on polygons that have either not been considered during the first phase at all or were not adequately represented with respect to feature size. In total, both field campaigns produced an interpolation set of 156 samples and a validation set of 41 points. The selection of sample point locations has been done using ESRI software (ArcGIS) extended by Hawth's Tools and later on its replacement the Geospatial Modelling Environment (GME). 88% of all desired points could actually be reached in the field and have been successfully sampled. Our results indicate that the sampled calibration and validation sets are representative for each other and could be successfully used as interpolation data for spatial prediction purposes. With respect to soil textural fractions, for instance, equal multivariate means and variance homogeneity were found for the two datasets as evidenced by significant (P > 0.05) Hotelling T²-test (2.3 with df1 = 3, df2 = 193) and Bartlett's test statistics (6.4 with df = 6). The multivariate prediction of clay, silt and sand content using a neural network residual cokriging approach reached an explained variance level of 56%, 47% and 63%. Thus, the presented case study is a successful example of considering readily available continuous information on soil forming factors such as geology and relief as stratifying variables for designing sampling schemes in digital soil mapping projects.
Method for nonlinear optimization for gas tagging and other systems
Chen, Ting; Gross, Kenny C.; Wegerich, Stephan
1998-01-01
A method and system for providing nuclear fuel rods with a configuration of isotopic gas tags. The method includes selecting a true location of a first gas tag node, selecting initial locations for the remaining n-1 nodes using target gas tag compositions, generating a set of random gene pools with L nodes, applying a Hopfield network for computing on energy, or cost, for each of the L gene pools and using selected constraints to establish minimum energy states to identify optimal gas tag nodes with each energy compared to a convergence threshold and then upon identifying the gas tag node continuing this procedure until establishing the next gas tag node until all remaining n nodes have been established.
Method for nonlinear optimization for gas tagging and other systems
Chen, T.; Gross, K.C.; Wegerich, S.
1998-01-06
A method and system are disclosed for providing nuclear fuel rods with a configuration of isotopic gas tags. The method includes selecting a true location of a first gas tag node, selecting initial locations for the remaining n-1 nodes using target gas tag compositions, generating a set of random gene pools with L nodes, applying a Hopfield network for computing on energy, or cost, for each of the L gene pools and using selected constraints to establish minimum energy states to identify optimal gas tag nodes with each energy compared to a convergence threshold and then upon identifying the gas tag node continuing this procedure until establishing the next gas tag node until all remaining n nodes have been established. 6 figs.
Heidema, A Geert; Boer, Jolanda M A; Nagelkerke, Nico; Mariman, Edwin C M; van der A, Daphne L; Feskens, Edith J M
2006-04-21
Genetic epidemiologists have taken the challenge to identify genetic polymorphisms involved in the development of diseases. Many have collected data on large numbers of genetic markers but are not familiar with available methods to assess their association with complex diseases. Statistical methods have been developed for analyzing the relation between large numbers of genetic and environmental predictors to disease or disease-related variables in genetic association studies. In this commentary we discuss logistic regression analysis, neural networks, including the parameter decreasing method (PDM) and genetic programming optimized neural networks (GPNN) and several non-parametric methods, which include the set association approach, combinatorial partitioning method (CPM), restricted partitioning method (RPM), multifactor dimensionality reduction (MDR) method and the random forests approach. The relative strengths and weaknesses of these methods are highlighted. Logistic regression and neural networks can handle only a limited number of predictor variables, depending on the number of observations in the dataset. Therefore, they are less useful than the non-parametric methods to approach association studies with large numbers of predictor variables. GPNN on the other hand may be a useful approach to select and model important predictors, but its performance to select the important effects in the presence of large numbers of predictors needs to be examined. Both the set association approach and random forests approach are able to handle a large number of predictors and are useful in reducing these predictors to a subset of predictors with an important contribution to disease. The combinatorial methods give more insight in combination patterns for sets of genetic and/or environmental predictor variables that may be related to the outcome variable. As the non-parametric methods have different strengths and weaknesses we conclude that to approach genetic association studies using the case-control design, the application of a combination of several methods, including the set association approach, MDR and the random forests approach, will likely be a useful strategy to find the important genes and interaction patterns involved in complex diseases.
Involving patients in setting priorities for healthcare improvement: a cluster randomized trial.
Boivin, Antoine; Lehoux, Pascale; Lacombe, Réal; Burgers, Jako; Grol, Richard
2014-02-20
Patients are increasingly seen as active partners in healthcare. While patient involvement in individual clinical decisions has been extensively studied, no trial has assessed how patients can effectively be involved in collective healthcare decisions affecting the population. The goal of this study was to test the impact of involving patients in setting healthcare improvement priorities for chronic care at the community level. Cluster randomized controlled trial. Local communities were randomized in intervention (priority setting with patient involvement) and control sites (no patient involvement). Communities in a canadian region were required to set priorities for improving chronic disease management in primary care, from a list of 37 validated quality indicators. Patients were consulted in writing, before participating in face-to-face deliberation with professionals. Professionals established priorities among themselves, without patient involvement. A total of 172 individuals from six communities participated in the study, including 83 chronic disease patients, and 89 health professionals. The primary outcome was the level of agreement between patients' and professionals' priorities. Secondary outcomes included professionals' intention to use the selected quality indicators, and the costs of patient involvement. Priorities established with patients were more aligned with core generic components of the Medical Home and Chronic Care Model, including: access to primary care, self-care support, patient participation in clinical decisions, and partnership with community organizations (p < 0.01). Priorities established by professionals alone placed more emphasis on the technical quality of single disease management. The involvement intervention fostered mutual influence between patients and professionals, which resulted in a 41% increase in agreement on common priorities (95%CI: +12% to +58%, p < 0.01). Professionals' intention to use the selected quality indicators was similar in intervention and control sites. Patient involvement increased the costs of the prioritization process by 17%, and required 10% more time to reach consensus on common priorities. Patient involvement can change priorities driving healthcare improvement at the population level. Future research should test the generalizability of these findings to other contexts, and assess its impact on patient care. The Netherlands National Trial Register #NTR2496.
A prediction of templates in the auditory cortex system
NASA Astrophysics Data System (ADS)
Ghanbeigi, Kimia
In this study variation of human auditory evoked mismatch field amplitudes in response to complex tones as a function of the removal in single partials in the onset period was investigated. It was determined: 1-A single frequency elimination in a sound stimulus plays a significant role in human brain sound recognition. 2-By comparing the mismatches of the brain response due to a single frequency elimination in the "Starting Transient" and "Sustain Part" of the sound stimulus, it is found that the brain is more sensitive to frequency elimination in the Starting Transient. This study involves 4 healthy subjects with normal hearing. Neural activity was recorded with stimulus whole-head MEG. Verification of spatial location in the auditory cortex was determined by comparing with MRI images. In the first set of stimuli, repetitive ('standard') tones with five selected onset frequencies were randomly embedded in the string of rare ('deviant') tones with randomly varying inter stimulus intervals. In the deviant tones one of the frequency components was omitted relative to the deviant tones during the onset period. The frequency of the test partial of the complex tone was intentionally selected to preclude its reinsertion by generation of harmonics or combination tones due to either the nonlinearity of the ear, the electronic equipment or the brain processing. In the second set of stimuli, time structured as above, repetitive ('standard') tones with five selected sustained frequency components were embedded in the string of rare '(deviant') tones for which one of these selected frequencies was omitted in the sustained tone. In both measurements, the carefully frequency selection precluded their reinsertion by generation of harmonics or combination tones due to the nonlinearity of the ear, the electronic equipment and brain processing. The same considerations for selecting the test frequency partial were applied. Results. By comparing MMN of the two data sets, the relative contribution to sound recognition of the omitted partial frequency components in the onset and sustained regions has been determined. Conclusion. The presence of significant mismatch negativity, due to neural activity of auditory cortex, emphasizes that the brain recognizes the elimination of a single frequency of carefully chosen anharmonic frequencies. It was shown this mismatch is more significant if the single frequency elimination occurs in the onset period.
Bestrashniy, Jessica Rutledge Bruce Musselman; Nguyen, Viet Nhung; Nguyen, Thi Loi; Pham, Thi Lieu; Nguyen, Thu Anh; Pham, Duc Cuong; Nghiem, Le Phuong Hoa; Le, Thi Ngoc Anh; Nguyen, Binh Hoa; Nguyen, Kim Cuong; Nguyen, Huy Dung; Buu, Tran Ngoc; Le, Thi Nhung; Nguyen, Viet Hung; Dinh, Ngoc Sy; Britton, Warwick John; Marks, Guy Barrington; Fox, Greg James
2018-06-23
Patients completing treatment for tuberculosis (TB) in high-prevalence settings face a risk of developing recurrent disease. This has important consequences for public health, given its association with drug resistance and a poor prognosis. Previous research has implicated individual factors such as smoking, alcohol use, HIV, poor treatment adherence, and drug resistant disease as risk factors for recurrence. However, little is known about how these factors co-act to produce recurrent disease. Furthermore, perhaps factors related to the index disease means higher burden/low resource settings may be more prone to recurrent disease that could be preventable. We conducted a case-control study nested within a cohort of consecutively enrolled adults who were being treated for smear positive pulmonary TB in 70 randomly selected district clinics in Vietnam. Cases were patients with recurrent TB, identified by follow-up from the parent cohort study. Controls were selected from the cohort by random sampling. Information on demographic, clinical and disease-related characteristics was obtained by interview. information was extracted from clinic registries. Logistic regression, with stepwise selection, was used to develop a fully adjusted model for the odds of recurrence of TB. We recruited 10,964 patients between October 2010 and July 2013. Median follow-up was 988 days. At the end of follow-up, 505 patients (4.7%) with recurrence were identified as cases and 630 other patients were randomly selected as controls. Predictors of recurrence included multidrug-resistant (MDR)-TB (adjusted odds ratio 79.6; 95% CI: 25.1-252.0), self-reported prior TB therapy (aOR=2.5; 95% CI: 1.7-3.5), and incomplete adherence (aOR=1.9; 95% CI 1.1-3.1). Index disease treatment history is a leading determinant of relapse among patients with TB in Vietnam. Further research is required to identify interventions that will reduce the risk of recurrent disease and enhance its early detection within high-risk populations. Copyright © 2018. Published by Elsevier Ltd.
Warris, Sven; Boymans, Sander; Muiser, Iwe; Noback, Michiel; Krijnen, Wim; Nap, Jan-Peter
2014-01-13
Small RNAs are important regulators of genome function, yet their prediction in genomes is still a major computational challenge. Statistical analyses of pre-miRNA sequences indicated that their 2D structure tends to have a minimal free energy (MFE) significantly lower than MFE values of equivalently randomized sequences with the same nucleotide composition, in contrast to other classes of non-coding RNA. The computation of many MFEs is, however, too intensive to allow for genome-wide screenings. Using a local grid infrastructure, MFE distributions of random sequences were pre-calculated on a large scale. These distributions follow a normal distribution and can be used to determine the MFE distribution for any given sequence composition by interpolation. It allows on-the-fly calculation of the normal distribution for any candidate sequence composition. The speedup achieved makes genome-wide screening with this characteristic of a pre-miRNA sequence practical. Although this particular property alone will not be able to distinguish miRNAs from other sequences sufficiently discriminative, the MFE-based P-value should be added to the parameters of choice to be included in the selection of potential miRNA candidates for experimental verification.
Kilicoglu, Halil; Shin, Dongwook; Rindflesch, Thomas C.
2014-01-01
Gene regulatory networks are a crucial aspect of systems biology in describing molecular mechanisms of the cell. Various computational models rely on random gene selection to infer such networks from microarray data. While incorporation of prior knowledge into data analysis has been deemed important, in practice, it has generally been limited to referencing genes in probe sets and using curated knowledge bases. We investigate the impact of augmenting microarray data with semantic relations automatically extracted from the literature, with the view that relations encoding gene/protein interactions eliminate the need for random selection of components in non-exhaustive approaches, producing a more accurate model of cellular behavior. A genetic algorithm is then used to optimize the strength of interactions using microarray data and an artificial neural network fitness function. The result is a directed and weighted network providing the individual contribution of each gene to its target. For testing, we used invasive ductile carcinoma of the breast to query the literature and a microarray set containing gene expression changes in these cells over several time points. Our model demonstrates significantly better fitness than the state-of-the-art model, which relies on an initial random selection of genes. Comparison to the component pathways of the KEGG Pathways in Cancer map reveals that the resulting networks contain both known and novel relationships. The p53 pathway results were manually validated in the literature. 60% of non-KEGG relationships were supported (74% for highly weighted interactions). The method was then applied to yeast data and our model again outperformed the comparison model. Our results demonstrate the advantage of combining gene interactions extracted from the literature in the form of semantic relations with microarray analysis in generating contribution-weighted gene regulatory networks. This methodology can make a significant contribution to understanding the complex interactions involved in cellular behavior and molecular physiology. PMID:24921649
Chen, Guocai; Cairelli, Michael J; Kilicoglu, Halil; Shin, Dongwook; Rindflesch, Thomas C
2014-06-01
Gene regulatory networks are a crucial aspect of systems biology in describing molecular mechanisms of the cell. Various computational models rely on random gene selection to infer such networks from microarray data. While incorporation of prior knowledge into data analysis has been deemed important, in practice, it has generally been limited to referencing genes in probe sets and using curated knowledge bases. We investigate the impact of augmenting microarray data with semantic relations automatically extracted from the literature, with the view that relations encoding gene/protein interactions eliminate the need for random selection of components in non-exhaustive approaches, producing a more accurate model of cellular behavior. A genetic algorithm is then used to optimize the strength of interactions using microarray data and an artificial neural network fitness function. The result is a directed and weighted network providing the individual contribution of each gene to its target. For testing, we used invasive ductile carcinoma of the breast to query the literature and a microarray set containing gene expression changes in these cells over several time points. Our model demonstrates significantly better fitness than the state-of-the-art model, which relies on an initial random selection of genes. Comparison to the component pathways of the KEGG Pathways in Cancer map reveals that the resulting networks contain both known and novel relationships. The p53 pathway results were manually validated in the literature. 60% of non-KEGG relationships were supported (74% for highly weighted interactions). The method was then applied to yeast data and our model again outperformed the comparison model. Our results demonstrate the advantage of combining gene interactions extracted from the literature in the form of semantic relations with microarray analysis in generating contribution-weighted gene regulatory networks. This methodology can make a significant contribution to understanding the complex interactions involved in cellular behavior and molecular physiology.
González-Recio, O; Jiménez-Montero, J A; Alenda, R
2013-01-01
In the next few years, with the advent of high-density single nucleotide polymorphism (SNP) arrays and genome sequencing, genomic evaluation methods will need to deal with a large number of genetic variants and an increasing sample size. The boosting algorithm is a machine-learning technique that may alleviate the drawbacks of dealing with such large data sets. This algorithm combines different predictors in a sequential manner with some shrinkage on them; each predictor is applied consecutively to the residuals from the committee formed by the previous ones to form a final prediction based on a subset of covariates. Here, a detailed description is provided and examples using a toy data set are included. A modification of the algorithm called "random boosting" was proposed to increase predictive ability and decrease computation time of genome-assisted evaluation in large data sets. Random boosting uses a random selection of markers to add a subsequent weak learner to the predictive model. These modifications were applied to a real data set composed of 1,797 bulls genotyped for 39,714 SNP. Deregressed proofs of 4 yield traits and 1 type trait from January 2009 routine evaluations were used as dependent variables. A 2-fold cross-validation scenario was implemented. Sires born before 2005 were used as a training sample (1,576 and 1,562 for production and type traits, respectively), whereas younger sires were used as a testing sample to evaluate predictive ability of the algorithm on yet-to-be-observed phenotypes. Comparison with the original algorithm was provided. The predictive ability of the algorithm was measured as Pearson correlations between observed and predicted responses. Further, estimated bias was computed as the average difference between observed and predicted phenotypes. The results showed that the modification of the original boosting algorithm could be run in 1% of the time used with the original algorithm and with negligible differences in accuracy and bias. This modification may be used to speed the calculus of genome-assisted evaluation in large data sets such us those obtained from consortiums. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Automatic learning-based beam angle selection for thoracic IMRT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Amit, Guy; Marshall, Andrea; Purdie, Thomas G., E-mail: tom.purdie@rmp.uhn.ca
Purpose: The treatment of thoracic cancer using external beam radiation requires an optimal selection of the radiation beam directions to ensure effective coverage of the target volume and to avoid unnecessary treatment of normal healthy tissues. Intensity modulated radiation therapy (IMRT) planning is a lengthy process, which requires the planner to iterate between choosing beam angles, specifying dose–volume objectives and executing IMRT optimization. In thorax treatment planning, where there are no class solutions for beam placement, beam angle selection is performed manually, based on the planner’s clinical experience. The purpose of this work is to propose and study a computationallymore » efficient framework that utilizes machine learning to automatically select treatment beam angles. Such a framework may be helpful for reducing the overall planning workload. Methods: The authors introduce an automated beam selection method, based on learning the relationships between beam angles and anatomical features. Using a large set of clinically approved IMRT plans, a random forest regression algorithm is trained to map a multitude of anatomical features into an individual beam score. An optimization scheme is then built to select and adjust the beam angles, considering the learned interbeam dependencies. The validity and quality of the automatically selected beams evaluated using the manually selected beams from the corresponding clinical plans as the ground truth. Results: The analysis included 149 clinically approved thoracic IMRT plans. For a randomly selected test subset of 27 plans, IMRT plans were generated using automatically selected beams and compared to the clinical plans. The comparison of the predicted and the clinical beam angles demonstrated a good average correspondence between the two (angular distance 16.8° ± 10°, correlation 0.75 ± 0.2). The dose distributions of the semiautomatic and clinical plans were equivalent in terms of primary target volume coverage and organ at risk sparing and were superior over plans produced with fixed sets of common beam angles. The great majority of the automatic plans (93%) were approved as clinically acceptable by three radiation therapy specialists. Conclusions: The results demonstrated the feasibility of utilizing a learning-based approach for automatic selection of beam angles in thoracic IMRT planning. The proposed method may assist in reducing the manual planning workload, while sustaining plan quality.« less
NASA Technical Reports Server (NTRS)
Tumer, Kagan; Oza, Nikunj C.; Clancy, Daniel (Technical Monitor)
2001-01-01
Using an ensemble of classifiers instead of a single classifier has been shown to improve generalization performance in many pattern recognition problems. However, the extent of such improvement depends greatly on the amount of correlation among the errors of the base classifiers. Therefore, reducing those correlations while keeping the classifiers' performance levels high is an important area of research. In this article, we explore input decimation (ID), a method which selects feature subsets for their ability to discriminate among the classes and uses them to decouple the base classifiers. We provide a summary of the theoretical benefits of correlation reduction, along with results of our method on two underwater sonar data sets, three benchmarks from the Probenl/UCI repositories, and two synthetic data sets. The results indicate that input decimated ensembles (IDEs) outperform ensembles whose base classifiers use all the input features; randomly selected subsets of features; and features created using principal components analysis, on a wide range of domains.
Serotonin Augmentation Reduces Response to Attack in Aggressive Individuals
Berman, Mitchell E.; McCloskey, Michael S.; Fanning, Jennifer R.; Schumacher, Julie A.; Coccaro, Emil F.
2009-01-01
We tested the theory that central serotonin (5- hydroxytryptamine, or 5-HT) activity regulates aggression by modulating response to provocation. Eighty men and women (40 with and 40 without a history of aggression) were randomly assigned to receive either 40 mg of paroxetine (to acutely augment serotonergic activity) or a placebo, administered using double-blind procedures. Aggression was assessed during a competitive reaction time game with a fictitious opponent. Shocks were selected by the participant and opponent before each trial, with the loser on each trial receiving the shock set by the other player. Provocation was manipulated by having the opponent select increasingly intense shocks for the participant and eventually an ostensibly severe shock toward the end of the trials. Aggression was measured by the number of severe shocks set by the participant for the opponent. As predicted, aggressive responding after provocation was attenuated by augmentation of serotonin in individuals with a pronounced history of aggression. PMID:19422623
[Mokken scaling of the Cognitive Screening Test].
Diesfeldt, H F A
2009-10-01
The Cognitive Screening Test (CST) is a twenty-item orientation questionnaire in Dutch, that is commonly used to evaluate cognitive impairment. This study applied Mokken Scale Analysis, a non-parametric set of techniques derived from item response theory (IRT), to CST-data of 466 consecutive participants in psychogeriatric day care. The full item set and the standard short version of fourteen items both met the assumptions of the monotone homogeneity model, with scalability coefficient H = 0.39, which is considered weak. In order to select items that would fulfil the assumption of invariant item ordering or the double monotonicity model, the subjects were randomly partitioned into a training set (50% of the sample) and a test set (the remaining half). By means of an automated item selection eleven items were found to measure one latent trait, with H = 0.67 and item H coefficients larger than 0.51. Cross-validation of the item analysis in the remaining half of the subjects gave comparable values (H = 0.66; item H coefficients larger than 0.56). The selected items involve year, place of residence, birth date, the monarch's and prime minister's names, and their predecessors. Applying optimal discriminant analysis (ODA) it was found that the full set of twenty CST items performed best in distinguishing two predefined groups of patients of lower or higher cognitive ability, as established by an independent criterion derived from the Amsterdam Dementia Screening Test. The chance corrected predictive value or prognostic utility was 47.5% for the full item set, 45.2% for the fourteen items of the standard short version of the CST, and 46.1% for the homogeneous, unidimensional set of selected eleven items. The results of the item analysis support the application of the CST in cognitive assessment, and revealed a more reliable 'short' version of the CST than the standard short version (CST14).
The influence of negative training set size on machine learning-based virtual screening.
Kurczab, Rafał; Smusz, Sabina; Bojarski, Andrzej J
2014-01-01
The paper presents a thorough analysis of the influence of the number of negative training examples on the performance of machine learning methods. The impact of this rather neglected aspect of machine learning methods application was examined for sets containing a fixed number of positive and a varying number of negative examples randomly selected from the ZINC database. An increase in the ratio of positive to negative training instances was found to greatly influence most of the investigated evaluating parameters of ML methods in simulated virtual screening experiments. In a majority of cases, substantial increases in precision and MCC were observed in conjunction with some decreases in hit recall. The analysis of dynamics of those variations let us recommend an optimal composition of training data. The study was performed on several protein targets, 5 machine learning algorithms (SMO, Naïve Bayes, Ibk, J48 and Random Forest) and 2 types of molecular fingerprints (MACCS and CDK FP). The most effective classification was provided by the combination of CDK FP with SMO or Random Forest algorithms. The Naïve Bayes models appeared to be hardly sensitive to changes in the number of negative instances in the training set. In conclusion, the ratio of positive to negative training instances should be taken into account during the preparation of machine learning experiments, as it might significantly influence the performance of particular classifier. What is more, the optimization of negative training set size can be applied as a boosting-like approach in machine learning-based virtual screening.
The influence of negative training set size on machine learning-based virtual screening
2014-01-01
Background The paper presents a thorough analysis of the influence of the number of negative training examples on the performance of machine learning methods. Results The impact of this rather neglected aspect of machine learning methods application was examined for sets containing a fixed number of positive and a varying number of negative examples randomly selected from the ZINC database. An increase in the ratio of positive to negative training instances was found to greatly influence most of the investigated evaluating parameters of ML methods in simulated virtual screening experiments. In a majority of cases, substantial increases in precision and MCC were observed in conjunction with some decreases in hit recall. The analysis of dynamics of those variations let us recommend an optimal composition of training data. The study was performed on several protein targets, 5 machine learning algorithms (SMO, Naïve Bayes, Ibk, J48 and Random Forest) and 2 types of molecular fingerprints (MACCS and CDK FP). The most effective classification was provided by the combination of CDK FP with SMO or Random Forest algorithms. The Naïve Bayes models appeared to be hardly sensitive to changes in the number of negative instances in the training set. Conclusions In conclusion, the ratio of positive to negative training instances should be taken into account during the preparation of machine learning experiments, as it might significantly influence the performance of particular classifier. What is more, the optimization of negative training set size can be applied as a boosting-like approach in machine learning-based virtual screening. PMID:24976867
A computational approach to compare regression modelling strategies in prediction research.
Pajouheshnia, Romin; Pestman, Wiebe R; Teerenstra, Steven; Groenwold, Rolf H H
2016-08-25
It is often unclear which approach to fit, assess and adjust a model will yield the most accurate prediction model. We present an extension of an approach for comparing modelling strategies in linear regression to the setting of logistic regression and demonstrate its application in clinical prediction research. A framework for comparing logistic regression modelling strategies by their likelihoods was formulated using a wrapper approach. Five different strategies for modelling, including simple shrinkage methods, were compared in four empirical data sets to illustrate the concept of a priori strategy comparison. Simulations were performed in both randomly generated data and empirical data to investigate the influence of data characteristics on strategy performance. We applied the comparison framework in a case study setting. Optimal strategies were selected based on the results of a priori comparisons in a clinical data set and the performance of models built according to each strategy was assessed using the Brier score and calibration plots. The performance of modelling strategies was highly dependent on the characteristics of the development data in both linear and logistic regression settings. A priori comparisons in four empirical data sets found that no strategy consistently outperformed the others. The percentage of times that a model adjustment strategy outperformed a logistic model ranged from 3.9 to 94.9 %, depending on the strategy and data set. However, in our case study setting the a priori selection of optimal methods did not result in detectable improvement in model performance when assessed in an external data set. The performance of prediction modelling strategies is a data-dependent process and can be highly variable between data sets within the same clinical domain. A priori strategy comparison can be used to determine an optimal logistic regression modelling strategy for a given data set before selecting a final modelling approach.
Use of simulation to compare the performance of minimization with stratified blocked randomization.
Toorawa, Robert; Adena, Michael; Donovan, Mark; Jones, Steve; Conlon, John
2009-01-01
Minimization is an alternative method to stratified permuted block randomization, which may be more effective at balancing treatments when there are many strata. However, its use in the regulatory setting for industry trials remains controversial, primarily due to the difficulty in interpreting conventional asymptotic statistical tests under restricted methods of treatment allocation. We argue that the use of minimization should be critically evaluated when designing the study for which it is proposed. We demonstrate by example how simulation can be used to investigate whether minimization improves treatment balance compared with stratified randomization, and how much randomness can be incorporated into the minimization before any balance advantage is no longer retained. We also illustrate by example how the performance of the traditional model-based analysis can be assessed, by comparing the nominal test size with the observed test size over a large number of simulations. We recommend that the assignment probability for the minimization be selected using such simulations. Copyright (c) 2008 John Wiley & Sons, Ltd.
Artificial neural networks modelling the prednisolone nanoprecipitation in microfluidic reactors.
Ali, Hany S M; Blagden, Nicholas; York, Peter; Amani, Amir; Brook, Toni
2009-06-28
This study employs artificial neural networks (ANNs) to create a model to identify relationships between variables affecting drug nanoprecipitation using microfluidic reactors. The input variables examined were saturation levels of prednisolone, solvent and antisolvent flow rates, microreactor inlet angles and internal diameters, while particle size was the single output. ANNs software was used to analyse a set of data obtained by random selection of the variables. The developed model was then assessed using a separate set of validation data and provided good agreement with the observed results. The antisolvent flow rate was found to have the dominant role on determining final particle size.
Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry.
Chowdhury, Alok Kumar; Tjondronegoro, Dian; Chandran, Vinod; Trost, Stewart G
2017-09-01
To investigate whether the use of ensemble learning algorithms improve physical activity recognition accuracy compared to the single classifier algorithms, and to compare the classification accuracy achieved by three conventional ensemble machine learning methods (bagging, boosting, random forest) and a custom ensemble model comprising four algorithms commonly used for activity recognition (binary decision tree, k nearest neighbor, support vector machine, and neural network). The study used three independent data sets that included wrist-worn accelerometer data. For each data set, a four-step classification framework consisting of data preprocessing, feature extraction, normalization and feature selection, and classifier training and testing was implemented. For the custom ensemble, decisions from the single classifiers were aggregated using three decision fusion methods: weighted majority vote, naïve Bayes combination, and behavior knowledge space combination. Classifiers were cross-validated using leave-one subject out cross-validation and compared on the basis of average F1 scores. In all three data sets, ensemble learning methods consistently outperformed the individual classifiers. Among the conventional ensemble methods, random forest models provided consistently high activity recognition; however, the custom ensemble model using weighted majority voting demonstrated the highest classification accuracy in two of the three data sets. Combining multiple individual classifiers using conventional or custom ensemble learning methods can improve activity recognition accuracy from wrist-worn accelerometer data.
A closer look at cross-validation for assessing the accuracy of gene regulatory networks and models.
Tabe-Bordbar, Shayan; Emad, Amin; Zhao, Sihai Dave; Sinha, Saurabh
2018-04-26
Cross-validation (CV) is a technique to assess the generalizability of a model to unseen data. This technique relies on assumptions that may not be satisfied when studying genomics datasets. For example, random CV (RCV) assumes that a randomly selected set of samples, the test set, well represents unseen data. This assumption doesn't hold true where samples are obtained from different experimental conditions, and the goal is to learn regulatory relationships among the genes that generalize beyond the observed conditions. In this study, we investigated how the CV procedure affects the assessment of supervised learning methods used to learn gene regulatory networks (or in other applications). We compared the performance of a regression-based method for gene expression prediction estimated using RCV with that estimated using a clustering-based CV (CCV) procedure. Our analysis illustrates that RCV can produce over-optimistic estimates of the model's generalizability compared to CCV. Next, we defined the 'distinctness' of test set from training set and showed that this measure is predictive of performance of the regression method. Finally, we introduced a simulated annealing method to construct partitions with gradually increasing distinctness and showed that performance of different gene expression prediction methods can be better evaluated using this method.
NASA Astrophysics Data System (ADS)
Liu, Zhangjun; Liu, Zenghui
2018-06-01
This paper develops a hybrid approach of spectral representation and random function for simulating stationary stochastic vector processes. In the proposed approach, the high-dimensional random variables, included in the original spectral representation (OSR) formula, could be effectively reduced to only two elementary random variables by introducing the random functions that serve as random constraints. Based on this, a satisfactory simulation accuracy can be guaranteed by selecting a small representative point set of the elementary random variables. The probability information of the stochastic excitations can be fully emerged through just several hundred of sample functions generated by the proposed approach. Therefore, combined with the probability density evolution method (PDEM), it could be able to implement dynamic response analysis and reliability assessment of engineering structures. For illustrative purposes, a stochastic turbulence wind velocity field acting on a frame-shear-wall structure is simulated by constructing three types of random functions to demonstrate the accuracy and efficiency of the proposed approach. Careful and in-depth studies concerning the probability density evolution analysis of the wind-induced structure have been conducted so as to better illustrate the application prospects of the proposed approach. Numerical examples also show that the proposed approach possesses a good robustness.
Random covering of the circle: the configuration-space of the free deposition process
NASA Astrophysics Data System (ADS)
Huillet, Thierry
2003-12-01
Consider a circle of circumference 1. Throw at random n points, sequentially, on this circle and append clockwise an arc (or rod) of length s to each such point. The resulting random set (the free gas of rods) is a collection of a random number of clusters with random sizes. It models a free deposition process on a 1D substrate. For such processes, we shall consider the occurrence times (number of rods) and probabilities, as n grows, of the following configurations: those avoiding rod overlap (the hard-rod gas), those for which the largest gap is smaller than rod length s (the packing gas), those (parking configurations) for which hard rod and packing constraints are both fulfilled and covering configurations. Special attention is paid to the statistical properties of each such (rare) configuration in the asymptotic density domain when ns = rgr, for some finite density rgr of points. Using results from spacings in the random division of the circle, explicit large deviation rate functions can be computed in each case from state equations. Lastly, a process consisting in selecting at random one of these specific equilibrium configurations (called the observable) can be modelled. When particularized to the parking model, this system produces parking configurations differently from Rényi's random sequential adsorption model.
ERIC Educational Resources Information Center
Huynh, Huynh
By noting that a Rasch or two parameter logistic (2PL) item belongs to the exponential family of random variables and that the probability density function (pdf) of the correct response (X=1) and the incorrect response (X=0) are symmetric with respect to the vertical line at the item location, it is shown that the conjugate prior for ability is…
Service-Oriented Node Scheduling Scheme for Wireless Sensor Networks Using Markov Random Field Model
Cheng, Hongju; Su, Zhihuang; Lloret, Jaime; Chen, Guolong
2014-01-01
Future wireless sensor networks are expected to provide various sensing services and energy efficiency is one of the most important criterions. The node scheduling strategy aims to increase network lifetime by selecting a set of sensor nodes to provide the required sensing services in a periodic manner. In this paper, we are concerned with the service-oriented node scheduling problem to provide multiple sensing services while maximizing the network lifetime. We firstly introduce how to model the data correlation for different services by using Markov Random Field (MRF) model. Secondly, we formulate the service-oriented node scheduling issue into three different problems, namely, the multi-service data denoising problem which aims at minimizing the noise level of sensed data, the representative node selection problem concerning with selecting a number of active nodes while determining the services they provide, and the multi-service node scheduling problem which aims at maximizing the network lifetime. Thirdly, we propose a Multi-service Data Denoising (MDD) algorithm, a novel multi-service Representative node Selection and service Determination (RSD) algorithm, and a novel MRF-based Multi-service Node Scheduling (MMNS) scheme to solve the above three problems respectively. Finally, extensive experiments demonstrate that the proposed scheme efficiently extends the network lifetime. PMID:25384005
T-wave end detection using neural networks and Support Vector Machines.
Suárez-León, Alexander Alexeis; Varon, Carolina; Willems, Rik; Van Huffel, Sabine; Vázquez-Seisdedos, Carlos Román
2018-05-01
In this paper we propose a new approach for detecting the end of the T-wave in the electrocardiogram (ECG) using Neural Networks and Support Vector Machines. Both, Multilayer Perceptron (MLP) neural networks and Fixed-Size Least-Squares Support Vector Machines (FS-LSSVM) were used as regression algorithms to determine the end of the T-wave. Different strategies for selecting the training set such as random selection, k-means, robust clustering and maximum quadratic (Rényi) entropy were evaluated. Individual parameters were tuned for each method during training and the results are given for the evaluation set. A comparison between MLP and FS-LSSVM approaches was performed. Finally, a fair comparison of the FS-LSSVM method with other state-of-the-art algorithms for detecting the end of the T-wave was included. The experimental results show that FS-LSSVM approaches are more suitable as regression algorithms than MLP neural networks. Despite the small training sets used, the FS-LSSVM methods outperformed the state-of-the-art techniques. FS-LSSVM can be successfully used as a T-wave end detection algorithm in ECG even with small training set sizes. Copyright © 2018 Elsevier Ltd. All rights reserved.
Defining fitness in an uncertain world.
Crewe, Paul; Gratwick, Richard; Grafen, Alan
2018-04-01
The recently elucidated definition of fitness employed by Fisher in his fundamental theorem of natural selection is combined with reproductive values as appropriately defined in the context of both random environments and continuing fluctuations in the distribution over classes in a class-structured population. We obtain astonishingly simple results, generalisations of the Price Equation and the fundamental theorem, that show natural selection acting only through the arithmetic expectation of fitness over all uncertainties, in contrast to previous studies with fluctuating demography, in which natural selection looks rather complicated. Furthermore, our setting permits each class to have its characteristic ploidy, thus covering haploidy, diploidy and haplodiploidy at the same time; and allows arbitrary classes, including continuous variables such as condition. The simplicity is achieved by focussing just on the effects of natural selection on genotype frequencies: while other causes are present in the model, and the effect of natural selection is assessed in their presence, these causes will have their own further effects on genoytpe frequencies that are not assessed here. Also, Fisher's uses of reproductive value are shown to have two ambivalences, and a new axiomatic foundation for reproductive value is endorsed. The results continue the formal darwinism project, and extend support for the individual-as-maximising-agent analogy to finite populations with random environments and fluctuating class-distributions. The model may also lead to improved ways to measure fitness in real populations.
Komócsi, András; Aradi, Dániel; Kehl, Dániel; Ungi, Imre; Thury, Attila; Pintér, Tünde; Di Nicolantonio, James J.; Tornyos, Adrienn
2014-01-01
Introduction Superior outcomes with transradial (TRPCI) versus transfemoral coronary intervention (TFPCI) in the setting of acute ST-segment elevation myocardial infarction (STEMI) have been suggested by earlier studies. However, this effect was not evident in randomized controlled trials (RCTs), suggesting a possible allocation bias in observational studies. Since important studies with heterogeneous results regarding mortality have been published recently, we aimed to perform an updated review and meta-analysis on the safety and efficacy of TRPCI compared to TFPCI in the setting of STEMI. Material and methods Electronic databases were searched for relevant studies from January 1993 to November 2012. Outcome parameters of RCTs were pooled with the DerSimonian-Laird random-effects model. Results Twelve RCTs involving 5,124 patients were identified. According to the pooled analysis, TRPCI was associated with a significant reduction in major bleeding (odds ratio (OR): 0.52 (95% confidence interval (CI) 0.38–0.71, p < 0.0001)). The risk of mortality and major adverse events was significantly lower after TRPCI (OR = 0.58 (95% CI: 0.43–0.79), p = 0.0005 and OR = 0.67 (95% CI: 0.52–0.86), p = 0.002 respectively). Conclusions Robust data from randomized clinical studies indicate that TRPCI reduces both ischemic and bleeding complications in STEMI. These findings support the preferential use of radial access for primary PCI. PMID:24904651
Komócsi, András; Aradi, Dániel; Kehl, Dániel; Ungi, Imre; Thury, Attila; Pintér, Tünde; Di Nicolantonio, James J; Tornyos, Adrienn; Vorobcsuk, András
2014-05-12
Superior outcomes with transradial (TRPCI) versus transfemoral coronary intervention (TFPCI) in the setting of acute ST-segment elevation myocardial infarction (STEMI) have been suggested by earlier studies. However, this effect was not evident in randomized controlled trials (RCTs), suggesting a possible allocation bias in observational studies. Since important studies with heterogeneous results regarding mortality have been published recently, we aimed to perform an updated review and meta-analysis on the safety and efficacy of TRPCI compared to TFPCI in the setting of STEMI. Electronic databases were searched for relevant studies from January 1993 to November 2012. Outcome parameters of RCTs were pooled with the DerSimonian-Laird random-effects model. Twelve RCTs involving 5,124 patients were identified. According to the pooled analysis, TRPCI was associated with a significant reduction in major bleeding (odds ratio (OR): 0.52 (95% confidence interval (CI) 0.38-0.71, p < 0.0001)). The risk of mortality and major adverse events was significantly lower after TRPCI (OR = 0.58 (95% CI: 0.43-0.79), p = 0.0005 and OR = 0.67 (95% CI: 0.52-0.86), p = 0.002 respectively). Robust data from randomized clinical studies indicate that TRPCI reduces both ischemic and bleeding complications in STEMI. These findings support the preferential use of radial access for primary PCI.
Learning accurate and interpretable models based on regularized random forests regression
2014-01-01
Background Many biology related research works combine data from multiple sources in an effort to understand the underlying problems. It is important to find and interpret the most important information from these sources. Thus it will be beneficial to have an effective algorithm that can simultaneously extract decision rules and select critical features for good interpretation while preserving the prediction performance. Methods In this study, we focus on regression problems for biological data where target outcomes are continuous. In general, models constructed from linear regression approaches are relatively easy to interpret. However, many practical biological applications are nonlinear in essence where we can hardly find a direct linear relationship between input and output. Nonlinear regression techniques can reveal nonlinear relationship of data, but are generally hard for human to interpret. We propose a rule based regression algorithm that uses 1-norm regularized random forests. The proposed approach simultaneously extracts a small number of rules from generated random forests and eliminates unimportant features. Results We tested the approach on some biological data sets. The proposed approach is able to construct a significantly smaller set of regression rules using a subset of attributes while achieving prediction performance comparable to that of random forests regression. Conclusion It demonstrates high potential in aiding prediction and interpretation of nonlinear relationships of the subject being studied. PMID:25350120
NASA Astrophysics Data System (ADS)
Deng, Chengbin; Wu, Changshan
2013-12-01
Urban impervious surface information is essential for urban and environmental applications at the regional/national scales. As a popular image processing technique, spectral mixture analysis (SMA) has rarely been applied to coarse-resolution imagery due to the difficulty of deriving endmember spectra using traditional endmember selection methods, particularly within heterogeneous urban environments. To address this problem, we derived endmember signatures through a least squares solution (LSS) technique with known abundances of sample pixels, and integrated these endmember signatures into SMA for mapping large-scale impervious surface fraction. In addition, with the same sample set, we carried out objective comparative analyses among SMA (i.e. fully constrained and unconstrained SMA) and machine learning (i.e. Cubist regression tree and Random Forests) techniques. Analysis of results suggests three major conclusions. First, with the extrapolated endmember spectra from stratified random training samples, the SMA approaches performed relatively well, as indicated by small MAE values. Second, Random Forests yields more reliable results than Cubist regression tree, and its accuracy is improved with increased sample sizes. Finally, comparative analyses suggest a tentative guide for selecting an optimal approach for large-scale fractional imperviousness estimation: unconstrained SMA might be a favorable option with a small number of samples, while Random Forests might be preferred if a large number of samples are available.
Johnson, Jason K.; Oyen, Diane Adele; Chertkov, Michael; ...
2016-12-01
Inference and learning of graphical models are both well-studied problems in statistics and machine learning that have found many applications in science and engineering. However, exact inference is intractable in general graphical models, which suggests the problem of seeking the best approximation to a collection of random variables within some tractable family of graphical models. In this paper, we focus on the class of planar Ising models, for which exact inference is tractable using techniques of statistical physics. Based on these techniques and recent methods for planarity testing and planar embedding, we propose a greedy algorithm for learning the bestmore » planar Ising model to approximate an arbitrary collection of binary random variables (possibly from sample data). Given the set of all pairwise correlations among variables, we select a planar graph and optimal planar Ising model defined on this graph to best approximate that set of correlations. Finally, we demonstrate our method in simulations and for two applications: modeling senate voting records and identifying geo-chemical depth trends from Mars rover data.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Johnson, Jason K.; Oyen, Diane Adele; Chertkov, Michael
Inference and learning of graphical models are both well-studied problems in statistics and machine learning that have found many applications in science and engineering. However, exact inference is intractable in general graphical models, which suggests the problem of seeking the best approximation to a collection of random variables within some tractable family of graphical models. In this paper, we focus on the class of planar Ising models, for which exact inference is tractable using techniques of statistical physics. Based on these techniques and recent methods for planarity testing and planar embedding, we propose a greedy algorithm for learning the bestmore » planar Ising model to approximate an arbitrary collection of binary random variables (possibly from sample data). Given the set of all pairwise correlations among variables, we select a planar graph and optimal planar Ising model defined on this graph to best approximate that set of correlations. Finally, we demonstrate our method in simulations and for two applications: modeling senate voting records and identifying geo-chemical depth trends from Mars rover data.« less
Zhai, Fuhua; Raver, C. Cybele; Jones, Stephanie M.
2012-01-01
The role of subsequent school contexts in the long-term effects of early childhood interventions has received increasing attention, but has been understudied in the literature. Using data from the Chicago School Readiness Project (CSRP), a cluster-randomized controlled trial conducted in Head Start programs, we investigate whether the intervention had differential effects on academic and behavioral outcomes in kindergarten if children attended high- or low-performing schools subsequent to the preschool intervention year. To address the issue of selection bias, we adopt an innovative method, principal score matching, and control for a set of child, mother, and classroom covariates. We find that exposure to the CSRP intervention in the Head Start year had significant effects on academic and behavioral outcomes in kindergarten for children who subsequently attended high-performing schools, but no significant effects on children attending low-performing schools. Policy implications of the findings are discussed. PMID:22773872
Tucker, Jalie A.; Reed, Geoffrey M.
2008-01-01
This paper examines the utility of evidentiary pluralism, a research strategy that selects methods in service of content questions, in the context of rehabilitation psychology. Hierarchical views that favor randomized controlled clinical trials (RCTs) over other evidence are discussed, and RCTs are considered as they intersect with issues in the field. RCTs are vital for establishing treatment efficacy, but whether they are uniformly the best evidence to inform practice is critically evaluated. We argue that because treatment is only one of several variables that influence functioning, disability, and participation over time, an expanded set of conceptual and data analytic approaches should be selected in an informed way to support an expanded research agenda that investigates therapeutic and extra-therapeutic influences on rehabilitation processes and outcomes. The benefits of evidentiary pluralism are considered, including helping close the gap between the narrower clinical rehabilitation model and a public health disability model. KEY WORDS: evidence-based practice, evidentiary pluralism, rehabilitation psychology, randomized controlled trials PMID:19649150
Chen, Henry W; Du, Jingcheng; Song, Hsing-Yi; Liu, Xiangyu; Jiang, Guoqian
2018-01-01
Background Today, there is an increasing need to centralize and standardize electronic health data within clinical research as the volume of data continues to balloon. Domain-specific common data elements (CDEs) are emerging as a standard approach to clinical research data capturing and reporting. Recent efforts to standardize clinical study CDEs have been of great benefit in facilitating data integration and data sharing. The importance of the temporal dimension of clinical research studies has been well recognized; however, very few studies have focused on the formal representation of temporal constraints and temporal relationships within clinical research data in the biomedical research community. In particular, temporal information can be extremely powerful to enable high-quality cancer research. Objective The objective of the study was to develop and evaluate an ontological approach to represent the temporal aspects of cancer study CDEs. Methods We used CDEs recorded in the National Cancer Institute (NCI) Cancer Data Standards Repository (caDSR) and created a CDE parser to extract time-relevant CDEs from the caDSR. Using the Web Ontology Language (OWL)–based Time Event Ontology (TEO), we manually derived representative patterns to semantically model the temporal components of the CDEs using an observing set of randomly selected time-related CDEs (n=600) to create a set of TEO ontological representation patterns. In evaluating TEO’s ability to represent the temporal components of the CDEs, this set of representation patterns was tested against two test sets of randomly selected time-related CDEs (n=425). Results It was found that 94.2% (801/850) of the CDEs in the test sets could be represented by the TEO representation patterns. Conclusions In conclusion, TEO is a good ontological model for representing the temporal components of the CDEs recorded in caDSR. Our representative model can harness the Semantic Web reasoning and inferencing functionalities and present a means for temporal CDEs to be machine-readable, streamlining meaningful searches. PMID:29472179
O'Boyle, Noel M; Palmer, David S; Nigsch, Florian; Mitchell, John Bo
2008-10-29
We present a novel feature selection algorithm, Winnowing Artificial Ant Colony (WAAC), that performs simultaneous feature selection and model parameter optimisation for the development of predictive quantitative structure-property relationship (QSPR) models. The WAAC algorithm is an extension of the modified ant colony algorithm of Shen et al. (J Chem Inf Model 2005, 45: 1024-1029). We test the ability of the algorithm to develop a predictive partial least squares model for the Karthikeyan dataset (J Chem Inf Model 2005, 45: 581-590) of melting point values. We also test its ability to perform feature selection on a support vector machine model for the same dataset. Starting from an initial set of 203 descriptors, the WAAC algorithm selected a PLS model with 68 descriptors which has an RMSE on an external test set of 46.6 degrees C and R2 of 0.51. The number of components chosen for the model was 49, which was close to optimal for this feature selection. The selected SVM model has 28 descriptors (cost of 5, epsilon of 0.21) and an RMSE of 45.1 degrees C and R2 of 0.54. This model outperforms a kNN model (RMSE of 48.3 degrees C, R2 of 0.47) for the same data and has similar performance to a Random Forest model (RMSE of 44.5 degrees C, R2 of 0.55). However it is much less prone to bias at the extremes of the range of melting points as shown by the slope of the line through the residuals: -0.43 for WAAC/SVM, -0.53 for Random Forest. With a careful choice of objective function, the WAAC algorithm can be used to optimise machine learning and regression models that suffer from overfitting. Where model parameters also need to be tuned, as is the case with support vector machine and partial least squares models, it can optimise these simultaneously. The moving probabilities used by the algorithm are easily interpreted in terms of the best and current models of the ants, and the winnowing procedure promotes the removal of irrelevant descriptors.
Kids'Cam: An Objective Methodology to Study the World in Which Children Live.
Signal, Louise N; Smith, Moira B; Barr, Michelle; Stanley, James; Chambers, Tim J; Zhou, Jiang; Duane, Aaron; Jenkin, Gabrielle L S; Pearson, Amber L; Gurrin, Cathal; Smeaton, Alan F; Hoek, Janet; Ni Mhurchu, Cliona
2017-09-01
This paper reports on a new methodology to objectively study the world in which children live. The primary research study (Kids'Cam Food Marketing) illustrates the method; numerous ancillary studies include exploration of children's exposure to alcohol, smoking, "blue" space and gambling, and their use of "green" space, transport, and sun protection. One hundred sixty-eight randomly selected children (aged 11-13 years) recruited from 16 randomly selected schools in Wellington, New Zealand used wearable cameras and GPS units for 4 days, recording imagery every 7 seconds and longitude/latitude locations every 5 seconds. Data were collected from July 2014 to June 2015. Analysis commenced in 2015 and is ongoing. Bespoke software was used to manually code images for variables of interest including setting, marketing media, and product category to produce variables for statistical analysis. GPS data were extracted and cleaned in ArcGIS, version 10.3 for exposure spatial analysis. Approximately 1.4 million images and 2.2 million GPS coordinates were generated (most were usable) from many settings including the difficult to measure aspects of exposures in the home, at school, and during leisure time. The method is ethical, legal, and acceptable to children and the wider community. This methodology enabled objective analysis of the world in which children live. The main arm examined the frequency and nature of children's exposure to food and beverage marketing and provided data on difficult to measure settings. The methodology will likely generate robust evidence facilitating more effective policymaking to address numerous public health concerns. Copyright © 2017. Published by Elsevier Inc.
Perianth organization and intra-specific floral variability.
Herrera, J; Arista, M; Ortiz, P L
2008-11-01
Floral symmetry and fusion of perianth parts are factors that contribute to fine-tune the match between flowers and their animal pollination vectors. In the present study, we investigated whether the possession of a sympetalous (fused) corolla and bilateral symmetry of flowers translate into decreased intra-specific variability as a result of natural stabilizing selection exerted by pollinators. Average size of the corolla and intra-specific variability were determined in two sets of southern Spanish entomophilous plant species. In the first set, taxa were paired by family to control for the effect of phylogeny (phylogenetically independent contrasts), whereas in the second set species were selected at random. Flower size data from a previous study (with different species) were also used to test the hypothesis that petal fusion contributes to decrease intra-specific variability. In the phylogenetically independent contrasts, floral symmetry was a significant correlate of intra-specific variation, with bilaterally symmetrical flowers showing more constancy than radially symmetrical flowers (i.e. unsophisticated from a functional perspective). As regards petal fusion, species with fused petals were on average more constant than choripetalous species, but the difference was not statistically significant. The reanalysis of data from a previous study yielded largely similar results, with a distinct effect of symmetry on variability, but no effect of petal fusion. The randomly-chosen species sample, on the other hand, failed to reveal any significant effect of either symmetry or petal fusion on intra-specific variation. The problem of low-statistical power in this kind of analysis, and the difficulty of testing an evolutionary hypothesis that involves phenotypic traits with a high degree of morphological correlation is discussed.
Fingerprint recognition of alien invasive weeds based on the texture character and machine learning
NASA Astrophysics Data System (ADS)
Yu, Jia-Jia; Li, Xiao-Li; He, Yong; Xu, Zheng-Hao
2008-11-01
Multi-spectral imaging technique based on texture analysis and machine learning was proposed to discriminate alien invasive weeds with similar outline but different categories. The objectives of this study were to investigate the feasibility of using Multi-spectral imaging, especially the near-infrared (NIR) channel (800 nm+/-10 nm) to find the weeds' fingerprints, and validate the performance with specific eigenvalues by co-occurrence matrix. Veronica polita Pries, Veronica persica Poir, longtube ground ivy, Laminum amplexicaule Linn. were selected in this study, which perform different effect in field, and are alien invasive species in China. 307 weed leaves' images were randomly selected for the calibration set, while the remaining 207 samples for the prediction set. All images were pretreated by Wallis filter to adjust the noise by uneven lighting. Gray level co-occurrence matrix was applied to extract the texture character, which shows density, randomness correlation, contrast and homogeneity of texture with different algorithms. Three channels (green channel by 550 nm+/-10 nm, red channel by 650 nm+/-10 nm and NIR channel by 800 nm+/-10 nm) were respectively calculated to get the eigenvalues.Least-squares support vector machines (LS-SVM) was applied to discriminate the categories of weeds by the eigenvalues from co-occurrence matrix. Finally, recognition ratio of 83.35% by NIR channel was obtained, better than the results by green channel (76.67%) and red channel (69.46%). The prediction results of 81.35% indicated that the selected eigenvalues reflected the main characteristics of weeds' fingerprint based on multi-spectral (especially by NIR channel) and LS-SVM model.
Sariyar, Murat; Hoffmann, Isabell; Binder, Harald
2014-02-26
Molecular data, e.g. arising from microarray technology, is often used for predicting survival probabilities of patients. For multivariate risk prediction models on such high-dimensional data, there are established techniques that combine parameter estimation and variable selection. One big challenge is to incorporate interactions into such prediction models. In this feasibility study, we present building blocks for evaluating and incorporating interactions terms in high-dimensional time-to-event settings, especially for settings in which it is computationally too expensive to check all possible interactions. We use a boosting technique for estimation of effects and the following building blocks for pre-selecting interactions: (1) resampling, (2) random forests and (3) orthogonalization as a data pre-processing step. In a simulation study, the strategy that uses all building blocks is able to detect true main effects and interactions with high sensitivity in different kinds of scenarios. The main challenge are interactions composed of variables that do not represent main effects, but our findings are also promising in this regard. Results on real world data illustrate that effect sizes of interactions frequently may not be large enough to improve prediction performance, even though the interactions are potentially of biological relevance. Screening interactions through random forests is feasible and useful, when one is interested in finding relevant two-way interactions. The other building blocks also contribute considerably to an enhanced pre-selection of interactions. We determined the limits of interaction detection in terms of necessary effect sizes. Our study emphasizes the importance of making full use of existing methods in addition to establishing new ones.
Improving the performance of minimizers and winnowing schemes.
Marçais, Guillaume; Pellow, David; Bork, Daniel; Orenstein, Yaron; Shamir, Ron; Kingsford, Carl
2017-07-15
The minimizers scheme is a method for selecting k -mers from sequences. It is used in many bioinformatics software tools to bin comparable sequences or to sample a sequence in a deterministic fashion at approximately regular intervals, in order to reduce memory consumption and processing time. Although very useful, the minimizers selection procedure has undesirable behaviors (e.g. too many k -mers are selected when processing certain sequences). Some of these problems were already known to the authors of the minimizers technique, and the natural lexicographic ordering of k -mers used by minimizers was recognized as their origin. Many software tools using minimizers employ ad hoc variations of the lexicographic order to alleviate those issues. We provide an in-depth analysis of the effect of k -mer ordering on the performance of the minimizers technique. By using small universal hitting sets (a recently defined concept), we show how to significantly improve the performance of minimizers and avoid some of its worse behaviors. Based on these results, we encourage bioinformatics software developers to use an ordering based on a universal hitting set or, if not possible, a randomized ordering, rather than the lexicographic order. This analysis also settles negatively a conjecture (by Schleimer et al. ) on the expected density of minimizers in a random sequence. The software used for this analysis is available on GitHub: https://github.com/gmarcais/minimizers.git . gmarcais@cs.cmu.edu or carlk@cs.cmu.edu. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
NASA Astrophysics Data System (ADS)
Li, Zhe; Feng, Jinchao; Liu, Pengyu; Sun, Zhonghua; Li, Gang; Jia, Kebin
2018-05-01
Temperature is usually considered as a fluctuation in near-infrared spectral measurement. Chemometric methods were extensively studied to correct the effect of temperature variations. However, temperature can be considered as a constructive parameter that provides detailed chemical information when systematically changed during the measurement. Our group has researched the relationship between temperature-induced spectral variation (TSVC) and normalized squared temperature. In this study, we focused on the influence of temperature distribution in calibration set. Multi-temperature calibration set selection (MTCS) method was proposed to improve the prediction accuracy by considering the temperature distribution of calibration samples. Furthermore, double-temperature calibration set selection (DTCS) method was proposed based on MTCS method and the relationship between TSVC and normalized squared temperature. We compare the prediction performance of PLS models based on random sampling method and proposed methods. The results from experimental studies showed that the prediction performance was improved by using proposed methods. Therefore, MTCS method and DTCS method will be the alternative methods to improve prediction accuracy in near-infrared spectral measurement.
Surgical treatment of secondary peritonitis : A continuing problem.
van Ruler, O; Boermeester, M A
2017-01-01
Secondary peritonitis remains associated with high mortality and morbidity rates. Treatment of secondary peritonitis is challenging even in modern medicine. Surgical intervention for source control remains the cornerstone of treatment, beside adequate antimicrobial therapy and resuscitation. A randomized clinical trial showed that relaparotomy on demand (ROD) after initial emergency surgery is the preferred treatment strategy, irrespective of the severity and extent of peritonitis. The effective and safe use of ROD requires intensive monitoring of the patient in a setting where diagnostic tests and decision making about relaparotomy are guaranteed round the clock. The lack of knowledge on timely and adequate patient selection, together with the lack of use of easy but reliable monitoring tools, seems to hamper full implementation of ROD. The accuracy of the relap decision tool is reasonable for prediction of ongoing peritonitis and selection for computer tomography (CT). The value of CT in an early postoperative phase is unclear. Future research and innovative technologies should focus on the additive value of CT in cases of operated secondary peritonitis and on the further optimization of bedside prediction tools to enhance adequate patient selection for intervention in a multidisciplinary setting.
Evolution of basic equations for nearshore wave field
ISOBE, Masahiko
2013-01-01
In this paper, a systematic, overall view of theories for periodic waves of permanent form, such as Stokes and cnoidal waves, is described first with their validity ranges. To deal with random waves, a method for estimating directional spectra is given. Then, various wave equations are introduced according to the assumptions included in their derivations. The mild-slope equation is derived for combined refraction and diffraction of linear periodic waves. Various parabolic approximations and time-dependent forms are proposed to include randomness and nonlinearity of waves as well as to simplify numerical calculation. Boussinesq equations are the equations developed for calculating nonlinear wave transformations in shallow water. Nonlinear mild-slope equations are derived as a set of wave equations to predict transformation of nonlinear random waves in the nearshore region. Finally, wave equations are classified systematically for a clear theoretical understanding and appropriate selection for specific applications. PMID:23318680
Naugle, Alecia Larew; Barlow, Kristina E; Eblen, Denise R; Teter, Vanessa; Umholtz, Robert
2006-11-01
The U.S. Food Safety and Inspection Service (FSIS) tests sets of samples of selected raw meat and poultry products for Salmonella to ensure that federally inspected establishments meet performance standards defined in the pathogen reduction-hazard analysis and critical control point system (PR-HACCP) final rule. In the present report, sample set results are described and associations between set failure and set and establishment characteristics are identified for 4,607 sample sets collected from 1998 through 2003. Sample sets were obtained from seven product classes: broiler chicken carcasses (n = 1,010), cow and bull carcasses (n = 240), market hog carcasses (n = 560), steer and heifer carcasses (n = 123), ground beef (n = 2,527), ground chicken (n = 31), and ground turkey (n = 116). Of these 4,607 sample sets, 92% (4,255) were collected as part of random testing efforts (A sets), and 93% (4,166) passed. However, the percentage of positive samples relative to the maximum number of positive results allowable in a set increased over time for broilers but decreased or stayed the same for the other product classes. Three factors associated with set failure were identified: establishment size, product class, and year. Set failures were more likely early in the testing program (relative to 2003). Small and very small establishments were more likely to fail than large ones. Set failure was less likely in ground beef than in other product classes. Despite an overall decline in set failures through 2003, these results highlight the need for continued vigilance to reduce Salmonella contamination in broiler chicken and continued implementation of programs designed to assist small and very small establishments with PR-HACCP compliance issues.
A General Method for Predicting Amino Acid Residues Experiencing Hydrogen Exchange
Wang, Boshen; Perez-Rathke, Alan; Li, Renhao; Liang, Jie
2018-01-01
Information on protein hydrogen exchange can help delineate key regions involved in protein-protein interactions and provides important insight towards determining functional roles of genetic variants and their possible mechanisms in disease processes. Previous studies have shown that the degree of hydrogen exchange is affected by hydrogen bond formations, solvent accessibility, proximity to other residues, and experimental conditions. However, a general predictive method for identifying residues capable of hydrogen exchange transferable to a broad set of proteins is lacking. We have developed a machine learning method based on random forest that can predict whether a residue experiences hydrogen exchange. Using data from the Start2Fold database, which contains information on 13,306 residues (3,790 of which experience hydrogen exchange and 9,516 which do not exchange), our method achieves good performance. Specifically, we achieve an overall out-of-bag (OOB) error, an unbiased estimate of the test set error, of 20.3 percent. Using a randomly selected test data set consisting of 500 residues experiencing hydrogen exchange and 500 which do not, our method achieves an accuracy of 0.79, a recall of 0.74, a precision of 0.82, and an F1 score of 0.78.
Yu, Xiaonan; Liu, Bin; Pei, Yuru; Xu, Tianmin
2014-05-01
To establish an objective method for evaluating facial attractiveness from a set of orthodontic photographs. One hundred eight malocclusion patients randomly selected from six universities in China were randomly divided into nine groups, with each group containing an equal number of patients with Class I, II, and III malocclusions. Sixty-nine expert Chinese orthodontists ranked photographs of the patients (frontal, lateral, and frontal smiling photos) before and after orthodontic treatment from "most attractive" to "least attractive" in each group. A weighted mean ranking was then calculated for each patient, based on which a three-point scale was created. Procrustes superimposition was conducted on 101 landmarks identified on the photographs. A support vector regression (SVR) function was set up according to the coordinate values of identified landmarks of each photographic set and its corresponding grading. Its predictive ability was tested for each group in turn. The average coincidence rate obtained for comparisons of the subjective ratings with the SVR evaluation was 71.8% according to 18 verification tests. Geometric morphometrics combined with SVR may be a prospective method for objective comprehensive evaluation of facial attractiveness in the near future.
For whom should we use selective decontamination of the digestive tract?
de Smet, Anne Marie G A; Bonten, Marc J M; Kluytmans, Jan A J W
2012-04-01
This review discusses the relevant studies on selective decontamination of the digestive tract (SDD) published between 2009 and mid-2011. In a multicenter cluster-randomized cross-over study in the Netherlands, SDD and selective oropharyngeal decontamination (SOD) were associated with higher survival at day 28, with a lower incidence of ICU-acquired bacteremia and with less acquisition of respiratory tract colonization with antibiotic resistant pathogens, compared to standard care. A post-hoc analysis of this study suggests that SDD might be more effective in surgical patients and SOD in nonsurgical patients. In a randomized study perioperative use of SDD in patients undergoing gastrointestinal surgery was associated with lower incidences of anastomotic leakages. A Cochrane meta-analysis, not including any of the before mentioned studies, reported a reduction of respiratory tract infections in studies by using topical antibiotics only and higher survival rates when topical antibiotics were combined with parenteral antibiotics. Recent studies show that in ICUs with low levels of antibiotic resistance, SDD and SOD improved patient outcome and reduced infections and carriage with antibiotic-resistant pathogens. The effect in settings with higher levels of antibiotic resistance remains to be determined as well as the efficacy of SDD and SOD in specific patient groups.
McAllister, S; Wiem Lestari, B; Sujatmiko, B; Siregar, A; Sihaloho, E D; Fathania, D; Dewi, N F; Koesoemadinata, R C; Hill, P C; Alisjahbana, B
2017-09-21
Setting: A community health clinic catchment area in the eastern part of Bandung City, Indonesia. Objective: To evaluate the feasibility of two different screening interventions using community health workers (CHWs) in detecting tuberculosis (TB) cases. Design: This was a feasibility study of 1) house-to-house TB symptom screening of five randomly selected 'neighbourhoods' in the catchment area, and 2) selected screening of household contacts of TB index patients and their neighbouring households. Acceptability was assessed through focus group discussions with key stakeholders. Results: Of 5100 individuals screened in randomly selected neighbourhoods, 48 (0.9%) reported symptoms, of whom 38 provided sputum samples; no positive TB was found. No TB cases were found among the 88 household contacts or the 423 neighbourhood contacts. With training, regular support and supervision from research staff and local community health centre staff, CHWs were able to undertake screening effectively, and almost all householders were willing to participate. Conclusion: The use of CHWs for TB screening could be integrated into routine practice relatively easily in Indonesia. The effectiveness of this would need further exploration, particularly with the use of improved diagnostics such as chest X-ray and sputum culture.
Wiem Lestari, B.; Sujatmiko, B.; Siregar, A.; Sihaloho, E. D.; Fathania, D.; Dewi, N. F.; Koesoemadinata, R. C.; Hill, P. C.; Alisjahbana, B.
2017-01-01
Setting: A community health clinic catchment area in the eastern part of Bandung City, Indonesia. Objective: To evaluate the feasibility of two different screening interventions using community health workers (CHWs) in detecting tuberculosis (TB) cases. Design: This was a feasibility study of 1) house-to-house TB symptom screening of five randomly selected ‘neighbourhoods’ in the catchment area, and 2) selected screening of household contacts of TB index patients and their neighbouring households. Acceptability was assessed through focus group discussions with key stakeholders. Results: Of 5100 individuals screened in randomly selected neighbourhoods, 48 (0.9%) reported symptoms, of whom 38 provided sputum samples; no positive TB was found. No TB cases were found among the 88 household contacts or the 423 neighbourhood contacts. With training, regular support and supervision from research staff and local community health centre staff, CHWs were able to undertake screening effectively, and almost all householders were willing to participate. Conclusion: The use of CHWs for TB screening could be integrated into routine practice relatively easily in Indonesia. The effectiveness of this would need further exploration, particularly with the use of improved diagnostics such as chest X-ray and sputum culture. PMID:29226096
NASA Astrophysics Data System (ADS)
Mishra, Aashwin; Iaccarino, Gianluca
2017-11-01
In spite of their deficiencies, RANS models represent the workhorse for industrial investigations into turbulent flows. In this context, it is essential to provide diagnostic measures to assess the quality of RANS predictions. To this end, the primary step is to identify feature importances amongst massive sets of potentially descriptive and discriminative flow features. This aids the physical interpretability of the resultant discrepancy model and its extensibility to similar problems. Recent investigations have utilized approaches such as Random Forests, Support Vector Machines and the Least Absolute Shrinkage and Selection Operator for feature selection. With examples, we exhibit how such methods may not be suitable for turbulent flow datasets. The underlying rationale, such as the correlation bias and the required conditions for the success of penalized algorithms, are discussed with illustrative examples. Finally, we provide alternate approaches using convex combinations of regularized regression approaches and randomized sub-sampling in combination with feature selection algorithms, to infer model structure from data. This research was supported by the Defense Advanced Research Projects Agency under the Enabling Quantification of Uncertainty in Physical Systems (EQUiPS) project (technical monitor: Dr Fariba Fahroo).
Use of Peritoneal Dialysis in AKI: A Systematic Review
Chionh, Chang Yin; Soni, Sachin S.; Finkelstein, Fredric O.; Ronco, Claudio
2013-01-01
Summary Background and objectives The role of peritoneal dialysis in the management of AKI is not well defined, although it remains frequently used, especially in low-resource settings. A systematic review was performed to describe outcomes in AKI treated with peritoneal dialysis and compare peritoneal dialysis with extracorporeal blood purification, such as continuous or intermittent hemodialysis. Design, setting, participants, & measurements MEDLINE, CINAHL, and Central Register of Controlled Trials were searched in July of 2012. Eligible studies selected were observational cohort or randomized adult population studies on peritoneal dialysis in the setting of AKI. The primary outcome of interest was all-cause mortality. Summary estimates of odds ratio were obtained using a random effects model. Results Of 982 citations, 24 studies (n=1556 patients) were identified. The overall methodological quality was low. Thirteen studies described patients (n=597) treated with peritoneal dialysis only; pooled mortality was 39.3%. In 11 studies (7 cohort studies and 4 randomized trials), patients received peritoneal dialysis (n=392, pooled mortality=58.0%) or extracorporeal blood purification (n=567, pooled mortality=56.1%). In the cohort studies, there was no difference in mortality between peritoneal dialysis and extracorporeal blood purification (odds ratio, 0.96; 95% confidence interval, 0.53 to 1.71). In four randomized trials, there was also no difference in mortality (odds ratio, 1.50; 95% confidence interval, 0.46 to 4.86); however, heterogeneity was significant (I2=73%, P=0.03). Conclusions There is currently no evidence to suggest significant differences in mortality between peritoneal dialysis and extracorporeal blood purification in AKI. There is a need for good-quality evidence in this important area. PMID:23833316
Data-driven confounder selection via Markov and Bayesian networks.
Häggström, Jenny
2018-06-01
To unbiasedly estimate a causal effect on an outcome unconfoundedness is often assumed. If there is sufficient knowledge on the underlying causal structure then existing confounder selection criteria can be used to select subsets of the observed pretreatment covariates, X, sufficient for unconfoundedness, if such subsets exist. Here, estimation of these target subsets is considered when the underlying causal structure is unknown. The proposed method is to model the causal structure by a probabilistic graphical model, for example, a Markov or Bayesian network, estimate this graph from observed data and select the target subsets given the estimated graph. The approach is evaluated by simulation both in a high-dimensional setting where unconfoundedness holds given X and in a setting where unconfoundedness only holds given subsets of X. Several common target subsets are investigated and the selected subsets are compared with respect to accuracy in estimating the average causal effect. The proposed method is implemented with existing software that can easily handle high-dimensional data, in terms of large samples and large number of covariates. The results from the simulation study show that, if unconfoundedness holds given X, this approach is very successful in selecting the target subsets, outperforming alternative approaches based on random forests and LASSO, and that the subset estimating the target subset containing all causes of outcome yields smallest MSE in the average causal effect estimation. © 2017, The International Biometric Society.
Susukida, Ryoko; Crum, Rosa M; Stuart, Elizabeth A; Ebnesajjad, Cyrus; Mojtabai, Ramin
2016-07-01
To compare the characteristics of individuals participating in randomized controlled trials (RCTs) of treatments of substance use disorder (SUD) with individuals receiving treatment in usual care settings, and to provide a summary quantitative measure of differences between characteristics of these two groups of individuals using propensity score methods. Design Analyses using data from RCT samples from the National Institute of Drug Abuse Clinical Trials Network (CTN) and target populations of patients drawn from the Treatment Episodes Data Set-Admissions (TEDS-A). Settings Multiple clinical trial sites and nation-wide usual SUD treatment settings in the United States. A total of 3592 individuals from 10 CTN samples and 1 602 226 individuals selected from TEDS-A between 2001 and 2009. Measurements The propensity scores for enrolling in the RCTs were computed based on the following nine observable characteristics: sex, race/ethnicity, age, education, employment status, marital status, admission to treatment through criminal justice, intravenous drug use and the number of prior treatments. Findings The proportion of those with ≥ 12 years of education and the proportion of those who had full-time jobs were significantly higher among RCT samples than among target populations (in seven and nine trials, respectively, at P < 0.001). The pooled difference in the mean propensity scores between the RCTs and the target population was 1.54 standard deviations and was statistically significant at P < 0.001. In the United States, individuals recruited into randomized controlled trials of substance use disorder treatments appear to be very different from individuals receiving treatment in usual care settings. Notably, RCT participants tend to have more years of education and a greater likelihood of full-time work compared with people receiving care in usual care settings. © 2016 Society for the Study of Addiction.
Selection Gradients, the Opportunity for Selection, and the Coefficient of Determination
Moorad, Jacob A.; Wade, Michael J.
2013-01-01
We derive the relationship between R2 (the coefficient of determination), selection gradients, and the opportunity for selection for univariate and multivariate cases. Our main result is to show that the portion of the opportunity for selection that is caused by variation for any trait is equal to the product of its selection gradient and its selection differential. This relationship is a corollary of the first and second fundamental theorems of natural selection, and it permits one to investigate the portions of the total opportunity for selection that are involved in directional selection, stabilizing (and diversifying) selection, and correlational selection, which is important to morphological integration. It also allows one to determine the fraction of fitness variation not explained by variation in measured phenotypes and therefore attributable to random (or, at least, unknown) influences. We apply our methods to a human data set to show how sex-specific mating success as a component of fitness variance can be decoupled from that owing to prereproductive mortality. By quantifying linear sources of sexual selection and quadratic sources of sexual selection, we illustrate that the former is stronger in males, while the latter is stronger in females. PMID:23448880
Selection gradients, the opportunity for selection, and the coefficient of determination.
Moorad, Jacob A; Wade, Michael J
2013-03-01
Abstract We derive the relationship between R(2) (the coefficient of determination), selection gradients, and the opportunity for selection for univariate and multivariate cases. Our main result is to show that the portion of the opportunity for selection that is caused by variation for any trait is equal to the product of its selection gradient and its selection differential. This relationship is a corollary of the first and second fundamental theorems of natural selection, and it permits one to investigate the portions of the total opportunity for selection that are involved in directional selection, stabilizing (and diversifying) selection, and correlational selection, which is important to morphological integration. It also allows one to determine the fraction of fitness variation not explained by variation in measured phenotypes and therefore attributable to random (or, at least, unknown) influences. We apply our methods to a human data set to show how sex-specific mating success as a component of fitness variance can be decoupled from that owing to prereproductive mortality. By quantifying linear sources of sexual selection and quadratic sources of sexual selection, we illustrate that the former is stronger in males, while the latter is stronger in females.
Seehaus, Frank; Schwarze, Michael; Flörkemeier, Thilo; von Lewinski, Gabriela; Kaptein, Bart L; Jakubowitz, Eike; Hurschler, Christof
2016-05-01
Implant migration can be accurately quantified by model-based Roentgen stereophotogrammetric analysis (RSA), using an implant surface model to locate the implant relative to the bone. In a clinical situation, a single reverse engineering (RE) model for each implant type and size is used. It is unclear to what extent the accuracy and precision of migration measurement is affected by implant manufacturing variability unaccounted for by a single representative model. Individual RE models were generated for five short-stem hip implants of the same type and size. Two phantom analyses and one clinical analysis were performed: "Accuracy-matched models": one stem was assessed, and the results from the original RE model were compared with randomly selected models. "Accuracy-random model": each of the five stems was assessed and analyzed using one randomly selected RE model. "Precision-clinical setting": implant migration was calculated for eight patients, and all five available RE models were applied to each case. For the two phantom experiments, the 95%CI of the bias ranged from -0.28 mm to 0.30 mm for translation and -2.3° to 2.5° for rotation. In the clinical setting, precision is less than 0.5 mm and 1.2° for translation and rotation, respectively, except for rotations about the proximodistal axis (<4.1°). High accuracy and precision of model-based RSA can be achieved and are not biased by using a single representative RE model. At least for implants similar in shape to the investigated short-stem, individual models are not necessary. © 2015 Orthopaedic Research Society. Published by Wiley Periodicals, Inc. J Orthop Res 34:903-910, 2016. © 2015 Orthopaedic Research Society. Published by Wiley Periodicals, Inc.
Schroy, Paul C; Duhovic, Emir; Chen, Clara A; Heeren, Timothy C; Lopez, William; Apodaca, Danielle L; Wong, John B
2016-05-01
Eliciting patient preferences within the context of shared decision making has been advocated for colorectal cancer (CRC) screening, yet providers often fail to comply with patient preferences that differ from their own. To determine whether risk stratification for advanced colorectal neoplasia (ACN) influences provider willingness to comply with patient preferences when selecting a desired CRC screening option. Randomized controlled trial. Asymptomatic, average-risk patients due for CRC screening in an urban safety net health care setting. Patients were randomized 1:1 to a decision aid alone (n= 168) or decision aid plus risk assessment (n= 173) arm between September 2012 and September 2014. The primary outcome was concordance between patient preference and test ordered; secondary outcomes included patient satisfaction with the decision-making process, screening intentions, test completion rates, and provider satisfaction. Although providers perceived risk stratification to be useful in selecting an appropriate screening test for their average-risk patients, no significant differences in concordance were observed between the decision aid alone and decision aid plus risk assessment groups (88.1% v. 85.0%,P= 0.40) or high- and low-risk groups (84.5% v. 87.1%,P= 0.51). Concordance was highest for colonoscopy and relatively low for tests other than colonoscopy, regardless of study arm or risk group. Failure to comply with patient preferences was negatively associated with satisfaction with the decision-making process, screening intentions, and test completion rates. Single-institution setting; lack of provider education about the utility of risk stratification into their decision making. Providers perceived risk stratification to be useful in their decision making but often failed to comply with patient preferences for tests other than colonoscopy, even among those deemed to be at low risk of ACN. © The Author(s) 2016.
Mixing rates and limit theorems for random intermittent maps
NASA Astrophysics Data System (ADS)
Bahsoun, Wael; Bose, Christopher
2016-04-01
We study random transformations built from intermittent maps on the unit interval that share a common neutral fixed point. We focus mainly on random selections of Pomeu-Manneville-type maps {{T}α} using the full parameter range 0<α <∞ , in general. We derive a number of results around a common theme that illustrates in detail how the constituent map that is fastest mixing (i.e. smallest α) combined with details of the randomizing process, determines the asymptotic properties of the random transformation. Our key result (theorem 1.1) establishes sharp estimates on the position of return time intervals for the quenched dynamics. The main applications of this estimate are to limit laws (in particular, CLT and stable laws, depending on the parameters chosen in the range 0<α <1 ) for the associated skew product; these are detailed in theorem 3.2. Since our estimates in theorem 1.1 also hold for 1≤slant α <∞ we study a second class of random transformations derived from piecewise affine Gaspard-Wang maps, prove existence of an infinite (σ-finite) invariant measure and study the corresponding correlation asymptotics. To the best of our knowledge, this latter kind of result is completely new in the setting of random transformations.
Extended observability of linear time-invariant systems under recurrent loss of output data
NASA Technical Reports Server (NTRS)
Luck, Rogelio; Ray, Asok; Halevi, Yoram
1989-01-01
Recurrent loss of sensor data in integrated control systems of an advanced aircraft may occur under different operating conditions that include detected frame errors and queue saturation in computer networks, and bad data suppression in signal processing. This paper presents an extension of the concept of observability based on a set of randomly selected nonconsecutive outputs in finite-dimensional, linear, time-invariant systems. Conditions for testing extended observability have been established.
2004-03-01
definition efficiency is the amount of the time that the processing element is gainfully employed , which is calculated by using the ratio of the... employs an interest- ing form of tournament selection called Pareto domination tournaments. Two members of the population are chosen at random and they...it has a set of solutions and using a template for each solution is not feasible. So the MOMGA employs a different competitive template during the
Kindt, Karlijn C. M.; Kleinjan, Marloes; Janssens, Jan M. A. M.; Scholte, Ron H. J.
2014-01-01
A randomized controlled trial was conducted among a potential high-risk group of 1,343 adolescents from low-income areas in The Netherlands to test the effectiveness of the depression prevention program Op Volle Kracht (OVK) as provided by teachers in a school setting. The results showed no main effect of the program on depressive symptoms at one-year follow-up. A moderation effect was found for parental psychopathology; adolescents who had parents with psychopathology and received the OVK program had less depressive symptoms compared to adolescents with parents with psychopathology in the control condition. No moderating effects on depressive symptoms were found for gender, ethnical background, and level of baseline depressive symptoms. An iatrogenic effect of the intervention was found on the secondary outcome of clinical depressive symptoms. Based on the low level of reported depressive symptoms at baseline, it seems that our sample might not meet the characteristics of a high-risk selective group for depressive symptoms. Therefore, no firm conclusions can be drawn about the selective potential of the OVK depression prevention program. In its current form, the OVK program should not be implemented on a large scale in the natural setting for non-high-risk adolescents. Future research should focus on high-risk participants, such as children of parents with psychopathology. PMID:24837666
Lu, Timothy Tehua; Lao, Oscar; Nothnagel, Michael; Junge, Olaf; Freitag-Wolf, Sandra; Caliebe, Amke; Balascakova, Miroslava; Bertranpetit, Jaume; Bindoff, Laurence Albert; Comas, David; Holmlund, Gunilla; Kouvatsi, Anastasia; Macek, Milan; Mollet, Isabelle; Nielsen, Finn; Parson, Walther; Palo, Jukka; Ploski, Rafal; Sajantila, Antti; Tagliabracci, Adriano; Gether, Ulrik; Werge, Thomas; Rivadeneira, Fernando; Hofman, Albert; Uitterlinden, André Gerardus; Gieger, Christian; Wichmann, Heinz-Erich; Ruether, Andreas; Schreiber, Stefan; Becker, Christian; Nürnberg, Peter; Nelson, Matthew Roberts; Kayser, Manfred; Krawczak, Michael
2009-07-01
Genetic matching potentially provides a means to alleviate the effects of incomplete Mendelian randomization in population-based gene-disease association studies. We therefore evaluated the genetic-matched pair study design on the basis of genome-wide SNP data (309,790 markers; Affymetrix GeneChip Human Mapping 500K Array) from 2457 individuals, sampled at 23 different recruitment sites across Europe. Using pair-wise identity-by-state (IBS) as a matching criterion, we tried to derive a subset of markers that would allow identification of the best overall matching (BOM) partner for a given individual, based on the IBS status for the subset alone. However, our results suggest that, by following this approach, the prediction accuracy is only notably improved by the first 20 markers selected, and increases proportionally to the marker number thereafter. Furthermore, in a considerable proportion of cases (76.0%), the BOM of a given individual, based on the complete marker set, came from a different recruitment site than the individual itself. A second marker set, specifically selected for ancestry sensitivity using singular value decomposition, performed even more poorly and was no more capable of predicting the BOM than randomly chosen subsets. This leads us to conclude that, at least in Europe, the utility of the genetic-matched pair study design depends critically on the availability of comprehensive genotype information for both cases and controls.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mathee, Angela
Introduction: Lead exposure in shooting ranges has been under scrutiny for decades, but no information in this regard is available in respect of African settings, and in South Africa specifically. The aim of this study was to determine the blood lead levels in the users of randomly selected private shooting ranges in South Africa's Gauteng province. Methods: An analytical cross sectional study was conducted, with participants recruited from four randomly selected shooting ranges and three archery ranges as a comparator group. Results: A total of 118 (87 shooters and 31 archers) were included in the analysis. Shooters had significantly highermore » blood lead levels (BLL) compared to archers with 36/85 (42.4%) of shooters versus 2/34 (5.9%) of archers found to have a BLL ≥10 μg/dl (p<0.001). Conclusion: Shooting ranges may constitute an import site of elevated exposure to lead. Improved ventilation, low levels of awareness of lead hazards, poor housekeeping, and inadequate personal hygiene facilities and practices at South African shooting ranges need urgent attention. - Highlights: • This is the first study, to our knowledge, of lead exposure in shooting ranges in an African setting. • This study indicates highly elevated lead exposure amongst the users of certain private shooting ranges in South Africa. • Lead exposure may be a serious, yet under-studied, source of adult lead exposure in South Africa, and possibly elsewhere on the African continent.« less
George, Steven Z; Teyhen, Deydre S; Wu, Samuel S; Wright, Alison C; Dugan, Jessica L; Yang, Guijun; Robinson, Michael E; Childs, John D
2009-07-01
The general population has a pessimistic view of low back pain (LBP), and evidence-based information has been used to positively influence LBP beliefs in previously reported mass media studies. However, there is a lack of randomized trials investigating whether LBP beliefs can be modified in primary prevention settings. This cluster randomized clinical trial investigated the effect of an evidence-based psychosocial educational program (PSEP) on LBP beliefs for soldiers completing military training. A military setting was selected for this clinical trial, because LBP is a common cause of soldier disability. Companies of soldiers (n = 3,792) were recruited, and cluster randomized to receive a PSEP or no education (control group, CG). The PSEP consisted of an interactive seminar, and soldiers were issued the Back Book for reference material. The primary outcome measure was the back beliefs questionnaire (BBQ), which assesses inevitable consequences of and ability to cope with LBP. The BBQ was administered before randomization and 12 weeks later. A linear mixed model was fitted for the BBQ at the 12-week follow-up, and a generalized linear mixed model was fitted for the dichotomous outcomes on BBQ change of greater than two points. Sensitivity analyses were performed to account for drop out. BBQ scores (potential range: 9-45) improved significantly from baseline of 25.6 +/- 5.7 (mean +/- SD) to 26.9 +/- 6.2 for those receiving the PSEP, while there was a significant decline from 26.1 +/- 5.7 to 25.6 +/- 6.0 for those in the CG. The adjusted mean BBQ score at follow-up for those receiving the PSEP was 1.49 points higher than those in the CG (P < 0.0001). The adjusted odds ratio of BBQ improvement of greater than two points for those receiving the PSEP was 1.51 (95% CI = 1.22-1.86) times that of those in the CG. BBQ improvement was also mildly associated with race and college education. Sensitivity analyses suggested minimal influence of drop out. In conclusion, soldiers that received the PSEP had an improvement in their beliefs related to the inevitable consequences of and ability to cope with LBP. This is the first randomized trial to show positive influence on LBP beliefs in a primary prevention setting, and these findings have potentially important public health implications for prevention of LBP.
Perrin, Ellen C; Sheldrick, R Christopher; McMenamy, Jannette M; Henson, Brandi S; Carter, Alice S
2014-01-01
Disruptive behavior disorders, such as attention-deficient/hyperactivity disorder and oppositional defiant disorder, are common and stable throughout childhood. These disorders cause long-term morbidity but benefit from early intervention. While symptoms are often evident before preschool, few children receive appropriate treatment during this period. Group parent training, such as the Incredible Years program, has been shown to be effective in improving parenting strategies and reducing children's disruptive behaviors. Because they already monitor young children's behavior and development, primary care pediatricians are in a good position to intervene early when indicated. To investigate the feasibility and effectiveness of parent-training groups delivered to parents of toddlers in pediatric primary care settings. This randomized clinical trial was conducted at 11 diverse pediatric practices in the Greater Boston area. A total of 273 parents of children between 2 and 4 years old who acknowledged disruptive behaviors on a 20-item checklist were included. A 10-week Incredible Years parent-training group co-led by a research clinician and a pediatric staff member. Self-reports and structured videotaped observations of parent and child behaviors conducted prior to, immediately after, and 12 months after the intervention. A total of 150 parents were randomly assigned to the intervention or the waiting-list group. An additional 123 parents were assigned to receive intervention without a randomly selected comparison group. Compared with the waiting-list group, greater improvement was observed in both intervention groups (P < .05). No differences were observed between the randomized and the nonrandomized intervention groups. Self-reports and structured observations provided evidence of improvements in parenting practices and child disruptive behaviors that were attributable to participation in the Incredible Years groups. This study demonstrated the feasibility and effectiveness of parent-training groups conducted in pediatric office settings to reduce disruptive behavior in toddlers. clinicaltrials.gov Identifier: NCT00402857.
Beksinska, Mags E; Smit, Jenni; Greener, Ross; Todd, Catherine S; Lee, Mei-ling Ting; Maphumulo, Virginia; Hoffmann, Vivian
2015-02-01
In low-income settings, many women and girls face activity restrictions during menses, owing to lack of affordable menstrual products. The menstrual cup (MC) is a nonabsorbent reusable cup that collects menstrual blood. We assessed the acceptability and performance of the MPower® MC compared to pads or tampons among women in a low-resource setting. We conducted a randomized two-period crossover trial at one site in Durban, South Africa, between January and November 2013. Participants aged 18-45 years with regular menstrual cycles were eligible for inclusion if they had no intention of becoming pregnant, were using an effective contraceptive method, had water from the municipal system as their primary water source, and had no sexually transmitted infections. We used a computer-generated randomization sequence to assign participants to one of two sequences of menstrual product use, with allocation concealed only from the study investigators. Participants used each method over three menstrual cycles (total 6 months) and were interviewed at baseline and monthly follow-up visits. The product acceptability outcome compared product satisfaction question scores using an ordinal logistic regression model with individual random effects. This study is registered on the South African Clinical Trials database: number DOH-27-01134273. Of 124 women assessed, 110 were eligible and randomly assigned to selected menstrual products. One hundred and five women completed all follow-up visits. By comparison to pads/tampons (usual product used), the MC was rated significantly better for comfort, quality, menstrual blood collection, appearance, and preference. Both of these comparative outcome measures, along with likelihood of continued use, recommending the product, and future purchase, increased for the MC over time. MC acceptance in a population of novice users, many with limited experience with tampons, indicates that there is a pool of potential users in low-resource settings.
Analysis of Machine Learning Techniques for Heart Failure Readmissions.
Mortazavi, Bobak J; Downing, Nicholas S; Bucholz, Emily M; Dharmarajan, Kumar; Manhapra, Ajay; Li, Shu-Xia; Negahban, Sahand N; Krumholz, Harlan M
2016-11-01
The current ability to predict readmissions in patients with heart failure is modest at best. It is unclear whether machine learning techniques that address higher dimensional, nonlinear relationships among variables would enhance prediction. We sought to compare the effectiveness of several machine learning algorithms for predicting readmissions. Using data from the Telemonitoring to Improve Heart Failure Outcomes trial, we compared the effectiveness of random forests, boosting, random forests combined hierarchically with support vector machines or logistic regression (LR), and Poisson regression against traditional LR to predict 30- and 180-day all-cause readmissions and readmissions because of heart failure. We randomly selected 50% of patients for a derivation set, and a validation set comprised the remaining patients, validated using 100 bootstrapped iterations. We compared C statistics for discrimination and distributions of observed outcomes in risk deciles for predictive range. In 30-day all-cause readmission prediction, the best performing machine learning model, random forests, provided a 17.8% improvement over LR (mean C statistics, 0.628 and 0.533, respectively). For readmissions because of heart failure, boosting improved the C statistic by 24.9% over LR (mean C statistic 0.678 and 0.543, respectively). For 30-day all-cause readmission, the observed readmission rates in the lowest and highest deciles of predicted risk with random forests (7.8% and 26.2%, respectively) showed a much wider separation than LR (14.2% and 16.4%, respectively). Machine learning methods improved the prediction of readmission after hospitalization for heart failure compared with LR and provided the greatest predictive range in observed readmission rates. © 2016 American Heart Association, Inc.
Dimensions of Oppression in the Lives of Impoverished Black Women Who Use Drugs
Windsor, Liliane Cambraia; Benoit, Ellen; Dunlap, Eloise
2010-01-01
Oppression against Black women continues to be a significant problem in the United States. The purpose of this study is to use grounded theory to identify multiple dimensions of oppression experienced by impoverished Black women who use drugs by examining several settings in which participants experience oppression. Three case studies of drug using, impoverished Black women were randomly selected from two large scale consecutive ethnographic studies conducted in New York City from 1998 to 2005. Analysis revealed five dimensions of oppression occurring within eight distinct settings. While dimensions constitute different manifestations of oppression, settings represented areas within participants’ lives or institutions with which participants interact. Dimensions of oppression included classism, sexism, familism, racism, and drugism. Settings included the school system, correction system, welfare system, housing and neighborhood, relationship with men, family, experiences with drug use, and employment. Findings have important implications for social justice, welfare, drug, and justice system policy. PMID:21113410
Deeb, Omar; Shaik, Basheerulla; Agrawal, Vijay K
2014-10-01
Quantitative Structure-Activity Relationship (QSAR) models for binding affinity constants (log Ki) of 78 flavonoid ligands towards the benzodiazepine site of GABA (A) receptor complex were calculated using the machine learning methods: artificial neural network (ANN) and support vector machine (SVM) techniques. The models obtained were compared with those obtained using multiple linear regression (MLR) analysis. The descriptor selection and model building were performed with 10-fold cross-validation using the training data set. The SVM and MLR coefficient of determination values are 0.944 and 0.879, respectively, for the training set and are higher than those of ANN models. Though the SVM model shows improvement of training set fitting, the ANN model was superior to SVM and MLR in predicting the test set. Randomization test is employed to check the suitability of the models.
Old-growth and mature forests near spotted owl nests in western Oregon
NASA Technical Reports Server (NTRS)
Ripple, William J.; Johnson, David H.; Hershey, K. T.; Meslow, E. Charles
1995-01-01
We investigated how the amount of old-growth and mature forest influences the selection of nest sites by northern spotted owls (Strix occidentalis caurina) in the Central Cascade Mountains of Oregon. We used 7 different plot sizes to compare the proportion of mature and old-growth forest between 30 nest sites and 30 random sites. The proportion of old-growth and mature forest was significantly greater at nests sites than at random sites for all plot sizes (P less than or equal to 0.01). Thus, management of the spotted owl might require setting the percentage of old-growth and mature forest retained from harvesting at least 1 standard deviation above the mean for the 30 nest sites we examined.
A Random Finite Set Approach to Space Junk Tracking and Identification
2014-09-03
Final 3. DATES COVERED (From - To) 31 Jan 13 – 29 Apr 14 4. TITLE AND SUBTITLE A Random Finite Set Approach to Space Junk Tracking and...01-2013 to 29-04-2014 4. TITLE AND SUBTITLE A Random Finite Set Approach to Space Junk Tracking and Identification 5a. CONTRACT NUMBER FA2386-13...Prescribed by ANSI Std Z39-18 A Random Finite Set Approach to Space Junk Tracking and Indentification Ba-Ngu Vo1, Ba-Tuong Vo1, 1Department of
Christen, William G.; Glynn, Robert J.; Gaziano, J. Michael; Darke, Amy K.; Crowley, John J.; Goodman, Phyllis J.; Lippman, Scott M.; Lad, Thomas E.; Bearden, James D.; Goodman, Gary E.; Minasian, Lori M.; Thompson, Ian M.; Blanke, Charles D.; Klein, Eric A.
2014-01-01
Importance Observational studies suggest a role for dietary nutrients such as vitamin E and selenium in cataract prevention. However, the results of randomized trials of vitamin E supplements and cataract have been disappointing, and are not yet available for selenium. Objective To test whether long-term supplementation with selenium and vitamin E affects the incidence of cataract in a large cohort of men. Design, Setting, and Participants The SELECT Eye Endpoints (SEE) study was an ancillary study of the SWOG-coordinated Selenium and Vitamin E Cancer Prevention Trial (SELECT), a randomized, placebo-controlled, four arm trial of selenium and vitamin E conducted among 35,533 men aged 50 years and older for African Americans and 55 and older for all other men, at 427 participating sites in the US, Canada, and Puerto Rico. A total of 11,267 SELECT participants from 128 SELECT sites participated in the SEE ancillary study. Intervention Individual supplements of selenium (200 µg/d from L-selenomethionine) and vitamin E (400 IU/d of all rac-α-tocopheryl acetate). Main Outcome Measures Incident cataract, defined as a lens opacity, age-related in origin, responsible for a reduction in best-corrected visual acuity to 20/30 or worse based on self-report confirmed by medical record review, and cataract extraction, defined as the surgical removal of an incident cataract. Results During a mean (SD) of 5.6 (1.2) years of treatment and follow-up, 389 cases of cataract were documented. There were 185 cataracts in the selenium group and 204 in the no selenium group (hazard ratio [HR], 0.91; 95 percent confidence interval [CI], 0.75 to 1.11; P=.37). For vitamin E, there were 197 cases in the treated group and 192 in the placebo group (HR, 1.02; CI, 0.84 to 1.25; P=.81). Similar results were observed for cataract extraction. Conclusions and Relevance These randomized trial data from a large cohort of apparently healthy men indicate that long-term daily supplementation with selenium and/or vitamin E is unlikely to have a large beneficial effect on age-related cataract. PMID:25232809
This report is a description of field work and data analysis results comparing a design comparable to systematic site selection with one based on random selection of sites. The report is expected to validate the use of random site selection in the bioassessment program for the O...
Muñoz, Irene; Henriques, Dora; Jara, Laura; Johnston, J Spencer; Chávez-Galarza, Julio; De La Rúa, Pilar; Pinto, M Alice
2017-07-01
The honeybee (Apis mellifera) has been threatened by multiple factors including pests and pathogens, pesticides and loss of locally adapted gene complexes due to replacement and introgression. In western Europe, the genetic integrity of the native A. m. mellifera (M-lineage) is endangered due to trading and intensive queen breeding with commercial subspecies of eastern European ancestry (C-lineage). Effective conservation actions require reliable molecular tools to identify pure-bred A. m. mellifera colonies. Microsatellites have been preferred for identification of A. m. mellifera stocks across conservation centres. However, owing to high throughput, easy transferability between laboratories and low genotyping error, SNPs promise to become popular. Here, we compared the resolving power of a widely utilized microsatellite set to detect structure and introgression with that of different sets that combine a variable number of SNPs selected for their information content and genomic proximity to the microsatellite loci. Contrary to every SNP data set, microsatellites did not discriminate between the two lineages in the PCA space. Mean introgression proportions were identical across the two marker types, although at the individual level, microsatellites' performance was relatively poor at the upper range of Q-values, a result reflected by their lower precision. Our results suggest that SNPs are more accurate and powerful than microsatellites for identification of A. m. mellifera colonies, especially when they are selected by information content. © 2016 John Wiley & Sons Ltd.
The frequency and behavioral outcomes of goal choices in the self-management of diabetes.
Estabrooks, Paul A; Nelson, Candace C; Xu, Stanley; King, Diane; Bayliss, Elizabeth A; Gaglio, Bridget; Nutting, Paul A; Glasgow, Russell E
2005-01-01
The purpose of this study was to determine the frequency and effectiveness of behavioral goal choices in the self-management of diabetes and to test goal-setting theory hypotheses that self-selection and behavioral specificity of goals are key to enhancing persistence. Participants with type 2 diabetes in a randomized controlled trial (n = 422) completed baseline behavioral assessments using a clinic-based, interactive, self-management CD-ROM that allowed them to select a behavioral goal and receive mail and telephone support for the initial 6 months of the trial followed by additional behavioral assessments. Frequency of behavioral goal selection and 6-month behavioral data were collected. Approximately 49%, 27%, and 24% of the participants, respectively, set goals to increase physical activity (PA), reduce fat intake, or increase fruits and vegetables (F&V) consumed. At baseline, participants who selected PA, reduced fat consumption, or F&V were significantly, and respectively, less active, consumed more dietary fat, and ate fewer F&V regardless of demographic characteristics. Participants who selected a reduced-fat goal showed a significantly larger decrease than did those that selected PA or F&V goals. Participants who selected an F&V goal showed significant changes in F&V consumption. Participants who selected a PA goal demonstrated significant changes in days of moderate and vigorous physical activity. When participants are provided with information on health behavior status and an option of behavioral goals for managing type 2 diabetes, they will select personally appropriate goals, resulting in significant behavioral changes over a 6-month period.
Marino, Miguel; Killerby, Marie; Lee, Soomi; Klein, Laura Cousino; Moen, Phyllis; Olson, Ryan; Kossek, Ellen Ernst; King, Rosalind; Erickson, Leslie; Berkman, Lisa F.; Buxton, Orfeu M.
2016-01-01
Objectives To evaluate the effects of a workplace-based intervention on actigraphic and self-reported sleep outcomes in an extended care setting. Design Cluster randomized trial. Setting Extended-care (nursing) facilities. Participants US employees and managers at nursing homes. Nursing homes were randomly selected to intervention or control settings. Intervention The Work, Family and Health Study developed an intervention aimed at reducing work-family conflict within a 4-month work-family organizational change process. Employees participated in interactive sessions with facilitated discussions, role-playing, and games designed to increase control over work processes and work time. Managers completed training in family-supportive supervision. Measurements Primary actigraphic outcomes included: total sleep duration, wake after sleep onset, nighttime sleep, variation in nighttime sleep, nap duration, and number of naps. Secondary survey outcomes included work-to-family conflict, sleep insufficiency, insomnia symptoms and sleep quality. Measures were obtained at baseline, 6-months and 12-months post-intervention. Results A total of 1,522 employees and 184 managers provided survey data at baseline. Managers and employees in the intervention arm showed no significant difference in sleep outcomes over time compared to control participants. Sleep outcomes were not moderated by work-to-family conflict or presence of children in the household for managers or employees. Age significantly moderated an intervention effect on nighttime sleep among employees (p=0.040), where younger employees benefited more from the intervention. Conclusion In the context of an extended-care nursing home workplace, the intervention did not significantly alter sleep outcomes in either managers or employees. Moderating effects of age were identified where younger employees’ sleep outcomes benefited more from the intervention. PMID:28239635
Liu, Gui-Song; Guo, Hao-Song; Pan, Tao; Wang, Ji-Hua; Cao, Gan
2014-10-01
Based on Savitzky-Golay (SG) smoothing screening, principal component analysis (PCA) combined with separately supervised linear discriminant analysis (LDA) and unsupervised hierarchical clustering analysis (HCA) were used for non-destructive visible and near-infrared (Vis-NIR) detection for breed screening of transgenic sugarcane. A random and stability-dependent framework of calibration, prediction, and validation was proposed. A total of 456 samples of sugarcane leaves planting in the elongating stage were collected from the field, which was composed of 306 transgenic (positive) samples containing Bt and Bar gene and 150 non-transgenic (negative) samples. A total of 156 samples (negative 50 and positive 106) were randomly selected as the validation set; the remaining samples (negative 100 and positive 200, a total of 300 samples) were used as the modeling set, and then the modeling set was subdivided into calibration (negative 50 and positive 100, a total of 150 samples) and prediction sets (negative 50 and positive 100, a total of 150 samples) for 50 times. The number of SG smoothing points was ex- panded, while some modes of higher derivative were removed because of small absolute value, and a total of 264 smoothing modes were used for screening. The pairwise combinations of first three principal components were used, and then the optimal combination of principal components was selected according to the model effect. Based on all divisions of calibration and prediction sets and all SG smoothing modes, the SG-PCA-LDA and SG-PCA-HCA models were established, the model parameters were optimized based on the average prediction effect for all divisions to produce modeling stability. Finally, the model validation was performed by validation set. With SG smoothing, the modeling accuracy and stability of PCA-LDA, PCA-HCA were signif- icantly improved. For the optimal SG-PCA-LDA model, the recognition rate of positive and negative validation samples were 94.3%, 96.0%; and were 92.5%, 98.0% for the optimal SG-PCA-LDA model, respectively. Vis-NIR spectro- scopic pattern recognition combined with SG smoothing could be used for accurate recognition of transgenic sugarcane leaves, and provided a convenient screening method for transgenic sugarcane breeding.
Composition bias and the origin of ORFan genes
Yomtovian, Inbal; Teerakulkittipong, Nuttinee; Lee, Byungkook; Moult, John; Unger, Ron
2010-01-01
Motivation: Intriguingly, sequence analysis of genomes reveals that a large number of genes are unique to each organism. The origin of these genes, termed ORFans, is not known. Here, we explore the origin of ORFan genes by defining a simple measure called ‘composition bias’, based on the deviation of the amino acid composition of a given sequence from the average composition of all proteins of a given genome. Results: For a set of 47 prokaryotic genomes, we show that the amino acid composition bias of real proteins, random ‘proteins’ (created by using the nucleotide frequencies of each genome) and ‘proteins’ translated from intergenic regions are distinct. For ORFans, we observed a correlation between their composition bias and their relative evolutionary age. Recent ORFan proteins have compositions more similar to those of random ‘proteins’, while the compositions of more ancient ORFan proteins are more similar to those of the set of all proteins of the organism. This observation is consistent with an evolutionary scenario wherein ORFan genes emerged and underwent a large number of random mutations and selection, eventually adapting to the composition preference of their organism over time. Contact: ron@biocoml.ls.biu.ac.il Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20231229
Parental bonding in men with alcohol disorders: a relationship with conduct disorder.
Joyce, P R; Sellman, D; Wells, E; Frampton, C M; Bushnell, J A; Oakley-Browne, M; Hornblow, A R
1994-09-01
Men from a clinical treatment setting suffering from alcohol dependence, and randomly selected men from the community diagnosed as having alcohol abuse and/or dependence, completed the Parental Bonding Instrument. The men from the alcohol treatment setting perceived both parents as having been uncaring and overprotective. In the general population sample, an uncaring and overprotective parental style was strongly associated with childhood conduct disorder, but not with alcohol disorder symptoms. This discrepancy in perceived parenting highlights the difficulties in extrapolating findings about aetiological factors for alcohol disorders from clinical samples. It also suggests that childhood conduct disorder and adult antisocial behaviour could influence which men with alcohol disorders receive inpatient treatment.
2014-01-01
Background Small RNAs are important regulators of genome function, yet their prediction in genomes is still a major computational challenge. Statistical analyses of pre-miRNA sequences indicated that their 2D structure tends to have a minimal free energy (MFE) significantly lower than MFE values of equivalently randomized sequences with the same nucleotide composition, in contrast to other classes of non-coding RNA. The computation of many MFEs is, however, too intensive to allow for genome-wide screenings. Results Using a local grid infrastructure, MFE distributions of random sequences were pre-calculated on a large scale. These distributions follow a normal distribution and can be used to determine the MFE distribution for any given sequence composition by interpolation. It allows on-the-fly calculation of the normal distribution for any candidate sequence composition. Conclusion The speedup achieved makes genome-wide screening with this characteristic of a pre-miRNA sequence practical. Although this particular property alone will not be able to distinguish miRNAs from other sequences sufficiently discriminative, the MFE-based P-value should be added to the parameters of choice to be included in the selection of potential miRNA candidates for experimental verification. PMID:24418292
Clustering of financial time series with application to index and enhanced index tracking portfolio
NASA Astrophysics Data System (ADS)
Dose, Christian; Cincotti, Silvano
2005-09-01
A stochastic-optimization technique based on time series cluster analysis is described for index tracking and enhanced index tracking problems. Our methodology solves the problem in two steps, i.e., by first selecting a subset of stocks and then setting the weight of each stock as a result of an optimization process (asset allocation). Present formulation takes into account constraints on the number of stocks and on the fraction of capital invested in each of them, whilst not including transaction costs. Computational results based on clustering selection are compared to those of random techniques and show the importance of clustering in noise reduction and robust forecasting applications, in particular for enhanced index tracking.
Kemoli, A M; van Amerongen, W E
2011-03-01
To determine the examiner's accuracy in selecting proximal carious lesions in primary molars for restoration using the atraumatic restorative treatment (ART) approach. Intervention study. CLINICAL SETTING AND PARTICIPANTS: A total of 804 six to eight year-olds from 30 rural schools in Kenya participated in the study. Three examiners selected a total of 1,280 suitable proximal carious lesions in primary molars after examining 6,002 children from 30 schools randomly selected out of 142 schools in two divisions. Seven operators randomly paired on a daily basis with eight assistants restored the lesions. An explanation was provided for any cavity that was not restored. Pre-and post-operative radiographs of the cavities were also taken for evaluation. The examiner's choice of suitable proximal cavities restorable using the ART approach was related to the decision made to either restore or not during the operative stage. The radiographic findings of the selected cavities were also compared to the decision made by the operator. The results obtained were used to determine the examiner's accuracy in selecting suitable proximal cavities for restoration using the ART approach. The majority of the children recruited in the study were excluded due to absenteeism, pulpal-exposure or anxiety during the operative stage. Only 804 children received one restoration in their primary molars. The examiner's accuracy in selecting suitable ART-restorable cavities clinically was 94.9% and based on radiographic analysis was 91.7%. A trained and diligent examiner has a very good chance of selecting proximal carious lesions restorable with the use of ART approach, without the threat of dental pulpal-involvement during the excavation of caries.
Randomizing Roaches: Exploring the "Bugs" of Randomization in Experimental Design
ERIC Educational Resources Information Center
Wagler, Amy; Wagler, Ron
2014-01-01
Understanding the roles of random selection and random assignment in experimental design is a central learning objective in most introductory statistics courses. This article describes an activity, appropriate for a high school or introductory statistics course, designed to teach the concepts, values and pitfalls of random selection and assignment…
Parda, Natalia; Stępień, Małgorzata; Zakrzewska, Karolina; Madaliński, Kazimierz; Kołakowska, Agnieszka; Godzik, Paulina; Rosińska, Magdalena
2016-01-01
Objectives Response rate in public health programmes may be a limiting factor. It is important to first consider their delivery and acceptability for the target. This study aimed at determining individual and unit-related factors associated with increased odds of non-response based on hepatitis C virus screening in primary healthcare. Design Primary healthcare units (PHCUs) were extracted from the Register of Health Care Centres. Each of the PHCUs was to enrol adult patients selected on a random basis. Data on the recruitment of PHCUs and patients were analysed. Multilevel modelling was applied to investigate individual and unit-related factors associated with non-response. Multilevel logistic model was developed with fixed effects and only a random intercept for the unit. Preliminary analysis included a random effect for unit and each of the individual or PHCU covariates separately. For each of the PHCU covariates, we applied a two-level model with individual covariates, unit random effect and a single fixed effect of this unit covariate. Setting This study was conducted in primary care units in selected provinces in Poland. Participants A total of 242 PHCUs and 24 480 adults were invited. Of them, 44 PHCUs and 20 939 patients agreed to participate. Both PHCUs and patients were randomly selected. Results Data on 44 PHCUs and 24 480 patients were analysed. PHCU-level factors and recruitment strategies were important predictors of non-response. Unit random effect was significant in all models. Larger and private units reported higher non-response rates, while for those with a history of running public health programmes the odds of non-response was lower. Proactive recruitment, more working hours devoted to the project and patient resulted in higher acceptance of the project. Higher number of personnel had no such effect. Conclusions Prior to the implementation of public health programme, several factors that could hinder its execution should be addressed. PMID:27927665
Contrasting lexical similarity and formal definitions in SNOMED CT: consistency and implications.
Agrawal, Ankur; Elhanan, Gai
2014-02-01
To quantify the presence of and evaluate an approach for detection of inconsistencies in the formal definitions of SNOMED CT (SCT) concepts utilizing a lexical method. Utilizing SCT's Procedure hierarchy, we algorithmically formulated similarity sets: groups of concepts with similar lexical structure of their fully specified name. We formulated five random samples, each with 50 similarity sets, based on the same parameter: number of parents, attributes, groups, all the former as well as a randomly selected control sample. All samples' sets were reviewed for types of formal definition inconsistencies: hierarchical, attribute assignment, attribute target values, groups, and definitional. For the Procedure hierarchy, 2111 similarity sets were formulated, covering 18.1% of eligible concepts. The evaluation revealed that 38 (Control) to 70% (Different relationships) of similarity sets within the samples exhibited significant inconsistencies. The rate of inconsistencies for the sample with different relationships was highly significant compared to Control, as well as the number of attribute assignment and hierarchical inconsistencies within their respective samples. While, at this time of the HITECH initiative, the formal definitions of SCT are only a minor consideration, in the grand scheme of sophisticated, meaningful use of captured clinical data, they are essential. However, significant portion of the concepts in the most semantically complex hierarchy of SCT, the Procedure hierarchy, are modeled inconsistently in a manner that affects their computability. Lexical methods can efficiently identify such inconsistencies and possibly allow for their algorithmic resolution. Copyright © 2013 Elsevier Inc. All rights reserved.
Environmental diversity as a surrogate for species representation.
Beier, Paul; de Albuquerque, Fábio Suzart
2015-10-01
Because many species have not been described and most species ranges have not been mapped, conservation planners often use surrogates for conservation planning, but evidence for surrogate effectiveness is weak. Surrogates are well-mapped features such as soil types, landforms, occurrences of an easily observed taxon (discrete surrogates), and well-mapped environmental conditions (continuous surrogate). In the context of reserve selection, the idea is that a set of sites selected to span diversity in the surrogate will efficiently represent most species. Environmental diversity (ED) is a rarely used surrogate that selects sites to efficiently span multivariate ordination space. Because it selects across continuous environmental space, ED should perform better than discrete surrogates (which necessarily ignore within-bin and between-bin heterogeneity). Despite this theoretical advantage, ED appears to have performed poorly in previous tests of its ability to identify 50 × 50 km cells that represented vertebrates in Western Europe. Using an improved implementation of ED, we retested ED on Western European birds, mammals, reptiles, amphibians, and combined terrestrial vertebrates. We also tested ED on data sets for plants of Zimbabwe, birds of Spain, and birds of Arizona (United States). Sites selected using ED represented European mammals no better than randomly selected cells, but they represented species in the other 7 data sets with 20% to 84% effectiveness. This far exceeds the performance in previous tests of ED, and exceeds the performance of most discrete surrogates. We believe ED performed poorly in previous tests because those tests considered only a few candidate explanatory variables and used suboptimal forms of ED's selection algorithm. We suggest future work on ED focus on analyses at finer grain sizes more relevant to conservation decisions, explore the effect of selecting the explanatory variables most associated with species turnover, and investigate whether nonclimate abiotic variables can provide useful surrogates in an ED framework. © 2015 Society for Conservation Biology.
Thermodynamic method for generating random stress distributions on an earthquake fault
Barall, Michael; Harris, Ruth A.
2012-01-01
This report presents a new method for generating random stress distributions on an earthquake fault, suitable for use as initial conditions in a dynamic rupture simulation. The method employs concepts from thermodynamics and statistical mechanics. A pattern of fault slip is considered to be analogous to a micro-state of a thermodynamic system. The energy of the micro-state is taken to be the elastic energy stored in the surrounding medium. Then, the Boltzmann distribution gives the probability of a given pattern of fault slip and stress. We show how to decompose the system into independent degrees of freedom, which makes it computationally feasible to select a random state. However, due to the equipartition theorem, straightforward application of the Boltzmann distribution leads to a divergence which predicts infinite stress. To avoid equipartition, we show that the finite strength of the fault acts to restrict the possible states of the system. By analyzing a set of earthquake scaling relations, we derive a new formula for the expected power spectral density of the stress distribution, which allows us to construct a computer algorithm free of infinities. We then present a new technique for controlling the extent of the rupture by generating a random stress distribution thousands of times larger than the fault surface, and selecting a portion which, by chance, has a positive stress perturbation of the desired size. Finally, we present a new two-stage nucleation method that combines a small zone of forced rupture with a larger zone of reduced fracture energy.
NASA Technical Reports Server (NTRS)
Falls, L. W.; Crutcher, H. L.
1976-01-01
Transformation of statistics from a dimensional set to another dimensional set involves linear functions of the original set of statistics. Similarly, linear functions will transform statistics within a dimensional set such that the new statistics are relevant to a new set of coordinate axes. A restricted case of the latter is the rotation of axes in a coordinate system involving any two correlated random variables. A special case is the transformation for horizontal wind distributions. Wind statistics are usually provided in terms of wind speed and direction (measured clockwise from north) or in east-west and north-south components. A direct application of this technique allows the determination of appropriate wind statistics parallel and normal to any preselected flight path of a space vehicle. Among the constraints for launching space vehicles are critical values selected from the distribution of the expected winds parallel to and normal to the flight path. These procedures are applied to space vehicle launches at Cape Kennedy, Florida.
2014-02-01
moisture level of 14% dry soil mass was maintained for the duration of the study by weekly additions of ASTM Type I water. Soil samples were collected...maintain the initial soil moisture level. One cluster of Orchard grass straw was harvested from a set of randomly selected replicate containers...decomposition is among the most integrating processes within the soil ecosystem because it involves complex interactions of soil microbial, plant , and
Application of random effects to the study of resource selection by animals
Gillies, C.S.; Hebblewhite, M.; Nielsen, S.E.; Krawchuk, M.A.; Aldridge, Cameron L.; Frair, J.L.; Saher, D.J.; Stevens, C.E.; Jerde, C.L.
2006-01-01
1. Resource selection estimated by logistic regression is used increasingly in studies to identify critical resources for animal populations and to predict species occurrence.2. Most frequently, individual animals are monitored and pooled to estimate population-level effects without regard to group or individual-level variation. Pooling assumes that both observations and their errors are independent, and resource selection is constant given individual variation in resource availability.3. Although researchers have identified ways to minimize autocorrelation, variation between individuals caused by differences in selection or available resources, including functional responses in resource selection, have not been well addressed.4. Here we review random-effects models and their application to resource selection modelling to overcome these common limitations. We present a simple case study of an analysis of resource selection by grizzly bears in the foothills of the Canadian Rocky Mountains with and without random effects.5. Both categorical and continuous variables in the grizzly bear model differed in interpretation, both in statistical significance and coefficient sign, depending on how a random effect was included. We used a simulation approach to clarify the application of random effects under three common situations for telemetry studies: (a) discrepancies in sample sizes among individuals; (b) differences among individuals in selection where availability is constant; and (c) differences in availability with and without a functional response in resource selection.6. We found that random intercepts accounted for unbalanced sample designs, and models with random intercepts and coefficients improved model fit given the variation in selection among individuals and functional responses in selection. Our empirical example and simulations demonstrate how including random effects in resource selection models can aid interpretation and address difficult assumptions limiting their generality. This approach will allow researchers to appropriately estimate marginal (population) and conditional (individual) responses, and account for complex grouping, unbalanced sample designs and autocorrelation.
Application of random effects to the study of resource selection by animals.
Gillies, Cameron S; Hebblewhite, Mark; Nielsen, Scott E; Krawchuk, Meg A; Aldridge, Cameron L; Frair, Jacqueline L; Saher, D Joanne; Stevens, Cameron E; Jerde, Christopher L
2006-07-01
1. Resource selection estimated by logistic regression is used increasingly in studies to identify critical resources for animal populations and to predict species occurrence. 2. Most frequently, individual animals are monitored and pooled to estimate population-level effects without regard to group or individual-level variation. Pooling assumes that both observations and their errors are independent, and resource selection is constant given individual variation in resource availability. 3. Although researchers have identified ways to minimize autocorrelation, variation between individuals caused by differences in selection or available resources, including functional responses in resource selection, have not been well addressed. 4. Here we review random-effects models and their application to resource selection modelling to overcome these common limitations. We present a simple case study of an analysis of resource selection by grizzly bears in the foothills of the Canadian Rocky Mountains with and without random effects. 5. Both categorical and continuous variables in the grizzly bear model differed in interpretation, both in statistical significance and coefficient sign, depending on how a random effect was included. We used a simulation approach to clarify the application of random effects under three common situations for telemetry studies: (a) discrepancies in sample sizes among individuals; (b) differences among individuals in selection where availability is constant; and (c) differences in availability with and without a functional response in resource selection. 6. We found that random intercepts accounted for unbalanced sample designs, and models with random intercepts and coefficients improved model fit given the variation in selection among individuals and functional responses in selection. Our empirical example and simulations demonstrate how including random effects in resource selection models can aid interpretation and address difficult assumptions limiting their generality. This approach will allow researchers to appropriately estimate marginal (population) and conditional (individual) responses, and account for complex grouping, unbalanced sample designs and autocorrelation.
Visual statistical learning is not reliably modulated by selective attention to isolated events
Musz, Elizabeth; Weber, Matthew J.; Thompson-Schill, Sharon L.
2014-01-01
Recent studies of visual statistical learning (VSL) indicate that the visual system can automatically extract temporal and spatial relationships between objects. We report several attempts to replicate and extend earlier work (Turk-Browne et al., 2005) in which observers performed a cover task on one of two interleaved stimulus sets, resulting in learning of temporal relationships that occur in the attended stream, but not those present in the unattended stream. Across four experiments, we exposed observers to a similar or identical familiarization protocol, directing attention to one of two interleaved stimulus sets; afterward, we assessed VSL efficacy for both sets using either implicit response-time measures or explicit familiarity judgments. In line with prior work, we observe learning for the attended stimulus set. However, unlike previous reports, we also observe learning for the unattended stimulus set. When instructed to selectively attend to only one of the stimulus sets and ignore the other set, observers could extract temporal regularities for both sets. Our efforts to experimentally decrease this effect by changing the cover task (Experiment 1) or the complexity of the statistical regularities (Experiment 3) were unsuccessful. A fourth experiment using a different assessment of learning likewise failed to show an attentional effect. Simulations drawing random samples our first three experiments (n=64) confirm that the distribution of attentional effects in our sample closely approximates the null. We offer several potential explanations for our failure to replicate earlier findings, and discuss how our results suggest limiting conditions on the relevance of attention to VSL. PMID:25172196
Muller, Julius; Parizotto, Eneida; Antrobus, Richard; Francis, James; Bunce, Campbell; Stranks, Amanda; Nichols, Marshall; McClain, Micah; Hill, Adrian V S; Ramasamy, Adaikalavan; Gilbert, Sarah C
2017-06-08
Influenza challenge trials are important for vaccine efficacy testing. Currently, disease severity is determined by self-reported scores to a list of symptoms which can be highly subjective. A more objective measure would allow for improved data analysis. Twenty-one volunteers participated in an influenza challenge trial. We calculated the daily sum of scores (DSS) for a list of 16 influenza symptoms. Whole blood collected at baseline and 24, 48, 72 and 96 h post challenge was profiled on Illumina HT12v4 microarrays. Changes in gene expression most strongly correlated with DSS were selected to train a Random Forest model and tested on two independent test sets consisting of 41 individuals profiled on a different microarray platform and 33 volunteers assayed by qRT-PCR. 1456 probes are significantly associated with DSS at 1% false discovery rate. We selected 19 genes with the largest fold change to train a random forest model. We observed good concordance between predicted and actual scores in the first test set (r = 0.57; RMSE = -16.1%) with the greatest agreement achieved on samples collected approximately 72 h post challenge. Therefore, we assayed samples collected at baseline and 72 h post challenge in the second test set by qRT-PCR and observed good concordance (r = 0.81; RMSE = -36.1%). We developed a 19-gene qRT-PCR panel to predict DSS, validated on two independent datasets. A transcriptomics based panel could provide a more objective measure of symptom scoring in future influenza challenge studies. Trial registration Samples were obtained from a clinical trial with the ClinicalTrials.gov Identifier: NCT02014870, first registered on December 5, 2013.
Estimation of reference intervals from small samples: an example using canine plasma creatinine.
Geffré, A; Braun, J P; Trumel, C; Concordet, D
2009-12-01
According to international recommendations, reference intervals should be determined from at least 120 reference individuals, which often are impossible to achieve in veterinary clinical pathology, especially for wild animals. When only a small number of reference subjects is available, the possible bias cannot be known and the normality of the distribution cannot be evaluated. A comparison of reference intervals estimated by different methods could be helpful. The purpose of this study was to compare reference limits determined from a large set of canine plasma creatinine reference values, and large subsets of this data, with estimates obtained from small samples selected randomly. Twenty sets each of 120 and 27 samples were randomly selected from a set of 1439 plasma creatinine results obtained from healthy dogs in another study. Reference intervals for the whole sample and for the large samples were determined by a nonparametric method. The estimated reference limits for the small samples were minimum and maximum, mean +/- 2 SD of native and Box-Cox-transformed values, 2.5th and 97.5th percentiles by a robust method on native and Box-Cox-transformed values, and estimates from diagrams of cumulative distribution functions. The whole sample had a heavily skewed distribution, which approached Gaussian after Box-Cox transformation. The reference limits estimated from small samples were highly variable. The closest estimates to the 1439-result reference interval for 27-result subsamples were obtained by both parametric and robust methods after Box-Cox transformation but were grossly erroneous in some cases. For small samples, it is recommended that all values be reported graphically in a dot plot or histogram and that estimates of the reference limits be compared using different methods.
CrowdPhase: crowdsourcing the phase problem
Jorda, Julien; Sawaya, Michael R.; Yeates, Todd O.
2014-01-01
The human mind innately excels at some complex tasks that are difficult to solve using computers alone. For complex problems amenable to parallelization, strategies can be developed to exploit human intelligence in a collective form: such approaches are sometimes referred to as ‘crowdsourcing’. Here, a first attempt at a crowdsourced approach for low-resolution ab initio phasing in macromolecular crystallography is proposed. A collaborative online game named CrowdPhase was designed, which relies on a human-powered genetic algorithm, where players control the selection mechanism during the evolutionary process. The algorithm starts from a population of ‘individuals’, each with a random genetic makeup, in this case a map prepared from a random set of phases, and tries to cause the population to evolve towards individuals with better phases based on Darwinian survival of the fittest. Players apply their pattern-recognition capabilities to evaluate the electron-density maps generated from these sets of phases and to select the fittest individuals. A user-friendly interface, a training stage and a competitive scoring system foster a network of well trained players who can guide the genetic algorithm towards better solutions from generation to generation via gameplay. CrowdPhase was applied to two synthetic low-resolution phasing puzzles and it was shown that players could successfully obtain phase sets in the 30° phase error range and corresponding molecular envelopes showing agreement with the low-resolution models. The successful preliminary studies suggest that with further development the crowdsourcing approach could fill a gap in current crystallographic methods by making it possible to extract meaningful information in cases where limited resolution might otherwise prevent initial phasing. PMID:24914965
Wyman, Peter A.; Brown, C. Hendricks
2015-01-01
The dynamic wait-listed design (DWLD) and regression point displacement design (RPDD) address several challenges in evaluating group-based interventions when there is a limited number of groups. Both DWLD and RPDD utilize efficiencies that increase statistical power and can enhance balance between community needs and research priorities. The DWLD blocks on more time units than traditional wait-listed designs, thereby increasing the proportion of a study period during which intervention and control conditions can be compared, and can also improve logistics of implementing intervention across multiple sites and strengthen fidelity. We discuss DWLDs in the larger context of roll-out randomized designs and compare it with its cousin the Stepped Wedge design. The RPDD uses archival data on the population of settings from which intervention unit(s) are selected to create expected posttest scores for units receiving intervention, to which actual posttest scores are compared. High pretest-posttest correlations give the RPDD statistical power for assessing intervention impact even when one or a few settings receive intervention. RPDD works best when archival data are available over a number of years prior to and following intervention. If intervention units were not randomly selected, propensity scores can be used to control for nonrandom selection factors. Examples are provided of the DWLD and RPDD used to evaluate, respectively, suicide prevention training (QPR) in 32 schools and a violence prevention program (CeaseFire) in 2 Chicago police districts over a 10-year period. How DWLD and RPDD address common threats to internal and external validity, as well as their limitations, are discussed. PMID:25481512
Active learning: a step towards automating medical concept extraction.
Kholghi, Mahnoosh; Sitbon, Laurianne; Zuccon, Guido; Nguyen, Anthony
2016-03-01
This paper presents an automatic, active learning-based system for the extraction of medical concepts from clinical free-text reports. Specifically, (1) the contribution of active learning in reducing the annotation effort and (2) the robustness of incremental active learning framework across different selection criteria and data sets are determined. The comparative performance of an active learning framework and a fully supervised approach were investigated to study how active learning reduces the annotation effort while achieving the same effectiveness as a supervised approach. Conditional random fields as the supervised method, and least confidence and information density as 2 selection criteria for active learning framework were used. The effect of incremental learning vs standard learning on the robustness of the models within the active learning framework with different selection criteria was also investigated. The following 2 clinical data sets were used for evaluation: the Informatics for Integrating Biology and the Bedside/Veteran Affairs (i2b2/VA) 2010 natural language processing challenge and the Shared Annotated Resources/Conference and Labs of the Evaluation Forum (ShARe/CLEF) 2013 eHealth Evaluation Lab. The annotation effort saved by active learning to achieve the same effectiveness as supervised learning is up to 77%, 57%, and 46% of the total number of sequences, tokens, and concepts, respectively. Compared with the random sampling baseline, the saving is at least doubled. Incremental active learning is a promising approach for building effective and robust medical concept extraction models while significantly reducing the burden of manual annotation. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Strong adsorption of random heteropolymers on protein surfaces
NASA Astrophysics Data System (ADS)
Nguyen, Trung; Qiao, Baofu; Panganiban, Brian; Delre, Christopher; Xu, Ting; Olvera de La Cruz, Monica
Rational design of copolymers for stablizing proteins' functionalities in unfavorable solvents and delivering nanoparticles through organic membranes demands a thorough understanding of how the proteins and colloids are encapsulated by a given type of copolymers. Random heteropolymers (RHPs), a special family of copolymers with random segment order, have long been recognized as a promising coating materials due to their biomimetic behaviors while allowing for much flexibility in the synthesis procedure. Of practical importance is the ability to predict the conditions under which a given family of random heteropolymers would provide optimal encapsulatio. Here we investigate the key factors that govern the adsorption of RHPs on the surface of a model protein. Using coarse-grained molecular simulation we identify the conditions under which the model protein is fully covered by the polymers. We have examined the nanometer-level details of the adsorbed polymer chains and found a clear connection between the surface coverage and adsorption strength, solvent selectivity and the volume fraction of adsorbing monomers. The results in this work set the stage for further investigation on engineering biomimetic RHPs for stabilizing and delivering functional proteins across multiple media.
Fluoride resistance and transport by riboswitch-controlled CLC antiporters
Stockbridge, Randy B.; Lim, Hyun-Ho; Otten, Renee; Williams, Carole; Shane, Tania; Weinberg, Zasha; Miller, Christopher
2012-01-01
A subclass of bacterial CLC anion-transporting proteins, phylogenetically distant from long-studied CLCs, was recently shown to be specifically up-regulated by F-. We establish here that a set of randomly selected representatives from this “CLCF” clade protect Escherichia coli from F- toxicity, and that the purified proteins catalyze transport of F- in liposomes. Sequence alignments and membrane transport experiments using 19F NMR, osmotic response assays, and planar lipid bilayer recordings reveal four mechanistic traits that set CLCF proteins apart from all other known CLCs. First, CLCFs lack conserved residues that form the anion binding site in canonical CLCs. Second, CLCFs exhibit high anion selectivity for F- over Cl-. Third, at a residue thought to distinguish CLC channels and transporters, CLCFs bear a channel-like valine rather than a transporter-like glutamate, and yet are F-/H+ antiporters. Finally, F-/H+ exchange occurs with 1∶1 stoichiometry, in contrast to the usual value of 2∶1. PMID:22949689
Fluoride resistance and transport by riboswitch-controlled CLC antiporters.
Stockbridge, Randy B; Lim, Hyun-Ho; Otten, Renee; Williams, Carole; Shane, Tania; Weinberg, Zasha; Miller, Christopher
2012-09-18
A subclass of bacterial CLC anion-transporting proteins, phylogenetically distant from long-studied CLCs, was recently shown to be specifically up-regulated by F(-). We establish here that a set of randomly selected representatives from this "CLC(F)" clade protect Escherichia coli from F(-) toxicity, and that the purified proteins catalyze transport of F(-) in liposomes. Sequence alignments and membrane transport experiments using (19)F NMR, osmotic response assays, and planar lipid bilayer recordings reveal four mechanistic traits that set CLC(F) proteins apart from all other known CLCs. First, CLC(F)s lack conserved residues that form the anion binding site in canonical CLCs. Second, CLC(F)s exhibit high anion selectivity for F(-) over Cl(-). Third, at a residue thought to distinguish CLC channels and transporters, CLC(F)s bear a channel-like valine rather than a transporter-like glutamate, and yet are F(-)/H(+) antiporters. Finally, F(-)/H(+) exchange occurs with 1:1 stoichiometry, in contrast to the usual value of 2:1.
Mor, Vincent; Volandes, Angelo E; Gutman, Roee; Gatsonis, Constantine; Mitchell, Susan L
2017-04-01
Background/Aims Nursing homes are complex healthcare systems serving an increasingly sick population. Nursing homes must engage patients in advance care planning, but do so inconsistently. Video decision support tools improved advance care planning in small randomized controlled trials. Pragmatic trials are increasingly employed in health services research, although not commonly in the nursing home setting to which they are well-suited. This report presents the design and rationale for a pragmatic cluster randomized controlled trial that evaluated the "real world" application of an Advance Care Planning Video Program in two large US nursing home healthcare systems. Methods PRagmatic trial Of Video Education in Nursing homes was conducted in 360 nursing homes (N = 119 intervention/N = 241 control) owned by two healthcare systems. Over an 18-month implementation period, intervention facilities were instructed to offer the Advance Care Planning Video Program to all patients. Control facilities employed usual advance care planning practices. Patient characteristics and outcomes were ascertained from Medicare Claims, Minimum Data Set assessments, and facility electronic medical record data. Intervention adherence was measured using a Video Status Report embedded into electronic medical record systems. The primary outcome was the number of hospitalizations/person-day alive among long-stay patients with advanced dementia or cardiopulmonary disease. The rationale for the approaches to facility randomization and recruitment, intervention implementation, population selection, data acquisition, regulatory issues, and statistical analyses are discussed. Results The large number of well-characterized candidate facilities enabled several unique design features including stratification on historical hospitalization rates, randomization prior to recruitment, and 2:1 control to intervention facilities ratio. Strong endorsement from corporate leadership made randomization prior to recruitment feasible with 100% participation of facilities randomized to the intervention arm. Critical regulatory issues included minimal risk determination, waiver of informed consent, and determination that nursing home providers were not engaged in human subjects research. Intervention training and implementation were initiated on 5 January 2016 using corporate infrastructures for new program roll-out guided by standardized training elements designed by the research team. Video Status Reports in facilities' electronic medical records permitted "real-time" adherence monitoring and corrective actions. The Centers for Medicare and Medicaid Services Virtual Research Data Center allowed for rapid outcomes ascertainment. Conclusion We must rigorously evaluate interventions to deliver more patient-focused care to an increasingly frail nursing home population. Video decision support is a practical approach to improve advance care planning. PRagmatic trial Of Video Education in Nursing homes has the potential to promote goal-directed care among millions of older Americans in nursing homes and establish a methodology for future pragmatic randomized controlled trials in this complex healthcare setting.
NASA Astrophysics Data System (ADS)
Castillo, Richard; Castillo, Edward; Fuentes, David; Ahmad, Moiz; Wood, Abbie M.; Ludwig, Michelle S.; Guerrero, Thomas
2013-05-01
Landmark point-pairs provide a strategy to assess deformable image registration (DIR) accuracy in terms of the spatial registration of the underlying anatomy depicted in medical images. In this study, we propose to augment a publicly available database (www.dir-lab.com) of medical images with large sets of manually identified anatomic feature pairs between breath-hold computed tomography (BH-CT) images for DIR spatial accuracy evaluation. Ten BH-CT image pairs were randomly selected from the COPDgene study cases. Each patient had received CT imaging of the entire thorax in the supine position at one-fourth dose normal expiration and maximum effort full dose inspiration. Using dedicated in-house software, an imaging expert manually identified large sets of anatomic feature pairs between images. Estimates of inter- and intra-observer spatial variation in feature localization were determined by repeat measurements of multiple observers over subsets of randomly selected features. 7298 anatomic landmark features were manually paired between the 10 sets of images. Quantity of feature pairs per case ranged from 447 to 1172. Average 3D Euclidean landmark displacements varied substantially among cases, ranging from 12.29 (SD: 6.39) to 30.90 (SD: 14.05) mm. Repeat registration of uniformly sampled subsets of 150 landmarks for each case yielded estimates of observer localization error, which ranged in average from 0.58 (SD: 0.87) to 1.06 (SD: 2.38) mm for each case. The additions to the online web database (www.dir-lab.com) described in this work will broaden the applicability of the reference data, providing a freely available common dataset for targeted critical evaluation of DIR spatial accuracy performance in multiple clinical settings. Estimates of observer variance in feature localization suggest consistent spatial accuracy for all observers across both four-dimensional CT and COPDgene patient cohorts.
Economic evaluations of single- versus double-embryo transfer in IVF.
Fiddelers, A A A; Severens, J L; Dirksen, C D; Dumoulin, J C M; Land, J A; Evers, J L H
2007-01-01
Multiple pregnancies lead to complications and induce high costs. The most successful way to decrease multiple pregnancies in IVF is to transfer only one embryo, which might reduce the efficacy of treatment. The objective of this review is to determine which embryo-transfer policy is most cost-effective: elective single-embryo transfer (eSET) or double-embryo transfer (DET). Several databases were searched for (cost* or econ*) and (single embryo* or double embryo* or one embryo* or two embryo* or elect* embryo or multip* embryo*). On the basis of five exclusion criteria, titles and abstracts were screened by two individual reviewers. The remaining papers were read for further selection, and data were extracted from the selected studies. A total of 496 titles were identified through the searches and resulted in the selection of one observational study and three randomized studies. Study characteristics, total costs and probability of live births were extracted. Besides this, cost-effectiveness and incremental cost-effectiveness were derived. It can be concluded that DET is the most expensive strategy. DET is also most effective if performed in one fresh cycle. eSET is only preferred from a cost-effectiveness point of view when performed in good prognosis patients and when frozen/thawed cycles are included. If frozen/thawed cycles are excluded, the choice between eSET and DET depends on how much society is willing to pay for one extra successful pregnancy.
CADASTER QSPR Models for Predictions of Melting and Boiling Points of Perfluorinated Chemicals.
Bhhatarai, Barun; Teetz, Wolfram; Liu, Tao; Öberg, Tomas; Jeliazkova, Nina; Kochev, Nikolay; Pukalov, Ognyan; Tetko, Igor V; Kovarich, Simona; Papa, Ester; Gramatica, Paola
2011-03-14
Quantitative structure property relationship (QSPR) studies on per- and polyfluorinated chemicals (PFCs) on melting point (MP) and boiling point (BP) are presented. The training and prediction chemicals used for developing and validating the models were selected from Syracuse PhysProp database and literatures. The available experimental data sets were split in two different ways: a) random selection on response value, and b) structural similarity verified by self-organizing-map (SOM), in order to propose reliable predictive models, developed only on the training sets and externally verified on the prediction sets. Individual linear and non-linear approaches based models developed by different CADASTER partners on 0D-2D Dragon descriptors, E-state descriptors and fragment based descriptors as well as consensus model and their predictions are presented. In addition, the predictive performance of the developed models was verified on a blind external validation set (EV-set) prepared using PERFORCE database on 15 MP and 25 BP data respectively. This database contains only long chain perfluoro-alkylated chemicals, particularly monitored by regulatory agencies like US-EPA and EU-REACH. QSPR models with internal and external validation on two different external prediction/validation sets and study of applicability-domain highlighting the robustness and high accuracy of the models are discussed. Finally, MPs for additional 303 PFCs and BPs for 271 PFCs were predicted for which experimental measurements are unknown. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Qin, Zijian; Wang, Maolin; Yan, Aixia
2017-07-01
In this study, quantitative structure-activity relationship (QSAR) models using various descriptor sets and training/test set selection methods were explored to predict the bioactivity of hepatitis C virus (HCV) NS3/4A protease inhibitors by using a multiple linear regression (MLR) and a support vector machine (SVM) method. 512 HCV NS3/4A protease inhibitors and their IC 50 values which were determined by the same FRET assay were collected from the reported literature to build a dataset. All the inhibitors were represented with selected nine global and 12 2D property-weighted autocorrelation descriptors calculated from the program CORINA Symphony. The dataset was divided into a training set and a test set by a random and a Kohonen's self-organizing map (SOM) method. The correlation coefficients (r 2 ) of training sets and test sets were 0.75 and 0.72 for the best MLR model, 0.87 and 0.85 for the best SVM model, respectively. In addition, a series of sub-dataset models were also developed. The performances of all the best sub-dataset models were better than those of the whole dataset models. We believe that the combination of the best sub- and whole dataset SVM models can be used as reliable lead designing tools for new NS3/4A protease inhibitors scaffolds in a drug discovery pipeline. Copyright © 2017 Elsevier Ltd. All rights reserved.
Cheng, Yu-Huei
2014-12-01
Specific primers play an important role in polymerase chain reaction (PCR) experiments, and therefore it is essential to find specific primers of outstanding quality. Unfortunately, many PCR constraints must be simultaneously inspected which makes specific primer selection difficult and time-consuming. This paper introduces a novel computational intelligence-based method, Teaching-Learning-Based Optimisation, to select the specific and feasible primers. The specified PCR product lengths of 150-300 bp and 500-800 bp with three melting temperature formulae of Wallace's formula, Bolton and McCarthy's formula and SantaLucia's formula were performed. The authors calculate optimal frequency to estimate the quality of primer selection based on a total of 500 runs for 50 random nucleotide sequences of 'Homo species' retrieved from the National Center for Biotechnology Information. The method was then fairly compared with the genetic algorithm (GA) and memetic algorithm (MA) for primer selection in the literature. The results show that the method easily found suitable primers corresponding with the setting primer constraints and had preferable performance than the GA and the MA. Furthermore, the method was also compared with the common method Primer3 according to their method type, primers presentation, parameters setting, speed and memory usage. In conclusion, it is an interesting primer selection method and a valuable tool for automatic high-throughput analysis. In the future, the usage of the primers in the wet lab needs to be validated carefully to increase the reliability of the method.
Chao, Pei-Kuang; Wang, Chun-Li; Chan, Hsiao-Lung
2012-03-01
Predicting response after cardiac resynchronization therapy (CRT) has been a challenge of cardiologists. About 30% of selected patients based on the standard selection criteria for CRT do not show response after receiving the treatment. This study is aimed to build an intelligent classifier to assist in identifying potential CRT responders by speckle-tracking radial strain based on echocardiograms. The echocardiograms analyzed were acquired before CRT from 26 patients who have received CRT. Sequential forward selection was performed on the parameters obtained by peak-strain timing and phase space reconstruction on speckle-tracking radial strain to find an optimal set of features for creating intelligent classifiers. Support vector machine (SVM) with a linear, quadratic, and polynominal kernel were tested to build classifiers to identify potential responders and non-responders for CRT by selected features. Based on random sub-sampling validation, the best classification performance is correct rate about 95% with 96-97% sensitivity and 93-94% specificity achieved by applying SVM with a quadratic kernel on a set of 3 parameters. The selected 3 parameters contain both indexes extracted by peak-strain timing and phase space reconstruction. An intelligent classifier with an averaged correct rate, sensitivity and specificity above 90% for assisting in identifying CRT responders is built by speckle-tracking radial strain. The classifier can be applied to provide objective suggestion for patient selection of CRT. Copyright © 2011 Elsevier B.V. All rights reserved.
Statistical models for estimating daily streamflow in Michigan
Holtschlag, D.J.; Salehi, Habib
1992-01-01
Statistical models for estimating daily streamflow were analyzed for 25 pairs of streamflow-gaging stations in Michigan. Stations were paired by randomly choosing a station operated in 1989 at which 10 or more years of continuous flow data had been collected and at which flow is virtually unregulated; a nearby station was chosen where flow characteristics are similar. Streamflow data from the 25 randomly selected stations were used as the response variables; streamflow data at the nearby stations were used to generate a set of explanatory variables. Ordinary-least squares regression (OLSR) equations, autoregressive integrated moving-average (ARIMA) equations, and transfer function-noise (TFN) equations were developed to estimate the log transform of flow for the 25 randomly selected stations. The precision of each type of equation was evaluated on the basis of the standard deviation of the estimation errors. OLSR equations produce one set of estimation errors; ARIMA and TFN models each produce l sets of estimation errors corresponding to the forecast lead. The lead-l forecast is the estimate of flow l days ahead of the most recent streamflow used as a response variable in the estimation. In this analysis, the standard deviation of lead l ARIMA and TFN forecast errors were generally lower than the standard deviation of OLSR errors for l < 2 days and l < 9 days, respectively. Composite estimates were computed as a weighted average of forecasts based on TFN equations and backcasts (forecasts of the reverse-ordered series) based on ARIMA equations. The standard deviation of composite errors varied throughout the length of the estimation interval and generally was at maximum near the center of the interval. For comparison with OLSR errors, the mean standard deviation of composite errors were computed for intervals of length 1 to 40 days. The mean standard deviation of length-l composite errors were generally less than the standard deviation of the OLSR errors for l < 32 days. In addition, the composite estimates ensure a gradual transition between periods of estimated and measured flows. Model performance among stations of differing model error magnitudes were compared by computing ratios of the mean standard deviation of the length l composite errors to the standard deviation of OLSR errors. The mean error ratio for the set of 25 selected stations was less than 1 for intervals l < 32 days. Considering the frequency characteristics of the length of intervals of estimated record in Michigan, the effective mean error ratio for intervals < 30 days was 0.52. Thus, for intervals of estimation of 1 month or less, the error of the composite estimate is substantially lower than error of the OLSR estimate.
Neonatal Seizure Detection Using Deep Convolutional Neural Networks.
Ansari, Amir H; Cherian, Perumpillichira J; Caicedo, Alexander; Naulaers, Gunnar; De Vos, Maarten; Van Huffel, Sabine
2018-04-02
Identifying a core set of features is one of the most important steps in the development of an automated seizure detector. In most of the published studies describing features and seizure classifiers, the features were hand-engineered, which may not be optimal. The main goal of the present paper is using deep convolutional neural networks (CNNs) and random forest to automatically optimize feature selection and classification. The input of the proposed classifier is raw multi-channel EEG and the output is the class label: seizure/nonseizure. By training this network, the required features are optimized, while fitting a nonlinear classifier on the features. After training the network with EEG recordings of 26 neonates, five end layers performing the classification were replaced with a random forest classifier in order to improve the performance. This resulted in a false alarm rate of 0.9 per hour and seizure detection rate of 77% using a test set of EEG recordings of 22 neonates that also included dubious seizures. The newly proposed CNN classifier outperformed three data-driven feature-based approaches and performed similar to a previously developed heuristic method.
Complex network structure of musical compositions: Algorithmic generation of appealing music
NASA Astrophysics Data System (ADS)
Liu, Xiao Fan; Tse, Chi K.; Small, Michael
2010-01-01
In this paper we construct networks for music and attempt to compose music artificially. Networks are constructed with nodes and edges corresponding to musical notes and their co-occurring connections. We analyze classical music from Bach, Mozart, Chopin, as well as other types of music such as Chinese pop music. We observe remarkably similar properties in all networks constructed from the selected compositions. We conjecture that preserving the universal network properties is a necessary step in artificial composition of music. Power-law exponents of node degree, node strength and/or edge weight distributions, mean degrees, clustering coefficients, mean geodesic distances, etc. are reported. With the network constructed, music can be composed artificially using a controlled random walk algorithm, which begins with a randomly chosen note and selects the subsequent notes according to a simple set of rules that compares the weights of the edges, weights of the nodes, and/or the degrees of nodes. By generating a large number of compositions, we find that this algorithm generates music which has the necessary qualities to be subjectively judged as appealing.
Fernández, Rubén Arroyo; García-Hermoso, Antonio; Solera-Martínez, Montserrat; Correa, Ma Teresa Martín; Morales, Asunción Ferri; Martínez-Vizcaíno, Vicente
2015-01-01
The aim of this meta-analysis was to evaluate the evidence of the effect of pelvic floor muscle training on urinary incontinence after radical prostatectomy. A bibliographic search was conducted in four databases. Studies were grouped according to the intervention program(muscle training versus control and individual home-based versus physiotherapist-guided muscle training). Eight studies were selected for meta-analysis after satisfying the selection criteria. The data show that pelvic floor muscle training improves continence rate in the short (RR=2.16; p<0.001), medium (RR=1.45; p=0.001) and long term (RR=1.23; p=0.019) after surgery. The number of randomized controlled trials and the heterogeneity in the study population and type of pelvic floor muscle training were the main limitations. Programs including at least three sets of 10 repetitions of muscle training daily appear to improve continence rate after radical prostatectomy. Our meta-analysis shows that muscle training programs for urinary incontinence provide similar results to those of physiotherapist-guided programs, therefore being more cost-effective. © 2014 S. Karger AG, Basel.
Chen, Carla Chia-Ming; Schwender, Holger; Keith, Jonathan; Nunkesser, Robin; Mengersen, Kerrie; Macrossan, Paula
2011-01-01
Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection.
Weaver, Addie; Greeno, Catherine G; Goughler, Donald H; Yarzebinski, Kathleen; Zimmerman, Tina; Anderson, Carol
2013-07-01
This study examined the effect of using the Toyota Production System (TPS) to change intake procedures on treatment timeliness within a semi-rural community mental health clinic. One hundred randomly selected cases opened the year before the change and 100 randomly selected cases opened the year after the change were reviewed. An analysis of covariance demonstrated that changing intake procedures significantly decreased the number of days consumers waited for appointments (F(1,160) = 4.9; p = .03) from an average of 11 to 8 days. The pattern of difference on treatment timeliness was significantly different between adult and child programs (F(1,160) = 4.2; p = .04), with children waiting an average of 4 days longer than adults for appointments. Findings suggest that small system level changes may elicit important changes and that TPS offers a valuable model to improve processes within community mental health settings. Results also indicate that different factors drive adult and children's treatment timeliness.
Monte Carlo investigation of thrust imbalance of solid rocket motor pairs
NASA Technical Reports Server (NTRS)
Sforzini, R. H.; Foster, W. A., Jr.
1976-01-01
The Monte Carlo method of statistical analysis is used to investigate the theoretical thrust imbalance of pairs of solid rocket motors (SRMs) firing in parallel. Sets of the significant variables are selected using a random sampling technique and the imbalance calculated for a large number of motor pairs using a simplified, but comprehensive, model of the internal ballistics. The treatment of burning surface geometry allows for the variations in the ovality and alignment of the motor case and mandrel as well as those arising from differences in the basic size dimensions and propellant properties. The analysis is used to predict the thrust-time characteristics of 130 randomly selected pairs of Titan IIIC SRMs. A statistical comparison of the results with test data for 20 pairs shows the theory underpredicts the standard deviation in maximum thrust imbalance by 20% with variability in burning times matched within 2%. The range in thrust imbalance of Space Shuttle type SRM pairs is also estimated using applicable tolerances and variabilities and a correction factor based on the Titan IIIC analysis.
Ion channel gene expression predicts survival in glioma patients
Wang, Rong; Gurguis, Christopher I.; Gu, Wanjun; Ko, Eun A; Lim, Inja; Bang, Hyoweon; Zhou, Tong; Ko, Jae-Hong
2015-01-01
Ion channels are important regulators in cell proliferation, migration, and apoptosis. The malfunction and/or aberrant expression of ion channels may disrupt these important biological processes and influence cancer progression. In this study, we investigate the expression pattern of ion channel genes in glioma. We designate 18 ion channel genes that are differentially expressed in high-grade glioma as a prognostic molecular signature. This ion channel gene expression based signature predicts glioma outcome in three independent validation cohorts. Interestingly, 16 of these 18 genes were down-regulated in high-grade glioma. This signature is independent of traditional clinical, molecular, and histological factors. Resampling tests indicate that the prognostic power of the signature outperforms random gene sets selected from human genome in all the validation cohorts. More importantly, this signature performs better than the random gene signatures selected from glioma-associated genes in two out of three validation datasets. This study implicates ion channels in brain cancer, thus expanding on knowledge of their roles in other cancers. Individualized profiling of ion channel gene expression serves as a superior and independent prognostic tool for glioma patients. PMID:26235283
Weaver, A.; Greeno, C.G.; Goughler, D.H.; Yarzebinski, K.; Zimmerman, T.; Anderson, C.
2013-01-01
This study examined the effect of using the Toyota Production System (TPS) to change intake procedures on treatment timeliness within a semi-rural community mental health clinic. One hundred randomly selected cases opened the year before the change and one hundred randomly selected cases opened the year after the change were reviewed. An analysis of covariance (ANCOVA) demonstrated that changing intake procedures significantly decreased the number of days consumers waited for appointments (F(1,160)=4.9; p=.03) from an average of 11 days to 8 days. The pattern of difference on treatment timeliness was significantly different between adult and child programs (F(1,160)=4.2; p=.04), with children waiting an average of 4 days longer than adults for appointments. Findings suggest that small system level changes may elicit important changes and that TPS offers a valuable model to improve processes within community mental health settings. Results also indicate that different factors drive adult and children’s treatment timeliness. PMID:23576137
Exposure to lead in South African shooting ranges.
Mathee, Angela; de Jager, Pieter; Naidoo, Shan; Naicker, Nisha
2017-02-01
Lead exposure in shooting ranges has been under scrutiny for decades, but no information in this regard is available in respect of African settings, and in South Africa specifically. The aim of this study was to determine the blood lead levels in the users of randomly selected private shooting ranges in South Africa's Gauteng province. An analytical cross sectional study was conducted, with participants recruited from four randomly selected shooting ranges and three archery ranges as a comparator group. A total of 118 (87 shooters and 31 archers) were included in the analysis. Shooters had significantly higher blood lead levels (BLL) compared to archers with 36/85 (42.4%) of shooters versus 2/34 (5.9%) of archers found to have a BLL ≥10μg/dl (p<0.001). Shooting ranges may constitute an import site of elevated exposure to lead. Improved ventilation, low levels of awareness of lead hazards, poor housekeeping, and inadequate personal hygiene facilities and practices at South African shooting ranges need urgent attention. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Zhang, Ying; Sun, Jin; Zhang, Yun-Jiao; Chai, Qian-Yun; Zhang, Kang; Ma, Hong-Li; Wu, Xiao-Ke; Liu, Jian-Ping
2016-10-21
Although Traditional Chinese Medicine (TCM) has been widely used in clinical settings, a major challenge that remains in TCM is to evaluate its efficacy scientifically. This randomized controlled trial aims to evaluate the efficacy and safety of berberine in the treatment of patients with polycystic ovary syndrome. In order to improve the transparency and research quality of this clinical trial, we prepared this statistical analysis plan (SAP). The trial design, primary and secondary outcomes, and safety outcomes were declared to reduce selection biases in data analysis and result reporting. We specified detailed methods for data management and statistical analyses. Statistics in corresponding tables, listings, and graphs were outlined. The SAP provided more detailed information than trial protocol on data management and statistical analysis methods. Any post hoc analyses could be identified via referring to this SAP, and the possible selection bias and performance bias will be reduced in the trial. This study is registered at ClinicalTrials.gov, NCT01138930 , registered on 7 June 2010.
Utilizing Maximal Independent Sets as Dominating Sets in Scale-Free Networks
NASA Astrophysics Data System (ADS)
Derzsy, N.; Molnar, F., Jr.; Szymanski, B. K.; Korniss, G.
Dominating sets provide key solution to various critical problems in networked systems, such as detecting, monitoring, or controlling the behavior of nodes. Motivated by graph theory literature [Erdos, Israel J. Math. 4, 233 (1966)], we studied maximal independent sets (MIS) as dominating sets in scale-free networks. We investigated the scaling behavior of the size of MIS in artificial scale-free networks with respect to multiple topological properties (size, average degree, power-law exponent, assortativity), evaluated its resilience to network damage resulting from random failure or targeted attack [Molnar et al., Sci. Rep. 5, 8321 (2015)], and compared its efficiency to previously proposed dominating set selection strategies. We showed that, despite its small set size, MIS provides very high resilience against network damage. Using extensive numerical analysis on both synthetic and real-world (social, biological, technological) network samples, we demonstrate that our method effectively satisfies four essential requirements of dominating sets for their practical applicability on large-scale real-world systems: 1.) small set size, 2.) minimal network information required for their construction scheme, 3.) fast and easy computational implementation, and 4.) resiliency to network damage. Supported by DARPA, DTRA, and NSF.
Computer-aided diagnosis of melanoma using border and wavelet-based texture analysis.
Garnavi, Rahil; Aldeen, Mohammad; Bailey, James
2012-11-01
This paper presents a novel computer-aided diagnosis system for melanoma. The novelty lies in the optimised selection and integration of features derived from textural, borderbased and geometrical properties of the melanoma lesion. The texture features are derived from using wavelet-decomposition, the border features are derived from constructing a boundaryseries model of the lesion border and analysing it in spatial and frequency domains, and the geometry features are derived from shape indexes. The optimised selection of features is achieved by using the Gain-Ratio method, which is shown to be computationally efficient for melanoma diagnosis application. Classification is done through the use of four classifiers; namely, Support Vector Machine, Random Forest, Logistic Model Tree and Hidden Naive Bayes. The proposed diagnostic system is applied on a set of 289 dermoscopy images (114 malignant, 175 benign) partitioned into train, validation and test image sets. The system achieves and accuracy of 91.26% and AUC value of 0.937, when 23 features are used. Other important findings include (i) the clear advantage gained in complementing texture with border and geometry features, compared to using texture information only, and (ii) higher contribution of texture features than border-based features in the optimised feature set.
Evaluation of complex community-based childhood obesity prevention interventions.
Karacabeyli, D; Allender, S; Pinkney, S; Amed, S
2018-05-16
Multi-setting, multi-component community-based interventions have shown promise in preventing childhood obesity; however, evaluation of these complex interventions remains a challenge. The objective of the study is to systematically review published methodological approaches to outcome evaluation for multi-setting community-based childhood obesity prevention interventions and synthesize a set of pragmatic recommendations. MEDLINE, CINAHL and PsycINFO were searched from inception to 6 July 2017. Papers were included if the intervention targeted children ≤18 years, engaged at least two community sectors and described their outcome evaluation methodology. A single reviewer conducted title and abstract scans, full article review and data abstraction. Directed content analysis was performed by three reviewers to identify prevailing themes. Thirty-three studies were included, and of these, 26 employed a quasi-experimental design; the remaining were randomized control trials. Body mass index was the most commonly measured outcome, followed by health behaviour change and psychosocial outcomes. Six themes emerged, highlighting advantages and disadvantages of active vs. passive consent, quasi-experimental vs. randomized control trials, longitudinal vs. repeat cross-sectional designs and the roles of process evaluation and methodological flexibility in evaluating complex interventions. Selection of study designs and outcome measures compatible with community infrastructure, accompanied by process evaluation, may facilitate successful outcome evaluation. © 2018 World Obesity Federation.
Velema, Elizabeth; Vyth, Ellis L; Steenhuis, Ingrid H M
2017-01-11
The worksite cafeteria is a suitable setting for interventions focusing on changing eating behavior, because a lot of employees visit the worksite cafeteria regularly and a variety of interventions could be implemented there. The aim of this paper is to describe the intervention development and design of the evaluation of an intervention to make the purchase behavior of employees in the worksite cafeteria healthier. The developed intervention called "the worksite cafeteria 2.0" consists of a set of 19 strategies based on theory of nudging and social marketing (marketing mix). The intervention will be evaluated in a real-life setting, that is Dutch worksite cafeterias of different companies and with a number of contract catering organizations. The study is a randomized controlled trial (RCT), with 34 Dutch worksite cafeterias randomly allocated to the 12-week intervention or to the control group. Primary outcomes are sales data of selected products groups like sandwiches, salads, snacks and bread topping. Secondary outcomes are satisfaction of employees with the cafeteria and vitality. When executed, the described RCT will provide better knowledge in the effect of the intervention "the worksite cafeteria 2.0" on the purchasing behavior of Dutch employees in worksite cafeterias. Dutch Trial register: NTR5372 .
HIV salvage therapy does not require nucleoside reverse transcriptase inhibitors: a randomized trial
Tashima, Karen T; Smeaton, Laura M; Fichtenbaum, Carl J; Andrade, Adriana; Eron, Joseph J; Gandhi, Rajesh T; Johnson, Victoria A; Klingman, Karin L; Ritz, Justin; Hodder, Sally; Santana, Jorge L; Wilkin, Timothy; Haubrich, Richard H
2015-01-01
Background Nucleoside reverse transcriptase inhibitors (NRTIs) are often included in antiretroviral (ARV) regimens in treatment-experienced patients in the absence of data from randomized trials. Objective To compare treatment success between participants who omit versus Add NRTIs to an optimized ARV regimen of three or more agents. Design Multisite, randomized, controlled trial. Setting Outpatient HIV clinics. Participants HIV-infected patients with three-class ARV experience and/or viral resistance. Intervention Open-label optimized regimens (not including NRTIs) were selected based upon treatment history and susceptibility testing. Participants were randomized to Omit or Add NRTIs. Measurements The primary efficacy outcome was regimen failure through week 48, using a non-inferiority margin of 15%. The primary safety outcome was time to initial episode of severe sign/symptom or laboratory abnormality prior to discontinuation of NRTI assignment. Results 360 participants were randomized and 93% completed a week 48 visit. The cumulative probability of regimen failure was 29.8% in the Omit NRTI arm versus 25.9% in the Add NRTI arm (difference= 3.2%: 95% CI, −6.1 to 12.5). There were no significant differences in the primary safety endpoints or the proportion of participants with HIV RNA <50 copies/mL between arms. No deaths occurred in the Omit NRTIs arm, compared with 7 deaths in the Add NRTIs arm. Limitations Non-blinded study design and may not be applicable to resource poor settings. Conclusion HIV-infected treatment-experienced patients starting a new optimized regimen can safely omit NRTIs without compromising virologic efficacy. Omitting NRTIs will reduce pill burden, cost, and toxicity in this patient population. PMID:26595748
Blessing of dimensionality: mathematical foundations of the statistical physics of data.
Gorban, A N; Tyukin, I Y
2018-04-28
The concentrations of measure phenomena were discovered as the mathematical background to statistical mechanics at the end of the nineteenth/beginning of the twentieth century and have been explored in mathematics ever since. At the beginning of the twenty-first century, it became clear that the proper utilization of these phenomena in machine learning might transform the curse of dimensionality into the blessing of dimensionality This paper summarizes recently discovered phenomena of measure concentration which drastically simplify some machine learning problems in high dimension, and allow us to correct legacy artificial intelligence systems. The classical concentration of measure theorems state that i.i.d. random points are concentrated in a thin layer near a surface (a sphere or equators of a sphere, an average or median-level set of energy or another Lipschitz function, etc.). The new stochastic separation theorems describe the thin structure of these thin layers: the random points are not only concentrated in a thin layer but are all linearly separable from the rest of the set, even for exponentially large random sets. The linear functionals for separation of points can be selected in the form of the linear Fisher's discriminant. All artificial intelligence systems make errors. Non-destructive correction requires separation of the situations (samples) with errors from the samples corresponding to correct behaviour by a simple and robust classifier. The stochastic separation theorems provide us with such classifiers and determine a non-iterative (one-shot) procedure for their construction.This article is part of the theme issue 'Hilbert's sixth problem'. © 2018 The Author(s).
Blessing of dimensionality: mathematical foundations of the statistical physics of data
NASA Astrophysics Data System (ADS)
Gorban, A. N.; Tyukin, I. Y.
2018-04-01
The concentrations of measure phenomena were discovered as the mathematical background to statistical mechanics at the end of the nineteenth/beginning of the twentieth century and have been explored in mathematics ever since. At the beginning of the twenty-first century, it became clear that the proper utilization of these phenomena in machine learning might transform the curse of dimensionality into the blessing of dimensionality. This paper summarizes recently discovered phenomena of measure concentration which drastically simplify some machine learning problems in high dimension, and allow us to correct legacy artificial intelligence systems. The classical concentration of measure theorems state that i.i.d. random points are concentrated in a thin layer near a surface (a sphere or equators of a sphere, an average or median-level set of energy or another Lipschitz function, etc.). The new stochastic separation theorems describe the thin structure of these thin layers: the random points are not only concentrated in a thin layer but are all linearly separable from the rest of the set, even for exponentially large random sets. The linear functionals for separation of points can be selected in the form of the linear Fisher's discriminant. All artificial intelligence systems make errors. Non-destructive correction requires separation of the situations (samples) with errors from the samples corresponding to correct behaviour by a simple and robust classifier. The stochastic separation theorems provide us with such classifiers and determine a non-iterative (one-shot) procedure for their construction. This article is part of the theme issue `Hilbert's sixth problem'.
Sellors, John; Kaczorowski, Janusz; Sellors, Connie; Dolovich, Lisa; Woodward, Christel; Willan, Andrew; Goeree, Ron; Cosby, Roxanne; Trim, Kristina; Sebaldt, Rolf; Howard, Michelle; Hardcastle, Linda; Poston, Jeff
2003-01-01
Background Pharmacists can improve patient outcomes in institutional and pharmacy settings, but little is known about their effectiveness as consultants to primary care physicians. We examined whether an intervention by a specially trained pharmacist could reduce the number of daily medication units taken by elderly patients, as well as costs and health care use. Methods We conducted a randomized controlled trial in family practices in 24 sites in Ontario. We randomly allocated 48 randomly selected family physicians (69.6% participation rate) to the intervention or the control arm, along with 889 (69.5% participation rate) of their randomly selected community-dwelling, elderly patients who were taking 5 or more medications daily. In the intervention group, pharmacists conducted face-to-face medication reviews with the patients and then gave written recommendations to the physicians to resolve any drug-related problems. Process outcomes included the number of drug-related problems identified among the senior citizens in the intervention arm and the proportion of recommendations implemented by the physicians. Results After 5 months, seniors in the intervention and control groups were taking a mean of 12.4 and 12.2 medication units per day respectively (p = 0.50). There were no statistically significant differences in health care use or costs between groups. A mean of 2.5 drug-related problems per senior was identified in the intervention arm. Physicians implemented or attempted to implement 72.3% (790/1093) of the recommendations. Interpretation The intervention did not have a significant effect on patient outcomes. However, physicians were receptive to the recommendations to resolve drug-related problems, suggesting that collaboration between physicians and pharmacists is feasible. PMID:12847034
Cecchini, M; Warin, L
2016-03-01
Food labels are considered a crucial component of strategies tackling unhealthy diets and obesity. This study aims at assessing the effectiveness of food labelling in increasing the selection of healthier products and in reducing calorie intake. In addition, this study compares the relative effectiveness of traffic light schemes, Guideline Daily Amount and other food labelling schemes. A comprehensive set of databases were searched to identify randomized studies. Studies reporting homogeneous outcomes were pooled together and analysed through meta-analyses. Publication bias was evaluated with a funnel plot. Food labelling would increase the amount of people selecting a healthier food product by about 17.95% (confidence interval: +11.24% to +24.66%). Food labelling would also decrease calorie intake/choice by about 3.59% (confidence interval: -8.90% to +1.72%), but results are not statistically significant. Traffic light schemes are marginally more effective in increasing the selection of healthier options. Other food labels and Guideline Daily Amount follow. The available evidence did not allow studying the effects of single labelling schemes on calorie intake/choice. Findings of this study suggest that nutrition labelling may be an effective approach to empowering consumers in choosing healthier products. Interpretive labels, as traffic light labels, may be more effective. © 2015 World Obesity.
Caries status in 16 year-olds with varying exposure to water fluoridation in Ireland.
Mullen, J; McGaffin, J; Farvardin, N; Brightman, S; Haire, C; Freeman, R
2012-12-01
Most of the Republic of Ireland's public water supplies have been fluoridated since the mid-1960s while Northern Ireland has never been fluoridated, apart from some small short-lived schemes in east Ulster. This study examines dental caries status in 16 year-olds in a part of Ireland straddling fluoridated and non-fluoridated water supply areas and compares two methods of assessing the effectiveness of water fluoridation. The cross-sectional survey tested differences in caries status by two methods: 1, Estimated Fluoridation Status as used previously in national and regional studies in the Republic and in the All-Island study of 2002; 2, Percentage Lifetime Exposure, a modification of a system described by Slade in 1995 and used in Australian caries research. Adolescents were selected for the study by a two-part random sampling process. Firstly, schools were selected in each area by creating three tiers based on school size, and selecting schools randomly from each tier. Then random sampling of 16-year-olds from these schools, based on a pre-set sampling fraction for each tier of schools. With both systems of measurement, significantly lower caries levels were found in those children with the greatest exposure to fluoridated water when compared to those with the least exposure. The survey provides further evidence of the effectiveness in reducing dental caries experience up to 16 years of age. The extra intricacies involved in using the Percentage Lifetime Exposure method did not provide much more information when compared to the simpler Estimated Fluoridation Status method.
Lancarotte, Inês; Nobre, Moacyr Roberto
2016-01-01
The aim of this study was to identify and reflect on the methods employed by studies focusing on intervention programs for the primordial and primary prevention of cardiovascular diseases. The PubMed, EMBASE, SciVerse Hub-Scopus, and Cochrane Library electronic databases were searched using the terms ‘effectiveness AND primary prevention AND risk factors AND cardiovascular diseases’ for systematic reviews, meta-analyses, randomized clinical trials, and controlled clinical trials in the English language. A descriptive analysis of the employed strategies, theories, frameworks, applied activities, and measurement of the variables was conducted. Nineteen primary studies were analyzed. Heterogeneity was observed in the outcome evaluations, not only in the selected domains but also in the indicators used to measure the variables. There was also a predominance of repeated cross-sectional survey design, differences in community settings, and variability related to the randomization unit when randomization was implemented as part of the sample selection criteria; furthermore, particularities related to measures, limitations, and confounding factors were observed. The employed strategies, including their advantages and limitations, and the employed theories and frameworks are discussed, and risk communication, as the key element of the interventions, is emphasized. A methodological process of selecting and presenting the information to be communicated is recommended, and a systematic theoretical perspective to guide the communication of information is advised. The risk assessment concept, its essential elements, and the relevant role of risk perception are highlighted. It is fundamental for communication that statements targeting other people’s understanding be prepared using systematic data. PMID:27982169
May, Larissa S.; Rothman, Richard E.; Miller, Loren G.; Brooks, Gillian; Zocchi, Mark; Zatorski, Catherine; Dugas, Andrea F.; Ware, Chelsea E.; Jordan, Jeanne A.
2017-01-01
OBJECTIVE To determine whether real-time availability of rapid molecular results of Staphylococcus aureus would impact emergency department clinician antimicrobial selection for adults with cutaneous abscesses. DESIGN We performed a prospective, randomized controlled trial comparing a rapid molecular test with standard of care culture-based testing. Follow-up telephone calls were made at between 2 and 7 days, 1 month, and 3 months after discharge. SETTING Two urban, academic emergency departments. PATIENTS Patients at least 18 years old presenting with a chief complaint of abscess, cellulitis, or insect bite and receiving incision and drainage were eligible. Seven hundred seventy-eight people were assessed for eligibility and 252 met eligibility criteria. METHODS Clinician antibiotic selection and clinical outcomes were evaluated. An ad hoc outcome of test performance was performed. RESULTS We enrolled 252 patients and 126 were randomized to receive the rapid test. Methicillin-susceptible S. aureus–positive patients receiving rapid test results were prescribed beta-lactams more often than controls (absolute difference, 14.5% [95% CI, 1.1%–30.1%]) whereas methicillin-resistant S. aureus–positive patients receiving rapid test results were more often prescribed anti–methicillin-resistant S. aureus antibiotics (absolute difference, 21.5% [95% CI, 10.1%–33.0%]). There were no significant differences between the 2 groups in 1-week or 3-month clinical outcomes. CONCLUSION Availability of rapid molecular test results after incision and drainage was associated with more-targeted antibiotic selection. TRIAL REGISTRATION clinicaltrials.gov Identifier: NCT01523899 PMID:26306996
Cheng, Qiang; Zhou, Hongbo; Cheng, Jie
2011-06-01
Selecting features for multiclass classification is a critically important task for pattern recognition and machine learning applications. Especially challenging is selecting an optimal subset of features from high-dimensional data, which typically have many more variables than observations and contain significant noise, missing components, or outliers. Existing methods either cannot handle high-dimensional data efficiently or scalably, or can only obtain local optimum instead of global optimum. Toward the selection of the globally optimal subset of features efficiently, we introduce a new selector--which we call the Fisher-Markov selector--to identify those features that are the most useful in describing essential differences among the possible groups. In particular, in this paper we present a way to represent essential discriminating characteristics together with the sparsity as an optimization objective. With properly identified measures for the sparseness and discriminativeness in possibly high-dimensional settings, we take a systematic approach for optimizing the measures to choose the best feature subset. We use Markov random field optimization techniques to solve the formulated objective functions for simultaneous feature selection. Our results are noncombinatorial, and they can achieve the exact global optimum of the objective function for some special kernels. The method is fast; in particular, it can be linear in the number of features and quadratic in the number of observations. We apply our procedure to a variety of real-world data, including mid--dimensional optical handwritten digit data set and high-dimensional microarray gene expression data sets. The effectiveness of our method is confirmed by experimental results. In pattern recognition and from a model selection viewpoint, our procedure says that it is possible to select the most discriminating subset of variables by solving a very simple unconstrained objective function which in fact can be obtained with an explicit expression.
O'Boyle, Noel M; Palmer, David S; Nigsch, Florian; Mitchell, John BO
2008-01-01
Background We present a novel feature selection algorithm, Winnowing Artificial Ant Colony (WAAC), that performs simultaneous feature selection and model parameter optimisation for the development of predictive quantitative structure-property relationship (QSPR) models. The WAAC algorithm is an extension of the modified ant colony algorithm of Shen et al. (J Chem Inf Model 2005, 45: 1024–1029). We test the ability of the algorithm to develop a predictive partial least squares model for the Karthikeyan dataset (J Chem Inf Model 2005, 45: 581–590) of melting point values. We also test its ability to perform feature selection on a support vector machine model for the same dataset. Results Starting from an initial set of 203 descriptors, the WAAC algorithm selected a PLS model with 68 descriptors which has an RMSE on an external test set of 46.6°C and R2 of 0.51. The number of components chosen for the model was 49, which was close to optimal for this feature selection. The selected SVM model has 28 descriptors (cost of 5, ε of 0.21) and an RMSE of 45.1°C and R2 of 0.54. This model outperforms a kNN model (RMSE of 48.3°C, R2 of 0.47) for the same data and has similar performance to a Random Forest model (RMSE of 44.5°C, R2 of 0.55). However it is much less prone to bias at the extremes of the range of melting points as shown by the slope of the line through the residuals: -0.43 for WAAC/SVM, -0.53 for Random Forest. Conclusion With a careful choice of objective function, the WAAC algorithm can be used to optimise machine learning and regression models that suffer from overfitting. Where model parameters also need to be tuned, as is the case with support vector machine and partial least squares models, it can optimise these simultaneously. The moving probabilities used by the algorithm are easily interpreted in terms of the best and current models of the ants, and the winnowing procedure promotes the removal of irrelevant descriptors. PMID:18959785
Sperm competition games: sperm selection by females.
Ball, M A; Parker, G A
2003-09-07
We analyse a co-evolutionary sexual conflict game, in which males compete for fertilizations (sperm competition) and females operate sperm selection against unfavourable ejaculates (cryptic female choice). For simplicity, each female mates with two males per reproductive event, and the competing ejaculates are of two types, favourable (having high viability or success) or unfavourable (where progeny are less successful). Over evolutionary time, females can increase their level of sperm selection (measured as the proportion of unfavourable sperm eliminated) by paying a fecundity cost. Males can regulate sperm allocations depending on whether they will be favoured or disfavoured, but increasing sperm allocation reduces their mating rate. The resolution of this game depends on whether males are equal, or unequal. Males could be equal: each is favoured with probability, p, reflecting the proportion of females in the population that favour his ejaculate (the 'random-roles' model); different males are favoured by different sets of females. Alternatively, males could be unequal: given males are perceived consistently by all females as two distinct types, favoured and disfavoured, where p is now the frequency of the favoured male type in the population (the 'constant-types' model). In both cases, the evolutionarily stable strategy (ESS) is for females initially to increase sperm selection from zero as the viability of offspring from unfavourable ejaculates falls below that of favourable ejaculates. But in the random-roles model, sperm selection decreases again towards zero as the unfavourable ejaculates become disastrous (i.e. as their progeny viability decreases towards zero). This occurs because males avoid expenditure in unfavourable matings, to conserve sperm for matings in the favoured role where their offspring have high viability, thus allowing females to relax sperm selection. If sperm selection is costly to females, ESS sperm selection is high across a region of intermediate viabilities. If it is uncostly, there is no ESS in this region unless sperm limitation (i.e. some eggs fail to be fertilized because sperm numbers are too low) is included into the model. In the constant-types model, no relaxation of sperm selection occurs at very low viabilities of disfavoured male progeny. If sperm selection is sufficiently costly, ESS sperm selection increases as progeny viability decreases down towards zero; but if it is uncostly, there is no ESS at the lowest viabilities, and unlike the random-roles model, this cannot be stabilized by including sperm limitation. Sperm allocations in the ESS regions differ between the two models. With random roles, males always allocate more sperm in the favoured role. With constant types, the male type that is favoured allocates less sperm than the disfavoured type. These results suggests that empiricists studying cryptic female choice and sperm allocation patterns need to determine whether sperm selection is applied differently, or consistently, on given males by different females in the same population.
The Nature and Meaning of Body Concepts in Everyday Language and Theoretical Discourse.
Pollio, Howard R; Finn, Mike; Custer, Morgun
2016-06-01
Within phenomenological philosophy four topics, (1) Body, (2) Time, (3) Others and the Social Order and (4) World serve as the major contexts in which human perception, action and reflection take place. At present only three of these domains have been studied from an empirical perspective, leaving Body as the one domain requiring further analysis. Given this state of affairs, the purpose of the present study is to determine the everyday and theoretical meanings of body. To accomplish this task participants coded randomly selected body- related words into groups on the basis of having similar meanings. Once these groupings were established they were then evaluated by statistical clustering and multidimensional scaling procedures. Results indicated that it was possible to define the everyday meaning of the human experience of the human body in terms of the following set of themes: (1) inside/outside, (2) visible/not visible, (3) vitality and activity, (4) instrument and object and (5) appearance and self-expression. Concerns about the representativeness of the words studied led to the development and use of individual word pools from which a set of 50 partially different words was randomly selected for each participant. Results indicated little difference between themes produced in the present study when compared with those of an earlier study. The specific themes derived from the present study were then related to embodiment issues as reflected in the philosophical writings of Merleau-Ponty, the psycholinguistic analyses of Lakoff and Johnson and experimental psychology.
NASA Astrophysics Data System (ADS)
Algrain, Marcelo C.; Powers, Richard M.
1997-05-01
A case study, written in a tutorial manner, is presented where a comprehensive computer simulation is developed to determine the driving factors contributing to spacecraft pointing accuracy and stability. Models for major system components are described. Among them are spacecraft bus, attitude controller, reaction wheel assembly, star-tracker unit, inertial reference unit, and gyro drift estimators (Kalman filter). The predicted spacecraft performance is analyzed for a variety of input commands and system disturbances. The primary deterministic inputs are the desired attitude angles and rate set points. The stochastic inputs include random torque disturbances acting on the spacecraft, random gyro bias noise, gyro random walk, and star-tracker noise. These inputs are varied over a wide range to determine their effects on pointing accuracy and stability. The results are presented in the form of trade- off curves designed to facilitate the proper selection of subsystems so that overall spacecraft pointing accuracy and stability requirements are met.
Asiimwe, Stephen; Oloya, James; Song, Xiao; Whalen, Christopher C
2014-12-01
Unsupervised HIV self-testing (HST) has potential to increase knowledge of HIV status; however, its accuracy is unknown. To estimate the accuracy of unsupervised HST in field settings in Uganda, we performed a non-blinded, randomized controlled, non-inferiority trial of unsupervised compared with supervised HST among selected high HIV risk fisherfolk (22.1 % HIV Prevalence) in three fishing villages in Uganda between July and September 2013. The study enrolled 246 participants and randomized them in a 1:1 ratio to unsupervised HST or provider-supervised HST. In an intent-to-treat analysis, the HST sensitivity was 90 % in the unsupervised arm and 100 % among the provider-supervised, yielding a difference 0f -10 % (90 % CI -21, 1 %); non-inferiority was not shown. In a per protocol analysis, the difference in sensitivity was -5.6 % (90 % CI -14.4, 3.3 %) and did show non-inferiority. We conclude that unsupervised HST is feasible in rural Africa and may be non-inferior to provider-supervised HST.
Demaerschalk, Bart M; Brown, Robert D; Roubin, Gary S; Howard, Virginia J; Cesko, Eldina; Barrett, Kevin M; Longbottom, Mary E; Voeks, Jenifer H; Chaturvedi, Seemant; Brott, Thomas G; Lal, Brajesh K; Meschia, James F; Howard, George
2017-09-01
Multicenter clinical trials attempt to select sites that can move rapidly to randomization and enroll sufficient numbers of patients. However, there are few assessments of the success of site selection. In the CREST-2 (Carotid Revascularization and Medical Management for Asymptomatic Carotid Stenosis Trials), we assess factors associated with the time between site selection and authorization to randomize, the time between authorization to randomize and the first randomization, and the average number of randomizations per site per month. Potential factors included characteristics of the site, specialty of the principal investigator, and site type. For 147 sites, the median time between site selection to authorization to randomize was 9.9 months (interquartile range, 7.7, 12.4), and factors associated with early site activation were not identified. The median time between authorization to randomize and a randomization was 4.6 months (interquartile range, 2.6, 10.5). Sites with authorization to randomize in only the carotid endarterectomy study were slower to randomize, and other factors examined were not significantly associated with time-to-randomization. The recruitment rate was 0.26 (95% confidence interval, 0.23-0.28) patients per site per month. By univariate analysis, factors associated with faster recruitment were authorization to randomize in both trials, principal investigator specialties of interventional radiology and cardiology, pre-trial reported performance >50 carotid angioplasty and stenting procedures per year, status in the top half of recruitment in the CREST trial, and classification as a private health facility. Participation in StrokeNet was associated with slower recruitment as compared with the non-StrokeNet sites. Overall, selection of sites with high enrollment rates will likely require customization to align the sites selected to the factor under study in the trial. URL: http://www.clinicaltrials.gov. Unique identifier: NCT02089217. © 2017 American Heart Association, Inc.
ANALYSIS OF SAMPLING TECHNIQUES FOR IMBALANCED DATA: AN N=648 ADNI STUDY
Dubey, Rashmi; Zhou, Jiayu; Wang, Yalin; Thompson, Paul M.; Ye, Jieping
2013-01-01
Many neuroimaging applications deal with imbalanced imaging data. For example, in Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset, the mild cognitive impairment (MCI) cases eligible for the study are nearly two times the Alzheimer’s disease (AD) patients for structural magnetic resonance imaging (MRI) modality and six times the control cases for proteomics modality. Constructing an accurate classifier from imbalanced data is a challenging task. Traditional classifiers that aim to maximize the overall prediction accuracy tend to classify all data into the majority class. In this paper, we study an ensemble system of feature selection and data sampling for the class imbalance problem. We systematically analyze various sampling techniques by examining the efficacy of different rates and types of undersampling, oversampling, and a combination of over and under sampling approaches. We thoroughly examine six widely used feature selection algorithms to identify significant biomarkers and thereby reduce the complexity of the data. The efficacy of the ensemble techniques is evaluated using two different classifiers including Random Forest and Support Vector Machines based on classification accuracy, area under the receiver operating characteristic curve (AUC), sensitivity, and specificity measures. Our extensive experimental results show that for various problem settings in ADNI, (1). a balanced training set obtained with K-Medoids technique based undersampling gives the best overall performance among different data sampling techniques and no sampling approach; and (2). sparse logistic regression with stability selection achieves competitive performance among various feature selection algorithms. Comprehensive experiments with various settings show that our proposed ensemble model of multiple undersampled datasets yields stable and promising results. PMID:24176869
Analysis of sampling techniques for imbalanced data: An n = 648 ADNI study.
Dubey, Rashmi; Zhou, Jiayu; Wang, Yalin; Thompson, Paul M; Ye, Jieping
2014-02-15
Many neuroimaging applications deal with imbalanced imaging data. For example, in Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, the mild cognitive impairment (MCI) cases eligible for the study are nearly two times the Alzheimer's disease (AD) patients for structural magnetic resonance imaging (MRI) modality and six times the control cases for proteomics modality. Constructing an accurate classifier from imbalanced data is a challenging task. Traditional classifiers that aim to maximize the overall prediction accuracy tend to classify all data into the majority class. In this paper, we study an ensemble system of feature selection and data sampling for the class imbalance problem. We systematically analyze various sampling techniques by examining the efficacy of different rates and types of undersampling, oversampling, and a combination of over and undersampling approaches. We thoroughly examine six widely used feature selection algorithms to identify significant biomarkers and thereby reduce the complexity of the data. The efficacy of the ensemble techniques is evaluated using two different classifiers including Random Forest and Support Vector Machines based on classification accuracy, area under the receiver operating characteristic curve (AUC), sensitivity, and specificity measures. Our extensive experimental results show that for various problem settings in ADNI, (1) a balanced training set obtained with K-Medoids technique based undersampling gives the best overall performance among different data sampling techniques and no sampling approach; and (2) sparse logistic regression with stability selection achieves competitive performance among various feature selection algorithms. Comprehensive experiments with various settings show that our proposed ensemble model of multiple undersampled datasets yields stable and promising results. © 2013 Elsevier Inc. All rights reserved.
Pugliese, Laura; Woodriff, Molly; Crowley, Olga; Lam, Vivian; Sohn, Jeremy; Bradley, Scott
2016-03-16
Rising rates of smartphone ownership highlight opportunities for improved mobile application usage in clinical trials. While current methods call for device provisioning, the "bring your own device" (BYOD) model permits participants to use personal phones allowing for improved patient engagement and lowered operational costs. However, more evidence is needed to demonstrate the BYOD model's feasibility in research settings. To assess if CentrosHealth, a mobile application designed to support trial compliance, produces different outcomes in medication adherence and application engagement when distributed through study-provisioned devices compared to the BYOD model. 87 participants were randomly selected to use the mobile application or no intervention for a 28-day pilot study at a 2:1 randomization ratio (2 intervention: 1 control) and asked to consume a twice-daily probiotic supplement. The application users were further randomized into two groups: receiving the application on a personal "BYOD" or study-provided smartphone. In-depth interviews were performed in a randomly-selected subset of the intervention group (five BYOD and five study-provided smartphone users). The BYOD subgroup showed significantly greater engagement than study-provided phone users, as shown by higher application use frequency and duration over the study period. The BYOD subgroup also demonstrated a significant effect of engagement on medication adherence for number of application sessions (unstandardized regression coefficient beta=0.0006, p=0.02) and time spent therein (beta=0.00001, p=0.03). Study-provided phone users showed higher initial adherence rates, but greater decline (5.7%) than BYOD users (0.9%) over the study period. In-depth interviews revealed that participants preferred the BYOD model over using study-provided devices. Results indicate that the BYOD model is feasible in health research settings and improves participant experience, calling for further BYOD model validity assessment. Although group differences in medication adherence decline were insignificant, the greater trend of decline in provisioned device users warrants further investigation to determine if trends reach significance over time. Significantly higher application engagement rates and effect of engagement on medication adherence in the BYOD subgroup similarly imply that greater application engagement may correlate to better medication adherence over time.
Shireman, Theresa I; Mahnken, Jonathan D; Phadnis, Milind A; Ellerbeck, Edward F
2016-03-25
Within-class comparative effectiveness studies of β-blockers have not been performed in the chronic dialysis setting. With widespread cardiac disease in these patients and potential mechanistic differences within the class, we examined whether mortality and morbidity outcomes varied between cardio-selective and non-selective β-blockers. Retrospective observational study of within class β-blocker exposure among a national cohort of new chronic dialysis patients (N = 52,922) with hypertension and dual eligibility (Medicare-Medicaid). New β-blocker users were classified according to their exclusive use of one of the subclasses. Outcomes were all-cause mortality (ACM) and cardiovascular morbidity and mortality (CVMM). The associations of cardio-selective and non-selective agents on outcomes were adjusted for baseline characteristics using Cox proportional hazards. There were 4938 new β-blocker users included in the ACM model and 4537 in the CVMM model: 77 % on cardio-selective β-blockers. Exposure to cardio-selective and non-selective agents during the follow-up period was comparable, as measured by proportion of days covered (0.56 vs. 0.53 in the ACM model; 0.56 vs 0.54 in the CVMM model). Use of cardio-selective β-blockers was associated with lower risk for mortality (AHR = 0.84; 99 % CI = 0.72-0.97, p = 0.0026) and lower risk for CVMM events (AHR = 0.86; 99 % CI = 0.75-0.99, p = 0.0042). Among new β-blockers users on chronic dialysis, cardio-selective agents were associated with a statistically significant 16 % reduction in mortality and 14 % in cardiovascular morbidity and mortality relative to non-selective β-blocker users. A randomized clinical trial would be appropriate to more definitively answer whether cardio-selective β-blockers are superior to non-selective β-blockers in the setting of chronic dialysis.
Comparison of Oral Reading Errors between Contextual Sentences and Random Words among Schoolchildren
ERIC Educational Resources Information Center
Khalid, Nursyairah Mohd; Buari, Noor Halilah; Chen, Ai-Hong
2017-01-01
This paper compares the oral reading errors between the contextual sentences and random words among schoolchildren. Two sets of reading materials were developed to test the oral reading errors in 30 schoolchildren (10.00±1.44 years). Set A was comprised contextual sentences while Set B encompassed random words. The schoolchildren were asked to…
Oster, Natalia V; Carney, Patricia A; Allison, Kimberly H; Weaver, Donald L; Reisch, Lisa M; Longton, Gary; Onega, Tracy; Pepe, Margaret; Geller, Berta M; Nelson, Heidi D; Ross, Tyler R; Tosteson, Aanna N A; Elmore, Joann G
2013-02-05
Diagnostic test sets are a valuable research tool that contributes importantly to the validity and reliability of studies that assess agreement in breast pathology. In order to fully understand the strengths and weaknesses of any agreement and reliability study, however, the methods should be fully reported. In this paper we provide a step-by-step description of the methods used to create four complex test sets for a study of diagnostic agreement among pathologists interpreting breast biopsy specimens. We use the newly developed Guidelines for Reporting Reliability and Agreement Studies (GRRAS) as a basis to report these methods. Breast tissue biopsies were selected from the National Cancer Institute-funded Breast Cancer Surveillance Consortium sites. We used a random sampling stratified according to woman's age (40-49 vs. ≥50), parenchymal breast density (low vs. high) and interpretation of the original pathologist. A 3-member panel of expert breast pathologists first independently interpreted each case using five primary diagnostic categories (non-proliferative changes, proliferative changes without atypia, atypical ductal hyperplasia, ductal carcinoma in situ, and invasive carcinoma). When the experts did not unanimously agree on a case diagnosis a modified Delphi method was used to determine the reference standard consensus diagnosis. The final test cases were stratified and randomly assigned into one of four unique test sets. We found GRRAS recommendations to be very useful in reporting diagnostic test set development and recommend inclusion of two additional criteria: 1) characterizing the study population and 2) describing the methods for reference diagnosis, when applicable.
Forecasting Space Weather-Induced GPS Performance Degradation Using Random Forest
NASA Astrophysics Data System (ADS)
Filjar, R.; Filic, M.; Milinkovic, F.
2017-12-01
Space weather and ionospheric dynamics have a profound effect on positioning performance of the Global Satellite Navigation System (GNSS). However, the quantification of that effect is still the subject of scientific activities around the world. In the latest contribution to the understanding of the space weather and ionospheric effects on satellite-based positioning performance, we conducted a study of several candidates for forecasting method for space weather-induced GPS positioning performance deterioration. First, a 5-days set of experimentally collected data was established, encompassing the space weather and ionospheric activity indices (including: the readings of the Sudden Ionospheric Disturbance (SID) monitors, components of geomagnetic field strength, global Kp index, Dst index, GPS-derived Total Electron Content (TEC) samples, standard deviation of TEC samples, and sunspot number) and observations of GPS positioning error components (northing, easting, and height positioning error) derived from the Adriatic Sea IGS reference stations' RINEX raw pseudorange files in quiet space weather periods. This data set was split into the training and test sub-sets. Then, a selected set of supervised machine learning methods based on Random Forest was applied to the experimentally collected data set in order to establish the appropriate regional (the Adriatic Sea) forecasting models for space weather-induced GPS positioning performance deterioration. The forecasting models were developed in the R/rattle statistical programming environment. The forecasting quality of the regional forecasting models developed was assessed, and the conclusions drawn on the advantages and shortcomings of the regional forecasting models for space weather-caused GNSS positioning performance deterioration.
Classification of urine sediment based on convolution neural network
NASA Astrophysics Data System (ADS)
Pan, Jingjing; Jiang, Cunbo; Zhu, Tiantian
2018-04-01
By designing a new convolution neural network framework, this paper breaks the constraints of the original convolution neural network framework requiring large training samples and samples of the same size. Move and cropping the input images, generate the same size of the sub-graph. And then, the generated sub-graph uses the method of dropout, increasing the diversity of samples and preventing the fitting generation. Randomly select some proper subset in the sub-graphic set and ensure that the number of elements in the proper subset is same and the proper subset is not the same. The proper subsets are used as input layers for the convolution neural network. Through the convolution layer, the pooling, the full connection layer and output layer, we can obtained the classification loss rate of test set and training set. In the red blood cells, white blood cells, calcium oxalate crystallization classification experiment, the classification accuracy rate of 97% or more.
Whose data set is it anyway? Sharing raw data from randomized trials.
Vickers, Andrew J
2006-05-16
Sharing of raw research data is common in many areas of medical research, genomics being perhaps the most well-known example. In the clinical trial community investigators routinely refuse to share raw data from a randomized trial without giving a reason. Data sharing benefits numerous research-related activities: reproducing analyses; testing secondary hypotheses; developing and evaluating novel statistical methods; teaching; aiding design of future trials; meta-analysis; and, possibly, preventing error, fraud and selective reporting. Clinical trialists, however, sometimes appear overly concerned with being scooped and with misrepresentation of their work. Both possibilities can be avoided with simple measures such as inclusion of the original trialists as co-authors on any publication resulting from data sharing. Moreover, if we treat any data set as belonging to the patients who comprise it, rather than the investigators, such concerns fall away. Technological developments, particularly the Internet, have made data sharing generally a trivial logistical problem. Data sharing should come to be seen as an inherent part of conducting a randomized trial, similar to the way in which we consider ethical review and publication of study results. Journals and funding bodies should insist that trialists make raw data available, for example, by publishing data on the Web. If the clinical trial community continues to fail with respect to data sharing, we will only strengthen the public perception that we do clinical trials to benefit ourselves, not our patients.
NASA Astrophysics Data System (ADS)
Litvinenko, S. V.; Bielobrov, D. O.; Lysenko, V.; Skryshevsky, V. A.
2016-08-01
The electronic tongue based on the array of low selective photovoltaic (PV) sensors and principal component analysis is proposed for detection of various alcohol solutions. A sensor array is created at the forming of p-n junction on silicon wafer with porous silicon layer on the opposite side. A dynamical set of sensors is formed due to the inhomogeneous distribution of the surface recombination rate at this porous silicon side. The sensitive to molecular adsorption photocurrent is induced at the scanning of this side by laser beam. Water, ethanol, iso-propanol, and their mixtures were selected for testing. It is shown that the use of the random dispersion of surface recombination rates on different spots of the rear side of p-n junction and principal component analysis of PV signals allows identifying mentioned liquid substances and their mixtures.
Saunders, Gabrielle H; Biswas, Kousick; Serpi, Tracey; McGovern, Stephanie; Groer, Shirley; Stock, Eileen M; Magruder, Kathryn M; Storzbach, Daniel; Skelton, Kelly; Abrams, Thad; McCranie, Mark; Richerson, Joan; Dorn, Patricia A; Huang, Grant D; Fallon, Michael T
2017-11-01
Posttraumatic stress disorder (PTSD) is a leading cause of impairments in quality of life and functioning among Veterans. Service dogs have been promoted as an effective adjunctive intervention for PTSD, however published research is limited and design and implementation flaws in published studies limit validated conclusions. This paper describes the rationale for the study design, a detailed methodological description, and implementation challenges of a multisite randomized clinical trial examining the impact of service dogs on the on the functioning and quality of life of Veterans with PTSD. Trial design considerations prioritized participant and intervention (dog) safety, selection of an intervention comparison group that would optimize enrollment in all treatment arms, pragmatic methods to ensure healthy well-trained dogs, and the selection of outcomes for achieving scientific and clinical validity in a Veteran PTSD population. Since there is no blueprint for conducting a randomized clinical trial examining the impact of dogs on PTSD of this size and scope, it is our primary intent that the successful completion of this trial will set a benchmark for future trial design and scientific rigor, as well as guiding researchers aiming to better understand the role that dogs can have in the management of Veterans experiencing mental health conditions such as PTSD. Published by Elsevier Inc.
Dong, Qi; Elliott, Michael R; Raghunathan, Trivellore E
2014-06-01
Outside of the survey sampling literature, samples are often assumed to be generated by a simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs.
Dong, Qi; Elliott, Michael R.; Raghunathan, Trivellore E.
2017-01-01
Outside of the survey sampling literature, samples are often assumed to be generated by a simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs. PMID:29200608
Ensemble methods with simple features for document zone classification
NASA Astrophysics Data System (ADS)
Obafemi-Ajayi, Tayo; Agam, Gady; Xie, Bingqing
2012-01-01
Document layout analysis is of fundamental importance for document image understanding and information retrieval. It requires the identification of blocks extracted from a document image via features extraction and block classification. In this paper, we focus on the classification of the extracted blocks into five classes: text (machine printed), handwriting, graphics, images, and noise. We propose a new set of features for efficient classifications of these blocks. We present a comparative evaluation of three ensemble based classification algorithms (boosting, bagging, and combined model trees) in addition to other known learning algorithms. Experimental results are demonstrated for a set of 36503 zones extracted from 416 document images which were randomly selected from the tobacco legacy document collection. The results obtained verify the robustness and effectiveness of the proposed set of features in comparison to the commonly used Ocropus recognition features. When used in conjunction with the Ocropus feature set, we further improve the performance of the block classification system to obtain a classification accuracy of 99.21%.
Use of simulated data sets to evaluate the fidelity of metagenomic processing methods
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mavromatis, K; Ivanova, N; Barry, Kerrie
2007-01-01
Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods presently used to process metagenomic sequences, we constructed three simulated data sets of varying complexity by combining sequencing reads randomly selected from 113 isolate genomes. These data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition. We assembled sampled reads using three commonly used genome assemblers (Phrap, Arachne and JAZZ), and predicted genes using two popular gene-finding pipelines (fgenesb and CRITICA/GLIMMER). The phylogenetic origins of the assembled contigs were predicted using one sequence similarity-based ( blast hit distribution) and twomore » sequence composition-based (PhyloPythia, oligonucleotide frequencies) binning methods. We explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes. The simulated data sets are available online to facilitate standardized benchmarking of tools for metagenomic analysis.« less
Use of simulated data sets to evaluate the fidelity of Metagenomicprocessing methods
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mavromatis, Konstantinos; Ivanova, Natalia; Barry, Kerri
2006-12-01
Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods presently used to process metagenomic sequences, we constructed three simulated data sets of varying complexity by combining sequencing reads randomly selected from 113 isolate genomes. These data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition. We assembled sampled reads using three commonly used genome assemblers (Phrap, Arachne and JAZZ), and predicted genes using two popular gene finding pipelines (fgenesb and CRITICA/GLIMMER). The phylogenetic origins of the assembled contigs were predicted using one sequence similarity--based (blast hit distribution) and twomore » sequence composition--based (PhyloPythia, oligonucleotide frequencies) binning methods. We explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes. The simulated data sets are available online to facilitate standardized benchmarking of tools for metagenomic analysis.« less
Shirazi, M; Zeinaloo, A A; Parikh, S V; Sadeghi, M; Taghva, A; Arbabi, M; Kashani, A Sabouri; Alaeddini, F; Lonka, K; Wahlström, R
2008-04-01
The Prochaska model of readiness to change has been proposed to be used in educational interventions to improve medical care. To evaluate the impact on readiness to change of an educational intervention on management of depressive disorders based on a modified version of the Prochaska model in comparison with a standard programme of continuing medical education (CME). This is a randomized controlled trial within primary care practices in southern Tehran, Iran. The participants included 192 general physicians working in primary care (GPs) were recruited after random selection and randomized to intervention (96) and control (96). Intervention consisted of interactive, learner-centred educational methods in large and small group settings depending on the GPs' stages of readiness to change. Change in stage of readiness to change measured by the modified version of the Prochaska questionnaire was the The final number of participants was 78 (81%) in the intervention arm and 81 (84%) in the control arm. Significantly (P < 0.01), more GPs (57/96 = 59% versus 12/96 = 12%) in the intervention group changed to higher stages of readiness to change. The intervention effect was 46% points (P < 0.001) and 50% points (P < 0.001) in the large and small group setting, respectively. Educational formats that suit different stages of learning can support primary care doctors to reach higher stages of behavioural change in the topic of depressive disorders. Our findings have practical implications for conducting CME programmes in Iran and are possibly also applicable in other parts of the world.
47 CFR 1.1604 - Post-selection hearings.
Code of Federal Regulations, 2010 CFR
2010-10-01
... 47 Telecommunication 1 2010-10-01 2010-10-01 false Post-selection hearings. 1.1604 Section 1.1604 Telecommunication FEDERAL COMMUNICATIONS COMMISSION GENERAL PRACTICE AND PROCEDURE Random Selection Procedures for Mass Media Services General Procedures § 1.1604 Post-selection hearings. (a) Following the random...
47 CFR 1.1604 - Post-selection hearings.
Code of Federal Regulations, 2011 CFR
2011-10-01
... 47 Telecommunication 1 2011-10-01 2011-10-01 false Post-selection hearings. 1.1604 Section 1.1604 Telecommunication FEDERAL COMMUNICATIONS COMMISSION GENERAL PRACTICE AND PROCEDURE Random Selection Procedures for Mass Media Services General Procedures § 1.1604 Post-selection hearings. (a) Following the random...
NASA Astrophysics Data System (ADS)
Tehrany, M. Sh.; Jones, S.
2017-10-01
This paper explores the influence of the extent and density of the inventory data on the final outcomes. This study aimed to examine the impact of different formats and extents of the flood inventory data on the final susceptibility map. An extreme 2011 Brisbane flood event was used as the case study. LR model was applied using polygon and point formats of the inventory data. Random points of 1000, 700, 500, 300, 100 and 50 were selected and susceptibility mapping was undertaken using each group of random points. To perform the modelling Logistic Regression (LR) method was selected as it is a very well-known algorithm in natural hazard modelling due to its easily understandable, rapid processing time and accurate measurement approach. The resultant maps were assessed visually and statistically using Area under Curve (AUC) method. The prediction rates measured for susceptibility maps produced by polygon, 1000, 700, 500, 300, 100 and 50 random points were 63 %, 76 %, 88 %, 80 %, 74 %, 71 % and 65 % respectively. Evidently, using the polygon format of the inventory data didn't lead to the reasonable outcomes. In the case of random points, raising the number of points consequently increased the prediction rates, except for 1000 points. Hence, the minimum and maximum thresholds for the extent of the inventory must be set prior to the analysis. It is concluded that the extent and format of the inventory data are also two of the influential components in the precision of the modelling.
Lin, Kuan-Cheng; Hsieh, Yi-Hsiu
2015-10-01
The classification and analysis of data is an important issue in today's research. Selecting a suitable set of features makes it possible to classify an enormous quantity of data quickly and efficiently. Feature selection is generally viewed as a problem of feature subset selection, such as combination optimization problems. Evolutionary algorithms using random search methods have proven highly effective in obtaining solutions to problems of optimization in a diversity of applications. In this study, we developed a hybrid evolutionary algorithm based on endocrine-based particle swarm optimization (EPSO) and artificial bee colony (ABC) algorithms in conjunction with a support vector machine (SVM) for the selection of optimal feature subsets for the classification of datasets. The results of experiments using specific UCI medical datasets demonstrate that the accuracy of the proposed hybrid evolutionary algorithm is superior to that of basic PSO, EPSO and ABC algorithms, with regard to classification accuracy using subsets with a reduced number of features.
Using machine learning for sequence-level automated MRI protocol selection in neuroradiology.
Brown, Andrew D; Marotta, Thomas R
2018-05-01
Incorrect imaging protocol selection can lead to important clinical findings being missed, contributing to both wasted health care resources and patient harm. We present a machine learning method for analyzing the unstructured text of clinical indications and patient demographics from magnetic resonance imaging (MRI) orders to automatically protocol MRI procedures at the sequence level. We compared 3 machine learning models - support vector machine, gradient boosting machine, and random forest - to a baseline model that predicted the most common protocol for all observations in our test set. The gradient boosting machine model significantly outperformed the baseline and demonstrated the best performance of the 3 models in terms of accuracy (95%), precision (86%), recall (80%), and Hamming loss (0.0487). This demonstrates the feasibility of automating sequence selection by applying machine learning to MRI orders. Automated sequence selection has important safety, quality, and financial implications and may facilitate improvements in the quality and safety of medical imaging service delivery.
Genetic analysis of groups of mid-infrared predicted fatty acids in milk.
Narayana, S G; Schenkel, F S; Fleming, A; Koeck, A; Malchiodi, F; Jamrozik, J; Johnston, J; Sargolzaei, M; Miglior, F
2017-06-01
The objective of this study was to investigate genetic variability of mid-infrared predicted fatty acid groups in Canadian Holstein cattle. Genetic parameters were estimated for 5 groups of fatty acids: short-chain (4 to 10 carbons), medium-chain (11 to 16 carbons), long-chain (17 to 22 carbons), saturated, and unsaturated fatty acids. The data set included 49,127 test-day records from 10,029 first-lactation Holstein cows in 810 herds. The random regression animal test-day model included days in milk, herd-test date, and age-season of calving (polynomial regression) as fixed effects, herd-year of calving, animal additive genetic effect, and permanent environment effects as random polynomial regressions, and random residual effect. Legendre polynomials of the third degree were selected for the fixed regression for age-season of calving effect and Legendre polynomials of the fourth degree were selected for the random regression for animal additive genetic, permanent environment, and herd-year effect. The average daily heritability over the lactation for the medium-chain fatty acid group (0.32) was higher than for the short-chain (0.24) and long-chain (0.23) fatty acid groups. The average daily heritability for the saturated fatty acid group (0.33) was greater than for the unsaturated fatty acid group (0.21). Estimated average daily genetic correlations were positive among all fatty acid groups and ranged from moderate to high (0.63-0.96). The genetic correlations illustrated similarities and differences in their origin and the makeup of the groupings based on chain length and saturation. These results provide evidence for the existence of genetic variation in mid-infrared predicted fatty acid groups, and the possibility of improving milk fatty acid profile through genetic selection in Canadian dairy cattle. Copyright © 2017 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
2013-01-01
Background To evaluate the effectiveness of a new multifactorial intervention to improve health care for chronic ischemic heart disease patients in primary care. The strategy has two components: a) organizational for the patient/professional relationship and b) training for professionals. Methods/design Experimental study. Randomized clinical trial. Follow-up period: one year. Study setting: primary care, multicenter (15 health centers). For the intervention group 15 health centers are selected from those participating in ESCARVAL study. Once the center agreed to participate patients are randomly selected from the total amount of patients with ischemic heart disease registered in the electronic health records. For the control group a random sample of patients with ischemic heart disease is selected from all 72 health centers electronic records. Intervention components: a) Organizational intervention on the patient/professional relationship. Centered on the Chronic Care Model, the Stanford Expert Patient Program and the Kaiser Permanente model: Teamwork, informed and active patient, decision making shared with the patient, recommendations based on clinical guidelines, single electronic medical history per patient that allows the use of indicators for risk monitoring and stratification. b) Formative strategy for professionals: 4 face-to-face training workshops (one every 3 months), monthly update clinical sessions, online tutorial by a cardiologist, availability through the intranet of the action protocol and related documents. Measurements: Blood pressure, blood glucose, HbA1c, lipid profile and smoking. Frequent health care visits. Number of hospitalizations related to vascular disease. Therapeutic compliance. Drug use. Discussion This study aims to evaluate the efficacy of a multifactorial intervention strategy involving patients with ischemic heart disease for the improvement of the degree of control of the cardiovascular risk factors and of the quality of life, number of visits, and number of hospitalizations. Trial registration NCT01826929 PMID:23915267
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tai, An, E-mail: atai@mcw.edu; Liang, Zhiwen; Radiation Oncology Center, Wuhan Union Hospital, Huazhong University of Science and Technology, Wuhan
2013-08-01
Purpose: The purposes of this study were to quantify respiration-induced organ motions for pancreatic cancer patients and to explore strategies to account for these motions. Methods and Materials: Both 3-dimensional computed tomography (3DCT) and 4-dimensional computed tomography (4DCT) scans were acquired sequentially for 15 pancreatic cancer patients, including 10 randomly selected patients and 5 patients selected from a subgroup of patients with large tumor respiratory motions. 3DCTs were fused with 2 sets of 4DCT data at the end of exhale phase (50%) and the end of inhale phase (0%). The target was delineated on the 50% and 0% phase CTmore » sets, and the organs at risk were drawn on the 3DCT. These contours were populated to the CT sets at other respiratory phases based on deformable image registration. Internal target volumes (ITV) were generated by tracing the target contours of all phases (ITV{sub 10}), 3 phases of 0%, 20% and 50% (ITV{sub 3}), and 2 phases of 0% and 50% (ITV{sub 2}). ITVs generated from phase images were compared using percentage of volume overlap, Dice coefficient, geometric centers, and average surface distance. Results: Volume variations of pancreas, kidneys, and liver as a function of respiratory phases were small (<5%) during respiration. For the 10 randomly selected patients, peak-to-peak amplitudes of liver, left kidney, right kidney, and the target along the superior-inferior (SI) direction were 7.9 ± 3.2 mm, 7.1 ± 3.1 mm, 5.7 ± 3.2 mm, and 5.9 ± 2.8 mm, respectively. The percentage of volume overlap and Dice coefficient were 92% ± 1% and 96% ± 1% between ITV{sub 10} and ITV{sub 2} and 96% ± 1% and 98% ± 1% between ITV{sub 10} and ITV{sub 3}, respectively. The percentage of volume overlap between ITV{sub 10} and ITV{sub 3} was 93.6 ± 1.1 for patients with tumor motion >8 mm. Conclusions: Appropriate motion management strategies are proposed for radiation treatment planning of pancreatic tumors based on magnitudes of tumor respiratory motions.« less
Nygren, Peggy; Nelson, Heidi D.; Klein, Jonathan
2004-01-01
BACKGROUND We wanted to evaluate the benefits and harms of screening children in primary health care settings for abuse and neglect resulting from family violence by examining the evidence on the performance of screening instruments and the effectiveness of interventions. METHODS We searched for relevant studies in MEDLINE, PsycINFO, CINAHL, ERIC, Cochrane Controlled Trials Register, and reference lists. English language abstracts with original data about family violence against children focusing on screening and interventions initiated or based in health care settings were included. We extracted selected information about study design, patient populations and settings, methods of assessment or intervention, and outcome measures, and applied a set of criteria to evaluate study quality. RESULTS All instruments designed to screen for child abuse and neglect were directed to parents, particularly pregnant women. These instruments had fairly high sensitivity but low specificity when administered in high-risk study populations and have not been widely tested in other populations. Randomized controlled trials of frequent nurse home visitation programs beginning during pregnancy that address behavioral and psychological factors indicated improved abuse measures and outcomes. No studies were identified about interventions in older children or harms associated with screening and intervention. CONCLUSIONS No trials of the effectiveness of screening in a health care setting have been published. Clinician referrals to nurse home visitation during pregnancy and in early childhood may reduce abuse in selected populations. There are no studies about harms of screening and interventions. PMID:15083858
How precise are reported protein coordinate data?
Konagurthu, Arun S; Allison, Lloyd; Abramson, David; Stuckey, Peter J; Lesk, Arthur M
2014-03-01
Atomic coordinates in the Worldwide Protein Data Bank (wwPDB) are generally reported to greater precision than the experimental structure determinations have actually achieved. By using information theory and data compression to study the compressibility of protein atomic coordinates, it is possible to quantify the amount of randomness in the coordinate data and thereby to determine the realistic precision of the reported coordinates. On average, the value of each C(α) coordinate in a set of selected protein structures solved at a variety of resolutions is good to about 0.1 Å.
A Fundamental Relationship Between Genotype Frequencies and Fitnesses
Lachance, Joseph
2008-01-01
The set of possible postselection genotype frequencies in an infinite, randomly mating population is found. Geometric mean heterozygote frequency divided by geometric mean homozygote frequency equals two times the geometric mean heterozygote fitness divided by geometric mean homozygote fitness. The ratio of genotype frequencies provides a measure of genetic variation that is independent of allele frequencies. When this ratio does not equal two, either selection or population structure is present. Within-population HapMap data show population-specific patterns, while pooled data show an excess of homozygotes. PMID:18780726
Algorithms for selecting informative marker panels for population assignment.
Rosenberg, Noah A
2005-11-01
Given a set of potential source populations, genotypes of an individual of unknown origin at a collection of markers can be used to predict the correct source population of the individual. For improved efficiency, informative markers can be chosen from a larger set of markers to maximize the accuracy of this prediction. However, selecting the loci that are individually most informative does not necessarily produce the optimal panel. Here, using genotypes from eight species--carp, cat, chicken, dog, fly, grayling, human, and maize--this univariate accumulation procedure is compared to new multivariate "greedy" and "maximin" algorithms for choosing marker panels. The procedures generally suggest similar panels, although the greedy method often recommends inclusion of loci that are not chosen by the other algorithms. In seven of the eight species, when applied to five or more markers, all methods achieve at least 94% assignment accuracy on simulated individuals, with one species--dog--producing this level of accuracy with only three markers, and the eighth species--human--requiring approximately 13-16 markers. The new algorithms produce substantial improvements over use of randomly selected markers; where differences among the methods are noticeable, the greedy algorithm leads to slightly higher probabilities of correct assignment. Although none of the approaches necessarily chooses the panel with optimal performance, the algorithms all likely select panels with performance near enough to the maximum that they all are suitable for practical use.
Duiverman, Marieke L; Windisch, Wolfram; Storre, Jan H; Wijkstra, Peter J
2016-04-01
Recently, clear benefits have been shown from long-term noninvasive ventilation (NIV) in stable chronic obstructive pulmonary disease (COPD) patients with chronic hypercapnic respiratory failure. In our opinion, these benefits are confirmed and nocturnal NIV using sufficiently high inspiratory pressures should be considered in COPD patients with chronic hypercapnic respiratory failure in stable disease, preferably combined with pulmonary rehabilitation. In contrast, clear benefits from (continuing) NIV at home after an exacerbation in patients who remain hypercapnic have not been shown. In this review we will discuss the results of five trials investigating the use of home nocturnal NIV in patients with prolonged hypercapnia after a COPD exacerbation with acute hypercapnic respiratory failure. Although some uncontrolled trials might have shown some benefits of this therapy, the largest randomized controlled trial did not show benefits in terms of hospital readmission or death. However, further studies are necessary to select the patients that optimally benefit, select the right moment to initiate home NIV, select the optimal ventilatory settings, and to choose optimal follow up programmes. Furthermore, there is insufficient knowledge about the optimal ventilatory settings in the post-exacerbation period. Finally, we are not well informed about exact reasons for readmission in patients on NIV, the course of the exacerbation and the treatment instituted. A careful follow up might probably be necessary to prevent deterioration on NIV early. © The Author(s), 2016.
The emergence of collective phenomena in systems with random interactions
NASA Astrophysics Data System (ADS)
Abramkina, Volha
Emergent phenomena are one of the most profound topics in modern science, addressing the ways that collectivities and complex patterns appear due to multiplicity of components and simple interactions. Ensembles of random Hamiltonians allow one to explore emergent phenomena in a statistical way. In this work we adopt a shell model approach with a two-body interaction Hamiltonian. The sets of the two-body interaction strengths are selected at random, resulting in the two-body random ensemble (TBRE). Symmetries such as angular momentum, isospin, and parity entangled with complex many-body dynamics result in surprising order discovered in the spectrum of low-lying excitations. The statistical patterns exhibited in the TBRE are remarkably similar to those observed in real nuclei. Signs of almost every collective feature seen in nuclei, namely, pairing superconductivity, deformation, and vibration, have been observed in random ensembles [3, 4, 5, 6]. In what follows a systematic investigation of nuclear shape collectivities in random ensembles is conducted. The development of the mean field, its geometry, multipole collectivities and their dependence on the underlying two-body interaction are explored. Apart from the role of static symmetries such as SU(2) angular momentum and isospin groups, the emergence of dynamical symmetries including the seniority SU(2), rotational symmetry, as well as the Elliot SU(3) is shown to be an important precursor for the existence of geometric collectivities.
Halabi, Susan; Lin, Chen-Yen; Kelly, W. Kevin; Fizazi, Karim S.; Moul, Judd W.; Kaplan, Ellen B.; Morris, Michael J.; Small, Eric J.
2014-01-01
Purpose Prognostic models for overall survival (OS) for patients with metastatic castration-resistant prostate cancer (mCRPC) are dated and do not reflect significant advances in treatment options available for these patients. This work developed and validated an updated prognostic model to predict OS in patients receiving first-line chemotherapy. Methods Data from a phase III trial of 1,050 patients with mCRPC were used (Cancer and Leukemia Group B CALGB-90401 [Alliance]). The data were randomly split into training and testing sets. A separate phase III trial served as an independent validation set. Adaptive least absolute shrinkage and selection operator selected eight factors prognostic for OS. A predictive score was computed from the regression coefficients and used to classify patients into low- and high-risk groups. The model was assessed for its predictive accuracy using the time-dependent area under the curve (tAUC). Results The model included Eastern Cooperative Oncology Group performance status, disease site, lactate dehydrogenase, opioid analgesic use, albumin, hemoglobin, prostate-specific antigen, and alkaline phosphatase. Median OS values in the high- and low-risk groups, respectively, in the testing set were 17 and 30 months (hazard ratio [HR], 2.2; P < .001); in the validation set they were 14 and 26 months (HR, 2.9; P < .001). The tAUCs were 0.73 (95% CI, 0.70 to 0.73) and 0.76 (95% CI, 0.72 to 0.76) in the testing and validation sets, respectively. Conclusion An updated prognostic model for OS in patients with mCRPC receiving first-line chemotherapy was developed and validated on an external set. This model can be used to predict OS, as well as to better select patients to participate in trials on the basis of their prognosis. PMID:24449231
Mathematical models of cell factories: moving towards the core of industrial biotechnology.
Cvijovic, Marija; Bordel, Sergio; Nielsen, Jens
2011-09-01
Industrial biotechnology involves the utilization of cell factories for the production of fuels and chemicals. Traditionally, the development of highly productive microbial strains has relied on random mutagenesis and screening. The development of predictive mathematical models provides a new paradigm for the rational design of cell factories. Instead of selecting among a set of strains resulting from random mutagenesis, mathematical models allow the researchers to predict in silico the outcomes of different genetic manipulations and engineer new strains by performing gene deletions or additions leading to a higher productivity of the desired chemicals. In this review we aim to summarize the main modelling approaches of biological processes and illustrate the particular applications that they have found in the field of industrial microbiology. © 2010 The Authors. Journal compilation © 2010 Society for Applied Microbiology and Blackwell Publishing Ltd.
Investigation of a protein complex network
NASA Astrophysics Data System (ADS)
Mashaghi, A. R.; Ramezanpour, A.; Karimipour, V.
2004-09-01
The budding yeast Saccharomyces cerevisiae is the first eukaryote whose genome has been completely sequenced. It is also the first eukaryotic cell whose proteome (the set of all proteins) and interactome (the network of all mutual interactions between proteins) has been analyzed. In this paper we study the structure of the yeast protein complex network in which weighted edges between complexes represent the number of shared proteins. It is found that the network of protein complexes is a small world network with scale free behavior for many of its distributions. However we find that there are no strong correlations between the weights and degrees of neighboring complexes. To reveal non-random features of the network we also compare it with a null model in which the complexes randomly select their proteins. Finally we propose a simple evolutionary model based on duplication and divergence of proteins.
Evolution of Endovascular Therapy in Acute Stroke: Implications of Device Development
Balasubramaian, Adithya; Mitchell, Peter; Dowling, Richard
2015-01-01
Intravenous thrombolysis is an effective treatment for acute ischaemic stroke. However, vascular recanalization rates remain poor especially in the setting of large artery occlusion. On the other hand, endovascular intra-arterial therapy addresses this issue with superior recanalization rates compared with intravenous thrombolysis. Although previous randomized controlled studies of intra-arterial therapy failed to demonstrate superiority, the failings may be attributed to a combination of inferior intra-arterial devices and suboptimal selection criteria. The recent results of several randomized controlled trials have demonstrated significantly improved outcomes, underpinning the advantage of newer intra-arterial devices and superior recanalization rates, leading to renewed interest in establishing intra-arterial therapy as the gold standard for acute ischaemic stroke. The aim of this review is to outline the history and development of different intra-arterial devices and future directions in research. PMID:26060800
Computer simulation of the probability that endangered whales will interact with oil spills
DOE Office of Scientific and Technical Information (OSTI.GOV)
Reed, M.; Jayko, K.; Bowles, A.
1987-03-01
A numerical model system was developed to assess quantitatively the probability that endangered bowhead and gray whales will encounter spilled oil in Alaskan waters. Bowhead and gray whale migration and diving-surfacing models, and an oil-spill trajectory model comprise the system. The migration models were developed from conceptual considerations, then calibrated with and tested against observations. The movement of a whale point is governed by a random walk algorithm which stochastically follows a migratory pathway. The oil-spill model, developed under a series of other contracts, accounts for transport and spreading behavior in open water and in the presence of sea ice.more » Historical wind records and heavy, normal, or light ice cover data sets are selected at random to provide stochastic oil-spill scenarios for whale-oil interaction simulations.« less
Selective reminding of prospective memory in Multiple Sclerosis.
McKeever, Joshua D; Schultheis, Maria T; Sim, Tiffanie; Goykhman, Jessica; Patrick, Kristina; Ehde, Dawn M; Woods, Steven Paul
2017-04-19
Multiple sclerosis (MS) is associated with prospective memory (PM) deficits, which may increase the risk of poor functional/health outcomes such as medication non-adherence. This study examined the potential benefits of selective reminding to enhance PM functioning in persons with MS. Twenty-one participants with MS and 22 healthy adults (HA) underwent a neuropsychological battery including a Selective Reminding PM (SRPM) experimental procedure. Participants were randomly assigned to either: (1) a selective reminding condition in which participants learn (to criterion) eight prospective memory tasks in a Selective Reminding format; or (2) a single trial encoding condition (1T). A significant interaction was demonstrated, with MS participants receiving greater benefit than HAs from the SR procedure in terms of PM performance. Across diagnostic groups, participants in the SR conditions (vs. 1T conditions) demonstrated significantly better PM performance. Individuals with MS were impaired relative to HAs in the 1T condition, but performance was statistically comparable in the SR condition. This preliminary study suggests that selective reminding can be used to enhance PM cue detection and retrieval in MS. The extent to which selective reminding of PM is effective in naturalistic settings and for health-related behaviours in MS remains to be determined.
Cheng, Zhanzhan; Zhou, Shuigeng; Wang, Yang; Liu, Hui; Guan, Jihong; Chen, Yi-Ping Phoebe
2016-05-18
Prediction of compound-protein interactions (CPIs) is to find new compound-protein pairs where a protein is targeted by at least a compound, which is a crucial step in new drug design. Currently, a number of machine learning based methods have been developed to predict new CPIs in the literature. However, as there is not yet any publicly available set of validated negative CPIs, most existing machine learning based approaches use the unknown interactions (not validated CPIs) selected randomly as the negative examples to train classifiers for predicting new CPIs. Obviously, this is not quite reasonable and unavoidably impacts the CPI prediction performance. In this paper, we simply take the unknown CPIs as unlabeled examples, and propose a new method called PUCPI (the abbreviation of PU learning for Compound-Protein Interaction identification) that employs biased-SVM (Support Vector Machine) to predict CPIs using only positive and unlabeled examples. PU learning is a class of learning methods that leans from positive and unlabeled (PU) samples. To the best of our knowledge, this is the first work that identifies CPIs using only positive and unlabeled examples. We first collect known CPIs as positive examples and then randomly select compound-protein pairs not in the positive set as unlabeled examples. For each CPI/compound-protein pair, we extract protein domains as protein features and compound substructures as chemical features, then take the tensor product of the corresponding compound features and protein features as the feature vector of the CPI/compound-protein pair. After that, biased-SVM is employed to train classifiers on different datasets of CPIs and compound-protein pairs. Experiments over various datasets show that our method outperforms six typical classifiers, including random forest, L1- and L2-regularized logistic regression, naive Bayes, SVM and k-nearest neighbor (kNN), and three types of existing CPI prediction models. Source code, datasets and related documents of PUCPI are available at: http://admis.fudan.edu.cn/projects/pucpi.html.
Tjon-Kon-Fat, R I; Tajik, P; Zafarmand, M H; Bensdorp, A J; Bossuyt, P M M; Oosterhuis, G J E; van Golde, R; Repping, S; Lambers, M D A; Slappendel, E; Perquin, D; Pelinck, M J; Gianotten, J; Maas, J W M; Eijkemans, M J C; van der Veen, F; Mol, B W; van Wely, M
2017-05-01
Are there treatment selection markers that could aid in identifying couples, with unexplained or mild male subfertility, who would have better chances of a healthy child with IVF with single embryo transfer (IVF-SET) than with IUI with ovarian stimulation (IUI-OS)? We did not find any treatment selection markers that were associated with better chances of a healthy child with IVF-SET instead of IUI-OS in couples with unexplained or mild male subfertility. A recent trial, comparing IVF-SET to IUI-OS, found no evidence of a difference between live birth rates and multiple pregnancy rates. It was suggested that IUI-OS should remain the first-line treatment instead of IVF-SET in couples with unexplained or mild male subfertility and female age between 18 and 38 years. The question remains whether there are some couples that may have higher pregnancy chances if treated with IVF-SET instead of IUI. We performed our analyses on data from the INeS trial, where couples with unexplained or mild male subfertility and an unfavourable prognosis for natural conception were randomly allocated to IVF-SET, IVF in a modified natural cycle or IUI-OS. In view of the aim of this study, we only used data of the comparison between IVF-SET (201 couples) and IUI-OS (207 couples). We pre-defined the following baseline characteristics as potential treatment selection markers: female age, ethnicity, smoking status, type of subfertility (primary/secondary), duration of subfertility, BMI, pre-wash total motile count and Hunault prediction score. For each potential treatment selection marker, we explored the association with the chances of a healthy child after IVF-SET and IUI-OS and tested if there was an interaction with treatment. Given the exploratory nature of our analysis, we used a P-value of 0.1. None of the markers were associated with higher chances of a healthy child from IVF-SET compared to IUI-OS (P-value for interaction >0.10). Since this is the first large study that looked at potential treatment selection markers for IVF-SET compared to IUI-OS, we had no data on which to base a power calculation. The sample size was limited, making it difficult to detect any smaller associations. We could not identify couples with unexplained or mild male subfertility who would have had higher chances of a healthy child from immediate IVF-SET than from IUI-OS. As in the original trial IUI-OS had similar effectiveness and was less costly compared to IVF-SET, IUI-OS should remain the preferred first-line treatment in these couples. The study was supported by a grant from the Netherlands Organization for Health Research and Development, and a grant from the Netherlands' association of health care insurers. There are no conflicts of interest. The trial was registered at the Dutch trial registry (NTR939). © The Author 2017. Published by Oxford University Press on behalf of the European Society of Human Reproduction and Embryology. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Groves, Benjamin; Kuchina, Anna; Rosenberg, Alexander B.; Jojic, Nebojsa; Fields, Stanley; Seelig, Georg
2017-01-01
Our ability to predict protein expression from DNA sequence alone remains poor, reflecting our limited understanding of cis-regulatory grammar and hampering the design of engineered genes for synthetic biology applications. Here, we generate a model that predicts the protein expression of the 5′ untranslated region (UTR) of mRNAs in the yeast Saccharomyces cerevisiae. We constructed a library of half a million 50-nucleotide-long random 5′ UTRs and assayed their activity in a massively parallel growth selection experiment. The resulting data allow us to quantify the impact on protein expression of Kozak sequence composition, upstream open reading frames (uORFs), and secondary structure. We trained a convolutional neural network (CNN) on the random library and showed that it performs well at predicting the protein expression of both a held-out set of the random 5′ UTRs as well as native S. cerevisiae 5′ UTRs. The model additionally was used to computationally evolve highly active 5′ UTRs. We confirmed experimentally that the great majority of the evolved sequences led to higher protein expression rates than the starting sequences, demonstrating the predictive power of this model. PMID:29097404
School-located Influenza Vaccinations for Adolescents: A Randomized Controlled Trial.
Szilagyi, Peter G; Schaffer, Stanley; Rand, Cynthia M; Goldstein, Nicolas P N; Vincelli, Phyllis; Hightower, A Dirk; Younge, Mary; Eagan, Ashley; Blumkin, Aaron; Albertin, Christina S; DiBitetto, Kristine; Yoo, Byung-Kwang; Humiston, Sharon G
2018-02-01
We aimed to evaluate the effect of school-located influenza vaccination (SLIV) on adolescents' influenza vaccination rates. In 2015-2016, we performed a cluster-randomized trial of adolescent SLIV in middle/high schools. We selected 10 pairs of schools (identical grades within pairs) and randomly allocated schools within pairs to SLIV or usual care control. At eight suburban SLIV schools, we sent parents e-mail notifications about upcoming SLIV clinics and promoted online immunization consent. At two urban SLIV schools, we sent parents (via student backpack fliers) paper immunization consent forms and information about SLIV. E-mails were unavailable at these schools. Local health department nurses administered nasal or injectable influenza vaccine at dedicated SLIV clinics and billed insurers. We compared influenza vaccination rates at SLIV versus control schools using school directories to identify the student sample in each school. We used the state immunization registry to determine receipt of influenza vaccination. The final sample comprised 17,650 students enrolled in the 20 schools. Adolescents at suburban SLIV schools had higher overall influenza vaccination rates than did adolescents at control schools (51% vs. 46%, p < .001; adjusted odds ratio = 1.27, 95% confidence interval 1.18-1.38, controlling for vaccination during the prior two seasons). No effect of SLIV was noted among urbanschools on multivariate analysis. SLIV did not substitute for vaccinations in primary care or other settings; in suburban settings, SLIV was associated with increased vaccinations in primary care or other settings (adjusted odds ratio = 1.10, 95% confidence interval 1.02-1.19). SLIV in this community increased influenza vaccination rates among adolescents attending suburban schools. Copyright © 2018. Published by Elsevier Inc.
Dietary interventions to prevent and manage diabetes in worksite settings: a meta-analysis.
Shrestha, Archana; Karmacharya, Biraj Man; Khudyakov, Polyna; Weber, Mary Beth; Spiegelman, Donna
2018-01-25
The translation of lifestyle intervention to improve glucose tolerance into the workplace has been rare. The objective of this meta-analysis is to summarize the evidence for the effectiveness of dietary interventions in worksite settings on lowering blood sugar levels. We searched for studies in PubMed, Embase, Econlit, Ovid, Cochrane, Web of Science, and Cumulative Index to Nursing and Allied Health Literature. Search terms were as follows: (1) Exposure-based: nutrition/diet/dietary intervention/health promotion/primary prevention/health behavior/health education/food /program evaluation; (2) Outcome-based: diabetes/hyperglycemia/glucose/HbA1c/glycated hemoglobin; and (3) Setting-based: workplace/worksite/occupational/industry/job/employee. We manually searched review articles and reference lists of articles identified from 1969 to December 2016. We tested for between-studies heterogeneity and calculated the pooled effect sizes for changes in HbA1c (%) and fasting glucose (mg/dl) using random effect models for meta-analysis in 2016. A total of 17 articles out of 1663 initially selected articles were included in the meta-analysis. With a random-effects model, worksite dietary interventions led to a pooled -0.18% (95% CI, -0.29 to -0.06; P<0.001) difference in HbA1c. With the random-effects model, the interventions resulted in 2.60 mg/dl lower fasting glucose with borderline significance (95% CI: -5.27 to 0.08, P=0.06). In the multivariate meta-regression model, the interventions with high percent of female participants and that used the intervention directly delivered to individuals, rather the environment changes, were associated with more effective interventions. Workplace dietary interventions can improve HbA1c. The effects were larger for the interventions with greater number of female participants and with individual-level interventions.
Marino, Miguel; Killerby, Marie; Lee, Soomi; Klein, Laura Cousino; Moen, Phyllis; Olson, Ryan; Kossek, Ellen Ernst; King, Rosalind; Erickson, Leslie; Berkman, Lisa F; Buxton, Orfeu M
2016-12-01
To evaluate the effects of a workplace-based intervention on actigraphic and self-reported sleep outcomes in an extended care setting. Cluster randomized trial. Extended-care (nursing) facilities. US employees and managers at nursing homes. Nursing homes were randomly selected to intervention or control settings. The Work, Family and Health Study developed an intervention aimed at reducing work-family conflict within a 4-month work-family organizational change process. Employees participated in interactive sessions with facilitated discussions, role-playing, and games designed to increase control over work processes and work time. Managers completed training in family-supportive supervision. Primary actigraphic outcomes included: total sleep duration, wake after sleep onset, nighttime sleep, variation in nighttime sleep, nap duration, and number of naps. Secondary survey outcomes included work-to-family conflict, sleep insufficiency, insomnia symptoms and sleep quality. Measures were obtained at baseline, 6-months and 12-months post-intervention. A total of 1,522 employees and 184 managers provided survey data at baseline. Managers and employees in the intervention arm showed no significant difference in sleep outcomes over time compared to control participants. Sleep outcomes were not moderated by work-to-family conflict or presence of children in the household for managers or employees. Age significantly moderated an intervention effect on nighttime sleep among employees (p=0.040), where younger employees benefited more from the intervention. In the context of an extended-care nursing home workplace, the intervention did not significantly alter sleep outcomes in either managers or employees. Moderating effects of age were identified where younger employees' sleep outcomes benefited more from the intervention.
[Corifollitropin alfa in women stimulated for the first time in in vitro fertilization programme].
Vraná-Mardešićová, N; Vobořil, J; Melicharová, L; Jelínková, L; Vilímová, Š; Mardešić, T
2017-01-01
To compare results after stimulation with corifollitropin alfa (Elonva) in unselected group of women entering for the first time in in vitro fertilization programme (IVF) with results from Phase III randomized trials with selected groups of women. Prospective study. Sanatorium Pronatal, Praha. 40 unselected women with adequat ovarian reserve entering for the first time in IVF programme were stimulated with corifollitropin alfa and GnRH antagonists. Avarage age in the study group was 32,8 years (29-42 years), women younger then 36 and less then 60 kg received Elonva 100 µg , all others (age > 36 let, weight > 60 kg) Elonva 150 µg. Five days after egg retrieval one blastocyst was transferred (single embryo transfer - eSET). Our results were compared with the resuls in higly selected groups of women from Phase III randomized trials. After stimulation with corifollitropin alfa and GnRH antagonists on average 10,6 (9,2 ± 4,2) eggs could be retrieved, among them 7,3 (6,6 ± 3,9) were M II oocytes (68,9%) and fertilisation rate was 84,6%. After first embryo transfer ("fresh" embryos and embryos from "freeze all" cycles) 14 pregnancies were achieved (37,8%), three pregnancies were achieved later from transfer of frozen-thawed embryos (cumulative pregnancy rate 45,9%). There were three abortions. No severe hyperstimulation syndrom occured. Our results in unselected group of women stimulated for the first in an IVF programme with corifollitropin alfa are fully comparable with results published in randomized trials with selected group of patiens. Corifollitropin alfa in combination with daily GnRH antagonist can be successfully used in normal-responder patients stimulated for the first time in an IVF programmeKeywords: corifollitropin alfa, GnRH antagonists, ovarian stimulation, pregnancy.
Pandis, Nikolaos; Polychronopoulou, Argy; Eliades, Theodore
2011-12-01
Randomization is a key step in reducing selection bias during the treatment allocation phase in randomized clinical trials. The process of randomization follows specific steps, which include generation of the randomization list, allocation concealment, and implementation of randomization. The phenomenon in the dental and orthodontic literature of characterizing treatment allocation as random is frequent; however, often the randomization procedures followed are not appropriate. Randomization methods assign, at random, treatment to the trial arms without foreknowledge of allocation by either the participants or the investigators thus reducing selection bias. Randomization entails generation of random allocation, allocation concealment, and the actual methodology of implementing treatment allocation randomly and unpredictably. Most popular randomization methods include some form of restricted and/or stratified randomization. This article introduces the reasons, which make randomization an integral part of solid clinical trial methodology, and presents the main randomization schemes applicable to clinical trials in orthodontics.
Mean-deviation analysis in the theory of choice.
Grechuk, Bogdan; Molyboha, Anton; Zabarankin, Michael
2012-08-01
Mean-deviation analysis, along with the existing theories of coherent risk measures and dual utility, is examined in the context of the theory of choice under uncertainty, which studies rational preference relations for random outcomes based on different sets of axioms such as transitivity, monotonicity, continuity, etc. An axiomatic foundation of the theory of coherent risk measures is obtained as a relaxation of the axioms of the dual utility theory, and a further relaxation of the axioms are shown to lead to the mean-deviation analysis. Paradoxes arising from the sets of axioms corresponding to these theories and their possible resolutions are discussed, and application of the mean-deviation analysis to optimal risk sharing and portfolio selection in the context of rational choice is considered. © 2012 Society for Risk Analysis.
A feature-weighting account of priming in conjunction search.
Becker, Stefanie I; Horstmann, Gernot
2009-02-01
Previous research on the priming effect in conjunction search has shown that repeating the target and distractor features across displays speeds mean response times but does not improve search efficiency: Repetitions do not reduce the set size effect-that is, the effect of the number of distractor items-but only modulate the intercept of the search function. In the present study, we investigated whether priming modulates search efficiency when a conjunctively defined target randomly changes between red and green. The results from an eyetracking experiment show that repeating the target across trials reduced the set size effect and, thus, did enhance search efficiency. Moreover, the probability of selecting the target as the first item in the display was higher when the target-distractor displays were repeated across trials than when they changed. Finally, red distractors were selected more frequently than green distractors when the previous target had been red (and vice versa). Taken together, these results indicate that priming in conjunction search modulates processes concerned with guiding attention to the target, by assigning more attentional weight to features sharing the previous target's color.
Inter-rater reliability and review of the VA unresolved narratives.
Eagon, J. C.; Hurdle, J. F.; Lincoln, M. J.
1996-01-01
To better understand how VA clinicians use medical vocabulary in every day practice, we set out to characterize terms generated in the Problem List module of the VA's DHCP system that were not mapped to terms in the controlled-vocabulary lexicon of DHCP. When entered terms fail to match those in the lexicon, a note is sent to a central repository. When our study started, the volume in that repository had reached 16,783 terms. We wished to characterize the potential reasons why these terms failed to match terms in the lexicon. After examining two small samples of randomly selected terms, we used group consensus to develop a set of rating criteria and a rating form. To be sure that the results of multiple reviewers could be confidently compared, we analyzed the inter-rater agreement of our rating process. Two rates used this form to rate the same 400 terms. We found that modifiers and numeric data were common and consistent reasons for failure to match, while others such as use of synonyms and absence of the concept from the lexicon were common but less consistently selected. PMID:8947642
Inter-rater reliability and review of the VA unresolved narratives.
Eagon, J C; Hurdle, J F; Lincoln, M J
1996-01-01
To better understand how VA clinicians use medical vocabulary in every day practice, we set out to characterize terms generated in the Problem List module of the VA's DHCP system that were not mapped to terms in the controlled-vocabulary lexicon of DHCP. When entered terms fail to match those in the lexicon, a note is sent to a central repository. When our study started, the volume in that repository had reached 16,783 terms. We wished to characterize the potential reasons why these terms failed to match terms in the lexicon. After examining two small samples of randomly selected terms, we used group consensus to develop a set of rating criteria and a rating form. To be sure that the results of multiple reviewers could be confidently compared, we analyzed the inter-rater agreement of our rating process. Two rates used this form to rate the same 400 terms. We found that modifiers and numeric data were common and consistent reasons for failure to match, while others such as use of synonyms and absence of the concept from the lexicon were common but less consistently selected.
Amperometric Glucose Sensors: Sources of Error and Potential Benefit of Redundancy
Castle, Jessica R.; Kenneth Ward, W.
2010-01-01
Amperometric glucose sensors have advanced the care of patients with diabetes and are being studied to control insulin delivery in the research setting. However, at times, currently available sensors demonstrate suboptimal accuracy, which can result from calibration error, sensor drift, or lag. Inaccuracy can be particularly problematic in a closed-loop glycemic control system. In such a system, the use of two sensors allows selection of the more accurate sensor as the input to the controller. In our studies in subjects with type 1 diabetes, the accuracy of the better of two sensors significantly exceeded the accuracy of a single, randomly selected sensor. If an array with three or more sensors were available, it would likely allow even better accuracy with the use of voting. PMID:20167187
DeBar, Lynn L.; Yarborough, Bobbi Jo; Striegel-Moore, Ruth H.; Rosselli, Francine; Perrin, Nancy; Wilson, G. Terence; Kraemer, Helena C.; Green, Rory; Lynch, Frances
2009-01-01
Objective To explore effects of various recruitment strategies on randomized clinical trial (RCT)-entry characteristics for patients with eating disorders within an everyday health-plan practice setting. Methods Randomly selected women, aged 25-50, in a Pacific Northwest HMO were invited to complete a self-report binge-eating screener for two treatment trials. We publicized the trials within the health plan to allow self-referral. Here, we report differences on eating-disorder status by mode and nature of recruitment (online, mail, self-referred) and assessment (comprehensive versus abbreviated) and on possible differences in enrollee characteristics between those recruited by strategy (self-referred versus study-outreach efforts). Results Few differences emerged among those recruited through outreach who responded by different modalities (internet versus mail), early-versus-late responders, and those enrolling under more comprehensive or abbreviated assessment. Self-referred were more likely to meet binge-eating thresholds and reported higher average BMI than those recruited by outreach and responding by mail; however, in most respects the groups were more similar than anticipated. Fewer than 1% of those initially contacted through outreach enrolled. Conclusions Aggressive outreach and screening is likely not feasible for broader dissemination in everyday practice settings and recruits individuals with more similar demographic and clinical characteristics to those recruited through more abbreviated and realistic screening procedures than anticipated. PMID:19275947
2009-08-01
Bryant, R, Engel, CC (2004). A therapist-assisted internet self-help program for traumatic stress . Professional Psychology: Research and Practice, 35...Combat-Related PTSD in Military Primary Healthcare Settings: A Randomized Trial of “DESTRESS-PC” PRINCIPAL INVESTIGATOR: Charles Engel...Early Resilience Intervention for Combat-Related PTSD in Military Primary Healthcare Settings: A Randomized Trial of DESTRESS-PC 5b. GRANT NUMBER
Direct-to-Consumer Prescription Drug Advertising and the Public
Bell, Robert A; Kravitz, Richard L; Wilkes, Michael S
1999-01-01
OBJECTIVE Drug manufacturers are intensely promoting their products directly to consumers, but the impact has not been widely studied. Consumers' awareness and understanding of, attitudes toward, and susceptibility to direct-to-consumer (DTC) drug advertising were examined. DESIGN Random-digit dialing telephone survey with a random household member selection procedure (completion and response rates, 58% and 69%, respectively). SETTING Respondents were interviewed while they were at their residences. PARTICIPANTS Complete data were obtained from 329 adults in Sacramento County, California. MEASUREMENTS AND MAIN RESULTS Outcome measures included awareness of advertisements for 10 selected drugs, misconceptions about DTC advertising, attitudes toward DTC ads, and behavioral responses to such promotions. The influence of demographic characteristics, health status, attitudes, beliefs, and media exposure on awareness and behaviors was examined. On average, respondents were aware of advertisements for 3.7 of the 10 drugs; awareness varied from 8% for Buspar (buspirone) to 72% for Claritin (loratadine). Awareness was associated with prescription drug use, media exposure, positive attitudes toward DTC advertising, poorer health, and insurance status. Substantial misconceptions were revealed; e.g., 43% thought that only “completely safe” drugs could be advertised. Direct-to-consumer advertisements had led one third of respondents to ask their physicians for drug information and one fifth to request a prescription. CONCLUSIONS Direct-to-consumer advertisements are reaching the public, but selectively so, and affecting their behaviors. Implications for public policy are examined. PMID:10571712
NASA Astrophysics Data System (ADS)
Fritz, Andreas; Enßle, Fabian; Zhang, Xiaoli; Koch, Barbara
2016-08-01
The present study analyses the two earth observation sensors regarding their capability of modelling forest above ground biomass and forest density. Our research is carried out at two different demonstration sites. The first is located in south-western Germany (region Karlsruhe) and the second is located in southern China in Jiangle County (Province Fujian). A set of spectral and spatial predictors are computed from both, Sentinel-2A and WorldView-2 data. Window sizes in the range of 3*3 pixels to 21*21 pixels are computed in order to cover the full range of the canopy sizes of mature forest stands. Textural predictors of first and second order (grey-level-co-occurrence matrix) are calculated and are further used within a feature selection procedure. Additionally common spectral predictors from WorldView-2 and Sentinel-2A data such as all relevant spectral bands and NDVI are integrated in the analyses. To examine the most important predictors, a predictor selection algorithm is applied to the data, whereas the entire predictor set of more than 1000 predictors is used to find most important ones. Out of the original set only the most important predictors are then further analysed. Predictor selection is done with the Boruta package in R (Kursa and Rudnicki (2010)), whereas regression is computed with random forest. Prior the classification and regression a tuning of parameters is done by a repetitive model selection (100 runs), based on the .632 bootstrapping. Both are implemented in the caret R pack- age (Kuhn et al. (2016)). To account for the variability in the data set 100 independent runs are performed. Within each run 80 percent of the data is used for training and the 20 percent are used for an independent validation. With the subset of original predictors mapping of above ground biomass is performed.
Grewal, Nivit; Singh, Shailendra; Chand, Trilok
2017-01-01
Owing to the innate noise in the biological data sources, a single source or a single measure do not suffice for an effective disease gene prioritization. So, the integration of multiple data sources or aggregation of multiple measures is the need of the hour. The aggregation operators combine multiple related data values to a single value such that the combined value has the effect of all the individual values. In this paper, an attempt has been made for applying the fuzzy aggregation on the network-based disease gene prioritization and investigate its effect under noise conditions. This study has been conducted for a set of 15 blood disorders by fusing four different network measures, computed from the protein interaction network, using a selected set of aggregation operators and ranking the genes on the basis of the aggregated value. The aggregation operator-based rankings have been compared with the "Random walk with restart" gene prioritization method. The impact of noise has also been investigated by adding varying proportions of noise to the seed set. The results reveal that for all the selected blood disorders, the Mean of Maximal operator has relatively outperformed the other aggregation operators for noisy as well as non-noisy data.
NASA Astrophysics Data System (ADS)
Zhang, Linna; Li, Gang; Sun, Meixiu; Li, Hongxiao; Wang, Zhennan; Li, Yingxin; Lin, Ling
2017-11-01
Identifying whole bloods to be either human or nonhuman is an important responsibility for import-export ports and inspection and quarantine departments. Analytical methods and DNA testing methods are usually destructive. Previous studies demonstrated that visible diffuse reflectance spectroscopy method can realize noncontact human and nonhuman blood discrimination. An appropriate method for calibration set selection was very important for a robust quantitative model. In this paper, Random Selection (RS) method and Kennard-Stone (KS) method was applied in selecting samples for calibration set. Moreover, proper stoichiometry method can be greatly beneficial for improving the performance of classification model or quantification model. Partial Least Square Discrimination Analysis (PLSDA) method was commonly used in identification of blood species with spectroscopy methods. Least Square Support Vector Machine (LSSVM) was proved to be perfect for discrimination analysis. In this research, PLSDA method and LSSVM method was used for human blood discrimination. Compared with the results of PLSDA method, this method could enhance the performance of identified models. The overall results convinced that LSSVM method was more feasible for identifying human and animal blood species, and sufficiently demonstrated LSSVM method was a reliable and robust method for human blood identification, and can be more effective and accurate.
[Surgical treatment of secondary peritonitis: A continuing problem. German version].
van Ruler, O; Boermeester, M A
2016-01-01
Secondary peritonitis remains associated with high mortality and morbidity rates. Treatment of secondary peritonitis is still challenging even in the era of modern medicine. Surgical intervention for source control remains the cornerstone of treatment besides adequate antimicrobial therapy and when necessary intensive medical care measures and resuscitation. A randomized clinical trial showed that relaparotomy on demand (ROD) after initial emergency surgery was the preferred treatment strategy, irrespective of the severity and extent of peritonitis. The effective and safe use of ROD requires intensive monitoring of the patient in a setting where diagnostic tests and decision making about relaparotomy are guaranteed round the clock. The lack of knowledge on timely and adequate patient selection, together with the lack of use of easy but reliable monitoring tools seem to hamper full implementation of ROD. The accuracy of the relaparotomy decision tool is reasonable for prediction of the formation of peritonitis and necessary selection of patients for computed tomography (CT). The value of CT in the early postoperative phase is unclear. Future research and innovative technologies should focus on the additive value of CT after surgical treatment for secondary peritonitis and on the further optimization of bedside prediction tools to enhance adequate patient selection for interventions in a multidisciplinary setting.
Chemical name extraction based on automatic training data generation and rich feature set.
Yan, Su; Spangler, W Scott; Chen, Ying
2013-01-01
The automation of extracting chemical names from text has significant value to biomedical and life science research. A major barrier in this task is the difficulty of getting a sizable and good quality data to train a reliable entity extraction model. Another difficulty is the selection of informative features of chemical names, since comprehensive domain knowledge on chemistry nomenclature is required. Leveraging random text generation techniques, we explore the idea of automatically creating training sets for the task of chemical name extraction. Assuming the availability of an incomplete list of chemical names, called a dictionary, we are able to generate well-controlled, random, yet realistic chemical-like training documents. We statistically analyze the construction of chemical names based on the incomplete dictionary, and propose a series of new features, without relying on any domain knowledge. Compared to state-of-the-art models learned from manually labeled data and domain knowledge, our solution shows better or comparable results in annotating real-world data with less human effort. Moreover, we report an interesting observation about the language for chemical names. That is, both the structural and semantic components of chemical names follow a Zipfian distribution, which resembles many natural languages.
Beccaria, Marco; Mellors, Theodore R; Petion, Jacky S; Rees, Christiaan A; Nasir, Mavra; Systrom, Hannah K; Sairistil, Jean W; Jean-Juste, Marc-Antoine; Rivera, Vanessa; Lavoile, Kerline; Severe, Patrice; Pape, Jean W; Wright, Peter F; Hill, Jane E
2018-02-01
Tuberculosis (TB) remains a global public health malady that claims almost 1.8 million lives annually. Diagnosis of TB represents perhaps one of the most challenging aspects of tuberculosis control. Gold standards for diagnosis of active TB (culture and nucleic acid amplification) are sputum-dependent, however, in up to a third of TB cases, an adequate biological sputum sample is not readily available. The analysis of exhaled breath, as an alternative to sputum-dependent tests, has the potential to provide a simple, fast, and non-invasive, and ready-available diagnostic service that could positively change TB detection. Human breath has been evaluated in the setting of active tuberculosis using thermal desorption-comprehensive two-dimensional gas chromatography-time of flight mass spectrometry methodology. From the entire spectrum of volatile metabolites in breath, three random forest machine learning models were applied leading to the generation of a panel of 46 breath features. The twenty-two common features within each random forest model used were selected as a set that could distinguish subjects with confirmed pulmonary M. tuberculosis infection and people with other pathologies than TB. Copyright © 2018 Elsevier B.V. All rights reserved.
Motte, Anne-France; Diallo, Stéphanie; van den Brink, Hélène; Châteauvieux, Constance; Serrano, Carole; Naud, Carole; Steelandt, Julie; Alsac, Jean-Marc; Aubry, Pierre; Cour, Florence; Pellerin, Olivier; Pineau, Judith; Prognon, Patrice; Borget, Isabelle; Bonan, Brigitte; Martelli, Nicolas
2017-11-01
The aim of this study was to determine relevant items for reporting clinical trials on implantable medical devices (IMDs) and to identify reporting guidelines which include these items. A panel of experts identified the most relevant items for evaluating IMDs from an initial list based on reference papers. We then conducted a systematic review of articles indexed in MEDLINE. We retrieved reporting guidelines from the EQUATOR network's library for health research reporting. Finally, we screened these reporting guidelines to find those using our set of reporting items. Seven relevant reporting items were selected that related to four topics: randomization, learning curve, surgical setting, and device information. A total of 348 reporting guidelines were identified, among which 26 met our inclusion criteria. However, none of the 26 reporting guidelines presented all seven items together. The most frequently reported item was timing of randomization (65%). On the contrary, device information and learning curve effects were poorly specified. To our knowledge, this study is the first to identify specific items related to IMDs in reporting guidelines for clinical trials. We have shown that no existing reporting guideline is totally suitable for these devices. Copyright © 2017 Elsevier Inc. All rights reserved.
Automatic detection of atrial fibrillation in cardiac vibration signals.
Brueser, C; Diesel, J; Zink, M D H; Winter, S; Schauerte, P; Leonhardt, S
2013-01-01
We present a study on the feasibility of the automatic detection of atrial fibrillation (AF) from cardiac vibration signals (ballistocardiograms/BCGs) recorded by unobtrusive bedmounted sensors. The proposed system is intended as a screening and monitoring tool in home-healthcare applications and not as a replacement for ECG-based methods used in clinical environments. Based on BCG data recorded in a study with 10 AF patients, we evaluate and rank seven popular machine learning algorithms (naive Bayes, linear and quadratic discriminant analysis, support vector machines, random forests as well as bagged and boosted trees) for their performance in separating 30 s long BCG epochs into one of three classes: sinus rhythm, atrial fibrillation, and artifact. For each algorithm, feature subsets of a set of statistical time-frequency-domain and time-domain features were selected based on the mutual information between features and class labels as well as first- and second-order interactions among features. The classifiers were evaluated on a set of 856 epochs by means of 10-fold cross-validation. The best algorithm (random forests) achieved a Matthews correlation coefficient, mean sensitivity, and mean specificity of 0.921, 0.938, and 0.982, respectively.
Sheridan, Janie; Stewart, Joanna; Smart, Ros; McCormick, Ross
2012-01-01
To estimate the prevalence of risky drinking among customers in community pharmacies and to explore customer attitudes towards screening and brief intervention (SBI). Cross-sectional, anonymous survey, using random selection of community pharmacies in New Zealand to collect data using self-completion questionnaires and an opportunity to enter a prize draw. Participants were customers/patients attending the community pharmacy on a specific, randomly selected day (Monday to Friday) in one set week. Alcohol Use Disorder Identification Test (AUDIT)-C using a cut-off score of 5 was used to measure risky drinking. Attitudes towards pharmacists engaging in SBI for risky drinkers were measured. 2384 completed customer/patient questionnaires from 43 participating pharmacies. Almost 84% ever drank alcohol and using a score of 5 or more as a cut-off, 30% of the sample would be considered as risky drinkers. Attitudes were generally positive to pharmacists undertaking SBI. Logistic regression with AUDIT-C positive or negative as the dependent variable found those taking medicines for mental health and liver disease being more likely to score negative on the AUDIT-C, and smokers and those purchasing hangover cures were more likely than average to have a positive AUDIT-C screen. This study indicates there is scope for community pharmacists to undertake SBI for risky drinking, and that customers find this to be acceptable. Targeted screening may well be useful, in particular for smokers. Further research is required to explore the effectiveness of SBI for risky drinkers in this setting. © 2011 Australasian Professional Society on Alcohol and other Drugs.
Determination of the optimal number of components in independent components analysis.
Kassouf, Amine; Jouan-Rimbaud Bouveresse, Delphine; Rutledge, Douglas N
2018-03-01
Independent components analysis (ICA) may be considered as one of the most established blind source separation techniques for the treatment of complex data sets in analytical chemistry. Like other similar methods, the determination of the optimal number of latent variables, in this case, independent components (ICs), is a crucial step before any modeling. Therefore, validation methods are required in order to decide about the optimal number of ICs to be used in the computation of the final model. In this paper, three new validation methods are formally presented. The first one, called Random_ICA, is a generalization of the ICA_by_blocks method. Its specificity resides in the random way of splitting the initial data matrix into two blocks, and then repeating this procedure several times, giving a broader perspective for the selection of the optimal number of ICs. The second method, called KMO_ICA_Residuals is based on the computation of the Kaiser-Meyer-Olkin (KMO) index of the transposed residual matrices obtained after progressive extraction of ICs. The third method, called ICA_corr_y, helps to select the optimal number of ICs by computing the correlations between calculated proportions and known physico-chemical information about samples, generally concentrations, or between a source signal known to be present in the mixture and the signals extracted by ICA. These three methods were tested using varied simulated and experimental data sets and compared, when necessary, to ICA_by_blocks. Results were relevant and in line with expected ones, proving the reliability of the three proposed methods. Copyright © 2017 Elsevier B.V. All rights reserved.
Automated encoding of clinical documents based on natural language processing.
Friedman, Carol; Shagina, Lyudmila; Lussier, Yves; Hripcsak, George
2004-01-01
The aim of this study was to develop a method based on natural language processing (NLP) that automatically maps an entire clinical document to codes with modifiers and to quantitatively evaluate the method. An existing NLP system, MedLEE, was adapted to automatically generate codes. The method involves matching of structured output generated by MedLEE consisting of findings and modifiers to obtain the most specific code. Recall and precision applied to Unified Medical Language System (UMLS) coding were evaluated in two separate studies. Recall was measured using a test set of 150 randomly selected sentences, which were processed using MedLEE. Results were compared with a reference standard determined manually by seven experts. Precision was measured using a second test set of 150 randomly selected sentences from which UMLS codes were automatically generated by the method and then validated by experts. Recall of the system for UMLS coding of all terms was .77 (95% CI.72-.81), and for coding terms that had corresponding UMLS codes recall was .83 (.79-.87). Recall of the system for extracting all terms was .84 (.81-.88). Recall of the experts ranged from .69 to .91 for extracting terms. The precision of the system was .89 (.87-.91), and precision of the experts ranged from .61 to .91. Extraction of relevant clinical information and UMLS coding were accomplished using a method based on NLP. The method appeared to be comparable to or better than six experts. The advantage of the method is that it maps text to codes along with other related information, rendering the coded output suitable for effective retrieval.
Accuracy of a novel multi-sensor board for measuring physical activity and energy expenditure
Lester, Jonathan; Migotsky, Sean; Goh, Jorming; Higgins, Lisa; Borriello, Gaetano
2011-01-01
The ability to relate physical activity to health depends on accurate measurement. Yet, none of the available methods are fully satisfactory due to several factors. This study examined the accuracy of a multi-sensor board (MSB) that infers activity types (sitting, standing, walking, stair climbing, and running) and estimates energy expenditure in 57 adults (32 females) 39.2 ± 13.5 years. In the laboratory, subjects walked and ran on a treadmill over a select range of speeds and grades for 3 min each (six stages in random order) while connected to a stationary calorimeter, preceded and followed by brief sitting and standing. On a different day, subjects completed scripted activities in the field connected to a portable calorimeter. The MSB was attached to a strap at the right hip. Subjects repeated one condition (randomly selected) on the third day. Accuracy of inferred activities compared with recorded activities (correctly identified activities/total activities × 100) was 97 and 84% in the laboratory and field, respectively. Absolute accuracy of energy expenditure [100 – absolute value (kilocalories MSB – kilocalories calorimeter/kilocalories calorimeter) × 100] was 89 and 76% in the laboratory and field, the later being different (P < 0.05) from the calorimeter. Test–retest reliability for energy expenditure was significant in both settings (P < 0.0001; r = 0.97). In general, the MSB provides accurate measures of activity type in laboratory and field settings and energy expenditure during treadmill walking and running although the device underestimates energy expenditure in the field. PMID:21249383
Self-Organization of Temporal Structures — A Possible Solution for the Intervention Problem
NASA Astrophysics Data System (ADS)
von Lucadou, Walter
2006-10-01
The paper presents an experiment that is a conceptual replication of two earlier experiments which demonstrate entanglement correlations between a quantum physical random process and certain psychological variables of human observers. In the present study button-pushes were used as psychological variables. The button-pushes were performed by the subject with his or her left or right hand in order to "control" (according to the instruction) a random process that could be observed on a display. Each button-push started the next random event which, however, in reality, was independent of the button-pushes. The study consists of three independent sets of data (n = 386) that were gained with almost fee same apparatus in three different experimental situations. The first data set serves as reference. It was an automatic control-run without subjects. The second set was produced mainly by subjects who asked for taking part in a para-psychological experiment and who visited the "Parapsychological Counseling Office" in Freiburg especially for this purpose. Most of them were highly motivated persons who wanted to test their "psi ability". In this case the number of runs could be selected by the subjects before the experimental session. The third set of data (of the same size) was collected during two public exhibitions (at Basel and at Freiburg) where the visitors had the opportunity to participate in a "PK experiment". In this case the number of trials and runs was fixed in advance, but the duration of the experiment was dependent of the speed of button-pushes. The results corroborate the previous studies. The specific way how the subjects pushed the buttons is highly significantly correlated with the independent random process. This correlation shows up for the momentarily generated random events as well as for the previous and the later runs during the experimental session. In a strict sense, only the correlations with the future random events can be interpreted as non-local correlations. The structure of the data, however, allows the conclusion, that all observed correlations can be considered as entanglement-correlations. The number of entanglement-correlations was significantly higher for the highly motivated group (data set 2) than for the unselected group of the exhibition participants (data set 3). The latter, however, where not completely unsuccessful: A subgroup who showed "innovative" behavior also showed significant entanglement-correlations. It could further be shown, that the structure of the matrix of entanglement-correlations is not stable in time and changes if the experiment is repeated. In comparison with previous correlation-experiments, no decline of the effect size was observed. These results are in agreement with the predictions of the "Weak Quantum Theory (WQT)" and the "Model of Pragmatic Information (MPI)". These models interpret the measured correlations as entanglement-correlations within a self-organizing, organizationally closed, psycho-physical system that exist during a certain time-interval (as long as the system is active). The entanglement-correlations cannot be considered as a causal influence (in the sense of a PK-Influence) and thus are called "micro-synchronicity". After a short introduction (1.), the question is discussed how non-local correlations can be created in psycho-physical systems (2.). In chapter (3.) the description of the experimental setting is given and the apparatus (4.) and randomness test of the random event generator (5.) are described. Additionally, an overview of the structure of the data is given (6.) and the analysis methods are described (7.). In chapter (8.) the experimental hypotheses are formulated and the results are reported (9.). After the discussion of the results (10.) the conclusions (11).) of the study are presented.
Pulsed Nd:YAG laser selective ablation of surface enamel caries: II. Histology and clinical trials
NASA Astrophysics Data System (ADS)
Harris, David M.; Goodis, Harold E.; White, Joel M.; Arcoria, Charles J.; Simon, James; Burkart, John; Yessik, Michael J.; Myers, Terry D.
2000-03-01
High intensity infrared light from the pulsed Nd:YAG dental laser is absorbed by pigmented carious enamel and not absorbed by normal enamel. Therefore, this system is capable of selective removal of surface enamel caries. Safety and efficacy of the clinical procedure was evaluated in two sets of clinical trials at three dental schools. Carious lesions were randomized to drill or laser treatment. Pulp vitality, surface condition, preparations and restorations were evaluated by blinded evaluators. In Study 1 surface caries were removed from 104 third molars scheduled for extraction. One week post-treatment teeth were extracted and the pulp was examined histologically. In Study 2 90 patients with 422 lesions on 376 teeth were randomized to laser or drill and followed for six months. There were no adverse events and both clinical and histological evaluations of pulp vitality showed no abnormalities. Caries were removed in all conditions. A significantly greater number of preparations in the drill groups vs. laser groups entered dentin (drill equals 11, laser equals 1, p less than 0.001). This indicates that the more conservative laser treatment removed the caries but not the sound enamel below the lesion.
Salas, Eric Ariel L; Valdez, Raul; Michel, Stefan
2017-11-01
We modeled summer and winter habitat suitability of Marco Polo argali in the Pamir Mountains in southeastern Tajikistan using these statistical algorithms: Generalized Linear Model, Random Forest, Boosted Regression Tree, Maxent, and Multivariate Adaptive Regression Splines. Using sheep occurrence data collected from 2009 to 2015 and a set of selected habitat predictors, we produced summer and winter habitat suitability maps and determined the important habitat suitability predictors for both seasons. Our results demonstrated that argali selected proximity to riparian areas and greenness as the two most relevant variables for summer, and the degree of slope (gentler slopes between 0° to 20°) and Landsat temperature band for winter. The terrain roughness was also among the most important variables in summer and winter models. Aspect was only significant for winter habitat, with argali preferring south-facing mountain slopes. We evaluated various measures of model performance such as the Area Under the Curve (AUC) and the True Skill Statistic (TSS). Comparing the five algorithms, the AUC scored highest for Boosted Regression Tree in summer (AUC = 0.94) and winter model runs (AUC = 0.94). In contrast, Random Forest underperformed in both model runs.
Natural selection underlies apparent stress-induced mutagenesis in a bacteriophage infection model.
Yosef, Ido; Edgar, Rotem; Levy, Asaf; Amitai, Gil; Sorek, Rotem; Munitz, Ariel; Qimron, Udi
2016-04-18
The emergence of mutations following growth-limiting conditions underlies bacterial drug resistance, viral escape from the immune system and fundamental evolution-driven events. Intriguingly, whether mutations are induced by growth limitation conditions or are randomly generated during growth and then selected by growth limitation conditions remains an open question(1). Here, we show that bacteriophage T7 undergoes apparent stress-induced mutagenesis when selected for improved recognition of its host's receptor. In our unique experimental set-up, the growth limitation condition is physically and temporally separated from mutagenesis: growth limitation occurs while phage DNA is outside the host, and spontaneous mutations occur during phage DNA replication inside the host. We show that the selected beneficial mutations are not pre-existing and that the initial slow phage growth is enabled by the phage particle's low-efficiency DNA injection into the host. Thus, the phage particle allows phage populations to initially extend their host range without mutagenesis by virtue of residual recognition of the host receptor. Mutations appear during non-selective intracellular replication, and the frequency of mutant phages increases by natural selection acting on free phages, which are not capable of mutagenesis.
Correcting Evaluation Bias of Relational Classifiers with Network Cross Validation
2010-01-01
classi- fication algorithms: simple random resampling (RRS), equal-instance random resampling (ERS), and network cross-validation ( NCV ). The first two... NCV procedure that eliminates overlap between test sets altogether. The procedure samples for k disjoint test sets that will be used for evaluation...propLabeled ∗ S) nodes from train Pool in f erenceSet =network − trainSet F = F ∪ < trainSet, test Set, in f erenceSet > end for output: F NCV addresses
McKim, James M.; Hartung, Thomas; Kleensang, Andre; Sá-Rocha, Vanessa
2016-01-01
Supervised learning methods promise to improve integrated testing strategies (ITS), but must be adjusted to handle high dimensionality and dose–response data. ITS approaches are currently fueled by the increasing mechanistic understanding of adverse outcome pathways (AOP) and the development of tests reflecting these mechanisms. Simple approaches to combine skin sensitization data sets, such as weight of evidence, fail due to problems in information redundancy and high dimension-ality. The problem is further amplified when potency information (dose/response) of hazards would be estimated. Skin sensitization currently serves as the foster child for AOP and ITS development, as legislative pressures combined with a very good mechanistic understanding of contact dermatitis have led to test development and relatively large high-quality data sets. We curated such a data set and combined a recursive variable selection algorithm to evaluate the information available through in silico, in chemico and in vitro assays. Chemical similarity alone could not cluster chemicals’ potency, and in vitro models consistently ranked high in recursive feature elimination. This allows reducing the number of tests included in an ITS. Next, we analyzed with a hidden Markov model that takes advantage of an intrinsic inter-relationship among the local lymph node assay classes, i.e. the monotonous connection between local lymph node assay and dose. The dose-informed random forest/hidden Markov model was superior to the dose-naive random forest model on all data sets. Although balanced accuracy improvement may seem small, this obscures the actual improvement in misclassifications as the dose-informed hidden Markov model strongly reduced "false-negatives" (i.e. extreme sensitizers as non-sensitizer) on all data sets. PMID:26046447
Qin, Li-Xuan; Levine, Douglas A
2016-06-10
Accurate discovery of molecular biomarkers that are prognostic of a clinical outcome is an important yet challenging task, partly due to the combination of the typically weak genomic signal for a clinical outcome and the frequently strong noise due to microarray handling effects. Effective strategies to resolve this challenge are in dire need. We set out to assess the use of careful study design and data normalization for the discovery of prognostic molecular biomarkers. Taking progression free survival in advanced serous ovarian cancer as an example, we conducted empirical analysis on two sets of microRNA arrays for the same set of tumor samples: arrays in one set were collected using careful study design (that is, uniform handling and randomized array-to-sample assignment) and arrays in the other set were not. We found that (1) handling effects can confound the clinical outcome under study as a result of chance even with randomization, (2) the level of confounding handling effects can be reduced by data normalization, and (3) good study design cannot be replaced by post-hoc normalization. In addition, we provided a practical approach to define positive and negative control markers for detecting handling effects and assessing the performance of a normalization method. Our work showcased the difficulty of finding prognostic biomarkers for a clinical outcome of weak genomic signals, illustrated the benefits of careful study design and data normalization, and provided a practical approach to identify handling effects and select a beneficial normalization method. Our work calls for careful study design and data analysis for the discovery of robust and translatable molecular biomarkers.
Luechtefeld, Thomas; Maertens, Alexandra; McKim, James M; Hartung, Thomas; Kleensang, Andre; Sá-Rocha, Vanessa
2015-11-01
Supervised learning methods promise to improve integrated testing strategies (ITS), but must be adjusted to handle high dimensionality and dose-response data. ITS approaches are currently fueled by the increasing mechanistic understanding of adverse outcome pathways (AOP) and the development of tests reflecting these mechanisms. Simple approaches to combine skin sensitization data sets, such as weight of evidence, fail due to problems in information redundancy and high dimensionality. The problem is further amplified when potency information (dose/response) of hazards would be estimated. Skin sensitization currently serves as the foster child for AOP and ITS development, as legislative pressures combined with a very good mechanistic understanding of contact dermatitis have led to test development and relatively large high-quality data sets. We curated such a data set and combined a recursive variable selection algorithm to evaluate the information available through in silico, in chemico and in vitro assays. Chemical similarity alone could not cluster chemicals' potency, and in vitro models consistently ranked high in recursive feature elimination. This allows reducing the number of tests included in an ITS. Next, we analyzed with a hidden Markov model that takes advantage of an intrinsic inter-relationship among the local lymph node assay classes, i.e. the monotonous connection between local lymph node assay and dose. The dose-informed random forest/hidden Markov model was superior to the dose-naive random forest model on all data sets. Although balanced accuracy improvement may seem small, this obscures the actual improvement in misclassifications as the dose-informed hidden Markov model strongly reduced " false-negatives" (i.e. extreme sensitizers as non-sensitizer) on all data sets. Copyright © 2015 John Wiley & Sons, Ltd.
2013-01-01
Background Unexpected obstetric emergencies threaten the safety of pregnant women. As emergencies are rare, they are difficult to learn. Therefore, simulation-based medical education (SBME) seems relevant. In non-systematic reviews on SBME, medical simulation has been suggested to be associated with improved learner outcomes. However, many questions on how SBME can be optimized remain unanswered. One unresolved issue is how 'in situ simulation' (ISS) versus 'off site simulation' (OSS) impact learning. ISS means simulation-based training in the actual patient care unit (in other words, the labor room and operating room). OSS means training in facilities away from the actual patient care unit, either at a simulation centre or in hospital rooms that have been set up for this purpose. Methods and design The objective of this randomized trial is to study the effect of ISS versus OSS on individual learning outcome, safety attitude, motivation, stress, and team performance amongst multi-professional obstetric-anesthesia teams. The trial is a single-centre randomized superiority trial including 100 participants. The inclusion criteria were health-care professionals employed at the department of obstetrics or anesthesia at Rigshospitalet, Copenhagen, who were working on shifts and gave written informed consent. Exclusion criteria were managers with staff responsibilities, and staff who were actively taking part in preparation of the trial. The same obstetric multi-professional training was conducted in the two simulation settings. The experimental group was exposed to training in the ISS setting, and the control group in the OSS setting. The primary outcome is the individual score on a knowledge test. Exploratory outcomes are individual scores on a safety attitudes questionnaire, a stress inventory, salivary cortisol levels, an intrinsic motivation inventory, results from a questionnaire evaluating perceptions of the simulation and suggested changes needed in the organization, a team-based score on video-assessed team performance and on selected clinical performance. Discussion The perspective is to provide new knowledge on contextual effects of different simulation settings. Trial registration ClincialTrials.gov NCT01792674. PMID:23870501
Sørensen, Jette Led; Van der Vleuten, Cees; Lindschou, Jane; Gluud, Christian; Østergaard, Doris; LeBlanc, Vicki; Johansen, Marianne; Ekelund, Kim; Albrechtsen, Charlotte Krebs; Pedersen, Berit Woetman; Kjærgaard, Hanne; Weikop, Pia; Ottesen, Bent
2013-07-17
Unexpected obstetric emergencies threaten the safety of pregnant women. As emergencies are rare, they are difficult to learn. Therefore, simulation-based medical education (SBME) seems relevant. In non-systematic reviews on SBME, medical simulation has been suggested to be associated with improved learner outcomes. However, many questions on how SBME can be optimized remain unanswered. One unresolved issue is how 'in situ simulation' (ISS) versus 'off site simulation' (OSS) impact learning. ISS means simulation-based training in the actual patient care unit (in other words, the labor room and operating room). OSS means training in facilities away from the actual patient care unit, either at a simulation centre or in hospital rooms that have been set up for this purpose. The objective of this randomized trial is to study the effect of ISS versus OSS on individual learning outcome, safety attitude, motivation, stress, and team performance amongst multi-professional obstetric-anesthesia teams.The trial is a single-centre randomized superiority trial including 100 participants. The inclusion criteria were health-care professionals employed at the department of obstetrics or anesthesia at Rigshospitalet, Copenhagen, who were working on shifts and gave written informed consent. Exclusion criteria were managers with staff responsibilities, and staff who were actively taking part in preparation of the trial. The same obstetric multi-professional training was conducted in the two simulation settings. The experimental group was exposed to training in the ISS setting, and the control group in the OSS setting. The primary outcome is the individual score on a knowledge test. Exploratory outcomes are individual scores on a safety attitudes questionnaire, a stress inventory, salivary cortisol levels, an intrinsic motivation inventory, results from a questionnaire evaluating perceptions of the simulation and suggested changes needed in the organization, a team-based score on video-assessed team performance and on selected clinical performance. The perspective is to provide new knowledge on contextual effects of different simulation settings. ClincialTrials.gov NCT01792674.
Coupling GIS and multivariate approaches to reference site selection for wadeable stream monitoring.
Collier, Kevin J; Haigh, Andy; Kelly, Johlene
2007-04-01
Geographic Information System (GIS) was used to identify potential reference sites for wadeable stream monitoring, and multivariate analyses were applied to test whether invertebrate communities reflected a priori spatial and stream type classifications. We identified potential reference sites in segments with unmodified vegetation cover adjacent to the stream and in >85% of the upstream catchment. We then used various landcover, amenity and environmental impact databases to eliminate sites that had potential anthropogenic influences upstream and that fell into a range of access classes. Each site identified by this process was coded by four dominant stream classes and seven zones, and 119 candidate sites were randomly selected for follow-up assessment. This process yielded 16 sites conforming to reference site criteria using a conditional-probabilistic design, and these were augmented by an additional 14 existing or special interest reference sites. Non-metric multidimensional scaling (NMS) analysis of percent abundance invertebrate data indicated significant differences in community composition among some of the zones and stream classes identified a priori providing qualified support for this framework in reference site selection. NMS analysis of a range standardised condition and diversity metrics derived from the invertebrate data indicated a core set of 26 closely related sites, and four outliers that were considered atypical of reference site conditions and subsequently dropped from the network. Use of GIS linked to stream typology, available spatial databases and aerial photography greatly enhanced the objectivity and efficiency of reference site selection. The multi-metric ordination approach reduced variability among stream types and bias associated with non-random site selection, and provided an effective way to identify representative reference sites.
Literature-based discovery of diabetes- and ROS-related targets
2010-01-01
Background Reactive oxygen species (ROS) are known mediators of cellular damage in multiple diseases including diabetic complications. Despite its importance, no comprehensive database is currently available for the genes associated with ROS. Methods We present ROS- and diabetes-related targets (genes/proteins) collected from the biomedical literature through a text mining technology. A web-based literature mining tool, SciMiner, was applied to 1,154 biomedical papers indexed with diabetes and ROS by PubMed to identify relevant targets. Over-represented targets in the ROS-diabetes literature were obtained through comparisons against randomly selected literature. The expression levels of nine genes, selected from the top ranked ROS-diabetes set, were measured in the dorsal root ganglia (DRG) of diabetic and non-diabetic DBA/2J mice in order to evaluate the biological relevance of literature-derived targets in the pathogenesis of diabetic neuropathy. Results SciMiner identified 1,026 ROS- and diabetes-related targets from the 1,154 biomedical papers (http://jdrf.neurology.med.umich.edu/ROSDiabetes/). Fifty-three targets were significantly over-represented in the ROS-diabetes literature compared to randomly selected literature. These over-represented targets included well-known members of the oxidative stress response including catalase, the NADPH oxidase family, and the superoxide dismutase family of proteins. Eight of the nine selected genes exhibited significant differential expression between diabetic and non-diabetic mice. For six genes, the direction of expression change in diabetes paralleled enhanced oxidative stress in the DRG. Conclusions Literature mining compiled ROS-diabetes related targets from the biomedical literature and led us to evaluate the biological relevance of selected targets in the pathogenesis of diabetic neuropathy. PMID:20979611
Metabolite and transcript markers for the prediction of potato drought tolerance.
Sprenger, Heike; Erban, Alexander; Seddig, Sylvia; Rudack, Katharina; Thalhammer, Anja; Le, Mai Q; Walther, Dirk; Zuther, Ellen; Köhl, Karin I; Kopka, Joachim; Hincha, Dirk K
2018-04-01
Potato (Solanum tuberosum L.) is one of the most important food crops worldwide. Current potato varieties are highly susceptible to drought stress. In view of global climate change, selection of cultivars with improved drought tolerance and high yield potential is of paramount importance. Drought tolerance breeding of potato is currently based on direct selection according to yield and phenotypic traits and requires multiple trials under drought conditions. Marker-assisted selection (MAS) is cheaper, faster and reduces classification errors caused by noncontrolled environmental effects. We analysed 31 potato cultivars grown under optimal and reduced water supply in six independent field trials. Drought tolerance was determined as tuber starch yield. Leaf samples from young plants were screened for preselected transcript and nontargeted metabolite abundance using qRT-PCR and GC-MS profiling, respectively. Transcript marker candidates were selected from a published RNA-Seq data set. A Random Forest machine learning approach extracted metabolite and transcript markers for drought tolerance prediction with low error rates of 6% and 9%, respectively. Moreover, by combining transcript and metabolite markers, the prediction error was reduced to 4.3%. Feature selection from Random Forest models allowed model minimization, yielding a minimal combination of only 20 metabolite and transcript markers that were successfully tested for their reproducibility in 16 independent agronomic field trials. We demonstrate that a minimum combination of transcript and metabolite markers sampled at early cultivation stages predicts potato yield stability under drought largely independent of seasonal and regional agronomic conditions. © 2017 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
An Overview of Randomization and Minimization Programs for Randomized Clinical Trials
Saghaei, Mahmoud
2011-01-01
Randomization is an essential component of sound clinical trials, which prevents selection biases and helps in blinding the allocations. Randomization is a process by which subsequent subjects are enrolled into trial groups only by chance, which essentially eliminates selection biases. A serious consequence of randomization is severe imbalance among the treatment groups with respect to some prognostic factors, which invalidate the trial results or necessitate complex and usually unreliable secondary analysis to eradicate the source of imbalances. Minimization on the other hand tends to allocate in such a way as to minimize the differences among groups, with respect to prognostic factors. Pure minimization is therefore completely deterministic, that is, one can predict the allocation of the next subject by knowing the factor levels of a previously enrolled subject and having the properties of the next subject. To eliminate the predictability of randomization, it is necessary to include some elements of randomness in the minimization algorithms. In this article brief descriptions of randomization and minimization are presented followed by introducing selected randomization and minimization programs. PMID:22606659
Albert, Steven M; King, Jennifer; Dew, Mary Amanda; Begley, Amy; Anderson, Stewart; Karp, Jordan; Gildengers, Ari; Butters, Meryl; Reynolds, Charles F
2016-01-01
Addressing subthreshold depression (indicated prevention) and vulnerabilities that increase the risk of major depression or anxiety disorders (selective prevention) is important for protecting mental health in old age. The Depression-Agency Based Collaborative (Dep-ABC) is a prevention trial involving older adults recruited from aging services sites (home care agencies, senior housing, senior centers) who meet criteria for subthreshold depression and disability. Therefore, the authors examine the effectiveness of partnerships with aging services sites for recruiting at-risk older adults, the quality of recruitment and acceptability of the Dep-ABC assessment and intervention, and the baseline status of participants. Dep-ABC is a single-blind randomized controlled prevention trial set in aging services settings but with centralized screening, randomization, in-home assessments, and follow-up. Its intervention arm involves six to eight sessions of problem-solving therapy, in which older adults aged 60+ learn to break down problems that affect well-being and develop strategies to address them. We examined participation rates to assess quality of recruitment across sites and level of disability according to service use. Dep-ABC randomized 104 participants, 68.4% of eligible older adults. Screening using self-reported disability successfully netted a sample in which 74% received home care agency services, with remaining participants similarly impaired in structured self-reports of impairment and on observed performance tests. Direct outreach to aging services providers is an effective way to identify older adults with service needs at high risk of major depression. Problem-solving therapy is acceptable to this population and can be added to current services. Copyright © 2015 American Association for Geriatric Psychiatry. Published by Elsevier Inc. All rights reserved.
Albert, Steven M.; King, Jennifer; Dew, Mary Amanda; Begley, Amy; Anderson, Stewart; Karp, Jordan; Gildengers, Ari; Butters, Meryl; Reynolds, Charles F.
2015-01-01
Background Addressing subthreshold depression (indicated prevention) as well as vulnerabilities that increase the risk of major depression or anxiety disorders (selective prevention) is important for protecting mental health in old age. The Depression-Agency Based Collaborative is a prevention trial involving older adults recruited from aging services sites (home care agencies, senior housing senior centers) who meet criteria for subthreshold depression and disability. Objective To examine (i) the effectiveness of partnerships with aging services sites for recruiting at-risk older adults, (ii) the quality of recruitment and acceptability of the Dep-ABC assessment and intervention, and (iii) the baseline status of participants. Methods Dep-ABC is a single-blind randomized controlled prevention trial set in aging services settings but with centralized screening, randomization, in-home assessments, and follow-up. Its intervention arm involves 6–8 sessions of problem-solving therapy, in which older adults aged 60+ learn to break down problems that affect wellbeing and develop strategies to address them. We examined participation rates to assess quality of recruitment across sites and level of disability according to service use. Results Dep-ABC randomized 104 participants, 68.4% of eligible older adults. Screening using self-reported disability successfully netted a sample in which 74% received home care agency services, with remaining participants similarly impaired in structured self-reports of impairment and on observed performance tests. Conclusions Direct outreach to aging services providers is an effective way to identify older adults with service needs at high risk of major depression. Problem solving therapy is acceptable to this population and can be added to current services. PMID:26706911
Sciahbasi, Alessandro; Calabrò, Paolo; Sarandrea, Alessandro; Rigattieri, Stefano; Tomassini, Francesco; Sardella, Gennaro; Zavalloni, Dennis; Cortese, Bernardo; Limbruno, Ugo; Tebaldi, Matteo; Gagnor, Andrea; Rubartelli, Paolo; Zingarelli, Antonio; Valgimigli, Marco
2014-06-01
Radiation absorbed by interventional cardiologists is a frequently under-evaluated important issue. Aim is to compare radiation dose absorbed by interventional cardiologists during percutaneous coronary procedures for acute coronary syndromes comparing transradial and transfemoral access. The randomized multicentre MATRIX (Minimizing Adverse Haemorrhagic Events by TRansradial Access Site and Systemic Implementation of angioX) trial has been designed to compare the clinical outcome of patients with acute coronary syndromes treated invasively according to the access site (transfemoral vs. transradial) and to the anticoagulant therapy (bivalirudin vs. heparin). Selected experienced interventional cardiologists involved in this study have been equipped with dedicated thermoluminescent dosimeters to evaluate the radiation dose absorbed during transfemoral or right transradial or left transradial access. For each access we evaluate the radiation dose absorbed at wrist, at thorax and at eye level. Consequently the operator is equipped with three sets (transfemoral, right transradial or left transradial access) of three different dosimeters (wrist, thorax and eye dosimeter). Primary end-point of the study is the procedural radiation dose absorbed by operators at thorax. An important secondary end-point is the procedural radiation dose absorbed by operators comparing the right or left radial approach. Patient randomization is performed according to the MATRIX protocol for the femoral or radial approach. A further randomization for the radial approach is performed to compare right and left transradial access. The RAD-MATRIX study will probably consent to clarify the radiation issue for interventional cardiologist comparing transradial and transfemoral access in the setting of acute coronary syndromes. Copyright © 2014 Elsevier Inc. All rights reserved.
Soares, Marta O.; Palmer, Stephen; Ades, Anthony E.; Harrison, David; Shankar-Hari, Manu; Rowan, Kathy M.
2015-01-01
Cost-effectiveness analysis (CEA) models are routinely used to inform health care policy. Key model inputs include relative effectiveness of competing treatments, typically informed by meta-analysis. Heterogeneity is ubiquitous in meta-analysis, and random effects models are usually used when there is variability in effects across studies. In the absence of observed treatment effect modifiers, various summaries from the random effects distribution (random effects mean, predictive distribution, random effects distribution, or study-specific estimate [shrunken or independent of other studies]) can be used depending on the relationship between the setting for the decision (population characteristics, treatment definitions, and other contextual factors) and the included studies. If covariates have been measured that could potentially explain the heterogeneity, then these can be included in a meta-regression model. We describe how covariates can be included in a network meta-analysis model and how the output from such an analysis can be used in a CEA model. We outline a model selection procedure to help choose between competing models and stress the importance of clinical input. We illustrate the approach with a health technology assessment of intravenous immunoglobulin for the management of adult patients with severe sepsis in an intensive care setting, which exemplifies how risk of bias information can be incorporated into CEA models. We show that the results of the CEA and value-of-information analyses are sensitive to the model and highlight the importance of sensitivity analyses when conducting CEA in the presence of heterogeneity. The methods presented extend naturally to heterogeneity in other model inputs, such as baseline risk. PMID:25712447
Welton, Nicky J; Soares, Marta O; Palmer, Stephen; Ades, Anthony E; Harrison, David; Shankar-Hari, Manu; Rowan, Kathy M
2015-07-01
Cost-effectiveness analysis (CEA) models are routinely used to inform health care policy. Key model inputs include relative effectiveness of competing treatments, typically informed by meta-analysis. Heterogeneity is ubiquitous in meta-analysis, and random effects models are usually used when there is variability in effects across studies. In the absence of observed treatment effect modifiers, various summaries from the random effects distribution (random effects mean, predictive distribution, random effects distribution, or study-specific estimate [shrunken or independent of other studies]) can be used depending on the relationship between the setting for the decision (population characteristics, treatment definitions, and other contextual factors) and the included studies. If covariates have been measured that could potentially explain the heterogeneity, then these can be included in a meta-regression model. We describe how covariates can be included in a network meta-analysis model and how the output from such an analysis can be used in a CEA model. We outline a model selection procedure to help choose between competing models and stress the importance of clinical input. We illustrate the approach with a health technology assessment of intravenous immunoglobulin for the management of adult patients with severe sepsis in an intensive care setting, which exemplifies how risk of bias information can be incorporated into CEA models. We show that the results of the CEA and value-of-information analyses are sensitive to the model and highlight the importance of sensitivity analyses when conducting CEA in the presence of heterogeneity. The methods presented extend naturally to heterogeneity in other model inputs, such as baseline risk. © The Author(s) 2015.
Benchmarking protein classification algorithms via supervised cross-validation.
Kertész-Farkas, Attila; Dhir, Somdutta; Sonego, Paolo; Pacurar, Mircea; Netoteia, Sergiu; Nijveen, Harm; Kuzniar, Arnold; Leunissen, Jack A M; Kocsor, András; Pongor, Sándor
2008-04-24
Development and testing of protein classification algorithms are hampered by the fact that the protein universe is characterized by groups vastly different in the number of members, in average protein size, similarity within group, etc. Datasets based on traditional cross-validation (k-fold, leave-one-out, etc.) may not give reliable estimates on how an algorithm will generalize to novel, distantly related subtypes of the known protein classes. Supervised cross-validation, i.e., selection of test and train sets according to the known subtypes within a database has been successfully used earlier in conjunction with the SCOP database. Our goal was to extend this principle to other databases and to design standardized benchmark datasets for protein classification. Hierarchical classification trees of protein categories provide a simple and general framework for designing supervised cross-validation strategies for protein classification. Benchmark datasets can be designed at various levels of the concept hierarchy using a simple graph-theoretic distance. A combination of supervised and random sampling was selected to construct reduced size model datasets, suitable for algorithm comparison. Over 3000 new classification tasks were added to our recently established protein classification benchmark collection that currently includes protein sequence (including protein domains and entire proteins), protein structure and reading frame DNA sequence data. We carried out an extensive evaluation based on various machine-learning algorithms such as nearest neighbor, support vector machines, artificial neural networks, random forests and logistic regression, used in conjunction with comparison algorithms, BLAST, Smith-Waterman, Needleman-Wunsch, as well as 3D comparison methods DALI and PRIDE. The resulting datasets provide lower, and in our opinion more realistic estimates of the classifier performance than do random cross-validation schemes. A combination of supervised and random sampling was used to construct model datasets, suitable for algorithm comparison.
Grudniewicz, Agnes; Bhattacharyya, Onil; McKibbon, K Ann; Straus, Sharon E
2015-11-04
Printed educational materials (PEMs) are a frequently used tool to disseminate clinical information and attempt to change behavior within primary care. However, their effect on clinician behavior is limited. In this study, we explored how PEMs can be redesigned to better meet the needs of primary care physicians (PCPs) and whether usability and selection can be increased when design principles and user preferences are used. We redesigned a publicly available PEM using physician preferences, design principles, and graphic designer support. We invited PCPs to select their preferred document between the redesigned and original versions in a discrete choice experiment, followed by an assessment of usability with the System Usability Scale and a think aloud process. We conducted this study in both a controlled and opportunistic setting to determine whether usability testing results vary by study location. Think aloud data was thematically analyzed, and results were interpreted using the Technology Acceptance Model. One hundred and eighty four PCPs participated in the discrete choice experiment at the 2014 Family Medicine Forum, a large Canadian conference for family physicians. Of these, 87.7 % preferred the redesigned version. Follow-up interviews were held with a randomly selected group of seven participants. We repeated this in a controlled setting in Toronto, Canada, with a set of 14 participants. Using the System Usability Scale, we found that usability scores were significantly increased with the redesign (p < 0.001). We also found that when PCPs were given the choice between the two versions, they selected the redesigned version as their preferred PEM more often than the original (p < 0.001). Results did not appear to differ between the opportunistic and controlled setting. We used the results of the think aloud process to add to a list of end user preferences developed in a previous study. We found that redesigning a PEM with user preferences and design principles can improve its usability and result in the PEM being selected more often than the original. We feel this finding supports the involvement of the user, application of design principles, and the assistance of a graphic designer in the development of PEMs.
Foldamer hypothesis for the growth and sequence differentiation of prebiotic polymers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Guseva, Elizaveta; Zuckermann, Ronald N.; Dill, Ken A.
It is not known how life originated. It is thought that prebiotic processes were able to synthesize short random polymers. However, then, how do short-chain molecules spontaneously grow longer? Also, how would random chains grow more informational and become autocatalytic (i.e., increasing their own concentrations)? We study the folding and binding of random sequences of hydrophobic ( H) and polar ( P) monomers in a computational model. We find that even short hydrophobic polar ( HP) chains can collapse into relatively compact structures, exposing hydrophobic surfaces. In this way, they act as primitive versions of today’s protein catalysts, elongating othermore » such HP polymers as ribosomes would now do. Such foldamer catalysts are shown to form an autocatalytic set, through which short chains grow into longer chains that have particular sequences. An attractive feature of this model is that it does not overconverge to a single solution; it gives ensembles that could further evolve under selection. This mechanism describes how specific sequences and conformations could contribute to the chemistry-to-biology (CTB) transition.« less
Foldamer hypothesis for the growth and sequence differentiation of prebiotic polymers
Guseva, Elizaveta; Zuckermann, Ronald N.; Dill, Ken A.
2017-08-22
It is not known how life originated. It is thought that prebiotic processes were able to synthesize short random polymers. However, then, how do short-chain molecules spontaneously grow longer? Also, how would random chains grow more informational and become autocatalytic (i.e., increasing their own concentrations)? We study the folding and binding of random sequences of hydrophobic ( H) and polar ( P) monomers in a computational model. We find that even short hydrophobic polar ( HP) chains can collapse into relatively compact structures, exposing hydrophobic surfaces. In this way, they act as primitive versions of today’s protein catalysts, elongating othermore » such HP polymers as ribosomes would now do. Such foldamer catalysts are shown to form an autocatalytic set, through which short chains grow into longer chains that have particular sequences. An attractive feature of this model is that it does not overconverge to a single solution; it gives ensembles that could further evolve under selection. This mechanism describes how specific sequences and conformations could contribute to the chemistry-to-biology (CTB) transition.« less
Foldamer hypothesis for the growth and sequence differentiation of prebiotic polymers
Guseva, Elizaveta; Zuckermann, Ronald N.; Dill, Ken A.
2017-01-01
It is not known how life originated. It is thought that prebiotic processes were able to synthesize short random polymers. However, then, how do short-chain molecules spontaneously grow longer? Also, how would random chains grow more informational and become autocatalytic (i.e., increasing their own concentrations)? We study the folding and binding of random sequences of hydrophobic (H) and polar (P) monomers in a computational model. We find that even short hydrophobic polar (HP) chains can collapse into relatively compact structures, exposing hydrophobic surfaces. In this way, they act as primitive versions of today’s protein catalysts, elongating other such HP polymers as ribosomes would now do. Such foldamer catalysts are shown to form an autocatalytic set, through which short chains grow into longer chains that have particular sequences. An attractive feature of this model is that it does not overconverge to a single solution; it gives ensembles that could further evolve under selection. This mechanism describes how specific sequences and conformations could contribute to the chemistry-to-biology (CTB) transition. PMID:28831002
NASA Astrophysics Data System (ADS)
Castagnoli, Giuseppe
2017-05-01
The usual representation of quantum algorithms, limited to the process of solving the problem, is physically incomplete as it lacks the initial measurement. We extend it to the process of setting the problem. An initial measurement selects a problem setting at random, and a unitary transformation sends it into the desired setting. The extended representation must be with respect to Bob, the problem setter, and any external observer. It cannot be with respect to Alice, the problem solver. It would tell her the problem setting and thus the solution of the problem implicit in it. In the representation to Alice, the projection of the quantum state due to the initial measurement should be postponed until the end of the quantum algorithm. In either representation, there is a unitary transformation between the initial and final measurement outcomes. As a consequence, the final measurement of any ℛ-th part of the solution could select back in time a corresponding part of the random outcome of the initial measurement; the associated projection of the quantum state should be advanced by the inverse of that unitary transformation. This, in the representation to Alice, would tell her, before she begins her problem solving action, that part of the solution. The quantum algorithm should be seen as a sum over classical histories in each of which Alice knows in advance one of the possible ℛ-th parts of the solution and performs the oracle queries still needed to find it - this for the value of ℛ that explains the algorithm's speedup. We have a relation between retrocausality ℛ and the number of oracle queries needed to solve an oracle problem quantumly. All the oracle problems examined can be solved with any value of ℛ up to an upper bound attained by the optimal quantum algorithm. This bound is always in the vicinity of 1/2 . Moreover, ℛ =1/2 always provides the order of magnitude of the number of queries needed to solve the problem in an optimal quantum way. If this were true for any oracle problem, as plausible, it would solve the quantum query complexity problem.
Wickenberg-Bolin, Ulrika; Göransson, Hanna; Fryknäs, Mårten; Gustafsson, Mats G; Isaksson, Anders
2006-03-13
Supervised learning for classification of cancer employs a set of design examples to learn how to discriminate between tumors. In practice it is crucial to confirm that the classifier is robust with good generalization performance to new examples, or at least that it performs better than random guessing. A suggested alternative is to obtain a confidence interval of the error rate using repeated design and test sets selected from available examples. However, it is known that even in the ideal situation of repeated designs and tests with completely novel samples in each cycle, a small test set size leads to a large bias in the estimate of the true variance between design sets. Therefore different methods for small sample performance estimation such as a recently proposed procedure called Repeated Random Sampling (RSS) is also expected to result in heavily biased estimates, which in turn translates into biased confidence intervals. Here we explore such biases and develop a refined algorithm called Repeated Independent Design and Test (RIDT). Our simulations reveal that repeated designs and tests based on resampling in a fixed bag of samples yield a biased variance estimate. We also demonstrate that it is possible to obtain an improved variance estimate by means of a procedure that explicitly models how this bias depends on the number of samples used for testing. For the special case of repeated designs and tests using new samples for each design and test, we present an exact analytical expression for how the expected value of the bias decreases with the size of the test set. We show that via modeling and subsequent reduction of the small sample bias, it is possible to obtain an improved estimate of the variance of classifier performance between design sets. However, the uncertainty of the variance estimate is large in the simulations performed indicating that the method in its present form cannot be directly applied to small data sets.
Ghadie, Mohamed A; Japkowicz, Nathalie; Perkins, Theodore J
2015-08-15
Stem cell differentiation is largely guided by master transcriptional regulators, but it also depends on the expression of other types of genes, such as cell cycle genes, signaling genes, metabolic genes, trafficking genes, etc. Traditional approaches to understanding gene expression patterns across multiple conditions, such as principal components analysis or K-means clustering, can group cell types based on gene expression, but they do so without knowledge of the differentiation hierarchy. Hierarchical clustering can organize cell types into a tree, but in general this tree is different from the differentiation hierarchy itself. Given the differentiation hierarchy and gene expression data at each node, we construct a weighted Euclidean distance metric such that the minimum spanning tree with respect to that metric is precisely the given differentiation hierarchy. We provide a set of linear constraints that are provably sufficient for the desired construction and a linear programming approach to identify sparse sets of weights, effectively identifying genes that are most relevant for discriminating different parts of the tree. We apply our method to microarray gene expression data describing 38 cell types in the hematopoiesis hierarchy, constructing a weighted Euclidean metric that uses just 175 genes. However, we find that there are many alternative sets of weights that satisfy the linear constraints. Thus, in the style of random-forest training, we also construct metrics based on random subsets of the genes and compare them to the metric of 175 genes. We then report on the selected genes and their biological functions. Our approach offers a new way to identify genes that may have important roles in stem cell differentiation. tperkins@ohri.ca Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
The RANDOM computer program: A linear congruential random number generator
NASA Technical Reports Server (NTRS)
Miles, R. F., Jr.
1986-01-01
The RANDOM Computer Program is a FORTRAN program for generating random number sequences and testing linear congruential random number generators (LCGs). The linear congruential form of random number generator is discussed, and the selection of parameters of an LCG for a microcomputer described. This document describes the following: (1) The RANDOM Computer Program; (2) RANDOM.MOD, the computer code needed to implement an LCG in a FORTRAN program; and (3) The RANCYCLE and the ARITH Computer Programs that provide computational assistance in the selection of parameters for an LCG. The RANDOM, RANCYCLE, and ARITH Computer Programs are written in Microsoft FORTRAN for the IBM PC microcomputer and its compatibles. With only minor modifications, the RANDOM Computer Program and its LCG can be run on most micromputers or mainframe computers.
Chinese manipulation for mechanical neck pain: a systematic review.
Lin, Jian Hua; Chiu, Thomas Tai Wing; Hu, Jia
2012-11-01
To assess whether Chinese manipulation improves pain, function/disability and global perceived effect in adults with acute/subacute/chronic neck pain. CAJ Full-text Database (Chinese), Wanfang Database (Chinese), Cochrane Database (English) and Medline (English). Literature searching was performed with the following keywords and their combination: 'manual therapy/bone setting/Chinese manipulation', 'neck/cervical pain', 'cervical vertebrae', 'cervical spondylosis/radiculopathy' and 'randomized controlled trial/review.' Two independent reviewers selected studies, extracted data and assessed risk of bias for each included study. Randomized controlled trials or quasi-randomized controlled trials on the effect of Chinese manipulation in treating adult patients with neck pain were selected. Mean differences with 95% confidence intervals (CI) were calculated. Quality of the evidence was assessed by the Grading of Recommendations, Assessment, Development and Evaluation (GRADE) approach. Four studies (610 participants) were included in this review. There was very low-quality evidence suggesting that, compared to cervical traction in sitting, Chinese manipulation produced more immediate post-intervention pain relief (mean difference: -1.06; 95% CI: -1.37~ -0.75; P < 0.001) and improvement of global signs and symptoms (mean difference: -3.81; 95% CI: -4.71 ~ -2.91; P < 0.001). Very low-quality evidence showed that Chinese manipulation alone was superior to Chinese traditional massage in immediate post-intervention pain relief (mean difference: -2.02; 95% CI: -2.78~ -1.26; P < 0.001). There was limited evidence showing Chinese manipulation could produce short-term improvement for neck pain.
Efficiency of extracting stereo-driven object motions
Jain, Anshul; Zaidi, Qasim
2013-01-01
Most living things and many nonliving things deform as they move, requiring observers to separate object motions from object deformations. When the object is partially occluded, the task becomes more difficult because it is not possible to use two-dimensional (2-D) contour correlations (Cohen, Jain, & Zaidi, 2010). That leaves dynamic depth matching across the unoccluded views as the main possibility. We examined the role of stereo cues in extracting motion of partially occluded and deforming three-dimensional (3-D) objects, simulated by disk-shaped random-dot stereograms set at randomly assigned depths and placed uniformly around a circle. The stereo-disparities of the disks were temporally oscillated to simulate clockwise or counterclockwise rotation of the global shape. To dynamically deform the global shape, random disparity perturbation was added to each disk's depth on each stimulus frame. At low perturbation, observers reported rotation directions consistent with the global shape, even against local motion cues, but performance deteriorated at high perturbation. Using 3-D global shape correlations, we formulated an optimal Bayesian discriminator for rotation direction. Based on rotation discrimination thresholds, human observers were 75% as efficient as the optimal model, demonstrating that global shapes derived from stereo cues facilitate inferences of object motions. To complement reports of stereo and motion integration in extrastriate cortex, our results suggest the possibilities that disparity selectivity and feature tracking are linked, or that global motion selective neurons can be driven purely from disparity cues. PMID:23325345
Ranking the whole MEDLINE database according to a large training set using text indexing.
Suomela, Brian P; Andrade, Miguel A
2005-03-24
The MEDLINE database contains over 12 million references to scientific literature, with about 3/4 of recent articles including an abstract of the publication. Retrieval of entries using queries with keywords is useful for human users that need to obtain small selections. However, particular analyses of the literature or database developments may need the complete ranking of all the references in the MEDLINE database as to their relevance to a topic of interest. This report describes a method that does this ranking using the differences in word content between MEDLINE entries related to a topic and the whole of MEDLINE, in a computational time appropriate for an article search query engine. We tested the capabilities of our system to retrieve MEDLINE references which are relevant to the subject of stem cells. We took advantage of the existing annotation of references with terms from the MeSH hierarchical vocabulary (Medical Subject Headings, developed at the National Library of Medicine). A training set of 81,416 references was constructed by selecting entries annotated with the MeSH term stem cells or some child in its sub tree. Frequencies of all nouns, verbs, and adjectives in the training set were computed and the ratios of word frequencies in the training set to those in the entire MEDLINE were used to score references. Self-consistency of the algorithm, benchmarked with a test set containing the training set and an equal number of references randomly selected from MEDLINE was better using nouns (79%) than adjectives (73%) or verbs (70%). The evaluation of the system with 6,923 references not used for training, containing 204 articles relevant to stem cells according to a human expert, indicated a recall of 65% for a precision of 65%. This strategy appears to be useful for predicting the relevance of MEDLINE references to a given concept. The method is simple and can be used with any user-defined training set. Choice of the part of speech of the words used for classification has important effects on performance. Lists of words, scripts, and additional information are available from the web address http://www.ogic.ca/projects/ks2004/.
Peng, Youyi; Keenan, Susan M; Zhang, Qiang; Kholodovych, Vladyslav; Welsh, William J
2005-03-10
Three-dimensional quantitative structure-activity relationship (3D-QSAR) models were constructed using comparative molecular field analysis (CoMFA) on a series of opioid receptor antagonists. To obtain statistically significant and robust CoMFA models, a sizable data set of naltrindole and naltrexone analogues was assembled by pooling biological and structural data from independent studies. A process of "leave one data set out", similar to the traditional "leave one out" cross-validation procedure employed in partial least squares (PLS) analysis, was utilized to study the feasibility of pooling data in the present case. These studies indicate that our approach yields statistically significant and highly predictive CoMFA models from the pooled data set of delta, mu, and kappa opioid receptor antagonists. All models showed excellent internal predictability and self-consistency: q(2) = 0.69/r(2) = 0.91 (delta), q(2) = 0.67/r(2) = 0.92 (mu), and q(2) = 0.60/r(2) = 0.96 (kappa). The CoMFA models were further validated using two separate test sets: one test set was selected randomly from the pooled data set, while the other test set was retrieved from other published sources. The overall excellent agreement between CoMFA-predicted and experimental binding affinities for a structurally diverse array of ligands across all three opioid receptor subtypes gives testimony to the superb predictive power of these models. CoMFA field analysis demonstrated that the variations in binding affinity of opioid antagonists are dominated by steric rather than electrostatic interactions with the three opioid receptor binding sites. The CoMFA steric-electrostatic contour maps corresponding to the delta, mu, and kappa opioid receptor subtypes reflected the characteristic similarities and differences in the familiar "message-address" concept of opioid receptor ligands. Structural modifications to increase selectivity for the delta over mu and kappa opioid receptors have been predicted on the basis of the CoMFA contour maps. The structure-activity relationships (SARs) together with the CoMFA models should find utility for the rational design of subtype-selective opioid receptor antagonists.
Hansen, Adam G.; Beauchamp, David A.
2014-01-01
Most predators eat only a subset of possible prey. However, studies evaluating diet selection rarely measure prey availability in a manner that accounts for temporal–spatial overlap with predators, the sensory mechanisms employed to detect prey, and constraints on prey capture.We evaluated the diet selection of cutthroat trout (Oncorhynchus clarkii) feeding on a diverse planktivore assemblage in Lake Washington to test the hypothesis that the diet selection of piscivores would reflect random (opportunistic) as opposed to non-random (targeted) feeding, after accounting for predator–prey overlap, visual detection and capture constraints.Diets of cutthroat trout were sampled in autumn 2005, when the abundance of transparent, age-0 longfin smelt (Spirinchus thaleichthys) was low, and 2006, when the abundance of smelt was nearly seven times higher. Diet selection was evaluated separately using depth-integrated and depth-specific (accounted for predator–prey overlap) prey abundance. The abundance of different prey was then adjusted for differences in detectability and vulnerability to predation to see whether these factors could explain diet selection.In 2005, cutthroat trout fed non-randomly by selecting against the smaller, transparent age-0 longfin smelt, but for the larger age-1 longfin smelt. After adjusting prey abundance for visual detection and capture, cutthroat trout fed randomly. In 2006, depth-integrated and depth-specific abundance explained the diets of cutthroat trout well, indicating random feeding. Feeding became non-random after adjusting for visual detection and capture. Cutthroat trout selected strongly for age-0 longfin smelt, but against similar sized threespine stickleback (Gasterosteus aculeatus) and larger age-1 longfin smelt in 2006. Overlap with juvenile sockeye salmon (O. nerka) was minimal in both years, and sockeye salmon were rare in the diets of cutthroat trout.The direction of the shift between random and non-random selection depended on the presence of a weak versus a strong year class of age-0 longfin smelt. These fish were easy to catch, but hard to see. When their density was low, poor detection could explain their rarity in the diet. When their density was high, poor detection was compensated by higher encounter rates with cutthroat trout, sufficient to elicit a targeted feeding response. The nature of the feeding selectivity of a predator can be highly dependent on fluctuations in the abundance and suitability of key prey.
Group Counseling With Emotionally Disturbed School Children in Taiwan.
ERIC Educational Resources Information Center
Chiu, Peter
The application of group counseling to emotionally disturbed school children in Chinese culture was examined. Two junior high schools located in Tao-Yuan Province were randomly selected with two eighth-grade classes randomly selected from each school. Ten emotionally disturbed students were chosen from each class and randomly assigned to two…
Sample Selection in Randomized Experiments: A New Method Using Propensity Score Stratified Sampling
ERIC Educational Resources Information Center
Tipton, Elizabeth; Hedges, Larry; Vaden-Kiernan, Michael; Borman, Geoffrey; Sullivan, Kate; Caverly, Sarah
2014-01-01
Randomized experiments are often seen as the "gold standard" for causal research. Despite the fact that experiments use random assignment to treatment conditions, units are seldom selected into the experiment using probability sampling. Very little research on experimental design has focused on how to make generalizations to well-defined…
On Measuring and Reducing Selection Bias with a Quasi-Doubly Randomized Preference Trial
ERIC Educational Resources Information Center
Joyce, Ted; Remler, Dahlia K.; Jaeger, David A.; Altindag, Onur; O'Connell, Stephen D.; Crockett, Sean
2017-01-01
Randomized experiments provide unbiased estimates of treatment effects, but are costly and time consuming. We demonstrate how a randomized experiment can be leveraged to measure selection bias by conducting a subsequent observational study that is identical in every way except that subjects choose their treatment--a quasi-doubly randomized…
van Rumste, Minouche M E; Custers, Inge M; van Wely, Madelon; Koks, Carolien A; van Weering, Hans G I; Beckers, Nicole G M; Scheffer, Gabrielle J; Broekmans, Frank J M; Hompes, Peter G A; Mochtar, Monique H; van der Veen, Fulco; Mol, Ben W J
2014-03-01
Couples with unexplained subfertility are often treated with intrauterine insemination (IUI) with ovarian stimulation, which carries the risk of multiple pregnancies. An explorative randomized controlled trial was performed comparing one cycle of IVF with elective single-embryo transfer (eSET) versus three cycles of IUI-ovarian stimulation in couples with unexplained subfertility and a poor prognosis for natural conception, to assess the economic burden of the treatment modalities. The main outcome measures were ongoing pregnancy rates and costs. This study randomly assigned 58 couples to IVF-eSET and 58 couples to IUI-ovarian stimulation. The ongoing pregnancy rates were 24% in with IVF-eSET versus 21% with IUI-ovarian stimulation, with two and three multiple pregnancies, respectively. The mean cost per included couple was significantly different: €2781 with IVF-eSET and €1876 with IUI-ovarian stimulation (P<0.01). The additional costs per ongoing pregnancy were €2456 for IVF-eSET. In couples with unexplained subfertility, one cycle of IVF-eSET cost an additional €900 per couple compared with three cycles of IUI-ovarian stimulation, for no increase in ongoing pregnancy rates or decrease in multiple pregnancies. When IVF-eSET results in higher ongoing pregnancy rates, IVF would be the preferred treatment. Couples that have been trying to conceive unsuccessfully are often treated with intrauterine insemination (IUI) and medication to improve egg production (ovarian stimulation). This treatment carries the risk of multiple pregnancies like twins. We performed an explorative study among those couples that had a poor prognosis for natural conception. One cycle of IVF with transfer of one selected embryo (elective single-embryo transfer, eSET) was compared with three cycles of IUI-ovarian stimulation. The aim of this study was to assess the economic burden of both treatments. The Main outcome measures were number of good pregnancies above 12weeks and costs. We randomly assigned 58 couples to IVF-eSET and 58 couples to IUI-ovarian stimulation. The ongoing pregnancy rates were comparable: 24% with IVF-eSET versus 21% with IUI-ovarian stimulation. There were two multiple pregnancies with IVF-eSET and three multiple pregnancies with IUI-ovarian stimulation. The mean cost per included couple was significantly different, €2781 with IVF-eSET and €1876 with IUI-ovarian stimulation. The additional costs per ongoing pregnancy were €2456 for IVF-eSET. In couples with unexplained subfertility, one cycle of IVF-eSET costed an additional €900 per couple compared to three cycles of IUI-ovarian stimulation, for no increase in ongoing pregnancy rates or decrease in multiple pregnancies. We conclude that IUI-ovarian stimulation is the preferred treatment to start with. When IVF-eSET results in a higher ongoing pregnancy rate (>38%), IVF would be the preferred treatment. Copyright © 2013 Reproductive Healthcare Ltd. Published by Elsevier Ltd. All rights reserved.
Crabtree, Nathaniel M; Moore, Jason H; Bowyer, John F; George, Nysia I
2017-01-01
A computational evolution system (CES) is a knowledge discovery engine that can identify subtle, synergistic relationships in large datasets. Pareto optimization allows CESs to balance accuracy with model complexity when evolving classifiers. Using Pareto optimization, a CES is able to identify a very small number of features while maintaining high classification accuracy. A CES can be designed for various types of data, and the user can exploit expert knowledge about the classification problem in order to improve discrimination between classes. These characteristics give CES an advantage over other classification and feature selection algorithms, particularly when the goal is to identify a small number of highly relevant, non-redundant biomarkers. Previously, CESs have been developed only for binary class datasets. In this study, we developed a multi-class CES. The multi-class CES was compared to three common feature selection and classification algorithms: support vector machine (SVM), random k-nearest neighbor (RKNN), and random forest (RF). The algorithms were evaluated on three distinct multi-class RNA sequencing datasets. The comparison criteria were run-time, classification accuracy, number of selected features, and stability of selected feature set (as measured by the Tanimoto distance). The performance of each algorithm was data-dependent. CES performed best on the dataset with the smallest sample size, indicating that CES has a unique advantage since the accuracy of most classification methods suffer when sample size is small. The multi-class extension of CES increases the appeal of its application to complex, multi-class datasets in order to identify important biomarkers and features.
Lippman, Sheri A.; Shade, Starley B.; Hubbard, Alan E.
2011-01-01
Background Intervention effects estimated from non-randomized intervention studies are plagued by biases, yet social or structural intervention studies are rarely randomized. There are underutilized statistical methods available to mitigate biases due to self-selection, missing data, and confounding in longitudinal, observational data permitting estimation of causal effects. We demonstrate the use of Inverse Probability Weighting (IPW) to evaluate the effect of participating in a combined clinical and social STI/HIV prevention intervention on reduction of incident chlamydia and gonorrhea infections among sex workers in Brazil. Methods We demonstrate the step-by-step use of IPW, including presentation of the theoretical background, data set up, model selection for weighting, application of weights, estimation of effects using varied modeling procedures, and discussion of assumptions for use of IPW. Results 420 sex workers contributed data on 840 incident chlamydia and gonorrhea infections. Participators were compared to non-participators following application of inverse probability weights to correct for differences in covariate patterns between exposed and unexposed participants and between those who remained in the intervention and those who were lost-to-follow-up. Estimators using four model selection procedures provided estimates of intervention effect between odds ratio (OR) .43 (95% CI:.22-.85) and .53 (95% CI:.26-1.1). Conclusions After correcting for selection bias, loss-to-follow-up, and confounding, our analysis suggests a protective effect of participating in the Encontros intervention. Evaluations of behavioral, social, and multi-level interventions to prevent STI can benefit by introduction of weighting methods such as IPW. PMID:20375927
Wallace, Carol A; Giannini, Edward H; Huang, Bin; Itert, Lukasz; Ruperto, Nicolino
2011-07-01
To prospectively validate the preliminary criteria for clinical inactive disease (CID) in patients with select categories of juvenile idiopathic arthritis (JIA). We used the process for development of classification and response criteria recommended by the American College of Rheumatology Quality of Care Committee. Patient-visit profiles were extracted from the phase III randomized controlled trial of infliximab in polyarticular-course JIA (i.e., patients considered to resemble those with select categories of JIA) and sent to an international group of expert physician raters. Using the physician ratings as the gold standard, the sensitivity and specificity were calculated using the preliminary criteria. Modifications to the criteria were made, and these were sent to a larger group of pediatric rheumatologists to determine quantitative, face, and content validity. Variables weighted heaviest by physicians when making their judgment were the number of joints with active arthritis, erythrocyte sedimentation rate (ESR), physician's global assessment, and duration of morning stiffness. Three modifications were made: the definition of uveitis, the definition of abnormal ESR, and the addition of morning stiffness. These changes did not alter the accuracy of the preliminary set. The modified criteria, termed the "criteria for CID in select categories of JIA," have excellent feasibility and face, content, criterion, and discriminant validity to detect CID in select categories of JIA. The small changes made to the preliminary criteria set did not alter the area under the receiver operating characteristic curve (0.954) or accuracy (91%), but have increased face and content validity. Copyright © 2011 by the American College of Rheumatology.
Variable selection under multiple imputation using the bootstrap in a prognostic study
Heymans, Martijn W; van Buuren, Stef; Knol, Dirk L; van Mechelen, Willem; de Vet, Henrica CW
2007-01-01
Background Missing data is a challenging problem in many prognostic studies. Multiple imputation (MI) accounts for imputation uncertainty that allows for adequate statistical testing. We developed and tested a methodology combining MI with bootstrapping techniques for studying prognostic variable selection. Method In our prospective cohort study we merged data from three different randomized controlled trials (RCTs) to assess prognostic variables for chronicity of low back pain. Among the outcome and prognostic variables data were missing in the range of 0 and 48.1%. We used four methods to investigate the influence of respectively sampling and imputation variation: MI only, bootstrap only, and two methods that combine MI and bootstrapping. Variables were selected based on the inclusion frequency of each prognostic variable, i.e. the proportion of times that the variable appeared in the model. The discriminative and calibrative abilities of prognostic models developed by the four methods were assessed at different inclusion levels. Results We found that the effect of imputation variation on the inclusion frequency was larger than the effect of sampling variation. When MI and bootstrapping were combined at the range of 0% (full model) to 90% of variable selection, bootstrap corrected c-index values of 0.70 to 0.71 and slope values of 0.64 to 0.86 were found. Conclusion We recommend to account for both imputation and sampling variation in sets of missing data. The new procedure of combining MI with bootstrapping for variable selection, results in multivariable prognostic models with good performance and is therefore attractive to apply on data sets with missing values. PMID:17629912
Sandilands, Euan A; Cameron, Sharon; Paterson, Frances; Donaldson, Sam; Briody, Lesley; Crowe, Jane; Donnelly, Julie; Thompson, Adrian; Johnston, Neil R; Mackenzie, Ivor; Uren, Neal; Goddard, Jane; Webb, David J; Megson, Ian L; Bateman, Nicholas; Eddleston, Michael
2012-02-03
Contrast-induced nephropathy is a common complication of contrast administration in patients with chronic kidney disease and diabetes. Its pathophysiology is not well understood; similarly the role of intravenous or oral acetylcysteine is unclear. Randomized controlled trials to date have been conducted without detailed knowledge of the effect of acetylcysteine on renal function. We are conducting a detailed mechanistic study of acetylcysteine on normal and impaired kidneys, both with and without contrast. This information would guide the choice of dose, route, and appropriate outcome measure for future clinical trials in patients with chronic kidney disease. We designed a 4-part study. We have set up randomised controlled cross-over studies to assess the effect of intravenous (50 mg/kg/hr for 2 hrs before contrast exposure, then 20 mg/kg/hr for 5 hrs) or oral acetylcysteine (1200 mg twice daily for 2 days, starting the day before contrast exposure) on renal function in normal and diseased kidneys, and normal kidneys exposed to contrast. We have also set up a parallel-group randomized controlled trial to assess the effect of intravenous or oral acetylcysteine on patients with chronic kidney disease stage III undergoing elective coronary angiography. The primary outcome is change in renal blood flow; secondary outcomes include change in glomerular filtration rate, tubular function, urinary proteins, and oxidative balance. Contrast-induced nephropathy represents a significant source of hospital morbidity and mortality. Over the last ten years, acetylcysteine has been administered prior to contrast to reduce the risk of contrast-induced nephropathy. Randomized controlled trials, however, have not reliably demonstrated renoprotection; a recent large randomized controlled trial assessing a dose of oral acetylcysteine selected without mechanistic insight did not reduce the incidence of contrast-induced nephropathy. Our study should reveal the mechanism of effect of acetylcysteine on renal function and identify an appropriate route for future dose response studies and in time randomized controlled trials. Clinical Trials.gov: NCT00558142; EudraCT: 2006-003509-18.
Nunn, Andrew J; Rusen, I D; Van Deun, Armand; Torrea, Gabriela; Phillips, Patrick P J; Chiang, Chen-Yuan; Squire, S Bertel; Madan, Jason; Meredith, Sarah K
2014-09-09
In contrast to drug-sensitive tuberculosis, the guidelines for the treatment of multi-drug-resistant tuberculosis (MDR-TB) have a very poor evidence base; current recommendations, based on expert opinion, are that patients should be treated for a minimum of 20 months. A series of cohort studies conducted in Bangladesh identified a nine-month regimen with very promising results. There is a need to evaluate this regimen in comparison with the currently recommended regimen in a randomized controlled trial in a variety of settings, including patients with HIV-coinfection. STREAM is a multi-centre randomized trial of non-inferiority design comparing a nine-month regimen to the treatment currently recommended by the World Health Organization in patients with MDR pulmonary TB with no evidence on line probe assay of fluoroquinolone or kanamycin resistance. The nine-month regimen includes clofazimine and high-dose moxifloxacin and can be extended to 11 months in the event of delay in smear conversion. The primary outcome is based on the bacteriological status of the patients at 27 months post-randomization. Based on the assumption that the nine-month regimen will be slightly more effective than the control regimen and, given a 10% margin of non-inferiority, a total of 400 patients are required to be enrolled. Health economics data are being collected on all patients in selected sites. The results from the study in Bangladesh and cohorts in progress elsewhere are encouraging, but for this regimen to be recommended more widely than in a research setting, robust evidence is needed from a randomized clinical trial. Results from the STREAM trial together with data from ongoing cohorts should provide the evidence necessary to revise current recommendations for the treatment for MDR-TB. This trial was registered with clincaltrials.gov (registration number: ISRCTN78372190) on 14 October 2010.
NullSeq: A Tool for Generating Random Coding Sequences with Desired Amino Acid and GC Contents.
Liu, Sophia S; Hockenberry, Adam J; Lancichinetti, Andrea; Jewett, Michael C; Amaral, Luís A N
2016-11-01
The existence of over- and under-represented sequence motifs in genomes provides evidence of selective evolutionary pressures on biological mechanisms such as transcription, translation, ligand-substrate binding, and host immunity. In order to accurately identify motifs and other genome-scale patterns of interest, it is essential to be able to generate accurate null models that are appropriate for the sequences under study. While many tools have been developed to create random nucleotide sequences, protein coding sequences are subject to a unique set of constraints that complicates the process of generating appropriate null models. There are currently no tools available that allow users to create random coding sequences with specified amino acid composition and GC content for the purpose of hypothesis testing. Using the principle of maximum entropy, we developed a method that generates unbiased random sequences with pre-specified amino acid and GC content, which we have developed into a python package. Our method is the simplest way to obtain maximally unbiased random sequences that are subject to GC usage and primary amino acid sequence constraints. Furthermore, this approach can easily be expanded to create unbiased random sequences that incorporate more complicated constraints such as individual nucleotide usage or even di-nucleotide frequencies. The ability to generate correctly specified null models will allow researchers to accurately identify sequence motifs which will lead to a better understanding of biological processes as well as more effective engineering of biological systems.
Ließ, Mareike; Schmidt, Johannes; Glaser, Bruno
2016-01-01
Tropical forests are significant carbon sinks and their soils' carbon storage potential is immense. However, little is known about the soil organic carbon (SOC) stocks of tropical mountain areas whose complex soil-landscape and difficult accessibility pose a challenge to spatial analysis. The choice of methodology for spatial prediction is of high importance to improve the expected poor model results in case of low predictor-response correlations. Four aspects were considered to improve model performance in predicting SOC stocks of the organic layer of a tropical mountain forest landscape: Different spatial predictor settings, predictor selection strategies, various machine learning algorithms and model tuning. Five machine learning algorithms: random forests, artificial neural networks, multivariate adaptive regression splines, boosted regression trees and support vector machines were trained and tuned to predict SOC stocks from predictors derived from a digital elevation model and satellite image. Topographical predictors were calculated with a GIS search radius of 45 to 615 m. Finally, three predictor selection strategies were applied to the total set of 236 predictors. All machine learning algorithms-including the model tuning and predictor selection-were compared via five repetitions of a tenfold cross-validation. The boosted regression tree algorithm resulted in the overall best model. SOC stocks ranged between 0.2 to 17.7 kg m-2, displaying a huge variability with diffuse insolation and curvatures of different scale guiding the spatial pattern. Predictor selection and model tuning improved the models' predictive performance in all five machine learning algorithms. The rather low number of selected predictors favours forward compared to backward selection procedures. Choosing predictors due to their indiviual performance was vanquished by the two procedures which accounted for predictor interaction.
Advanced reliability methods for structural evaluation
NASA Technical Reports Server (NTRS)
Wirsching, P. H.; Wu, Y.-T.
1985-01-01
Fast probability integration (FPI) methods, which can yield approximate solutions to such general structural reliability problems as the computation of the probabilities of complicated functions of random variables, are known to require one-tenth the computer time of Monte Carlo methods for a probability level of 0.001; lower probabilities yield even more dramatic differences. A strategy is presented in which a computer routine is run k times with selected perturbed values of the variables to obtain k solutions for a response variable Y. An approximating polynomial is fit to the k 'data' sets, and FPI methods are employed for this explicit form.
A comparison of cord gingival displacement with the gingitage technique.
Tupac, R G; Neacy, K
1981-11-01
Fifteen young adult dogs were divided into three groups representing 0, 7- and 21-day healing periods. Randomly selected cuspid teeth were used to compare cord gingival displacement and gingitage techniques for subgingival tooth preparation and impression making. Clinical and histologic measurements were used as a basis for comparison. Results indicate that (1) the experimental teeth were clinically healthy at the beginning of the experiment, (2) clinical health of the gingival tissues was controlled throughout the course of the experiment, and (3) within this experimental setting, there was no significant difference between the cord gingival displacement technique and the gingitage technique.
SECIMTools: a suite of metabolomics data analysis tools.
Kirpich, Alexander S; Ibarra, Miguel; Moskalenko, Oleksandr; Fear, Justin M; Gerken, Joseph; Mi, Xinlei; Ashrafi, Ali; Morse, Alison M; McIntyre, Lauren M
2018-04-20
Metabolomics has the promise to transform the area of personalized medicine with the rapid development of high throughput technology for untargeted analysis of metabolites. Open access, easy to use, analytic tools that are broadly accessible to the biological community need to be developed. While technology used in metabolomics varies, most metabolomics studies have a set of features identified. Galaxy is an open access platform that enables scientists at all levels to interact with big data. Galaxy promotes reproducibility by saving histories and enabling the sharing workflows among scientists. SECIMTools (SouthEast Center for Integrated Metabolomics) is a set of Python applications that are available both as standalone tools and wrapped for use in Galaxy. The suite includes a comprehensive set of quality control metrics (retention time window evaluation and various peak evaluation tools), visualization techniques (hierarchical cluster heatmap, principal component analysis, modular modularity clustering), basic statistical analysis methods (partial least squares - discriminant analysis, analysis of variance, t-test, Kruskal-Wallis non-parametric test), advanced classification methods (random forest, support vector machines), and advanced variable selection tools (least absolute shrinkage and selection operator LASSO and Elastic Net). SECIMTools leverages the Galaxy platform and enables integrated workflows for metabolomics data analysis made from building blocks designed for easy use and interpretability. Standard data formats and a set of utilities allow arbitrary linkages between tools to encourage novel workflow designs. The Galaxy framework enables future data integration for metabolomics studies with other omics data.