Sample records for machine statistical availability

  1. Application of statistical machine translation to public health information: a feasibility study.

    PubMed

    Kirchhoff, Katrin; Turner, Anne M; Axelrod, Amittai; Saavedra, Francisco

    2011-01-01

    Accurate, understandable public health information is important for ensuring the health of the nation. The large portion of the US population with Limited English Proficiency is best served by translations of public-health information into other languages. However, a large number of health departments and primary care clinics face significant barriers to fulfilling federal mandates to provide multilingual materials to Limited English Proficiency individuals. This article presents a pilot study on the feasibility of using freely available statistical machine translation technology to translate health promotion materials. The authors gathered health-promotion materials in English from local and national public-health websites. Spanish versions were created by translating the documents using a freely available machine-translation website. Translations were rated for adequacy and fluency, analyzed for errors, manually corrected by a human posteditor, and compared with exclusively manual translations. Machine translation plus postediting took 15-53 min per document, compared to the reported days or even weeks for the standard translation process. A blind comparison of machine-assisted and human translations of six documents revealed overall equivalency between machine-translated and manually translated materials. The analysis of translation errors indicated that the most important errors were word-sense errors. The results indicate that machine translation plus postediting may be an effective method of producing multilingual health materials with equivalent quality but lower cost compared to manual translations.

  2. Application of statistical machine translation to public health information: a feasibility study

    PubMed Central

    Turner, Anne M; Axelrod, Amittai; Saavedra, Francisco

    2011-01-01

    Objective Accurate, understandable public health information is important for ensuring the health of the nation. The large portion of the US population with Limited English Proficiency is best served by translations of public-health information into other languages. However, a large number of health departments and primary care clinics face significant barriers to fulfilling federal mandates to provide multilingual materials to Limited English Proficiency individuals. This article presents a pilot study on the feasibility of using freely available statistical machine translation technology to translate health promotion materials. Design The authors gathered health-promotion materials in English from local and national public-health websites. Spanish versions were created by translating the documents using a freely available machine-translation website. Translations were rated for adequacy and fluency, analyzed for errors, manually corrected by a human posteditor, and compared with exclusively manual translations. Results Machine translation plus postediting took 15–53 min per document, compared to the reported days or even weeks for the standard translation process. A blind comparison of machine-assisted and human translations of six documents revealed overall equivalency between machine-translated and manually translated materials. The analysis of translation errors indicated that the most important errors were word-sense errors. Conclusion The results indicate that machine translation plus postediting may be an effective method of producing multilingual health materials with equivalent quality but lower cost compared to manual translations. PMID:21498805

  3. Competitive foods available in Pennsylvania public high schools.

    PubMed

    Probart, Claudia; McDonnell, Elaine; Weirich, J Elaine; Hartman, Terryl; Bailey-Davis, Lisa; Prabhakher, Vaheedha

    2005-08-01

    This study examined the types and extent of competitive foods available in public high schools in Pennsylvania. We developed, pilot tested, and distributed surveys to school foodservice directors in a random sample of 271 high schools in Pennsylvania. Two hundred twenty-eight surveys were returned, for a response rate of 84%. Statistical analyses were performed: Descriptive statistics were used to examine the extent of competitive food sales in Pennsylvania public high schools. The survey data were analyzed using SPSS software version 11.5.1 (2002, SPSS base 11.0 for Windows, SPSS Inc, Chicago, IL). A la carte sales provide almost dollar 700/day to school foodservice programs, almost 85% of which receive no financial support from their school districts. The top-selling a la carte items are "hamburgers, pizza, and sandwiches." Ninety-four percent of respondents indicated that vending machines are accessible to students. The item most commonly offered in vending machines is bottled water (71.5%). While food items are less often available through school stores and club fund-raisers, candy is the item most commonly offered through these sources. Competitive foods are widely available in high schools. Although many of the items available are low in nutritional value, we found several of the top-selling a la carte options to be nutritious and bottled water the item most often identified as available through vending machines.

  4. The impact of the availability of school vending machines on eating behavior during lunch: the Youth Physical Activity and Nutrition Survey.

    PubMed

    Park, Sohyun; Sappenfield, William M; Huang, Youjie; Sherry, Bettylou; Bensyl, Diana M

    2010-10-01

    Childhood obesity is a major public health concern and is associated with substantial morbidities. Access to less-healthy foods might facilitate dietary behaviors that contribute to obesity. However, less-healthy foods are usually available in school vending machines. This cross-sectional study examined the prevalence of students buying snacks or beverages from school vending machines instead of buying school lunch and predictors of this behavior. Analyses were based on the 2003 Florida Youth Physical Activity and Nutrition Survey using a representative sample of 4,322 students in grades six through eight in 73 Florida public middle schools. Analyses included χ2 tests and logistic regression. The outcome measure was buying a snack or beverage from vending machines 2 or more days during the previous 5 days instead of buying lunch. The survey response rate was 72%. Eighteen percent of respondents reported purchasing a snack or beverage from a vending machine 2 or more days during the previous 5 school days instead of buying school lunch. Although healthier options were available, the most commonly purchased vending machine items were chips, pretzels/crackers, candy bars, soda, and sport drinks. More students chose snacks or beverages instead of lunch in schools where beverage vending machines were also available than did students in schools where beverage vending machines were unavailable: 19% and 7%, respectively (P≤0.05). The strongest risk factor for buying snacks or beverages from vending machines instead of buying school lunch was availability of beverage vending machines in schools (adjusted odds ratio=3.5; 95% confidence interval, 2.2 to 5.7). Other statistically significant risk factors were smoking, non-Hispanic black race/ethnicity, Hispanic ethnicity, and older age. Although healthier choices were available, the most common choices were the less-healthy foods. Schools should consider developing policies to reduce the availability of less-healthy choices in vending machines and to reduce access to beverage vending machines. Copyright © 2010 American Dietetic Association. Published by Elsevier Inc. All rights reserved.

  5. New machine learning tools for predictive vegetation mapping after climate change: Bagging and Random Forest perform better than Regression Tree Analysis

    Treesearch

    L.R. Iverson; A.M. Prasad; A. Liaw

    2004-01-01

    More and better machine learning tools are becoming available for landscape ecologists to aid in understanding species-environment relationships and to map probable species occurrence now and potentially into the future. To thal end, we evaluated three statistical models: Regression Tree Analybib (RTA), Bagging Trees (BT) and Random Forest (RF) for their utility in...

  6. Machine Learning Methods for Attack Detection in the Smart Grid.

    PubMed

    Ozay, Mete; Esnaola, Inaki; Yarman Vural, Fatos Tunay; Kulkarni, Sanjeev R; Poor, H Vincent

    2016-08-01

    Attack detection problems in the smart grid are posed as statistical learning problems for different attack scenarios in which the measurements are observed in batch or online settings. In this approach, machine learning algorithms are used to classify measurements as being either secure or attacked. An attack detection framework is provided to exploit any available prior knowledge about the system and surmount constraints arising from the sparse structure of the problem in the proposed approach. Well-known batch and online learning algorithms (supervised and semisupervised) are employed with decision- and feature-level fusion to model the attack detection problem. The relationships between statistical and geometric properties of attack vectors employed in the attack scenarios and learning algorithms are analyzed to detect unobservable attacks using statistical learning methods. The proposed algorithms are examined on various IEEE test systems. Experimental analyses show that machine learning algorithms can detect attacks with performances higher than attack detection algorithms that employ state vector estimation methods in the proposed attack detection framework.

  7. Statistical Machine Learning for Structured and High Dimensional Data

    DTIC Science & Technology

    2014-09-17

    AFRL-OSR-VA-TR-2014-0234 STATISTICAL MACHINE LEARNING FOR STRUCTURED AND HIGH DIMENSIONAL DATA Larry Wasserman CARNEGIE MELLON UNIVERSITY Final...Re . 8-98) v Prescribed by ANSI Std. Z39.18 14-06-2014 Final Dec 2009 - Aug 2014 Statistical Machine Learning for Structured and High Dimensional...area of resource-constrained statistical estimation. machine learning , high-dimensional statistics U U U UU John Lafferty 773-702-3813 > Research under

  8. Probability machines: consistent probability estimation using nonparametric learning machines.

    PubMed

    Malley, J D; Kruppa, J; Dasgupta, A; Malley, K G; Ziegler, A

    2012-01-01

    Most machine learning approaches only provide a classification for binary responses. However, probabilities are required for risk estimation using individual patient characteristics. It has been shown recently that every statistical learning machine known to be consistent for a nonparametric regression problem is a probability machine that is provably consistent for this estimation problem. The aim of this paper is to show how random forests and nearest neighbors can be used for consistent estimation of individual probabilities. Two random forest algorithms and two nearest neighbor algorithms are described in detail for estimation of individual probabilities. We discuss the consistency of random forests, nearest neighbors and other learning machines in detail. We conduct a simulation study to illustrate the validity of the methods. We exemplify the algorithms by analyzing two well-known data sets on the diagnosis of appendicitis and the diagnosis of diabetes in Pima Indians. Simulations demonstrate the validity of the method. With the real data application, we show the accuracy and practicality of this approach. We provide sample code from R packages in which the probability estimation is already available. This means that all calculations can be performed using existing software. Random forest algorithms as well as nearest neighbor approaches are valid machine learning methods for estimating individual probabilities for binary responses. Freely available implementations are available in R and may be used for applications.

  9. Derivative Free Optimization of Complex Systems with the Use of Statistical Machine Learning Models

    DTIC Science & Technology

    2015-09-12

    AFRL-AFOSR-VA-TR-2015-0278 DERIVATIVE FREE OPTIMIZATION OF COMPLEX SYSTEMS WITH THE USE OF STATISTICAL MACHINE LEARNING MODELS Katya Scheinberg...COMPLEX SYSTEMS WITH THE USE OF STATISTICAL MACHINE LEARNING MODELS 5a.  CONTRACT NUMBER 5b.  GRANT NUMBER FA9550-11-1-0239 5c.  PROGRAM ELEMENT...developed, which has been the focus of our research. 15. SUBJECT TERMS optimization, Derivative-Free Optimization, Statistical Machine Learning 16. SECURITY

  10. Machine learning: Trends, perspectives, and prospects.

    PubMed

    Jordan, M I; Mitchell, T M

    2015-07-17

    Machine learning addresses the question of how to build computers that improve automatically through experience. It is one of today's most rapidly growing technical fields, lying at the intersection of computer science and statistics, and at the core of artificial intelligence and data science. Recent progress in machine learning has been driven both by the development of new learning algorithms and theory and by the ongoing explosion in the availability of online data and low-cost computation. The adoption of data-intensive machine-learning methods can be found throughout science, technology and commerce, leading to more evidence-based decision-making across many walks of life, including health care, manufacturing, education, financial modeling, policing, and marketing. Copyright © 2015, American Association for the Advancement of Science.

  11. Combining MEDLINE and publisher data to create parallel corpora for the automatic translation of biomedical text

    PubMed Central

    2013-01-01

    Background Most of the institutional and research information in the biomedical domain is available in the form of English text. Even in countries where English is an official language, such as the United States, language can be a barrier for accessing biomedical information for non-native speakers. Recent progress in machine translation suggests that this technique could help make English texts accessible to speakers of other languages. However, the lack of adequate specialized corpora needed to train statistical models currently limits the quality of automatic translations in the biomedical domain. Results We show how a large-sized parallel corpus can automatically be obtained for the biomedical domain, using the MEDLINE database. The corpus generated in this work comprises article titles obtained from MEDLINE and abstract text automatically retrieved from journal websites, which substantially extends the corpora used in previous work. After assessing the quality of the corpus for two language pairs (English/French and English/Spanish) we use the Moses package to train a statistical machine translation model that outperforms previous models for automatic translation of biomedical text. Conclusions We have built translation data sets in the biomedical domain that can easily be extended to other languages available in MEDLINE. These sets can successfully be applied to train statistical machine translation models. While further progress should be made by incorporating out-of-domain corpora and domain-specific lexicons, we believe that this work improves the automatic translation of biomedical texts. PMID:23631733

  12. Machine learning modelling for predicting soil liquefaction susceptibility

    NASA Astrophysics Data System (ADS)

    Samui, P.; Sitharam, T. G.

    2011-01-01

    This study describes two machine learning techniques applied to predict liquefaction susceptibility of soil based on the standard penetration test (SPT) data from the 1999 Chi-Chi, Taiwan earthquake. The first machine learning technique which uses Artificial Neural Network (ANN) based on multi-layer perceptions (MLP) that are trained with Levenberg-Marquardt backpropagation algorithm. The second machine learning technique uses the Support Vector machine (SVM) that is firmly based on the theory of statistical learning theory, uses classification technique. ANN and SVM have been developed to predict liquefaction susceptibility using corrected SPT [(N1)60] and cyclic stress ratio (CSR). Further, an attempt has been made to simplify the models, requiring only the two parameters [(N1)60 and peck ground acceleration (amax/g)], for the prediction of liquefaction susceptibility. The developed ANN and SVM models have also been applied to different case histories available globally. The paper also highlights the capability of the SVM over the ANN models.

  13. Effects of promotional materials on vending sales of low-fat items in teachers' lounges.

    PubMed

    Fiske, Amy; Cullen, Karen Weber

    2004-01-01

    This study examined the impact of an environmental intervention in the form of promotional materials and increased availability of low-fat items on vending machine sales. Ten vending machines were selected and randomly assigned to one of three conditions: control, or one of two experimental conditions. Vending machines in the two intervention conditions received three additional low-fat selections. Low-fat items were promoted at two levels: labels (intervention I), and labels plus signs (intervention II). The number of individual items sold and the total revenue generated was recorded weekly for each machine for 4 weeks. Use of promotional materials resulted in a small, but not significant, increase in the number of low-fat items sold, although machine sales were not significantly impacted by the change in product selection. Results of this study, although not statistically significant, suggest that environmental change may be a realistic means of positively influencing consumer behavior.

  14. Statistical and Machine Learning forecasting methods: Concerns and ways forward

    PubMed Central

    Makridakis, Spyros; Assimakopoulos, Vassilios

    2018-01-01

    Machine Learning (ML) methods have been proposed in the academic literature as alternatives to statistical ones for time series forecasting. Yet, scant evidence is available about their relative performance in terms of accuracy and computational requirements. The purpose of this paper is to evaluate such performance across multiple forecasting horizons using a large subset of 1045 monthly time series used in the M3 Competition. After comparing the post-sample accuracy of popular ML methods with that of eight traditional statistical ones, we found that the former are dominated across both accuracy measures used and for all forecasting horizons examined. Moreover, we observed that their computational requirements are considerably greater than those of statistical methods. The paper discusses the results, explains why the accuracy of ML models is below that of statistical ones and proposes some possible ways forward. The empirical results found in our research stress the need for objective and unbiased ways to test the performance of forecasting methods that can be achieved through sizable and open competitions allowing meaningful comparisons and definite conclusions. PMID:29584784

  15. Principle of maximum entropy for reliability analysis in the design of machine components

    NASA Astrophysics Data System (ADS)

    Zhang, Yimin

    2018-03-01

    We studied the reliability of machine components with parameters that follow an arbitrary statistical distribution using the principle of maximum entropy (PME). We used PME to select the statistical distribution that best fits the available information. We also established a probability density function (PDF) and a failure probability model for the parameters of mechanical components using the concept of entropy and the PME. We obtained the first four moments of the state function for reliability analysis and design. Furthermore, we attained an estimate of the PDF with the fewest human bias factors using the PME. This function was used to calculate the reliability of the machine components, including a connecting rod, a vehicle half-shaft, a front axle, a rear axle housing, and a leaf spring, which have parameters that typically follow a non-normal distribution. Simulations were conducted for comparison. This study provides a design methodology for the reliability of mechanical components for practical engineering projects.

  16. Data-driven advice for applying machine learning to bioinformatics problems

    PubMed Central

    Olson, Randal S.; La Cava, William; Mustahsan, Zairah; Varik, Akshay; Moore, Jason H.

    2017-01-01

    As the bioinformatics field grows, it must keep pace not only with new data but with new algorithms. Here we contribute a thorough analysis of 13 state-of-the-art, commonly used machine learning algorithms on a set of 165 publicly available classification problems in order to provide data-driven algorithm recommendations to current researchers. We present a number of statistical and visual comparisons of algorithm performance and quantify the effect of model selection and algorithm tuning for each algorithm and dataset. The analysis culminates in the recommendation of five algorithms with hyperparameters that maximize classifier performance across the tested problems, as well as general guidelines for applying machine learning to supervised classification problems. PMID:29218881

  17. External validation of ADO, DOSE, COTE and CODEX at predicting death in primary care patients with COPD using standard and machine learning approaches.

    PubMed

    Morales, Daniel R; Flynn, Rob; Zhang, Jianguo; Trucco, Emmanuel; Quint, Jennifer K; Zutis, Kris

    2018-05-01

    Several models for predicting the risk of death in people with chronic obstructive pulmonary disease (COPD) exist but have not undergone large scale validation in primary care. The objective of this study was to externally validate these models using statistical and machine learning approaches. We used a primary care COPD cohort identified using data from the UK Clinical Practice Research Datalink. Age-standardised mortality rates were calculated for the population by gender and discrimination of ADO (age, dyspnoea, airflow obstruction), COTE (COPD-specific comorbidity test), DOSE (dyspnoea, airflow obstruction, smoking, exacerbations) and CODEX (comorbidity, dyspnoea, airflow obstruction, exacerbations) at predicting death over 1-3 years measured using logistic regression and a support vector machine learning (SVM) method of analysis. The age-standardised mortality rate was 32.8 (95%CI 32.5-33.1) and 25.2 (95%CI 25.4-25.7) per 1000 person years for men and women respectively. Complete data were available for 54879 patients to predict 1-year mortality. ADO performed the best (c-statistic of 0.730) compared with DOSE (c-statistic 0.645), COTE (c-statistic 0.655) and CODEX (c-statistic 0.649) at predicting 1-year mortality. Discrimination of ADO and DOSE improved at predicting 1-year mortality when combined with COTE comorbidities (c-statistic 0.780 ADO + COTE; c-statistic 0.727 DOSE + COTE). Discrimination did not change significantly over 1-3 years. Comparable results were observed using SVM. In primary care, ADO appears superior at predicting death in COPD. Performance of ADO and DOSE improved when combined with COTE comorbidities suggesting better models may be generated with additional data facilitated using novel approaches. Copyright © 2018. Published by Elsevier Ltd.

  18. Assessing Continuous Operator Workload With a Hybrid Scaffolded Neuroergonomic Modeling Approach.

    PubMed

    Borghetti, Brett J; Giametta, Joseph J; Rusnock, Christina F

    2017-02-01

    We aimed to predict operator workload from neurological data using statistical learning methods to fit neurological-to-state-assessment models. Adaptive systems require real-time mental workload assessment to perform dynamic task allocations or operator augmentation as workload issues arise. Neuroergonomic measures have great potential for informing adaptive systems, and we combine these measures with models of task demand as well as information about critical events and performance to clarify the inherent ambiguity of interpretation. We use machine learning algorithms on electroencephalogram (EEG) input to infer operator workload based upon Improved Performance Research Integration Tool workload model estimates. Cross-participant models predict workload of other participants, statistically distinguishing between 62% of the workload changes. Machine learning models trained from Monte Carlo resampled workload profiles can be used in place of deterministic workload profiles for cross-participant modeling without incurring a significant decrease in machine learning model performance, suggesting that stochastic models can be used when limited training data are available. We employed a novel temporary scaffold of simulation-generated workload profile truth data during the model-fitting process. A continuous workload profile serves as the target to train our statistical machine learning models. Once trained, the workload profile scaffolding is removed and the trained model is used directly on neurophysiological data in future operator state assessments. These modeling techniques demonstrate how to use neuroergonomic methods to develop operator state assessments, which can be employed in adaptive systems.

  19. Reliability analysis of component of affination centrifugal 1 machine by using reliability engineering

    NASA Astrophysics Data System (ADS)

    Sembiring, N.; Ginting, E.; Darnello, T.

    2017-12-01

    Problems that appear in a company that produces refined sugar, the production floor has not reached the level of critical machine availability because it often suffered damage (breakdown). This results in a sudden loss of production time and production opportunities. This problem can be solved by Reliability Engineering method where the statistical approach to historical damage data is performed to see the pattern of the distribution. The method can provide a value of reliability, rate of damage, and availability level, of an machine during the maintenance time interval schedule. The result of distribution test to time inter-damage data (MTTF) flexible hose component is lognormal distribution while component of teflon cone lifthing is weibull distribution. While from distribution test to mean time of improvement (MTTR) flexible hose component is exponential distribution while component of teflon cone lifthing is weibull distribution. The actual results of the flexible hose component on the replacement schedule per 720 hours obtained reliability of 0.2451 and availability 0.9960. While on the critical components of teflon cone lifthing actual on the replacement schedule per 1944 hours obtained reliability of 0.4083 and availability 0.9927.

  20. Machine learning methods as a tool to analyse incomplete or irregularly sampled radon time series data.

    PubMed

    Janik, M; Bossew, P; Kurihara, O

    2018-07-15

    Machine learning is a class of statistical techniques which has proven to be a powerful tool for modelling the behaviour of complex systems, in which response quantities depend on assumed controls or predictors in a complicated way. In this paper, as our first purpose, we propose the application of machine learning to reconstruct incomplete or irregularly sampled data of time series indoor radon ( 222 Rn). The physical assumption underlying the modelling is that Rn concentration in the air is controlled by environmental variables such as air temperature and pressure. The algorithms "learn" from complete sections of multivariate series, derive a dependence model and apply it to sections where the controls are available, but not the response (Rn), and in this way complete the Rn series. Three machine learning techniques are applied in this study, namely random forest, its extension called the gradient boosting machine and deep learning. For a comparison, we apply the classical multiple regression in a generalized linear model version. Performance of the models is evaluated through different metrics. The performance of the gradient boosting machine is found to be superior to that of the other techniques. By applying learning machines, we show, as our second purpose, that missing data or periods of Rn series data can be reconstructed and resampled on a regular grid reasonably, if data of appropriate physical controls are available. The techniques also identify to which degree the assumed controls contribute to imputing missing Rn values. Our third purpose, though no less important from the viewpoint of physics, is identifying to which degree physical, in this case environmental variables, are relevant as Rn predictors, or in other words, which predictors explain most of the temporal variability of Rn. We show that variables which contribute most to the Rn series reconstruction, are temperature, relative humidity and day of the year. The first two are physical predictors, while "day of the year" is a statistical proxy or surrogate for missing or unknown predictors. Copyright © 2018 Elsevier B.V. All rights reserved.

  1. Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies

    PubMed Central

    Mieth, Bettina; Kloft, Marius; Rodríguez, Juan Antonio; Sonnenburg, Sören; Vobruba, Robin; Morcillo-Suárez, Carlos; Farré, Xavier; Marigorta, Urko M.; Fehr, Ernst; Dickhaus, Thorsten; Blanchard, Gilles; Schunk, Daniel; Navarro, Arcadi; Müller, Klaus-Robert

    2016-01-01

    The standard approach to the analysis of genome-wide association studies (GWAS) is based on testing each position in the genome individually for statistical significance of its association with the phenotype under investigation. To improve the analysis of GWAS, we propose a combination of machine learning and statistical testing that takes correlation structures within the set of SNPs under investigation in a mathematically well-controlled manner into account. The novel two-step algorithm, COMBI, first trains a support vector machine to determine a subset of candidate SNPs and then performs hypothesis tests for these SNPs together with an adequate threshold correction. Applying COMBI to data from a WTCCC study (2007) and measuring performance as replication by independent GWAS published within the 2008–2015 period, we show that our method outperforms ordinary raw p-value thresholding as well as other state-of-the-art methods. COMBI presents higher power and precision than the examined alternatives while yielding fewer false (i.e. non-replicated) and more true (i.e. replicated) discoveries when its results are validated on later GWAS studies. More than 80% of the discoveries made by COMBI upon WTCCC data have been validated by independent studies. Implementations of the COMBI method are available as a part of the GWASpi toolbox 2.0. PMID:27892471

  2. Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies.

    PubMed

    Mieth, Bettina; Kloft, Marius; Rodríguez, Juan Antonio; Sonnenburg, Sören; Vobruba, Robin; Morcillo-Suárez, Carlos; Farré, Xavier; Marigorta, Urko M; Fehr, Ernst; Dickhaus, Thorsten; Blanchard, Gilles; Schunk, Daniel; Navarro, Arcadi; Müller, Klaus-Robert

    2016-11-28

    The standard approach to the analysis of genome-wide association studies (GWAS) is based on testing each position in the genome individually for statistical significance of its association with the phenotype under investigation. To improve the analysis of GWAS, we propose a combination of machine learning and statistical testing that takes correlation structures within the set of SNPs under investigation in a mathematically well-controlled manner into account. The novel two-step algorithm, COMBI, first trains a support vector machine to determine a subset of candidate SNPs and then performs hypothesis tests for these SNPs together with an adequate threshold correction. Applying COMBI to data from a WTCCC study (2007) and measuring performance as replication by independent GWAS published within the 2008-2015 period, we show that our method outperforms ordinary raw p-value thresholding as well as other state-of-the-art methods. COMBI presents higher power and precision than the examined alternatives while yielding fewer false (i.e. non-replicated) and more true (i.e. replicated) discoveries when its results are validated on later GWAS studies. More than 80% of the discoveries made by COMBI upon WTCCC data have been validated by independent studies. Implementations of the COMBI method are available as a part of the GWASpi toolbox 2.0.

  3. Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies

    NASA Astrophysics Data System (ADS)

    Mieth, Bettina; Kloft, Marius; Rodríguez, Juan Antonio; Sonnenburg, Sören; Vobruba, Robin; Morcillo-Suárez, Carlos; Farré, Xavier; Marigorta, Urko M.; Fehr, Ernst; Dickhaus, Thorsten; Blanchard, Gilles; Schunk, Daniel; Navarro, Arcadi; Müller, Klaus-Robert

    2016-11-01

    The standard approach to the analysis of genome-wide association studies (GWAS) is based on testing each position in the genome individually for statistical significance of its association with the phenotype under investigation. To improve the analysis of GWAS, we propose a combination of machine learning and statistical testing that takes correlation structures within the set of SNPs under investigation in a mathematically well-controlled manner into account. The novel two-step algorithm, COMBI, first trains a support vector machine to determine a subset of candidate SNPs and then performs hypothesis tests for these SNPs together with an adequate threshold correction. Applying COMBI to data from a WTCCC study (2007) and measuring performance as replication by independent GWAS published within the 2008-2015 period, we show that our method outperforms ordinary raw p-value thresholding as well as other state-of-the-art methods. COMBI presents higher power and precision than the examined alternatives while yielding fewer false (i.e. non-replicated) and more true (i.e. replicated) discoveries when its results are validated on later GWAS studies. More than 80% of the discoveries made by COMBI upon WTCCC data have been validated by independent studies. Implementations of the COMBI method are available as a part of the GWASpi toolbox 2.0.

  4. AstroML: Python-powered Machine Learning for Astronomy

    NASA Astrophysics Data System (ADS)

    Vander Plas, Jake; Connolly, A. J.; Ivezic, Z.

    2014-01-01

    As astronomical data sets grow in size and complexity, automated machine learning and data mining methods are becoming an increasingly fundamental component of research in the field. The astroML project (http://astroML.org) provides a common repository for practical examples of the data mining and machine learning tools used and developed by astronomical researchers, written in Python. The astroML module contains a host of general-purpose data analysis and machine learning routines, loaders for openly-available astronomical datasets, and fast implementations of specific computational methods often used in astronomy and astrophysics. The associated website features hundreds of examples of these routines being used for analysis of real astronomical datasets, while the associated textbook provides a curriculum resource for graduate-level courses focusing on practical statistics, machine learning, and data mining approaches within Astronomical research. This poster will highlight several of the more powerful and unique examples of analysis performed with astroML, all of which can be reproduced in their entirety on any computer with the proper packages installed.

  5. Technical Report: Reference photon dosimetry data for Varian accelerators based on IROC-Houston site visit data.

    PubMed

    Kerns, James R; Followill, David S; Lowenstein, Jessica; Molineu, Andrea; Alvarez, Paola; Taylor, Paige A; Stingo, Francesco C; Kry, Stephen F

    2016-05-01

    Accurate data regarding linear accelerator (Linac) radiation characteristics are important for treatment planning system modeling as well as regular quality assurance of the machine. The Imaging and Radiation Oncology Core-Houston (IROC-H) has measured the dosimetric characteristics of numerous machines through their on-site dosimetry review protocols. Photon data are presented and can be used as a secondary check of acquired values, as a means to verify commissioning a new machine, or in preparation for an IROC-H site visit. Photon data from IROC-H on-site reviews from 2000 to 2014 were compiled and analyzed. Specifically, data from approximately 500 Varian machines were analyzed. Each dataset consisted of point measurements of several dosimetric parameters at various locations in a water phantom to assess the percentage depth dose, jaw output factors, multileaf collimator small field output factors, off-axis factors, and wedge factors. The data were analyzed by energy and parameter, with similarly performing machine models being assimilated into classes. Common statistical metrics are presented for each machine class. Measurement data were compared against other reference data where applicable. Distributions of the parameter data were shown to be robust and derive from a student's t distribution. Based on statistical and clinical criteria, all machine models were able to be classified into two or three classes for each energy, except for 6 MV for which there were eight classes. Quantitative analysis of the measurements for 6, 10, 15, and 18 MV photon beams is presented for each parameter; supplementary material has also been made available which contains further statistical information. IROC-H has collected numerous data on Varian Linacs and the results of photon measurements from the past 15 years are presented. The data can be used as a comparison check of a physicist's acquired values. Acquired values that are well outside the expected distribution should be verified by the physicist to identify whether the measurements are valid. Comparison of values to this reference data provides a redundant check to help prevent gross dosimetric treatment errors.

  6. Information integration and diagnosis analysis of equipment status and production quality for machining process

    NASA Astrophysics Data System (ADS)

    Zan, Tao; Wang, Min; Hu, Jianzhong

    2010-12-01

    Machining status monitoring technique by multi-sensors can acquire and analyze the machining process information to implement abnormity diagnosis and fault warning. Statistical quality control technique is normally used to distinguish abnormal fluctuations from normal fluctuations through statistical method. In this paper by comparing the advantages and disadvantages of the two methods, the necessity and feasibility of integration and fusion is introduced. Then an approach that integrates multi-sensors status monitoring and statistical process control based on artificial intelligent technique, internet technique and database technique is brought forward. Based on virtual instrument technique the author developed the machining quality assurance system - MoniSysOnline, which has been used to monitoring the grinding machining process. By analyzing the quality data and AE signal information of wheel dressing process the reason of machining quality fluctuation has been obtained. The experiment result indicates that the approach is suitable for the status monitoring and analyzing of machining process.

  7. Review of Statistical Learning Methods in Integrated Omics Studies (An Integrated Information Science).

    PubMed

    Zeng, Irene Sui Lan; Lumley, Thomas

    2018-01-01

    Integrated omics is becoming a new channel for investigating the complex molecular system in modern biological science and sets a foundation for systematic learning for precision medicine. The statistical/machine learning methods that have emerged in the past decade for integrated omics are not only innovative but also multidisciplinary with integrated knowledge in biology, medicine, statistics, machine learning, and artificial intelligence. Here, we review the nontrivial classes of learning methods from the statistical aspects and streamline these learning methods within the statistical learning framework. The intriguing findings from the review are that the methods used are generalizable to other disciplines with complex systematic structure, and the integrated omics is part of an integrated information science which has collated and integrated different types of information for inferences and decision making. We review the statistical learning methods of exploratory and supervised learning from 42 publications. We also discuss the strengths and limitations of the extended principal component analysis, cluster analysis, network analysis, and regression methods. Statistical techniques such as penalization for sparsity induction when there are fewer observations than the number of features and using Bayesian approach when there are prior knowledge to be integrated are also included in the commentary. For the completeness of the review, a table of currently available software and packages from 23 publications for omics are summarized in the appendix.

  8. A novelty detection diagnostic methodology for gearboxes operating under fluctuating operating conditions using probabilistic techniques

    NASA Astrophysics Data System (ADS)

    Schmidt, S.; Heyns, P. S.; de Villiers, J. P.

    2018-02-01

    In this paper, a fault diagnostic methodology is developed which is able to detect, locate and trend gear faults under fluctuating operating conditions when only vibration data from a single transducer, measured on a healthy gearbox are available. A two-phase feature extraction and modelling process is proposed to infer the operating condition and based on the operating condition, to detect changes in the machine condition. Information from optimised machine and operating condition hidden Markov models are statistically combined to generate a discrepancy signal which is post-processed to infer the condition of the gearbox. The discrepancy signal is processed and combined with statistical methods for automatic fault detection and localisation and to perform fault trending over time. The proposed methodology is validated on experimental data and a tacholess order tracking methodology is used to enhance the cost-effectiveness of the diagnostic methodology.

  9. Accelerometry-based classification of human activities using Markov modeling.

    PubMed

    Mannini, Andrea; Sabatini, Angelo Maria

    2011-01-01

    Accelerometers are a popular choice as body-motion sensors: the reason is partly in their capability of extracting information that is useful for automatically inferring the physical activity in which the human subject is involved, beside their role in feeding biomechanical parameters estimators. Automatic classification of human physical activities is highly attractive for pervasive computing systems, whereas contextual awareness may ease the human-machine interaction, and in biomedicine, whereas wearable sensor systems are proposed for long-term monitoring. This paper is concerned with the machine learning algorithms needed to perform the classification task. Hidden Markov Model (HMM) classifiers are studied by contrasting them with Gaussian Mixture Model (GMM) classifiers. HMMs incorporate the statistical information available on movement dynamics into the classification process, without discarding the time history of previous outcomes as GMMs do. An example of the benefits of the obtained statistical leverage is illustrated and discussed by analyzing two datasets of accelerometer time series.

  10. Algorithm of probabilistic assessment of fully-mechanized longwall downtime

    NASA Astrophysics Data System (ADS)

    Domrachev, A. N.; Rib, S. V.; Govorukhin, Yu M.; Krivopalov, V. G.

    2017-09-01

    The problem of increasing the load on a long fully-mechanized longwall has several aspects, one of which is the improvement of efficiency in using available stoping equipment due to the increase in coefficient of the machine operating time of a shearer and other mining machines that form an integral part of the longwall set of equipment. The task of predicting the reliability indicators of stoping equipment is solved by the statistical evaluation of parameters of downtime exponential distribution and failure recovery. It is more difficult to solve the problems of downtime accounting in case of accidents in the face workings and, despite the statistical data on accidents in mine workings, no solution has been found to date. The authors have proposed a variant of probability assessment of workings caving using Poisson distribution and the duration of their restoration using normal distribution. The above results confirm the possibility of implementing the approach proposed by the authors.

  11. Parameterizing Phrase Based Statistical Machine Translation Models: An Analytic Study

    ERIC Educational Resources Information Center

    Cer, Daniel

    2011-01-01

    The goal of this dissertation is to determine the best way to train a statistical machine translation system. I first develop a state-of-the-art machine translation system called Phrasal and then use it to examine a wide variety of potential learning algorithms and optimization criteria and arrive at two very surprising results. First, despite the…

  12. High School and Beyond. 1980 Sophomore Cohort. First Follow-Up (1982). [machine-readable data file].

    ERIC Educational Resources Information Center

    National Center for Education Statistics (ED), Washington, DC.

    The High School and Beyond 1980 Sophomore Cohort First Follow-Up (1982) data file is presented. The First Follow-Up Sophomore Cohort data tape consists of four related data files: (1) the student data file (including data availability flags, weights, questionnaire data, and composite variables); (2) Statistical Analysis System (SAS) control cards…

  13. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shettel, D.L. Jr.; Langfeldt, S.L.; Youngquist, C.A.

    This report presents a Hydrogeochemical and Stream Sediment Reconnaissance of the Christian NTMS Quadrangle, Alaska. In addition to this abbreviated data release, more complete data are available to the public in machine-readable form. These machine-readable data, as well as quarterly or semiannual program progress reports containing further information on the HSSR program in general, or on the Los Alamos National Laboratory portion of the program in particular, are available from DOE's Technical Library at its Grand Junction Area Office. Presented in this data release are location data, field analyses, and laboratory analyses of several different sample media. For the sakemore » of brevity, many field site observations have not been included in this volume; these data are, however, available on the magnetic tape. Appendices A through D describe the sample media and summarize the analytical results for each medium. The data have been subdivided by one of the Los Alamos National Laboratory sorting programs of Zinkl and others (1981a) into groups of stream-sediment, lake-sediment, stream-water, lake-water, and ground-water samples. For each group which contains a sufficient number of observations, statistical tables, tables of raw data, and 1:1,000,000 scale maps of pertinent elements have been included in this report. Also included are maps showing results of multivariate statistical analyses.« less

  14. Machine learning-based methods for prediction of linear B-cell epitopes.

    PubMed

    Wang, Hsin-Wei; Pai, Tun-Wen

    2014-01-01

    B-cell epitope prediction facilitates immunologists in designing peptide-based vaccine, diagnostic test, disease prevention, treatment, and antibody production. In comparison with T-cell epitope prediction, the performance of variable length B-cell epitope prediction is still yet to be satisfied. Fortunately, due to increasingly available verified epitope databases, bioinformaticians could adopt machine learning-based algorithms on all curated data to design an improved prediction tool for biomedical researchers. Here, we have reviewed related epitope prediction papers, especially those for linear B-cell epitope prediction. It should be noticed that a combination of selected propensity scales and statistics of epitope residues with machine learning-based tools formulated a general way for constructing linear B-cell epitope prediction systems. It is also observed from most of the comparison results that the kernel method of support vector machine (SVM) classifier outperformed other machine learning-based approaches. Hence, in this chapter, except reviewing recently published papers, we have introduced the fundamentals of B-cell epitope and SVM techniques. In addition, an example of linear B-cell prediction system based on physicochemical features and amino acid combinations is illustrated in details.

  15. Evaluation of machinability and flexural strength of a novel dental machinable glass-ceramic.

    PubMed

    Qin, Feng; Zheng, Shucan; Luo, Zufeng; Li, Yong; Guo, Ling; Zhao, Yunfeng; Fu, Qiang

    2009-10-01

    To evaluate the machinability and flexural strength of a novel dental machinable glass-ceramic (named PMC), and to compare the machinability property with that of Vita Mark II and human enamel. The raw batch materials were selected and mixed. Four groups of novel glass-ceramics were formed at different nucleation temperatures, and were assigned to Group 1, Group 2, Group 3 and Group 4. The machinability of the four groups of novel glass-ceramics, Vita Mark II ceramic and freshly extracted human premolars were compared by means of drilling depth measurement. A three-point bending test was used to measure the flexural strength of the novel glass-ceramics. The crystalline phases of the group with the best machinability were identified by X-ray diffraction. In terms of the drilling depth, Group 2 of the novel glass-ceramics proves to have the largest drilling depth. There was no statistical difference among Group 1, Group 4 and the natural teeth. The drilling depth of Vita MK II was statistically less than that of Group 1, Group 4 and the natural teeth. Group 3 had the least drilling depth. In respect of the flexural strength, Group 2 exhibited the maximum flexural strength; Group 1 was statistically weaker than Group 2; there was no statistical difference between Group 3 and Group 4, and they were the weakest materials. XRD of Group 2 ceramic showed that a new type of dental machinable glass-ceramic containing calcium-mica had been developed by the present study and was named PMC. PMC is promising for application as a dental machinable ceramic due to its good machinability and relatively high strength.

  16. Financial Statistics. Higher Education General Information Survey (HEGIS) [machine-readable data file].

    ERIC Educational Resources Information Center

    Center for Education Statistics (ED/OERI), Washington, DC.

    The Financial Statistics machine-readable data file (MRDF) is a subfile of the larger Higher Education General Information Survey (HEGIS). It contains basic financial statistics for over 3,000 institutions of higher education in the United States and its territories. The data are arranged sequentially by institution, with institutional…

  17. Multicopy programmable discrimination of general qubit states

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sentis, G.; Bagan, E.; Calsamiglia, J.

    2010-10-15

    Quantum state discrimination is a fundamental primitive in quantum statistics where one has to correctly identify the state of a system that is in one of two possible known states. A programmable discrimination machine performs this task when the pair of possible states is not a priori known but instead the two possible states are provided through two respective program ports. We study optimal programmable discrimination machines for general qubit states when several copies of states are available in the data or program ports. Two scenarios are considered: One in which the purity of the possible states is a priorimore » known, and the fully universal one where the machine operates over generic mixed states of unknown purity. We find analytical results for both the unambiguous and minimum error discrimination strategies. This allows us to calculate the asymptotic performance of programmable discrimination machines when a large number of copies are provided and to recover the standard state discrimination and state comparison values as different limiting cases.« less

  18. Examining the Association Between School Vending Machines and Children's Body Mass Index by Socioeconomic Status.

    PubMed

    O'Hara, Jeffrey K; Haynes-Maslow, Lindsey

    2015-01-01

    To examine the association between vending machine availability in schools and body mass index (BMI) among subgroups of children based on gender, race/ethnicity, and socioeconomic status classifications. First-difference multivariate regressions were estimated using longitudinal fifth- and eighth-grade data from the Early Childhood Longitudinal Study. The specifications were disaggregated by gender, race/ethnicity, and family socioeconomic status classifications. Vending machine availability had a positive association (P < .10) with BMI among Hispanic male children and low-income Hispanic children. Living in an urban location (P < .05) and hours watching television (P < .05) were also positively associated with BMI for these subgroups. Supplemental Nutrition Assistance Program enrollment was negatively associated with BMI for low-income Hispanic students (P < .05). These findings were not statistically significant when using Bonferroni adjusted critical values. The results suggest that the school food environment could reinforce health disparities that exist for Hispanic male children and low-income Hispanic children. Copyright © 2015 Society for Nutrition Education and Behavior. Published by Elsevier Inc. All rights reserved.

  19. The R package "sperrorest" : Parallelized spatial error estimation and variable importance assessment for geospatial machine learning

    NASA Astrophysics Data System (ADS)

    Schratz, Patrick; Herrmann, Tobias; Brenning, Alexander

    2017-04-01

    Computational and statistical prediction methods such as the support vector machine have gained popularity in remote-sensing applications in recent years and are often compared to more traditional approaches like maximum-likelihood classification. However, the accuracy assessment of such predictive models in a spatial context needs to account for the presence of spatial autocorrelation in geospatial data by using spatial cross-validation and bootstrap strategies instead of their now more widely used non-spatial equivalent. The R package sperrorest by A. Brenning [IEEE International Geoscience and Remote Sensing Symposium, 1, 374 (2012)] provides a generic interface for performing (spatial) cross-validation of any statistical or machine-learning technique available in R. Since spatial statistical models as well as flexible machine-learning algorithms can be computationally expensive, parallel computing strategies are required to perform cross-validation efficiently. The most recent major release of sperrorest therefore comes with two new features (aside from improved documentation): The first one is the parallelized version of sperrorest(), parsperrorest(). This function features two parallel modes to greatly speed up cross-validation runs. Both parallel modes are platform independent and provide progress information. par.mode = 1 relies on the pbapply package and calls interactively (depending on the platform) parallel::mclapply() or parallel::parApply() in the background. While forking is used on Unix-Systems, Windows systems use a cluster approach for parallel execution. par.mode = 2 uses the foreach package to perform parallelization. This method uses a different way of cluster parallelization than the parallel package does. In summary, the robustness of parsperrorest() is increased with the implementation of two independent parallel modes. A new way of partitioning the data in sperrorest is provided by partition.factor.cv(). This function gives the user the possibility to perform cross-validation at the level of some grouping structure. As an example, in remote sensing of agricultural land uses, pixels from the same field contain nearly identical information and will thus be jointly placed in either the test set or the training set. Other spatial sampling resampling strategies are already available and can be extended by the user.

  20. High School and Beyond. 1980 Senior Cohort. First Follow-Up (1982). [machine-readable data file].

    ERIC Educational Resources Information Center

    National Center for Education Statistics (ED), Washington, DC.

    The High School and Beyond 1980 Senior Cohort First Follow-Up (1982) Data File is presented. The First Follow-Up Senior Cohort data tape consists of four related data files: (1) the student data file (including data availability flags, weights, questionnaire data, and composite variables); (2) Statistical Analysis System (SAS) control cards for…

  1. Classifying injury narratives of large administrative databases for surveillance-A practical approach combining machine learning ensembles and human review.

    PubMed

    Marucci-Wellman, Helen R; Corns, Helen L; Lehto, Mark R

    2017-01-01

    Injury narratives are now available real time and include useful information for injury surveillance and prevention. However, manual classification of the cause or events leading to injury found in large batches of narratives, such as workers compensation claims databases, can be prohibitive. In this study we compare the utility of four machine learning algorithms (Naïve Bayes, Single word and Bi-gram models, Support Vector Machine and Logistic Regression) for classifying narratives into Bureau of Labor Statistics Occupational Injury and Illness event leading to injury classifications for a large workers compensation database. These algorithms are known to do well classifying narrative text and are fairly easy to implement with off-the-shelf software packages such as Python. We propose human-machine learning ensemble approaches which maximize the power and accuracy of the algorithms for machine-assigned codes and allow for strategic filtering of rare, emerging or ambiguous narratives for manual review. We compare human-machine approaches based on filtering on the prediction strength of the classifier vs. agreement between algorithms. Regularized Logistic Regression (LR) was the best performing algorithm alone. Using this algorithm and filtering out the bottom 30% of predictions for manual review resulted in high accuracy (overall sensitivity/positive predictive value of 0.89) of the final machine-human coded dataset. The best pairings of algorithms included Naïve Bayes with Support Vector Machine whereby the triple ensemble NB SW =NB BI-GRAM =SVM had very high performance (0.93 overall sensitivity/positive predictive value and high accuracy (i.e. high sensitivity and positive predictive values)) across both large and small categories leaving 41% of the narratives for manual review. Integrating LR into this ensemble mix improved performance only slightly. For large administrative datasets we propose incorporation of methods based on human-machine pairings such as we have done here, utilizing readily-available off-the-shelf machine learning techniques and resulting in only a fraction of narratives that require manual review. Human-machine ensemble methods are likely to improve performance over total manual coding. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  2. Astronomical Data Center Bulletin, volume 1, number 2

    NASA Technical Reports Server (NTRS)

    Nagy, T. A.; Warren, W. H., Jr.; Mead, J. M.

    1981-01-01

    Work in progress on astronomical catalogs is presented in 16 papers. Topics cover astronomical data center operations; automatic astronomical data retrieval at GSFC; interactive computer reference search of astronomical literature 1950-1976; formatting, checking, and documenting machine-readable catalogs; interactive catalog of UV, optical, and HI data for 201 Virgo cluster galaxies; machine-readable version of the general catalog of variable stars, third edition; galactic latitude and magnitude distribution of two astronomical catalogs; the catalog of open star clusters; infrared astronomical data base and catalog of infrared observations; the Air Force geophysics laboratory; revised magnetic tape of the N30 catalog of 5,268 standard stars; positional correlation of the two-micron sky survey and Smithsonian Astrophysical Observatory catalog sources; search capabilities for the catalog of stellar identifications (CSI) 1979 version; CSI statistics: blue magnitude versus spectral type; catalogs available from the Astronomical Data Center; and status report on machine-readable astronomical catalogs.

  3. Potential application of machine learning in health outcomes research and some statistical cautions.

    PubMed

    Crown, William H

    2015-03-01

    Traditional analytic methods are often ill-suited to the evolving world of health care big data characterized by massive volume, complexity, and velocity. In particular, methods are needed that can estimate models efficiently using very large datasets containing healthcare utilization data, clinical data, data from personal devices, and many other sources. Although very large, such datasets can also be quite sparse (e.g., device data may only be available for a small subset of individuals), which creates problems for traditional regression models. Many machine learning methods address such limitations effectively but are still subject to the usual sources of bias that commonly arise in observational studies. Researchers using machine learning methods such as lasso or ridge regression should assess these models using conventional specification tests. Copyright © 2015 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.

  4. AZOrange - High performance open source machine learning for QSAR modeling in a graphical programming environment

    PubMed Central

    2011-01-01

    Background Machine learning has a vast range of applications. In particular, advanced machine learning methods are routinely and increasingly used in quantitative structure activity relationship (QSAR) modeling. QSAR data sets often encompass tens of thousands of compounds and the size of proprietary, as well as public data sets, is rapidly growing. Hence, there is a demand for computationally efficient machine learning algorithms, easily available to researchers without extensive machine learning knowledge. In granting the scientific principles of transparency and reproducibility, Open Source solutions are increasingly acknowledged by regulatory authorities. Thus, an Open Source state-of-the-art high performance machine learning platform, interfacing multiple, customized machine learning algorithms for both graphical programming and scripting, to be used for large scale development of QSAR models of regulatory quality, is of great value to the QSAR community. Results This paper describes the implementation of the Open Source machine learning package AZOrange. AZOrange is specially developed to support batch generation of QSAR models in providing the full work flow of QSAR modeling, from descriptor calculation to automated model building, validation and selection. The automated work flow relies upon the customization of the machine learning algorithms and a generalized, automated model hyper-parameter selection process. Several high performance machine learning algorithms are interfaced for efficient data set specific selection of the statistical method, promoting model accuracy. Using the high performance machine learning algorithms of AZOrange does not require programming knowledge as flexible applications can be created, not only at a scripting level, but also in a graphical programming environment. Conclusions AZOrange is a step towards meeting the needs for an Open Source high performance machine learning platform, supporting the efficient development of highly accurate QSAR models fulfilling regulatory requirements. PMID:21798025

  5. AZOrange - High performance open source machine learning for QSAR modeling in a graphical programming environment.

    PubMed

    Stålring, Jonna C; Carlsson, Lars A; Almeida, Pedro; Boyer, Scott

    2011-07-28

    Machine learning has a vast range of applications. In particular, advanced machine learning methods are routinely and increasingly used in quantitative structure activity relationship (QSAR) modeling. QSAR data sets often encompass tens of thousands of compounds and the size of proprietary, as well as public data sets, is rapidly growing. Hence, there is a demand for computationally efficient machine learning algorithms, easily available to researchers without extensive machine learning knowledge. In granting the scientific principles of transparency and reproducibility, Open Source solutions are increasingly acknowledged by regulatory authorities. Thus, an Open Source state-of-the-art high performance machine learning platform, interfacing multiple, customized machine learning algorithms for both graphical programming and scripting, to be used for large scale development of QSAR models of regulatory quality, is of great value to the QSAR community. This paper describes the implementation of the Open Source machine learning package AZOrange. AZOrange is specially developed to support batch generation of QSAR models in providing the full work flow of QSAR modeling, from descriptor calculation to automated model building, validation and selection. The automated work flow relies upon the customization of the machine learning algorithms and a generalized, automated model hyper-parameter selection process. Several high performance machine learning algorithms are interfaced for efficient data set specific selection of the statistical method, promoting model accuracy. Using the high performance machine learning algorithms of AZOrange does not require programming knowledge as flexible applications can be created, not only at a scripting level, but also in a graphical programming environment. AZOrange is a step towards meeting the needs for an Open Source high performance machine learning platform, supporting the efficient development of highly accurate QSAR models fulfilling regulatory requirements.

  6. Statistical complexity measure of pseudorandom bit generators

    NASA Astrophysics Data System (ADS)

    González, C. M.; Larrondo, H. A.; Rosso, O. A.

    2005-08-01

    Pseudorandom number generators (PRNG) are extensively used in Monte Carlo simulations, gambling machines and cryptography as substitutes of ideal random number generators (RNG). Each application imposes different statistical requirements to PRNGs. As L’Ecuyer clearly states “the main goal for Monte Carlo methods is to reproduce the statistical properties on which these methods are based whereas for gambling machines and cryptology, observing the sequence of output values for some time should provide no practical advantage for predicting the forthcoming numbers better than by just guessing at random”. In accordance with different applications several statistical test suites have been developed to analyze the sequences generated by PRNGs. In a recent paper a new statistical complexity measure [Phys. Lett. A 311 (2003) 126] has been defined. Here we propose this measure, as a randomness quantifier of a PRNGs. The test is applied to three very well known and widely tested PRNGs available in the literature. All of them are based on mathematical algorithms. Another PRNGs based on Lorenz 3D chaotic dynamical system is also analyzed. PRNGs based on chaos may be considered as a model for physical noise sources and important new results are recently reported. All the design steps of this PRNG are described, and each stage increase the PRNG randomness using different strategies. It is shown that the MPR statistical complexity measure is capable to quantify this randomness improvement. The PRNG based on the chaotic 3D Lorenz dynamical system is also evaluated using traditional digital signal processing tools for comparison.

  7. A Survey of Statistical Machine Translation

    DTIC Science & Technology

    2007-04-01

    methods are notoriously sen- sitive to domain differences, however, so the move to informal text is likely to present many interesting challenges ...Och, Christoph Tillman, and Hermann Ney. Improved alignment models for statistical machine translation. In Proc. of EMNLP- VLC , pages 20–28, Jun 1999

  8. GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies

    PubMed Central

    Zhang, Bing; Schmoyer, Denise; Kirov, Stefan; Snoddy, Jay

    2004-01-01

    Background Microarray and other high-throughput technologies are producing large sets of interesting genes that are difficult to analyze directly. Bioinformatics tools are needed to interpret the functional information in the gene sets. Results We have created a web-based tool for data analysis and data visualization for sets of genes called GOTree Machine (GOTM). This tool was originally intended to analyze sets of co-regulated genes identified from microarray analysis but is adaptable for use with other gene sets from other high-throughput analyses. GOTree Machine generates a GOTree, a tree-like structure to navigate the Gene Ontology Directed Acyclic Graph for input gene sets. This system provides user friendly data navigation and visualization. Statistical analysis helps users to identify the most important Gene Ontology categories for the input gene sets and suggests biological areas that warrant further study. GOTree Machine is available online at . Conclusion GOTree Machine has a broad application in functional genomic, proteomic and other high-throughput methods that generate large sets of interesting genes; its primary purpose is to help users sort for interesting patterns in gene sets. PMID:14975175

  9. [Comparison of machinability of two types of dental machinable ceramic].

    PubMed

    Fu, Qiang; Zhao, Yunfeng; Li, Yong; Fan, Xinping; Li, Yan; Lin, Xuefeng

    2002-11-01

    In terms of the problems of now available dental machinable ceramics, a new type of calcium-mica glass-ceramic, PMC-I ceramic, was developed, and its machinability was compared with that of Vita MKII quantitatively. Moreover, the relationship between the strength and the machinability of PMC-I ceramic was studied. Samples of PMC-I ceramic were divided into four groups according to their nucleation procedures. 600-seconds drilling tests were conducted with high-speed steel tools (Phi = 2.3 mm) to measure the drilling depths of Vita MKII ceramic and PMC-I ceramic, while constant drilling speed of 600 rpm and constant axial load of 39.2 N were used. And the 3-point bending strength of the four groups of PMC-I ceramic were recorded. Drilling depth of Vita MKII was 0.71 mm, while the depths of the four groups of PMC-I ceramic were 0.88 mm, 1.40 mm, 0.40 mm and 0.90 mm, respectively. Group B of PMC-I ceramic showed the largest depth of 1.40 mm and was statistically different from other groups and Vita MKII. And the strength of the four groups of PMC-I ceramic were 137.7, 210.2, 118.0 and 106.0 MPa, respectively. The machinability of the new developed dental machinable ceramic of PMC-I could meet the need of the clinic.

  10. Retrospective cohort study of a microelectronics and business machine facility.

    PubMed

    Silver, Sharon R; Pinkerton, Lynne E; Fleming, Donald A; Jones, James H; Allee, Steven; Luo, Lian; Bertke, Stephen J

    2014-04-01

    We examined health outcomes among 34,494 workers employed at a microelectronics and business machine facility 1969-2001. Standardized mortality ratio (SMR) and standardized incidence ratios were used to evaluate health outcomes in the cohort and Cox regression modeling to evaluate relations between scores for occupational exposures and outcomes of a priori interest. Just over 17% of the cohort (5,966 people) had died through 2009. All cause, all cancer, and many cause-specific SMRs showed statistically significant deficits. In hourly males, SMRs were significantly elevated for non-Hodgkin's lymphoma and rectal cancer. Salaried males had excess testicular cancer incidence. Pleural cancer and mesothelioma excesses were observed in workers hired before 1969, but no available records substantiate use of asbestos in manufacturing processes. A positive, statistically significant relation was observed between exposure scores for tetrachloroethylene and nervous system diseases. Few significant exposure-outcome relations were observed, but risks from occupational exposures cannot be ruled out due to data limitations and the relative youth of the cohort. © 2013 Wiley Periodicals, Inc.

  11. Reversibility in Quantum Models of Stochastic Processes

    NASA Astrophysics Data System (ADS)

    Gier, David; Crutchfield, James; Mahoney, John; James, Ryan

    Natural phenomena such as time series of neural firing, orientation of layers in crystal stacking and successive measurements in spin-systems are inherently probabilistic. The provably minimal classical models of such stochastic processes are ɛ-machines, which consist of internal states, transition probabilities between states and output values. The topological properties of the ɛ-machine for a given process characterize the structure, memory and patterns of that process. However ɛ-machines are often not ideal because their statistical complexity (Cμ) is demonstrably greater than the excess entropy (E) of the processes they represent. Quantum models (q-machines) of the same processes can do better in that their statistical complexity (Cq) obeys the relation Cμ >= Cq >= E. q-machines can be constructed to consider longer lengths of strings, resulting in greater compression. With code-words of sufficiently long length, the statistical complexity becomes time-symmetric - a feature apparently novel to this quantum representation. This result has ramifications for compression of classical information in quantum computing and quantum communication technology.

  12. Bias correction for selecting the minimal-error classifier from many machine learning models.

    PubMed

    Ding, Ying; Tang, Shaowu; Liao, Serena G; Jia, Jia; Oesterreich, Steffi; Lin, Yan; Tseng, George C

    2014-11-15

    Supervised machine learning is commonly applied in genomic research to construct a classifier from the training data that is generalizable to predict independent testing data. When test datasets are not available, cross-validation is commonly used to estimate the error rate. Many machine learning methods are available, and it is well known that no universally best method exists in general. It has been a common practice to apply many machine learning methods and report the method that produces the smallest cross-validation error rate. Theoretically, such a procedure produces a selection bias. Consequently, many clinical studies with moderate sample sizes (e.g. n = 30-60) risk reporting a falsely small cross-validation error rate that could not be validated later in independent cohorts. In this article, we illustrated the probabilistic framework of the problem and explored the statistical and asymptotic properties. We proposed a new bias correction method based on learning curve fitting by inverse power law (IPL) and compared it with three existing methods: nested cross-validation, weighted mean correction and Tibshirani-Tibshirani procedure. All methods were compared in simulation datasets, five moderate size real datasets and two large breast cancer datasets. The result showed that IPL outperforms the other methods in bias correction with smaller variance, and it has an additional advantage to extrapolate error estimates for larger sample sizes, a practical feature to recommend whether more samples should be recruited to improve the classifier and accuracy. An R package 'MLbias' and all source files are publicly available. tsenglab.biostat.pitt.edu/software.htm. ctseng@pitt.edu Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  13. Statistical learning algorithms for identifying contrasting tillage practices with landsat thematic mapper data

    USDA-ARS?s Scientific Manuscript database

    Tillage management practices have direct impact on water holding capacity, evaporation, carbon sequestration, and water quality. This study examines the feasibility of two statistical learning algorithms, such as Least Square Support Vector Machine (LSSVM) and Relevance Vector Machine (RVM), for cla...

  14. MLViS: A Web Tool for Machine Learning-Based Virtual Screening in Early-Phase of Drug Discovery and Development

    PubMed Central

    Korkmaz, Selcuk; Zararsiz, Gokmen; Goksuluk, Dincer

    2015-01-01

    Virtual screening is an important step in early-phase of drug discovery process. Since there are thousands of compounds, this step should be both fast and effective in order to distinguish drug-like and nondrug-like molecules. Statistical machine learning methods are widely used in drug discovery studies for classification purpose. Here, we aim to develop a new tool, which can classify molecules as drug-like and nondrug-like based on various machine learning methods, including discriminant, tree-based, kernel-based, ensemble and other algorithms. To construct this tool, first, performances of twenty-three different machine learning algorithms are compared by ten different measures, then, ten best performing algorithms have been selected based on principal component and hierarchical cluster analysis results. Besides classification, this application has also ability to create heat map and dendrogram for visual inspection of the molecules through hierarchical cluster analysis. Moreover, users can connect the PubChem database to download molecular information and to create two-dimensional structures of compounds. This application is freely available through www.biosoft.hacettepe.edu.tr/MLViS/. PMID:25928885

  15. PredPsych: A toolbox for predictive machine learning-based approach in experimental psychology research.

    PubMed

    Koul, Atesh; Becchio, Cristina; Cavallo, Andrea

    2017-12-12

    Recent years have seen an increased interest in machine learning-based predictive methods for analyzing quantitative behavioral data in experimental psychology. While these methods can achieve relatively greater sensitivity compared to conventional univariate techniques, they still lack an established and accessible implementation. The aim of current work was to build an open-source R toolbox - "PredPsych" - that could make these methods readily available to all psychologists. PredPsych is a user-friendly, R toolbox based on machine-learning predictive algorithms. In this paper, we present the framework of PredPsych via the analysis of a recently published multiple-subject motion capture dataset. In addition, we discuss examples of possible research questions that can be addressed with the machine-learning algorithms implemented in PredPsych and cannot be easily addressed with univariate statistical analysis. We anticipate that PredPsych will be of use to researchers with limited programming experience not only in the field of psychology, but also in that of clinical neuroscience, enabling computational assessment of putative bio-behavioral markers for both prognosis and diagnosis.

  16. BLS Machine-Readable Data and Tabulating Routines.

    ERIC Educational Resources Information Center

    DiFillipo, Tony

    This report describes the machine-readable data and tabulating routines that the Bureau of Labor Statistics (BLS) is prepared to distribute. An introduction discusses the LABSTAT (Labor Statistics) database and the BLS policy on release of unpublished data. Descriptions summarizing data stored in 25 files follow this format: overview, data…

  17. On the Stability of Jump-Linear Systems Driven by Finite-State Machines with Markovian Inputs

    NASA Technical Reports Server (NTRS)

    Patilkulkarni, Sudarshan; Herencia-Zapana, Heber; Gray, W. Steven; Gonzalez, Oscar R.

    2004-01-01

    This paper presents two mean-square stability tests for a jump-linear system driven by a finite-state machine with a first-order Markovian input process. The first test is based on conventional Markov jump-linear theory and avoids the use of any higher-order statistics. The second test is developed directly using the higher-order statistics of the machine s output process. The two approaches are illustrated with a simple model for a recoverable computer control system.

  18. Modeling Stochastic Kinetics of Molecular Machines at Multiple Levels: From Molecules to Modules

    PubMed Central

    Chowdhury, Debashish

    2013-01-01

    A molecular machine is either a single macromolecule or a macromolecular complex. In spite of the striking superficial similarities between these natural nanomachines and their man-made macroscopic counterparts, there are crucial differences. Molecular machines in a living cell operate stochastically in an isothermal environment far from thermodynamic equilibrium. In this mini-review we present a catalog of the molecular machines and an inventory of the essential toolbox for theoretically modeling these machines. The tool kits include 1), nonequilibrium statistical-physics techniques for modeling machines and machine-driven processes; and 2), statistical-inference methods for reverse engineering a functional machine from the empirical data. The cell is often likened to a microfactory in which the machineries are organized in modular fashion; each module consists of strongly coupled multiple machines, but different modules interact weakly with each other. This microfactory has its own automated supply chain and delivery system. Buoyed by the success achieved in modeling individual molecular machines, we advocate integration of these models in the near future to develop models of functional modules. A system-level description of the cell from the perspective of molecular machinery (the mechanome) is likely to emerge from further integrations that we envisage here. PMID:23746505

  19. Detecting Visually Observable Disease Symptoms from Faces.

    PubMed

    Wang, Kuan; Luo, Jiebo

    2016-12-01

    Recent years have witnessed an increasing interest in the application of machine learning to clinical informatics and healthcare systems. A significant amount of research has been done on healthcare systems based on supervised learning. In this study, we present a generalized solution to detect visually observable symptoms on faces using semi-supervised anomaly detection combined with machine vision algorithms. We rely on the disease-related statistical facts to detect abnormalities and classify them into multiple categories to narrow down the possible medical reasons of detecting. Our method is in contrast with most existing approaches, which are limited by the availability of labeled training data required for supervised learning, and therefore offers the major advantage of flagging any unusual and visually observable symptoms.

  20. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Angers, Crystal Plume; Bottema, Ryan; Buckley, Les

    Purpose: Treatment unit uptime statistics are typically used to monitor radiation equipment performance. The Ottawa Hospital Cancer Centre has introduced the use of Quality Control (QC) test success as a quality indicator for equipment performance and overall health of the equipment QC program. Methods: Implemented in 2012, QATrack+ is used to record and monitor over 1100 routine machine QC tests each month for 20 treatment and imaging units ( http://qatrackplus.com/ ). Using an SQL (structured query language) script, automated queries of the QATrack+ database are used to generate program metrics such as the number of QC tests executed and themore » percentage of tests passing, at tolerance or at action. These metrics are compared against machine uptime statistics already reported within the program. Results: Program metrics for 2015 show good correlation between pass rate of QC tests and uptime for a given machine. For the nine conventional linacs, the QC test success rate was consistently greater than 97%. The corresponding uptimes for these units are better than 98%. Machines that consistently show higher failure or tolerance rates in the QC tests have lower uptimes. This points to either poor machine performance requiring corrective action or to problems with the QC program. Conclusions: QATrack+ significantly improves the organization of QC data but can also aid in overall equipment management. Complimenting machine uptime statistics with QC test metrics provides a more complete picture of overall machine performance and can be used to identify areas of improvement in the machine service and QC programs.« less

  1. Machine learning for outcome prediction of acute ischemic stroke post intra-arterial therapy.

    PubMed

    Asadi, Hamed; Dowling, Richard; Yan, Bernard; Mitchell, Peter

    2014-01-01

    Stroke is a major cause of death and disability. Accurately predicting stroke outcome from a set of predictive variables may identify high-risk patients and guide treatment approaches, leading to decreased morbidity. Logistic regression models allow for the identification and validation of predictive variables. However, advanced machine learning algorithms offer an alternative, in particular, for large-scale multi-institutional data, with the advantage of easily incorporating newly available data to improve prediction performance. Our aim was to design and compare different machine learning methods, capable of predicting the outcome of endovascular intervention in acute anterior circulation ischaemic stroke. We conducted a retrospective study of a prospectively collected database of acute ischaemic stroke treated by endovascular intervention. Using SPSS®, MATLAB®, and Rapidminer®, classical statistics as well as artificial neural network and support vector algorithms were applied to design a supervised machine capable of classifying these predictors into potential good and poor outcomes. These algorithms were trained, validated and tested using randomly divided data. We included 107 consecutive acute anterior circulation ischaemic stroke patients treated by endovascular technique. Sixty-six were male and the mean age of 65.3. All the available demographic, procedural and clinical factors were included into the models. The final confusion matrix of the neural network, demonstrated an overall congruency of ∼ 80% between the target and output classes, with favourable receiving operative characteristics. However, after optimisation, the support vector machine had a relatively better performance, with a root mean squared error of 2.064 (SD: ± 0.408). We showed promising accuracy of outcome prediction, using supervised machine learning algorithms, with potential for incorporation of larger multicenter datasets, likely further improving prediction. Finally, we propose that a robust machine learning system can potentially optimise the selection process for endovascular versus medical treatment in the management of acute stroke.

  2. On-line Machine Learning and Event Detection in Petascale Data Streams

    NASA Astrophysics Data System (ADS)

    Thompson, David R.; Wagstaff, K. L.

    2012-01-01

    Traditional statistical data mining involves off-line analysis in which all data are available and equally accessible. However, petascale datasets have challenged this premise since it is often impossible to store, let alone analyze, the relevant observations. This has led the machine learning community to investigate adaptive processing chains where data mining is a continuous process. Here pattern recognition permits triage and followup decisions at multiple stages of a processing pipeline. Such techniques can also benefit new astronomical instruments such as the Large Synoptic Survey Telescope (LSST) and Square Kilometre Array (SKA) that will generate petascale data volumes. We summarize some machine learning perspectives on real time data mining, with representative cases of astronomical applications and event detection in high volume datastreams. The first is a "supervised classification" approach currently used for transient event detection at the Very Long Baseline Array (VLBA). It injects known signals of interest - faint single-pulse anomalies - and tunes system parameters to recover these events. This permits meaningful event detection for diverse instrument configurations and observing conditions whose noise cannot be well-characterized in advance. Second, "semi-supervised novelty detection" finds novel events based on statistical deviations from previous patterns. It detects outlier signals of interest while considering known examples of false alarm interference. Applied to data from the Parkes pulsar survey, the approach identifies anomalous "peryton" phenomena that do not match previous event models. Finally, we consider online light curve classification that can trigger adaptive followup measurements of candidate events. Classifier performance analyses suggest optimal survey strategies, and permit principled followup decisions from incomplete data. These examples trace a broad range of algorithm possibilities available for online astronomical data mining. This talk describes research performed at the Jet Propulsion Laboratory, California Institute of Technology. Copyright 2012, All Rights Reserved. U.S. Government support acknowledged.

  3. Modeling stochastic kinetics of molecular machines at multiple levels: from molecules to modules.

    PubMed

    Chowdhury, Debashish

    2013-06-04

    A molecular machine is either a single macromolecule or a macromolecular complex. In spite of the striking superficial similarities between these natural nanomachines and their man-made macroscopic counterparts, there are crucial differences. Molecular machines in a living cell operate stochastically in an isothermal environment far from thermodynamic equilibrium. In this mini-review we present a catalog of the molecular machines and an inventory of the essential toolbox for theoretically modeling these machines. The tool kits include 1), nonequilibrium statistical-physics techniques for modeling machines and machine-driven processes; and 2), statistical-inference methods for reverse engineering a functional machine from the empirical data. The cell is often likened to a microfactory in which the machineries are organized in modular fashion; each module consists of strongly coupled multiple machines, but different modules interact weakly with each other. This microfactory has its own automated supply chain and delivery system. Buoyed by the success achieved in modeling individual molecular machines, we advocate integration of these models in the near future to develop models of functional modules. A system-level description of the cell from the perspective of molecular machinery (the mechanome) is likely to emerge from further integrations that we envisage here. Copyright © 2013 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  4. Machine Learning Approaches for Clinical Psychology and Psychiatry.

    PubMed

    Dwyer, Dominic B; Falkai, Peter; Koutsouleris, Nikolaos

    2018-05-07

    Machine learning approaches for clinical psychology and psychiatry explicitly focus on learning statistical functions from multidimensional data sets to make generalizable predictions about individuals. The goal of this review is to provide an accessible understanding of why this approach is important for future practice given its potential to augment decisions associated with the diagnosis, prognosis, and treatment of people suffering from mental illness using clinical and biological data. To this end, the limitations of current statistical paradigms in mental health research are critiqued, and an introduction is provided to critical machine learning methods used in clinical studies. A selective literature review is then presented aiming to reinforce the usefulness of machine learning methods and provide evidence of their potential. In the context of promising initial results, the current limitations of machine learning approaches are addressed, and considerations for future clinical translation are outlined.

  5. The Statistical Basis of Chemical Equilibria.

    ERIC Educational Resources Information Center

    Hauptmann, Siegfried; Menger, Eva

    1978-01-01

    Describes a machine which demonstrates the statistical bases of chemical equilibrium, and in doing so conveys insight into the connections among statistical mechanics, quantum mechanics, Maxwell Boltzmann statistics, statistical thermodynamics, and transition state theory. (GA)

  6. Adding Statistical Machine Translation Adaptation to Computer-Assisted Translation

    DTIC Science & Technology

    2013-09-01

    are automatically searched and used to suggest possible translations; (2) spell-checkers; (3) glossaries; (4) dictionaries ; (5) alignment and...matching against TMs to propose translations; spell-checking, glossary, and dictionary look-up; support for multiple file formats; regular expressions...on Telecommunications. Tehran, 2012, 822–826. Bertoldi, N.; Federico, M. Domain Adaptation for Statistical Machine Translation with Monolingual

  7. Systematic Poisoning Attacks on and Defenses for Machine Learning in Healthcare.

    PubMed

    Mozaffari-Kermani, Mehran; Sur-Kolay, Susmita; Raghunathan, Anand; Jha, Niraj K

    2015-11-01

    Machine learning is being used in a wide range of application domains to discover patterns in large datasets. Increasingly, the results of machine learning drive critical decisions in applications related to healthcare and biomedicine. Such health-related applications are often sensitive, and thus, any security breach would be catastrophic. Naturally, the integrity of the results computed by machine learning is of great importance. Recent research has shown that some machine-learning algorithms can be compromised by augmenting their training datasets with malicious data, leading to a new class of attacks called poisoning attacks. Hindrance of a diagnosis may have life-threatening consequences and could cause distrust. On the other hand, not only may a false diagnosis prompt users to distrust the machine-learning algorithm and even abandon the entire system but also such a false positive classification may cause patient distress. In this paper, we present a systematic, algorithm-independent approach for mounting poisoning attacks across a wide range of machine-learning algorithms and healthcare datasets. The proposed attack procedure generates input data, which, when added to the training set, can either cause the results of machine learning to have targeted errors (e.g., increase the likelihood of classification into a specific class), or simply introduce arbitrary errors (incorrect classification). These attacks may be applied to both fixed and evolving datasets. They can be applied even when only statistics of the training dataset are available or, in some cases, even without access to the training dataset, although at a lower efficacy. We establish the effectiveness of the proposed attacks using a suite of six machine-learning algorithms and five healthcare datasets. Finally, we present countermeasures against the proposed generic attacks that are based on tracking and detecting deviations in various accuracy metrics, and benchmark their effectiveness.

  8. Machine Learning Algorithms Outperform Conventional Regression Models in Predicting Development of Hepatocellular Carcinoma

    PubMed Central

    Singal, Amit G.; Mukherjee, Ashin; Elmunzer, B. Joseph; Higgins, Peter DR; Lok, Anna S.; Zhu, Ji; Marrero, Jorge A; Waljee, Akbar K

    2015-01-01

    Background Predictive models for hepatocellular carcinoma (HCC) have been limited by modest accuracy and lack of validation. Machine learning algorithms offer a novel methodology, which may improve HCC risk prognostication among patients with cirrhosis. Our study's aim was to develop and compare predictive models for HCC development among cirrhotic patients, using conventional regression analysis and machine learning algorithms. Methods We enrolled 442 patients with Child A or B cirrhosis at the University of Michigan between January 2004 and September 2006 (UM cohort) and prospectively followed them until HCC development, liver transplantation, death, or study termination. Regression analysis and machine learning algorithms were used to construct predictive models for HCC development, which were tested on an independent validation cohort from the Hepatitis C Antiviral Long-term Treatment against Cirrhosis (HALT-C) Trial. Both models were also compared to the previously published HALT-C model. Discrimination was assessed using receiver operating characteristic curve analysis and diagnostic accuracy was assessed with net reclassification improvement and integrated discrimination improvement statistics. Results After a median follow-up of 3.5 years, 41 patients developed HCC. The UM regression model had a c-statistic of 0.61 (95%CI 0.56-0.67), whereas the machine learning algorithm had a c-statistic of 0.64 (95%CI 0.60–0.69) in the validation cohort. The machine learning algorithm had significantly better diagnostic accuracy as assessed by net reclassification improvement (p<0.001) and integrated discrimination improvement (p=0.04). The HALT-C model had a c-statistic of 0.60 (95%CI 0.50-0.70) in the validation cohort and was outperformed by the machine learning algorithm (p=0.047). Conclusion Machine learning algorithms improve the accuracy of risk stratifying patients with cirrhosis and can be used to accurately identify patients at high-risk for developing HCC. PMID:24169273

  9. Parental attitudes towards soft drink vending machines in high schools.

    PubMed

    Hendel-Paterson, Maia; French, Simone A; Story, Mary

    2004-10-01

    Soft drink vending machines are available in 98% of US high schools. However, few data are available about parents' opinions regarding the availability of soft drink vending machines in schools. Six focus groups with 33 parents at three suburban high schools were conducted to describe the perspectives of parents regarding soft drink vending machines in their children's high school. Parents viewed the issue of soft drink vending machines as a matter of their children's personal choice more than as an issue of a healthful school environment. However, parents were unaware of many important details about the soft drink vending machines in their children's school, such as the number and location of machines, hours of operation, types of beverages available, or whether the school had contracts with soft drink companies. Parents need more information about the number of soft drink vending machines at their children's school, the beverages available, the revenue generated by soft drink vending machine sales, and the terms of any contracts between the school and soft drink companies.

  10. Machine Learning for Flood Prediction in Google Earth Engine

    NASA Astrophysics Data System (ADS)

    Kuhn, C.; Tellman, B.; Max, S. A.; Schwarz, B.

    2015-12-01

    With the increasing availability of high-resolution satellite imagery, dynamic flood mapping in near real time is becoming a reachable goal for decision-makers. This talk describes a newly developed framework for predicting biophysical flood vulnerability using public data, cloud computing and machine learning. Our objective is to define an approach to flood inundation modeling using statistical learning methods deployed in a cloud-based computing platform. Traditionally, static flood extent maps grounded in physically based hydrologic models can require hours of human expertise to construct at significant financial cost. In addition, desktop modeling software and limited local server storage can impose restraints on the size and resolution of input datasets. Data-driven, cloud-based processing holds promise for predictive watershed modeling at a wide range of spatio-temporal scales. However, these benefits come with constraints. In particular, parallel computing limits a modeler's ability to simulate the flow of water across a landscape, rendering traditional routing algorithms unusable in this platform. Our project pushes these limits by testing the performance of two machine learning algorithms, Support Vector Machine (SVM) and Random Forests, at predicting flood extent. Constructed in Google Earth Engine, the model mines a suite of publicly available satellite imagery layers to use as algorithm inputs. Results are cross-validated using MODIS-based flood maps created using the Dartmouth Flood Observatory detection algorithm. Model uncertainty highlights the difficulty of deploying unbalanced training data sets based on rare extreme events.

  11. Analysis of Machine Learning Techniques for Heart Failure Readmissions.

    PubMed

    Mortazavi, Bobak J; Downing, Nicholas S; Bucholz, Emily M; Dharmarajan, Kumar; Manhapra, Ajay; Li, Shu-Xia; Negahban, Sahand N; Krumholz, Harlan M

    2016-11-01

    The current ability to predict readmissions in patients with heart failure is modest at best. It is unclear whether machine learning techniques that address higher dimensional, nonlinear relationships among variables would enhance prediction. We sought to compare the effectiveness of several machine learning algorithms for predicting readmissions. Using data from the Telemonitoring to Improve Heart Failure Outcomes trial, we compared the effectiveness of random forests, boosting, random forests combined hierarchically with support vector machines or logistic regression (LR), and Poisson regression against traditional LR to predict 30- and 180-day all-cause readmissions and readmissions because of heart failure. We randomly selected 50% of patients for a derivation set, and a validation set comprised the remaining patients, validated using 100 bootstrapped iterations. We compared C statistics for discrimination and distributions of observed outcomes in risk deciles for predictive range. In 30-day all-cause readmission prediction, the best performing machine learning model, random forests, provided a 17.8% improvement over LR (mean C statistics, 0.628 and 0.533, respectively). For readmissions because of heart failure, boosting improved the C statistic by 24.9% over LR (mean C statistic 0.678 and 0.543, respectively). For 30-day all-cause readmission, the observed readmission rates in the lowest and highest deciles of predicted risk with random forests (7.8% and 26.2%, respectively) showed a much wider separation than LR (14.2% and 16.4%, respectively). Machine learning methods improved the prediction of readmission after hospitalization for heart failure compared with LR and provided the greatest predictive range in observed readmission rates. © 2016 American Heart Association, Inc.

  12. Rare events modeling with support vector machine: Application to forecasting large-amplitude geomagnetic substorms and extreme events in financial markets.

    NASA Astrophysics Data System (ADS)

    Gavrishchaka, V. V.; Ganguli, S. B.

    2001-12-01

    Reliable forecasting of rare events in a complex dynamical system is a challenging problem that is important for many practical applications. Due to the nature of rare events, data set available for construction of the statistical and/or machine learning model is often very limited and incomplete. Therefore many widely used approaches including such robust algorithms as neural networks can easily become inadequate for rare events prediction. Moreover in many practical cases models with high-dimensional inputs are required. This limits applications of the existing rare event modeling techniques (e.g., extreme value theory) that focus on univariate cases. These approaches are not easily extended to multivariate cases. Support vector machine (SVM) is a machine learning system that can provide an optimal generalization using very limited and incomplete training data sets and can efficiently handle high-dimensional data. These features may allow to use SVM to model rare events in some applications. We have applied SVM-based system to the problem of large-amplitude substorm prediction and extreme event forecasting in stock and currency exchange markets. Encouraging preliminary results will be presented and other possible applications of the system will be discussed.

  13. Interpreting support vector machine models for multivariate group wise analysis in neuroimaging

    PubMed Central

    Gaonkar, Bilwaj; Shinohara, Russell T; Davatzikos, Christos

    2015-01-01

    Machine learning based classification algorithms like support vector machines (SVMs) have shown great promise for turning a high dimensional neuroimaging data into clinically useful decision criteria. However, tracing imaging based patterns that contribute significantly to classifier decisions remains an open problem. This is an issue of critical importance in imaging studies seeking to determine which anatomical or physiological imaging features contribute to the classifier’s decision, thereby allowing users to critically evaluate the findings of such machine learning methods and to understand disease mechanisms. The majority of published work addresses the question of statistical inference for support vector classification using permutation tests based on SVM weight vectors. Such permutation testing ignores the SVM margin, which is critical in SVM theory. In this work we emphasize the use of a statistic that explicitly accounts for the SVM margin and show that the null distributions associated with this statistic are asymptotically normal. Further, our experiments show that this statistic is a lot less conservative as compared to weight based permutation tests and yet specific enough to tease out multivariate patterns in the data. Thus, we can better understand the multivariate patterns that the SVM uses for neuroimaging based classification. PMID:26210913

  14. Speech reconstruction using a deep partially supervised neural network.

    PubMed

    McLoughlin, Ian; Li, Jingjie; Song, Yan; Sharifzadeh, Hamid R

    2017-08-01

    Statistical speech reconstruction for larynx-related dysphonia has achieved good performance using Gaussian mixture models and, more recently, restricted Boltzmann machine arrays; however, deep neural network (DNN)-based systems have been hampered by the limited amount of training data available from individual voice-loss patients. The authors propose a novel DNN structure that allows a partially supervised training approach on spectral features from smaller data sets, yielding very good results compared with the current state-of-the-art.

  15. Evaluating the Security of Machine Learning Algorithms

    DTIC Science & Technology

    2008-05-20

    Two far-reaching trends in computing have grown in significance in recent years. First, statistical machine learning has entered the mainstream as a...computing applications. The growing intersection of these trends compels us to investigate how well machine learning performs under adversarial conditions... machine learning has a structure that we can use to build secure learning systems. This thesis makes three high-level contributions. First, we develop a

  16. Sharing brain mapping statistical results with the neuroimaging data model

    PubMed Central

    Maumet, Camille; Auer, Tibor; Bowring, Alexander; Chen, Gang; Das, Samir; Flandin, Guillaume; Ghosh, Satrajit; Glatard, Tristan; Gorgolewski, Krzysztof J.; Helmer, Karl G.; Jenkinson, Mark; Keator, David B.; Nichols, B. Nolan; Poline, Jean-Baptiste; Reynolds, Richard; Sochat, Vanessa; Turner, Jessica; Nichols, Thomas E.

    2016-01-01

    Only a tiny fraction of the data and metadata produced by an fMRI study is finally conveyed to the community. This lack of transparency not only hinders the reproducibility of neuroimaging results but also impairs future meta-analyses. In this work we introduce NIDM-Results, a format specification providing a machine-readable description of neuroimaging statistical results along with key image data summarising the experiment. NIDM-Results provides a unified representation of mass univariate analyses including a level of detail consistent with available best practices. This standardized representation allows authors to relay methods and results in a platform-independent regularized format that is not tied to a particular neuroimaging software package. Tools are available to export NIDM-Result graphs and associated files from the widely used SPM and FSL software packages, and the NeuroVault repository can import NIDM-Results archives. The specification is publically available at: http://nidm.nidash.org/specs/nidm-results.html. PMID:27922621

  17. Kernel machines for epilepsy diagnosis via EEG signal classification: a comparative study.

    PubMed

    Lima, Clodoaldo A M; Coelho, André L V

    2011-10-01

    We carry out a systematic assessment on a suite of kernel-based learning machines while coping with the task of epilepsy diagnosis through automatic electroencephalogram (EEG) signal classification. The kernel machines investigated include the standard support vector machine (SVM), the least squares SVM, the Lagrangian SVM, the smooth SVM, the proximal SVM, and the relevance vector machine. An extensive series of experiments was conducted on publicly available data, whose clinical EEG recordings were obtained from five normal subjects and five epileptic patients. The performance levels delivered by the different kernel machines are contrasted in terms of the criteria of predictive accuracy, sensitivity to the kernel function/parameter value, and sensitivity to the type of features extracted from the signal. For this purpose, 26 values for the kernel parameter (radius) of two well-known kernel functions (namely, Gaussian and exponential radial basis functions) were considered as well as 21 types of features extracted from the EEG signal, including statistical values derived from the discrete wavelet transform, Lyapunov exponents, and combinations thereof. We first quantitatively assess the impact of the choice of the wavelet basis on the quality of the features extracted. Four wavelet basis functions were considered in this study. Then, we provide the average accuracy (i.e., cross-validation error) values delivered by 252 kernel machine configurations; in particular, 40%/35% of the best-calibrated models of the standard and least squares SVMs reached 100% accuracy rate for the two kernel functions considered. Moreover, we show the sensitivity profiles exhibited by a large sample of the configurations whereby one can visually inspect their levels of sensitiveness to the type of feature and to the kernel function/parameter value. Overall, the results evidence that all kernel machines are competitive in terms of accuracy, with the standard and least squares SVMs prevailing more consistently. Moreover, the choice of the kernel function and parameter value as well as the choice of the feature extractor are critical decisions to be taken, albeit the choice of the wavelet family seems not to be so relevant. Also, the statistical values calculated over the Lyapunov exponents were good sources of signal representation, but not as informative as their wavelet counterparts. Finally, a typical sensitivity profile has emerged among all types of machines, involving some regions of stability separated by zones of sharp variation, with some kernel parameter values clearly associated with better accuracy rates (zones of optimality). Copyright © 2011 Elsevier B.V. All rights reserved.

  18. Fault Diagnosis for Rotating Machinery Using Vibration Measurement Deep Statistical Feature Learning.

    PubMed

    Li, Chuan; Sánchez, René-Vinicio; Zurita, Grover; Cerrada, Mariela; Cabrera, Diego

    2016-06-17

    Fault diagnosis is important for the maintenance of rotating machinery. The detection of faults and fault patterns is a challenging part of machinery fault diagnosis. To tackle this problem, a model for deep statistical feature learning from vibration measurements of rotating machinery is presented in this paper. Vibration sensor signals collected from rotating mechanical systems are represented in the time, frequency, and time-frequency domains, each of which is then used to produce a statistical feature set. For learning statistical features, real-value Gaussian-Bernoulli restricted Boltzmann machines (GRBMs) are stacked to develop a Gaussian-Bernoulli deep Boltzmann machine (GDBM). The suggested approach is applied as a deep statistical feature learning tool for both gearbox and bearing systems. The fault classification performances in experiments using this approach are 95.17% for the gearbox, and 91.75% for the bearing system. The proposed approach is compared to such standard methods as a support vector machine, GRBM and a combination model. In experiments, the best fault classification rate was detected using the proposed model. The results show that deep learning with statistical feature extraction has an essential improvement potential for diagnosing rotating machinery faults.

  19. Fault Diagnosis for Rotating Machinery Using Vibration Measurement Deep Statistical Feature Learning

    PubMed Central

    Li, Chuan; Sánchez, René-Vinicio; Zurita, Grover; Cerrada, Mariela; Cabrera, Diego

    2016-01-01

    Fault diagnosis is important for the maintenance of rotating machinery. The detection of faults and fault patterns is a challenging part of machinery fault diagnosis. To tackle this problem, a model for deep statistical feature learning from vibration measurements of rotating machinery is presented in this paper. Vibration sensor signals collected from rotating mechanical systems are represented in the time, frequency, and time-frequency domains, each of which is then used to produce a statistical feature set. For learning statistical features, real-value Gaussian-Bernoulli restricted Boltzmann machines (GRBMs) are stacked to develop a Gaussian-Bernoulli deep Boltzmann machine (GDBM). The suggested approach is applied as a deep statistical feature learning tool for both gearbox and bearing systems. The fault classification performances in experiments using this approach are 95.17% for the gearbox, and 91.75% for the bearing system. The proposed approach is compared to such standard methods as a support vector machine, GRBM and a combination model. In experiments, the best fault classification rate was detected using the proposed model. The results show that deep learning with statistical feature extraction has an essential improvement potential for diagnosing rotating machinery faults. PMID:27322273

  20. 41 CFR 101-26.509-2 - Requisitioning tabulating machine cards not available from Federal Supply Schedule contracts.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... machine cards not available from Federal Supply Schedule contracts. 101-26.509-2 Section 101-26.509-2... Programs § 101-26.509-2 Requisitioning tabulating machine cards not available from Federal Supply Schedule contracts. (a) Requisitions for tabulating machine cards covered by Federal Supply Schedule contracts which...

  1. Selected aspects of microelectronics technology and applications: Numerically controlled machine tools. Technology trends series no. 2

    NASA Astrophysics Data System (ADS)

    Sigurdson, J.; Tagerud, J.

    1986-05-01

    A UNIDO publication about machine tools with automatic control discusses the following: (1) numerical control (NC) machine tool perspectives, definition of NC, flexible manufacturing systems, robots and their industrial application, research and development, and sensors; (2) experience in developing a capability in NC machine tools; (3) policy issues; (4) procedures for retrieval of relevant documentation from data bases. Diagrams, statistics, bibliography are included.

  2. ROOT: A C++ framework for petabyte data storage, statistical analysis and visualization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Antcheva, I.; /CERN; Ballintijn, M.

    2009-01-01

    ROOT is an object-oriented C++ framework conceived in the high-energy physics (HEP) community, designed for storing and analyzing petabytes of data in an efficient way. Any instance of a C++ class can be stored into a ROOT file in a machine-independent compressed binary format. In ROOT the TTree object container is optimized for statistical data analysis over very large data sets by using vertical data storage techniques. These containers can span a large number of files on local disks, the web or a number of different shared file systems. In order to analyze this data, the user can chose outmore » of a wide set of mathematical and statistical functions, including linear algebra classes, numerical algorithms such as integration and minimization, and various methods for performing regression analysis (fitting). In particular, the RooFit package allows the user to perform complex data modeling and fitting while the RooStats library provides abstractions and implementations for advanced statistical tools. Multivariate classification methods based on machine learning techniques are available via the TMVA package. A central piece in these analysis tools are the histogram classes which provide binning of one- and multi-dimensional data. Results can be saved in high-quality graphical formats like Postscript and PDF or in bitmap formats like JPG or GIF. The result can also be stored into ROOT macros that allow a full recreation and rework of the graphics. Users typically create their analysis macros step by step, making use of the interactive C++ interpreter CINT, while running over small data samples. Once the development is finished, they can run these macros at full compiled speed over large data sets, using on-the-fly compilation, or by creating a stand-alone batch program. Finally, if processing farms are available, the user can reduce the execution time of intrinsically parallel tasks - e.g. data mining in HEP - by using PROOF, which will take care of optimally distributing the work over the available resources in a transparent way.« less

  3. Statistical-learning strategies generate only modestly performing predictive models for urinary symptoms following external beam radiotherapy of the prostate: A comparison of conventional and machine-learning methods

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yahya, Noorazrul, E-mail: noorazrul.yahya@research.uwa.edu.au; Ebert, Martin A.; Bulsara, Max

    Purpose: Given the paucity of available data concerning radiotherapy-induced urinary toxicity, it is important to ensure derivation of the most robust models with superior predictive performance. This work explores multiple statistical-learning strategies for prediction of urinary symptoms following external beam radiotherapy of the prostate. Methods: The performance of logistic regression, elastic-net, support-vector machine, random forest, neural network, and multivariate adaptive regression splines (MARS) to predict urinary symptoms was analyzed using data from 754 participants accrued by TROG03.04-RADAR. Predictive features included dose-surface data, comorbidities, and medication-intake. Four symptoms were analyzed: dysuria, haematuria, incontinence, and frequency, each with three definitions (grade ≥more » 1, grade ≥ 2 and longitudinal) with event rate between 2.3% and 76.1%. Repeated cross-validations producing matched models were implemented. A synthetic minority oversampling technique was utilized in endpoints with rare events. Parameter optimization was performed on the training data. Area under the receiver operating characteristic curve (AUROC) was used to compare performance using sample size to detect differences of ≥0.05 at the 95% confidence level. Results: Logistic regression, elastic-net, random forest, MARS, and support-vector machine were the highest-performing statistical-learning strategies in 3, 3, 3, 2, and 1 endpoints, respectively. Logistic regression, MARS, elastic-net, random forest, neural network, and support-vector machine were the best, or were not significantly worse than the best, in 7, 7, 5, 5, 3, and 1 endpoints. The best-performing statistical model was for dysuria grade ≥ 1 with AUROC ± standard deviation of 0.649 ± 0.074 using MARS. For longitudinal frequency and dysuria grade ≥ 1, all strategies produced AUROC>0.6 while all haematuria endpoints and longitudinal incontinence models produced AUROC<0.6. Conclusions: Logistic regression and MARS were most likely to be the best-performing strategy for the prediction of urinary symptoms with elastic-net and random forest producing competitive results. The predictive power of the models was modest and endpoint-dependent. New features, including spatial dose maps, may be necessary to achieve better models.« less

  4. Statistical machine translation for biomedical text: are we there yet?

    PubMed

    Wu, Cuijun; Xia, Fei; Deleger, Louise; Solti, Imre

    2011-01-01

    In our paper we addressed the research question: "Has machine translation achieved sufficiently high quality to translate PubMed titles for patients?". We analyzed statistical machine translation output for six foreign language - English translation pairs (bi-directionally). We built a high performing in-house system and evaluated its output for each translation pair on large scale both with automated BLEU scores and human judgment. In addition to the in-house system, we also evaluated Google Translate's performance specifically within the biomedical domain. We report high performance for German, French and Spanish -- English bi-directional translation pairs for both Google Translate and our system.

  5. Evaluating bacterial gene-finding HMM structures as probabilistic logic programs.

    PubMed

    Mørk, Søren; Holmes, Ian

    2012-03-01

    Probabilistic logic programming offers a powerful way to describe and evaluate structured statistical models. To investigate the practicality of probabilistic logic programming for structure learning in bioinformatics, we undertook a simplified bacterial gene-finding benchmark in PRISM, a probabilistic dialect of Prolog. We evaluate Hidden Markov Model structures for bacterial protein-coding gene potential, including a simple null model structure, three structures based on existing bacterial gene finders and two novel model structures. We test standard versions as well as ADPH length modeling and three-state versions of the five model structures. The models are all represented as probabilistic logic programs and evaluated using the PRISM machine learning system in terms of statistical information criteria and gene-finding prediction accuracy, in two bacterial genomes. Neither of our implementations of the two currently most used model structures are best performing in terms of statistical information criteria or prediction performances, suggesting that better-fitting models might be achievable. The source code of all PRISM models, data and additional scripts are freely available for download at: http://github.com/somork/codonhmm. Supplementary data are available at Bioinformatics online.

  6. Performance Analysis of Millimeter-Wave Multi-hop Machine-to-Machine Networks Based on Hop Distance Statistics

    PubMed Central

    2018-01-01

    As an intrinsic part of the Internet of Things (IoT) ecosystem, machine-to-machine (M2M) communications are expected to provide ubiquitous connectivity between machines. Millimeter-wave (mmWave) communication is another promising technology for the future communication systems to alleviate the pressure of scarce spectrum resources. For this reason, in this paper, we consider multi-hop M2M communications, where a machine-type communication (MTC) device with the limited transmit power relays to help other devices using mmWave. To be specific, we focus on hop distance statistics and their impacts on system performances in multi-hop wireless networks (MWNs) with directional antenna arrays in mmWave for M2M communications. Different from microwave systems, in mmWave communications, wireless channel suffers from blockage by obstacles that heavily attenuate line-of-sight signals, which may result in limited per-hop progress in MWNs. We consider two routing strategies aiming at different types of applications and derive the probability distributions of their hop distances. Moreover, we provide their baseline statistics assuming the blockage-free scenario to quantify the impact of blockages. Based on the hop distance analysis, we propose a method to estimate the end-to-end performances (e.g., outage probability, hop count, and transmit energy) of the mmWave MWNs, which provides important insights into mmWave MWN design without time-consuming and repetitive end-to-end simulation. PMID:29329248

  7. Performance Analysis of Millimeter-Wave Multi-hop Machine-to-Machine Networks Based on Hop Distance Statistics.

    PubMed

    Jung, Haejoon; Lee, In-Ho

    2018-01-12

    As an intrinsic part of the Internet of Things (IoT) ecosystem, machine-to-machine (M2M) communications are expected to provide ubiquitous connectivity between machines. Millimeter-wave (mmWave) communication is another promising technology for the future communication systems to alleviate the pressure of scarce spectrum resources. For this reason, in this paper, we consider multi-hop M2M communications, where a machine-type communication (MTC) device with the limited transmit power relays to help other devices using mmWave. To be specific, we focus on hop distance statistics and their impacts on system performances in multi-hop wireless networks (MWNs) with directional antenna arrays in mmWave for M2M communications. Different from microwave systems, in mmWave communications, wireless channel suffers from blockage by obstacles that heavily attenuate line-of-sight signals, which may result in limited per-hop progress in MWNs. We consider two routing strategies aiming at different types of applications and derive the probability distributions of their hop distances. Moreover, we provide their baseline statistics assuming the blockage-free scenario to quantify the impact of blockages. Based on the hop distance analysis, we propose a method to estimate the end-to-end performances (e.g., outage probability, hop count, and transmit energy) of the mmWave MWNs, which provides important insights into mmWave MWN design without time-consuming and repetitive end-to-end simulation.

  8. CISN ShakeAlert Earthquake Early Warning System Monitoring Tools

    NASA Astrophysics Data System (ADS)

    Henson, I. H.; Allen, R. M.; Neuhauser, D. S.

    2015-12-01

    CISN ShakeAlert is a prototype earthquake early warning system being developed and tested by the California Integrated Seismic Network. The system has recently been expanded to support redundant data processing and communications. It now runs on six machines at three locations with ten Apache ActiveMQ message brokers linking together 18 waveform processors, 12 event association processes and 4 Decision Module alert processes. The system ingests waveform data from about 500 stations and generates many thousands of triggers per day, from which a small portion produce earthquake alerts. We have developed interactive web browser system-monitoring tools that display near real time state-of-health and performance information. This includes station availability, trigger statistics, communication and alert latencies. Connections to regional earthquake catalogs provide a rapid assessment of the Decision Module hypocenter accuracy. Historical performance can be evaluated, including statistics for hypocenter and origin time accuracy and alert time latencies for different time periods, magnitude ranges and geographic regions. For the ElarmS event associator, individual earthquake processing histories can be examined, including details of the transmission and processing latencies associated with individual P-wave triggers. Individual station trigger and latency statistics are available. Detailed information about the ElarmS trigger association process for both alerted events and rejected events is also available. The Google Web Toolkit and Map API have been used to develop interactive web pages that link tabular and geographic information. Statistical analysis is provided by the R-Statistics System linked to a PostgreSQL database.

  9. Coupling Matched Molecular Pairs with Machine Learning for Virtual Compound Optimization.

    PubMed

    Turk, Samo; Merget, Benjamin; Rippmann, Friedrich; Fulle, Simone

    2017-12-26

    Matched molecular pair (MMP) analyses are widely used in compound optimization projects to gain insights into structure-activity relationships (SAR). The analysis is traditionally done via statistical methods but can also be employed together with machine learning (ML) approaches to extrapolate to novel compounds. The here introduced MMP/ML method combines a fragment-based MMP implementation with different machine learning methods to obtain automated SAR decomposition and prediction. To test the prediction capabilities and model transferability, two different compound optimization scenarios were designed: (1) "new fragments" which occurs when exploring new fragments for a defined compound series and (2) "new static core and transformations" which resembles for instance the identification of a new compound series. Very good results were achieved by all employed machine learning methods especially for the new fragments case, but overall deep neural network models performed best, allowing reliable predictions also for the new static core and transformations scenario, where comprehensive SAR knowledge of the compound series is missing. Furthermore, we show that models trained on all available data have a higher generalizability compared to models trained on focused series and can extend beyond chemical space covered in the training data. Thus, coupling MMP with deep neural networks provides a promising approach to make high quality predictions on various data sets and in different compound optimization scenarios.

  10. Machine Learning in Medicine

    PubMed Central

    Deo, Rahul C.

    2015-01-01

    Spurred by advances in processing power, memory, storage, and an unprecedented wealth of data, computers are being asked to tackle increasingly complex learning tasks, often with astonishing success. Computers have now mastered a popular variant of poker, learned the laws of physics from experimental data, and become experts in video games – tasks which would have been deemed impossible not too long ago. In parallel, the number of companies centered on applying complex data analysis to varying industries has exploded, and it is thus unsurprising that some analytic companies are turning attention to problems in healthcare. The purpose of this review is to explore what problems in medicine might benefit from such learning approaches and use examples from the literature to introduce basic concepts in machine learning. It is important to note that seemingly large enough medical data sets and adequate learning algorithms have been available for many decades – and yet, although there are thousands of papers applying machine learning algorithms to medical data, very few have contributed meaningfully to clinical care. This lack of impact stands in stark contrast to the enormous relevance of machine learning to many other industries. Thus part of my effort will be to identify what obstacles there may be to changing the practice of medicine through statistical learning approaches, and discuss how these might be overcome. PMID:26572668

  11. Machine Learning in the Presence of an Adversary: Attacking and Defending the SpamBayes Spam Filter

    DTIC Science & Technology

    2008-05-20

    Machine learning techniques are often used for decision making in security critical applications such as intrusion detection and spam filtering...filter. The defenses shown in this thesis are able to work against the attacks developed against SpamBayes and are sufficiently generic to be easily extended into other statistical machine learning algorithms.

  12. Testing meta tagger

    DTIC Science & Technology

    2017-12-21

    rank , and computer vision. Machine learning is closely related to (and often overlaps with) computational statistics, which also focuses on...Machine learning is a field of computer science that gives computers the ability to learn without being explicitly programmed.[1] Arthur Samuel...an American pioneer in the field of computer gaming and artificial intelligence, coined the term "Machine Learning " in 1959 while at IBM[2]. Evolved

  13. Advances in Machine Learning and Data Mining for Astronomy

    NASA Astrophysics Data System (ADS)

    Way, Michael J.; Scargle, Jeffrey D.; Ali, Kamal M.; Srivastava, Ashok N.

    2012-03-01

    Advances in Machine Learning and Data Mining for Astronomy documents numerous successful collaborations among computer scientists, statisticians, and astronomers who illustrate the application of state-of-the-art machine learning and data mining techniques in astronomy. Due to the massive amount and complexity of data in most scientific disciplines, the material discussed in this text transcends traditional boundaries between various areas in the sciences and computer science. The book's introductory part provides context to issues in the astronomical sciences that are also important to health, social, and physical sciences, particularly probabilistic and statistical aspects of classification and cluster analysis. The next part describes a number of astrophysics case studies that leverage a range of machine learning and data mining technologies. In the last part, developers of algorithms and practitioners of machine learning and data mining show how these tools and techniques are used in astronomical applications. With contributions from leading astronomers and computer scientists, this book is a practical guide to many of the most important developments in machine learning, data mining, and statistics. It explores how these advances can solve current and future problems in astronomy and looks at how they could lead to the creation of entirely new algorithms within the data mining community.

  14. Machine learning Z2 quantum spin liquids with quasiparticle statistics

    NASA Astrophysics Data System (ADS)

    Zhang, Yi; Melko, Roger G.; Kim, Eun-Ah

    2017-12-01

    After decades of progress and effort, obtaining a phase diagram for a strongly correlated topological system still remains a challenge. Although in principle one could turn to Wilson loops and long-range entanglement, evaluating these nonlocal observables at many points in phase space can be prohibitively costly. With growing excitement over topological quantum computation comes the need for an efficient approach for obtaining topological phase diagrams. Here we turn to machine learning using quantum loop topography (QLT), a notion we have recently introduced. Specifically, we propose a construction of QLT that is sensitive to quasiparticle statistics. We then use mutual statistics between the spinons and visons to detect a Z2 quantum spin liquid in a multiparameter phase space. We successfully obtain the quantum phase boundary between the topological and trivial phases using a simple feed-forward neural network. Furthermore, we demonstrate advantages of our approach for the evaluation of phase diagrams relating to speed and storage. Such statistics-based machine learning of topological phases opens new efficient routes to studying topological phase diagrams in strongly correlated systems.

  15. Statistical complex fatigue data for SAE 4340 steel and its use in design by reliability

    NASA Technical Reports Server (NTRS)

    Kececioglu, D.; Smith, J. L.

    1970-01-01

    A brief description of the complex fatigue machines used in the test program is presented. The data generated from these machines are given and discussed. Two methods of obtaining strength distributions from the data are also discussed. Then follows a discussion of the construction of statistical fatigue diagrams and their use in designing by reliability. Finally, some of the problems encountered in the test equipment and a corrective modification are presented.

  16. Virtual Manufacturing Techniques Designed and Applied to Manufacturing Activities in the Manufacturing Integration and Technology Branch

    NASA Technical Reports Server (NTRS)

    Shearrow, Charles A.

    1999-01-01

    One of the identified goals of EM3 is to implement virtual manufacturing by the time the year 2000 has ended. To realize this goal of a true virtual manufacturing enterprise the initial development of a machinability database and the infrastructure must be completed. This will consist of the containment of the existing EM-NET problems and developing machine, tooling, and common materials databases. To integrate the virtual manufacturing enterprise with normal day to day operations the development of a parallel virtual manufacturing machinability database, virtual manufacturing database, virtual manufacturing paradigm, implementation/integration procedure, and testable verification models must be constructed. Common and virtual machinability databases will include the four distinct areas of machine tools, available tooling, common machine tool loads, and a materials database. The machine tools database will include the machine envelope, special machine attachments, tooling capacity, location within NASA-JSC or with a contractor, and availability/scheduling. The tooling database will include available standard tooling, custom in-house tooling, tool properties, and availability. The common materials database will include materials thickness ranges, strengths, types, and their availability. The virtual manufacturing databases will consist of virtual machines and virtual tooling directly related to the common and machinability databases. The items to be completed are the design and construction of the machinability databases, virtual manufacturing paradigm for NASA-JSC, implementation timeline, VNC model of one bridge mill and troubleshoot existing software and hardware problems with EN4NET. The final step of this virtual manufacturing project will be to integrate other production sites into the databases bringing JSC's EM3 into a position of becoming a clearing house for NASA's digital manufacturing needs creating a true virtual manufacturing enterprise.

  17. Gradient boosting machine for modeling the energy consumption of commercial buildings

    DOE PAGES

    Touzani, Samir; Granderson, Jessica; Fernandes, Samuel

    2017-11-26

    Accurate savings estimations are important to promote energy efficiency projects and demonstrate their cost-effectiveness. The increasing presence of advanced metering infrastructure (AMI) in commercial buildings has resulted in a rising availability of high frequency interval data. These data can be used for a variety of energy efficiency applications such as demand response, fault detection and diagnosis, and heating, ventilation, and air conditioning (HVAC) optimization. This large amount of data has also opened the door to the use of advanced statistical learning models, which hold promise for providing accurate building baseline energy consumption predictions, and thus accurate saving estimations. The gradientmore » boosting machine is a powerful machine learning algorithm that is gaining considerable traction in a wide range of data driven applications, such as ecology, computer vision, and biology. In the present work an energy consumption baseline modeling method based on a gradient boosting machine was proposed. To assess the performance of this method, a recently published testing procedure was used on a large dataset of 410 commercial buildings. The model training periods were varied and several prediction accuracy metrics were used to evaluate the model's performance. The results show that using the gradient boosting machine model improved the R-squared prediction accuracy and the CV(RMSE) in more than 80 percent of the cases, when compared to an industry best practice model that is based on piecewise linear regression, and to a random forest algorithm.« less

  18. Gradient boosting machine for modeling the energy consumption of commercial buildings

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Touzani, Samir; Granderson, Jessica; Fernandes, Samuel

    Accurate savings estimations are important to promote energy efficiency projects and demonstrate their cost-effectiveness. The increasing presence of advanced metering infrastructure (AMI) in commercial buildings has resulted in a rising availability of high frequency interval data. These data can be used for a variety of energy efficiency applications such as demand response, fault detection and diagnosis, and heating, ventilation, and air conditioning (HVAC) optimization. This large amount of data has also opened the door to the use of advanced statistical learning models, which hold promise for providing accurate building baseline energy consumption predictions, and thus accurate saving estimations. The gradientmore » boosting machine is a powerful machine learning algorithm that is gaining considerable traction in a wide range of data driven applications, such as ecology, computer vision, and biology. In the present work an energy consumption baseline modeling method based on a gradient boosting machine was proposed. To assess the performance of this method, a recently published testing procedure was used on a large dataset of 410 commercial buildings. The model training periods were varied and several prediction accuracy metrics were used to evaluate the model's performance. The results show that using the gradient boosting machine model improved the R-squared prediction accuracy and the CV(RMSE) in more than 80 percent of the cases, when compared to an industry best practice model that is based on piecewise linear regression, and to a random forest algorithm.« less

  19. The effect of the use of a TNF-alpha inhibitor in hypothermic machine perfusion on kidney function after transplantation.

    PubMed

    Diuwe, Piotr; Domagala, Piotr; Durlik, Magdalena; Trzebicki, Janusz; Chmura, Andrzej; Kwiatkowski, Artur

    2017-08-01

    One of the most important problems in transplantation medicine is the ischemia/reperfusion injury of the organs to be transplanted. The aim of the present study was to assess the effect of tumor necrosis factor-alpha (TNF-alpha) inhibitor etanercept on the machine perfusion hypothermia of renal allograft kidney function and organ perfusion. No statistically significant differences were found in the impact of the applied intervention on kidney machine perfusion during which the average flow and vascular resistance were evaluated. There were no statistically significant differences in the occurrence of delayed graft function (DGF). Fewer events in patients who received a kidney from the etanercept treated Group A compared to the patients who received a kidney from the control Group B were observed when comparing the functional DGF and occurrence of acute rejection episodes, however, there was no statistically significant difference. In summary, no effect of treatment with etanercept an inhibitor of TNF-alpha in a hypothermic machine perfusion on renal allograft renal survival and its perfusion were detected in this study. However, treatment of the isolated organ may be important for the future of transplantation medicine. Copyright © 2017 Elsevier Inc. All rights reserved.

  20. Comprehensive machine learning analysis of Hydra behavior reveals a stable basal behavioral repertoire

    PubMed Central

    Taralova, Ekaterina; Dupre, Christophe; Yuste, Rafael

    2018-01-01

    Animal behavior has been studied for centuries, but few efficient methods are available to automatically identify and classify it. Quantitative behavioral studies have been hindered by the subjective and imprecise nature of human observation, and the slow speed of annotating behavioral data. Here, we developed an automatic behavior analysis pipeline for the cnidarian Hydra vulgaris using machine learning. We imaged freely behaving Hydra, extracted motion and shape features from the videos, and constructed a dictionary of visual features to classify pre-defined behaviors. We also identified unannotated behaviors with unsupervised methods. Using this analysis pipeline, we quantified 6 basic behaviors and found surprisingly similar behavior statistics across animals within the same species, regardless of experimental conditions. Our analysis indicates that the fundamental behavioral repertoire of Hydra is stable. This robustness could reflect a homeostatic neural control of "housekeeping" behaviors which could have been already present in the earliest nervous systems. PMID:29589829

  1. Statistical Capability Study of a Helical Grinding Machine Producing Screw Rotors

    NASA Astrophysics Data System (ADS)

    Holmes, C. S.; Headley, M.; Hart, P. W.

    2017-08-01

    Screw compressors depend for their efficiency and reliability on the accuracy of the rotors, and therefore on the machinery used in their production. The machinery has evolved over more than half a century in response to customer demands for production accuracy, efficiency, and flexibility, and is now at a high level on all three criteria. Production equipment and processes must be capable of maintaining accuracy over a production run, and this must be assessed statistically under strictly controlled conditions. This paper gives numerical data from such a study of an innovative machine tool and shows that it is possible to meet the demanding statistical capability requirements.

  2. Electricity Data Browser

    EIA Publications

    The Electricity Data Browser shows generation, consumption, fossil fuel receipts, stockpiles, retail sales, and electricity prices. The data appear on an interactive web page and are updated each month. The Electricity Data Browser includes all the datasets collected and published in EIA's Electric Power Monthly and allows users to perform dynamic charting of data sets as well as map the data by state. The data browser includes a series of reports that appear in the Electric Power Monthly and allows readers to drill down to plant level statistics, where available. All images and datasets are available for download. Users can also link to the data series in EIA's Application Programming Interface (API). An API makes our data machine-readable and more accessible to users.

  3. Using statistical and machine learning to help institutions detect suspicious access to electronic health records.

    PubMed

    Boxwala, Aziz A; Kim, Jihoon; Grillo, Janice M; Ohno-Machado, Lucila

    2011-01-01

    To determine whether statistical and machine-learning methods, when applied to electronic health record (EHR) access data, could help identify suspicious (ie, potentially inappropriate) access to EHRs. From EHR access logs and other organizational data collected over a 2-month period, the authors extracted 26 features likely to be useful in detecting suspicious accesses. Selected events were marked as either suspicious or appropriate by privacy officers, and served as the gold standard set for model evaluation. The authors trained logistic regression (LR) and support vector machine (SVM) models on 10-fold cross-validation sets of 1291 labeled events. The authors evaluated the sensitivity of final models on an external set of 58 events that were identified as truly inappropriate and investigated independently from this study using standard operating procedures. The area under the receiver operating characteristic curve of the models on the whole data set of 1291 events was 0.91 for LR, and 0.95 for SVM. The sensitivity of the baseline model on this set was 0.8. When the final models were evaluated on the set of 58 investigated events, all of which were determined as truly inappropriate, the sensitivity was 0 for the baseline method, 0.76 for LR, and 0.79 for SVM. The LR and SVM models may not generalize because of interinstitutional differences in organizational structures, applications, and workflows. Nevertheless, our approach for constructing the models using statistical and machine-learning techniques can be generalized. An important limitation is the relatively small sample used for the training set due to the effort required for its construction. The results suggest that statistical and machine-learning methods can play an important role in helping privacy officers detect suspicious accesses to EHRs.

  4. Using statistical and machine learning to help institutions detect suspicious access to electronic health records

    PubMed Central

    Kim, Jihoon; Grillo, Janice M; Ohno-Machado, Lucila

    2011-01-01

    Objective To determine whether statistical and machine-learning methods, when applied to electronic health record (EHR) access data, could help identify suspicious (ie, potentially inappropriate) access to EHRs. Methods From EHR access logs and other organizational data collected over a 2-month period, the authors extracted 26 features likely to be useful in detecting suspicious accesses. Selected events were marked as either suspicious or appropriate by privacy officers, and served as the gold standard set for model evaluation. The authors trained logistic regression (LR) and support vector machine (SVM) models on 10-fold cross-validation sets of 1291 labeled events. The authors evaluated the sensitivity of final models on an external set of 58 events that were identified as truly inappropriate and investigated independently from this study using standard operating procedures. Results The area under the receiver operating characteristic curve of the models on the whole data set of 1291 events was 0.91 for LR, and 0.95 for SVM. The sensitivity of the baseline model on this set was 0.8. When the final models were evaluated on the set of 58 investigated events, all of which were determined as truly inappropriate, the sensitivity was 0 for the baseline method, 0.76 for LR, and 0.79 for SVM. Limitations The LR and SVM models may not generalize because of interinstitutional differences in organizational structures, applications, and workflows. Nevertheless, our approach for constructing the models using statistical and machine-learning techniques can be generalized. An important limitation is the relatively small sample used for the training set due to the effort required for its construction. Conclusion The results suggest that statistical and machine-learning methods can play an important role in helping privacy officers detect suspicious accesses to EHRs. PMID:21672912

  5. HUMAN DECISIONS AND MACHINE PREDICTIONS.

    PubMed

    Kleinberg, Jon; Lakkaraju, Himabindu; Leskovec, Jure; Ludwig, Jens; Mullainathan, Sendhil

    2018-02-01

    Can machine learning improve human decision making? Bail decisions provide a good test case. Millions of times each year, judges make jail-or-release decisions that hinge on a prediction of what a defendant would do if released. The concreteness of the prediction task combined with the volume of data available makes this a promising machine-learning application. Yet comparing the algorithm to judges proves complicated. First, the available data are generated by prior judge decisions. We only observe crime outcomes for released defendants, not for those judges detained. This makes it hard to evaluate counterfactual decision rules based on algorithmic predictions. Second, judges may have a broader set of preferences than the variable the algorithm predicts; for instance, judges may care specifically about violent crimes or about racial inequities. We deal with these problems using different econometric strategies, such as quasi-random assignment of cases to judges. Even accounting for these concerns, our results suggest potentially large welfare gains: one policy simulation shows crime reductions up to 24.7% with no change in jailing rates, or jailing rate reductions up to 41.9% with no increase in crime rates. Moreover, all categories of crime, including violent crimes, show reductions; and these gains can be achieved while simultaneously reducing racial disparities. These results suggest that while machine learning can be valuable, realizing this value requires integrating these tools into an economic framework: being clear about the link between predictions and decisions; specifying the scope of payoff functions; and constructing unbiased decision counterfactuals. JEL Codes: C10 (Econometric and statistical methods and methodology), C55 (Large datasets: Modeling and analysis), K40 (Legal procedure, the legal system, and illegal behavior).

  6. HUMAN DECISIONS AND MACHINE PREDICTIONS*

    PubMed Central

    Kleinberg, Jon; Lakkaraju, Himabindu; Leskovec, Jure; Ludwig, Jens; Mullainathan, Sendhil

    2018-01-01

    Can machine learning improve human decision making? Bail decisions provide a good test case. Millions of times each year, judges make jail-or-release decisions that hinge on a prediction of what a defendant would do if released. The concreteness of the prediction task combined with the volume of data available makes this a promising machine-learning application. Yet comparing the algorithm to judges proves complicated. First, the available data are generated by prior judge decisions. We only observe crime outcomes for released defendants, not for those judges detained. This makes it hard to evaluate counterfactual decision rules based on algorithmic predictions. Second, judges may have a broader set of preferences than the variable the algorithm predicts; for instance, judges may care specifically about violent crimes or about racial inequities. We deal with these problems using different econometric strategies, such as quasi-random assignment of cases to judges. Even accounting for these concerns, our results suggest potentially large welfare gains: one policy simulation shows crime reductions up to 24.7% with no change in jailing rates, or jailing rate reductions up to 41.9% with no increase in crime rates. Moreover, all categories of crime, including violent crimes, show reductions; and these gains can be achieved while simultaneously reducing racial disparities. These results suggest that while machine learning can be valuable, realizing this value requires integrating these tools into an economic framework: being clear about the link between predictions and decisions; specifying the scope of payoff functions; and constructing unbiased decision counterfactuals. JEL Codes: C10 (Econometric and statistical methods and methodology), C55 (Large datasets: Modeling and analysis), K40 (Legal procedure, the legal system, and illegal behavior) PMID:29755141

  7. The LET Procedure for Prosthetic Myocontrol: Towards Multi-DOF Control Using Single-DOF Activations.

    PubMed

    Nowak, Markus; Castellini, Claudio

    2016-01-01

    Simultaneous and proportional myocontrol of dexterous hand prostheses is to a large extent still an open problem. With the advent of commercially and clinically available multi-fingered hand prostheses there are now more independent degrees of freedom (DOFs) in prostheses than can be effectively controlled using surface electromyography (sEMG), the current standard human-machine interface for hand amputees. In particular, it is uncertain, whether several DOFs can be controlled simultaneously and proportionally by exclusively calibrating the intended activation of single DOFs. The problem is currently solved by training on all required combinations. However, as the number of available DOFs grows, this approach becomes overly long and poses a high cognitive burden on the subject. In this paper we present a novel approach to overcome this problem. Multi-DOF activations are artificially modelled from single-DOF ones using a simple linear combination of sEMG signals, which are then added to the training set. This procedure, which we named LET (Linearly Enhanced Training), provides an augmented data set to any machine-learning-based intent detection system. In two experiments involving intact subjects, one offline and one online, we trained a standard machine learning approach using the full data set containing single- and multi-DOF activations as well as using the LET-augmented data set in order to evaluate the performance of the LET procedure. The results indicate that the machine trained on the latter data set obtains worse results in the offline experiment compared to the full data set. However, the online implementation enables the user to perform multi-DOF tasks with almost the same precision as single-DOF tasks without the need of explicitly training multi-DOF activations. Moreover, the parameters involved in the system are statistically uniform across subjects.

  8. Machine learning to predict the occurrence of bisphosphonate-related osteonecrosis of the jaw associated with dental extraction: A preliminary report.

    PubMed

    Kim, Dong Wook; Kim, Hwiyoung; Nam, Woong; Kim, Hyung Jun; Cha, In-Ho

    2018-04-23

    The aim of this study was to build and validate five types of machine learning models that can predict the occurrence of BRONJ associated with dental extraction in patients taking bisphosphonates for the management of osteoporosis. A retrospective review of the medical records was conducted to obtain cases and controls for the study. Total 125 patients consisting of 41 cases and 84 controls were selected for the study. Five machine learning prediction algorithms including multivariable logistic regression model, decision tree, support vector machine, artificial neural network, and random forest were implemented. The outputs of these models were compared with each other and also with conventional methods, such as serum CTX level. Area under the receiver operating characteristic (ROC) curve (AUC) was used to compare the results. The performance of machine learning models was significantly superior to conventional statistical methods and single predictors. The random forest model yielded the best performance (AUC = 0.973), followed by artificial neural network (AUC = 0.915), support vector machine (AUC = 0.882), logistic regression (AUC = 0.844), decision tree (AUC = 0.821), drug holiday alone (AUC = 0.810), and CTX level alone (AUC = 0.630). Machine learning methods showed superior performance in predicting BRONJ associated with dental extraction compared to conventional statistical methods using drug holiday and serum CTX level. Machine learning can thus be applied in a wide range of clinical studies. Copyright © 2017. Published by Elsevier Inc.

  9. Risk estimation using probability machines.

    PubMed

    Dasgupta, Abhijit; Szymczak, Silke; Moore, Jason H; Bailey-Wilson, Joan E; Malley, James D

    2014-03-01

    Logistic regression has been the de facto, and often the only, model used in the description and analysis of relationships between a binary outcome and observed features. It is widely used to obtain the conditional probabilities of the outcome given predictors, as well as predictor effect size estimates using conditional odds ratios. We show how statistical learning machines for binary outcomes, provably consistent for the nonparametric regression problem, can be used to provide both consistent conditional probability estimation and conditional effect size estimates. Effect size estimates from learning machines leverage our understanding of counterfactual arguments central to the interpretation of such estimates. We show that, if the data generating model is logistic, we can recover accurate probability predictions and effect size estimates with nearly the same efficiency as a correct logistic model, both for main effects and interactions. We also propose a method using learning machines to scan for possible interaction effects quickly and efficiently. Simulations using random forest probability machines are presented. The models we propose make no assumptions about the data structure, and capture the patterns in the data by just specifying the predictors involved and not any particular model structure. So they do not run the same risks of model mis-specification and the resultant estimation biases as a logistic model. This methodology, which we call a "risk machine", will share properties from the statistical machine that it is derived from.

  10. VAT: a computational framework to functionally annotate variants in personal genomes within a cloud-computing environment

    PubMed Central

    Habegger, Lukas; Balasubramanian, Suganthi; Chen, David Z.; Khurana, Ekta; Sboner, Andrea; Harmanci, Arif; Rozowsky, Joel; Clarke, Declan; Snyder, Michael; Gerstein, Mark

    2012-01-01

    Summary: The functional annotation of variants obtained through sequencing projects is generally assumed to be a simple intersection of genomic coordinates with genomic features. However, complexities arise for several reasons, including the differential effects of a variant on alternatively spliced transcripts, as well as the difficulty in assessing the impact of small insertions/deletions and large structural variants. Taking these factors into consideration, we developed the Variant Annotation Tool (VAT) to functionally annotate variants from multiple personal genomes at the transcript level as well as obtain summary statistics across genes and individuals. VAT also allows visualization of the effects of different variants, integrates allele frequencies and genotype data from the underlying individuals and facilitates comparative analysis between different groups of individuals. VAT can either be run through a command-line interface or as a web application. Finally, in order to enable on-demand access and to minimize unnecessary transfers of large data files, VAT can be run as a virtual machine in a cloud-computing environment. Availability and Implementation: VAT is implemented in C and PHP. The VAT web service, Amazon Machine Image, source code and detailed documentation are available at vat.gersteinlab.org. Contact: lukas.habegger@yale.edu or mark.gerstein@yale.edu Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:22743228

  11. Seeing It All: Evaluating Supervised Machine Learning Methods for the Classification of Diverse Otariid Behaviours

    PubMed Central

    Slip, David J.; Hocking, David P.; Harcourt, Robert G.

    2016-01-01

    Constructing activity budgets for marine animals when they are at sea and cannot be directly observed is challenging, but recent advances in bio-logging technology offer solutions to this problem. Accelerometers can potentially identify a wide range of behaviours for animals based on unique patterns of acceleration. However, when analysing data derived from accelerometers, there are many statistical techniques available which when applied to different data sets produce different classification accuracies. We investigated a selection of supervised machine learning methods for interpreting behavioural data from captive otariids (fur seals and sea lions). We conducted controlled experiments with 12 seals, where their behaviours were filmed while they were wearing 3-axis accelerometers. From video we identified 26 behaviours that could be grouped into one of four categories (foraging, resting, travelling and grooming) representing key behaviour states for wild seals. We used data from 10 seals to train four predictive classification models: stochastic gradient boosting (GBM), random forests, support vector machine using four different kernels and a baseline model: penalised logistic regression. We then took the best parameters from each model and cross-validated the results on the two seals unseen so far. We also investigated the influence of feature statistics (describing some characteristic of the seal), testing the models both with and without these. Cross-validation accuracies were lower than training accuracy, but the SVM with a polynomial kernel was still able to classify seal behaviour with high accuracy (>70%). Adding feature statistics improved accuracies across all models tested. Most categories of behaviour -resting, grooming and feeding—were all predicted with reasonable accuracy (52–81%) by the SVM while travelling was poorly categorised (31–41%). These results show that model selection is important when classifying behaviour and that by using animal characteristics we can strengthen the overall accuracy. PMID:28002450

  12. Primary Sclerosing Cholangitis Risk Estimate Tool (PREsTo) Predicts Outcomes in PSC: A Derivation & Validation Study Using Machine Learning.

    PubMed

    Eaton, John E; Vesterhus, Mette; McCauley, Bryan M; Atkinson, Elizabeth J; Schlicht, Erik M; Juran, Brian D; Gossard, Andrea A; LaRusso, Nicholas F; Gores, Gregory J; Karlsen, Tom H; Lazaridis, Konstantinos N

    2018-05-09

    Improved methods are needed to risk stratify and predict outcomes in patients with primary sclerosing cholangitis (PSC). Therefore, we sought to derive and validate a new prediction model and compare its performance to existing surrogate markers. The model was derived using 509 subjects from a multicenter North American cohort and validated in an international multicenter cohort (n=278). Gradient boosting, a machine based learning technique, was used to create the model. The endpoint was hepatic decompensation (ascites, variceal hemorrhage or encephalopathy). Subjects with advanced PSC or cholangiocarcinoma at baseline were excluded. The PSC risk estimate tool (PREsTo) consists of 9 variables: bilirubin, albumin, serum alkaline phosphatase (SAP) times the upper limit of normal (ULN), platelets, AST, hemoglobin, sodium, patient age and the number of years since PSC was diagnosed. Validation in an independent cohort confirms PREsTo accurately predicts decompensation (C statistic 0.90, 95% confidence interval (CI) 0.84-0.95) and performed well compared to MELD score (C statistic 0.72, 95% CI 0.57-0.84), Mayo PSC risk score (C statistic 0.85, 95% CI 0.77-0.92) and SAP < 1.5x ULN (C statistic 0.65, 95% CI 0.55-0.73). PREsTo continued to be accurate among individuals with a bilirubin < 2.0 mg/dL (C statistic 0.90, 95% CI 0.82-0.96) and when the score was re-applied at a later course in the disease (C statistic 0.82, 95% CI 0.64-0.95). PREsTo accurately predicts hepatic decompensation in PSC and exceeds the performance among other widely available, noninvasive prognostic scoring systems. This article is protected by copyright. All rights reserved. © 2018 by the American Association for the Study of Liver Diseases.

  13. Implementing Machine Learning in Radiology Practice and Research.

    PubMed

    Kohli, Marc; Prevedello, Luciano M; Filice, Ross W; Geis, J Raymond

    2017-04-01

    The purposes of this article are to describe concepts that radiologists should understand to evaluate machine learning projects, including common algorithms, supervised as opposed to unsupervised techniques, statistical pitfalls, and data considerations for training and evaluation, and to briefly describe ethical dilemmas and legal risk. Machine learning includes a broad class of computer programs that improve with experience. The complexity of creating, training, and monitoring machine learning indicates that the success of the algorithms will require radiologist involvement for years to come, leading to engagement rather than replacement.

  14. Pricing and Availability Intervention in Vending Machines at Four Bus Garages

    PubMed Central

    Hannan, Peter J; Harnack, Lisa J; Mitchell, Nathan R; Toomey, Traci L; Gerlach, Anne

    2009-01-01

    Objective To evaluate the effects of lowering prices and increasing availability on sales of healthy foods and beverages from 33 vending machines in four bus garages as part of a multi-component worksite obesity prevention intervention. Methods Availability of healthy items was increased to 50% and prices were lowered at least 10% in the vending machines in two metropolitan bus garages for an 18-month period. Two control garages offered vending choices at usual availability and prices. Sales data were collected monthly from each of the vending machines at the four garages. Results Increases in availability to 50% and price reductions of an average of 31% resulted in 10-42% higher sales of the healthy items. Employees were most price-responsive for snack purchases. Conclusions Greater availability and lower prices on targeted food and beverage items from vending machines was associated with greater purchases of these items over an eighteen-month period. Efforts to promote healthful food purchases in worksite settings should incorporate these two strategies. PMID:20061884

  15. Pricing and availability intervention in vending machines at four bus garages.

    PubMed

    French, Simone A; Hannan, Peter J; Harnack, Lisa J; Mitchell, Nathan R; Toomey, Traci L; Gerlach, Anne

    2010-01-01

    To evaluate the effects of lowering prices and increasing availability on sales of healthy foods and beverages from 33 vending machines in 4 bus garages as part of a multicomponent worksite obesity prevention intervention. Availability of healthy items was increased to 50% and prices were lowered at least 10% in the vending machines in two metropolitan bus garages for an 18-month period. Two control garages offered vending choices at usual availability and prices. Sales data were collected monthly from each of the vending machines at the four garages. Increases in availability to 50% and price reductions of an average of 31% resulted in 10% to 42% higher sales of the healthy items. Employees were mostly price responsive for snack purchases. Greater availability and lower prices on targeted food and beverage items from vending machines was associated with greater purchases of these items over an 18-month period. Efforts to promote healthful food purchases in worksite settings should incorporate these two strategies.

  16. Machine Learning in Medicine.

    PubMed

    Deo, Rahul C

    2015-11-17

    Spurred by advances in processing power, memory, storage, and an unprecedented wealth of data, computers are being asked to tackle increasingly complex learning tasks, often with astonishing success. Computers have now mastered a popular variant of poker, learned the laws of physics from experimental data, and become experts in video games - tasks that would have been deemed impossible not too long ago. In parallel, the number of companies centered on applying complex data analysis to varying industries has exploded, and it is thus unsurprising that some analytic companies are turning attention to problems in health care. The purpose of this review is to explore what problems in medicine might benefit from such learning approaches and use examples from the literature to introduce basic concepts in machine learning. It is important to note that seemingly large enough medical data sets and adequate learning algorithms have been available for many decades, and yet, although there are thousands of papers applying machine learning algorithms to medical data, very few have contributed meaningfully to clinical care. This lack of impact stands in stark contrast to the enormous relevance of machine learning to many other industries. Thus, part of my effort will be to identify what obstacles there may be to changing the practice of medicine through statistical learning approaches, and discuss how these might be overcome. © 2015 American Heart Association, Inc.

  17. Statistical downscaling of GCM simulations to streamflow using relevance vector machine

    NASA Astrophysics Data System (ADS)

    Ghosh, Subimal; Mujumdar, P. P.

    2008-01-01

    General circulation models (GCMs), the climate models often used in assessing the impact of climate change, operate on a coarse scale and thus the simulation results obtained from GCMs are not particularly useful in a comparatively smaller river basin scale hydrology. The article presents a methodology of statistical downscaling based on sparse Bayesian learning and Relevance Vector Machine (RVM) to model streamflow at river basin scale for monsoon period (June, July, August, September) using GCM simulated climatic variables. NCEP/NCAR reanalysis data have been used for training the model to establish a statistical relationship between streamflow and climatic variables. The relationship thus obtained is used to project the future streamflow from GCM simulations. The statistical methodology involves principal component analysis, fuzzy clustering and RVM. Different kernel functions are used for comparison purpose. The model is applied to Mahanadi river basin in India. The results obtained using RVM are compared with those of state-of-the-art Support Vector Machine (SVM) to present the advantages of RVMs over SVMs. A decreasing trend is observed for monsoon streamflow of Mahanadi due to high surface warming in future, with the CCSR/NIES GCM and B2 scenario.

  18. Methods, systems and apparatus for controlling operation of two alternating current (AC) machines

    DOEpatents

    Gallegos-Lopez, Gabriel [Torrance, CA; Nagashima, James M [Cerritos, CA; Perisic, Milun [Torrance, CA; Hiti, Silva [Redondo Beach, CA

    2012-06-05

    A system is provided for controlling two alternating current (AC) machines via a five-phase PWM inverter module. The system comprises a first control loop, a second control loop, and a current command adjustment module. The current command adjustment module operates in conjunction with the first control loop and the second control loop to continuously adjust current command signals that control the first AC machine and the second AC machine such that they share the input voltage available to them without compromising the target mechanical output power of either machine. This way, even when the phase voltage available to either one of the machines decreases, that machine outputs its target mechanical output power.

  19. Solution of a tridiagonal system of equations on the finite element machine

    NASA Technical Reports Server (NTRS)

    Bostic, S. W.

    1984-01-01

    Two parallel algorithms for the solution of tridiagonal systems of equations were implemented on the Finite Element Machine. The Accelerated Parallel Gauss method, an iterative method, and the Buneman algorithm, a direct method, are discussed and execution statistics are presented.

  20. Comparing statistical and machine learning classifiers: alternatives for predictive modeling in human factors research.

    PubMed

    Carnahan, Brian; Meyer, Gérard; Kuntz, Lois-Ann

    2003-01-01

    Multivariate classification models play an increasingly important role in human factors research. In the past, these models have been based primarily on discriminant analysis and logistic regression. Models developed from machine learning research offer the human factors professional a viable alternative to these traditional statistical classification methods. To illustrate this point, two machine learning approaches--genetic programming and decision tree induction--were used to construct classification models designed to predict whether or not a student truck driver would pass his or her commercial driver license (CDL) examination. The models were developed and validated using the curriculum scores and CDL exam performances of 37 student truck drivers who had completed a 320-hr driver training course. Results indicated that the machine learning classification models were superior to discriminant analysis and logistic regression in terms of predictive accuracy. Actual or potential applications of this research include the creation of models that more accurately predict human performance outcomes.

  1. Detection of Cutting Tool Wear using Statistical Analysis and Regression Model

    NASA Astrophysics Data System (ADS)

    Ghani, Jaharah A.; Rizal, Muhammad; Nuawi, Mohd Zaki; Haron, Che Hassan Che; Ramli, Rizauddin

    2010-10-01

    This study presents a new method for detecting the cutting tool wear based on the measured cutting force signals. A statistical-based method called Integrated Kurtosis-based Algorithm for Z-Filter technique, called I-kaz was used for developing a regression model and 3D graphic presentation of I-kaz 3D coefficient during machining process. The machining tests were carried out using a CNC turning machine Colchester Master Tornado T4 in dry cutting condition. A Kistler 9255B dynamometer was used to measure the cutting force signals, which were transmitted, analyzed, and displayed in the DasyLab software. Various force signals from machining operation were analyzed, and each has its own I-kaz 3D coefficient. This coefficient was examined and its relationship with flank wear lands (VB) was determined. A regression model was developed due to this relationship, and results of the regression model shows that the I-kaz 3D coefficient value decreases as tool wear increases. The result then is used for real time tool wear monitoring.

  2. Statistical mechanics of unsupervised feature learning in a restricted Boltzmann machine with binary synapses

    NASA Astrophysics Data System (ADS)

    Huang, Haiping

    2017-05-01

    Revealing hidden features in unlabeled data is called unsupervised feature learning, which plays an important role in pretraining a deep neural network. Here we provide a statistical mechanics analysis of the unsupervised learning in a restricted Boltzmann machine with binary synapses. A message passing equation to infer the hidden feature is derived, and furthermore, variants of this equation are analyzed. A statistical analysis by replica theory describes the thermodynamic properties of the model. Our analysis confirms an entropy crisis preceding the non-convergence of the message passing equation, suggesting a discontinuous phase transition as a key characteristic of the restricted Boltzmann machine. Continuous phase transition is also confirmed depending on the embedded feature strength in the data. The mean-field result under the replica symmetric assumption agrees with that obtained by running message passing algorithms on single instances of finite sizes. Interestingly, in an approximate Hopfield model, the entropy crisis is absent, and a continuous phase transition is observed instead. We also develop an iterative equation to infer the hyper-parameter (temperature) hidden in the data, which in physics corresponds to iteratively imposing Nishimori condition. Our study provides insights towards understanding the thermodynamic properties of the restricted Boltzmann machine learning, and moreover important theoretical basis to build simplified deep networks.

  3. Machine learning for neuroimaging with scikit-learn.

    PubMed

    Abraham, Alexandre; Pedregosa, Fabian; Eickenberg, Michael; Gervais, Philippe; Mueller, Andreas; Kossaifi, Jean; Gramfort, Alexandre; Thirion, Bertrand; Varoquaux, Gaël

    2014-01-01

    Statistical machine learning methods are increasingly used for neuroimaging data analysis. Their main virtue is their ability to model high-dimensional datasets, e.g., multivariate analysis of activation images or resting-state time series. Supervised learning is typically used in decoding or encoding settings to relate brain images to behavioral or clinical observations, while unsupervised learning can uncover hidden structures in sets of images (e.g., resting state functional MRI) or find sub-populations in large cohorts. By considering different functional neuroimaging applications, we illustrate how scikit-learn, a Python machine learning library, can be used to perform some key analysis steps. Scikit-learn contains a very large set of statistical learning algorithms, both supervised and unsupervised, and its application to neuroimaging data provides a versatile tool to study the brain.

  4. Machine learning for neuroimaging with scikit-learn

    PubMed Central

    Abraham, Alexandre; Pedregosa, Fabian; Eickenberg, Michael; Gervais, Philippe; Mueller, Andreas; Kossaifi, Jean; Gramfort, Alexandre; Thirion, Bertrand; Varoquaux, Gaël

    2014-01-01

    Statistical machine learning methods are increasingly used for neuroimaging data analysis. Their main virtue is their ability to model high-dimensional datasets, e.g., multivariate analysis of activation images or resting-state time series. Supervised learning is typically used in decoding or encoding settings to relate brain images to behavioral or clinical observations, while unsupervised learning can uncover hidden structures in sets of images (e.g., resting state functional MRI) or find sub-populations in large cohorts. By considering different functional neuroimaging applications, we illustrate how scikit-learn, a Python machine learning library, can be used to perform some key analysis steps. Scikit-learn contains a very large set of statistical learning algorithms, both supervised and unsupervised, and its application to neuroimaging data provides a versatile tool to study the brain. PMID:24600388

  5. Phenotyping: Using Machine Learning for Improved Pairwise Genotype Classification Based on Root Traits

    PubMed Central

    Zhao, Jiangsan; Bodner, Gernot; Rewald, Boris

    2016-01-01

    Phenotyping local crop cultivars is becoming more and more important, as they are an important genetic source for breeding – especially in regard to inherent root system architectures. Machine learning algorithms are promising tools to assist in the analysis of complex data sets; novel approaches are need to apply them on root phenotyping data of mature plants. A greenhouse experiment was conducted in large, sand-filled columns to differentiate 16 European Pisum sativum cultivars based on 36 manually derived root traits. Through combining random forest and support vector machine models, machine learning algorithms were successfully used for unbiased identification of most distinguishing root traits and subsequent pairwise cultivar differentiation. Up to 86% of pea cultivar pairs could be distinguished based on top five important root traits (Timp5) – Timp5 differed widely between cultivar pairs. Selecting top important root traits (Timp) provided a significant improved classification compared to using all available traits or randomly selected trait sets. The most frequent Timp of mature pea cultivars was total surface area of lateral roots originating from tap root segments at 0–5 cm depth. The high classification rate implies that culturing did not lead to a major loss of variability in root system architecture in the studied pea cultivars. Our results illustrate the potential of machine learning approaches for unbiased (root) trait selection and cultivar classification based on rather small, complex phenotypic data sets derived from pot experiments. Powerful statistical approaches are essential to make use of the increasing amount of (root) phenotyping information, integrating the complex trait sets describing crop cultivars. PMID:27999587

  6. Resistance gene identification from Larimichthys crocea with machine learning techniques

    NASA Astrophysics Data System (ADS)

    Cai, Yinyin; Liao, Zhijun; Ju, Ying; Liu, Juan; Mao, Yong; Liu, Xiangrong

    2016-12-01

    The research on resistance genes (R-gene) plays a vital role in bioinformatics as it has the capability of coping with adverse changes in the external environment, which can form the corresponding resistance protein by transcription and translation. It is meaningful to identify and predict R-gene of Larimichthys crocea (L.Crocea). It is friendly for breeding and the marine environment as well. Large amounts of L.Crocea’s immune mechanisms have been explored by biological methods. However, much about them is still unclear. In order to break the limited understanding of the L.Crocea’s immune mechanisms and to detect new R-gene and R-gene-like genes, this paper came up with a more useful combination prediction method, which is to extract and classify the feature of available genomic data by machine learning. The effectiveness of feature extraction and classification methods to identify potential novel R-gene was evaluated, and different statistical analyzes were utilized to explore the reliability of prediction method, which can help us further understand the immune mechanisms of L.Crocea against pathogens. In this paper, a webserver called LCRG-Pred is available at http://server.malab.cn/rg_lc/.

  7. Recent advances in environmental data mining

    NASA Astrophysics Data System (ADS)

    Leuenberger, Michael; Kanevski, Mikhail

    2016-04-01

    Due to the large amount and complexity of data available nowadays in geo- and environmental sciences, we face the need to develop and incorporate more robust and efficient methods for their analysis, modelling and visualization. An important part of these developments deals with an elaboration and application of a contemporary and coherent methodology following the process from data collection to the justification and communication of the results. Recent fundamental progress in machine learning (ML) can considerably contribute to the development of the emerging field - environmental data science. The present research highlights and investigates the different issues that can occur when dealing with environmental data mining using cutting-edge machine learning algorithms. In particular, the main attention is paid to the description of the self-consistent methodology and two efficient algorithms - Random Forest (RF, Breiman, 2001) and Extreme Learning Machines (ELM, Huang et al., 2006), which recently gained a great popularity. Despite the fact that they are based on two different concepts, i.e. decision trees vs artificial neural networks, they both propose promising results for complex, high dimensional and non-linear data modelling. In addition, the study discusses several important issues of data driven modelling, including feature selection and uncertainties. The approach considered is accompanied by simulated and real data case studies from renewable resources assessment and natural hazards tasks. In conclusion, the current challenges and future developments in statistical environmental data learning are discussed. References - Breiman, L., 2001. Random Forests. Machine Learning 45 (1), 5-32. - Huang, G.-B., Zhu, Q.-Y., Siew, C.-K., 2006. Extreme learning machine: theory and applications. Neurocomputing 70 (1-3), 489-501. - Kanevski, M., Pozdnoukhov, A., Timonin, V., 2009. Machine Learning for Spatial Environmental Data. EPFL Press; Lausanne, Switzerland, p.392. - Leuenberger, M., Kanevski, M., 2015. Extreme Learning Machines for spatial environmental data. Computers and Geosciences 85, 64-73.

  8. 48 CFR 6104.402 - Filing claims [Rule 402].

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... number, facsimile machine number, and e-mail address, if available, of the claimant; (ii) The name, address, telephone number, facsimile machine number, and e-mail address, if available, of the agency...'s telephone number is: (202) 606-8800. The Clerk's facsimile machine number is: (202) 606-0019. The...

  9. 48 CFR 6104.402 - Filing claims [Rule 402].

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... number, facsimile machine number, and e-mail address, if available, of the claimant; (ii) The name, address, telephone number, facsimile machine number, and e-mail address, if available, of the agency...'s telephone number is: (202) 606-8800. The Clerk's facsimile machine number is: (202) 606-0019. The...

  10. 48 CFR 6104.402 - Filing claims [Rule 402].

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... number, facsimile machine number, and e-mail address, if available, of the claimant; (ii) The name, address, telephone number, facsimile machine number, and e-mail address, if available, of the agency...'s telephone number is: (202) 606-8800. The Clerk's facsimile machine number is: (202) 606-0019. The...

  11. Trends of Occupational Fatalities Involving Machines, United States, 1992–2010

    PubMed Central

    Marsh, Suzanne M.; Fosbroke, David E.

    2016-01-01

    Background This paper describes trends of occupational machine-related fatalities from 1992–2010. We examine temporal patterns by worker demographics, machine types (e.g., stationary, mobile), and industries. Methods We analyzed fatalities from Census of Fatal Occupational Injuries data provided by the Bureau of Labor Statistics to the National Institute for Occupational Safety and Health. We used injury source to identify machine-related incidents and Poisson regression to assess trends over the 19-year period. Results There was an average annual decrease of 2.8% in overall machine-related fatality rates from 1992 through 2010. Mobile machine-related fatality rates decreased an average of 2.6% annually and stationary machine-related rates decreased an average of 3.5% annually. Groups that continued to be at high risk included older workers; self-employed; and workers in agriculture/forestry/fishing, construction, and mining. Conclusion Addressing dangers posed by tractors, excavators, and other mobile machines needs to continue. High-risk worker groups should receive targeted information on machine safety. PMID:26358658

  12. STATISTICAL EVALUATION OF CONFOCAL MICROSCOPY IMAGES

    EPA Science Inventory

    Abstract

    In this study the CV is defined as the Mean/SD of the population of beads or pixels. Flow cytometry uses the CV of beads to determine if the machine is aligned correctly and performing properly. This CV concept to determine machine performance has been adapted to...

  13. A MOOC on Approaches to Machine Translation

    ERIC Educational Resources Information Center

    Costa-jussà, Mart R.; Formiga, Lluís; Torrillas, Oriol; Petit, Jordi; Fonollosa, José A. R.

    2015-01-01

    This paper describes the design, development, and analysis of a MOOC entitled "Approaches to Machine Translation: Rule-based, statistical and hybrid", and provides lessons learned and conclusions to be taken into account in the future. The course was developed within the Canvas platform, used by recognized European universities. It…

  14. Signal detection using support vector machines in the presence of ultrasonic speckle

    NASA Astrophysics Data System (ADS)

    Kotropoulos, Constantine L.; Pitas, Ioannis

    2002-04-01

    Support Vector Machines are a general algorithm based on guaranteed risk bounds of statistical learning theory. They have found numerous applications, such as in classification of brain PET images, optical character recognition, object detection, face verification, text categorization and so on. In this paper we propose the use of support vector machines to segment lesions in ultrasound images and we assess thoroughly their lesion detection ability. We demonstrate that trained support vector machines with a Radial Basis Function kernel segment satisfactorily (unseen) ultrasound B-mode images as well as clinical ultrasonic images.

  15. Prediction of In-hospital Mortality in Emergency Department Patients With Sepsis: A Local Big Data-Driven, Machine Learning Approach.

    PubMed

    Taylor, R Andrew; Pare, Joseph R; Venkatesh, Arjun K; Mowafi, Hani; Melnick, Edward R; Fleischman, William; Hall, M Kennedy

    2016-03-01

    Predictive analytics in emergency care has mostly been limited to the use of clinical decision rules (CDRs) in the form of simple heuristics and scoring systems. In the development of CDRs, limitations in analytic methods and concerns with usability have generally constrained models to a preselected small set of variables judged to be clinically relevant and to rules that are easily calculated. Furthermore, CDRs frequently suffer from questions of generalizability, take years to develop, and lack the ability to be updated as new information becomes available. Newer analytic and machine learning techniques capable of harnessing the large number of variables that are already available through electronic health records (EHRs) may better predict patient outcomes and facilitate automation and deployment within clinical decision support systems. In this proof-of-concept study, a local, big data-driven, machine learning approach is compared to existing CDRs and traditional analytic methods using the prediction of sepsis in-hospital mortality as the use case. This was a retrospective study of adult ED visits admitted to the hospital meeting criteria for sepsis from October 2013 to October 2014. Sepsis was defined as meeting criteria for systemic inflammatory response syndrome with an infectious admitting diagnosis in the ED. ED visits were randomly partitioned into an 80%/20% split for training and validation. A random forest model (machine learning approach) was constructed using over 500 clinical variables from data available within the EHRs of four hospitals to predict in-hospital mortality. The machine learning prediction model was then compared to a classification and regression tree (CART) model, logistic regression model, and previously developed prediction tools on the validation data set using area under the receiver operating characteristic curve (AUC) and chi-square statistics. There were 5,278 visits among 4,676 unique patients who met criteria for sepsis. Of the 4,222 patients in the training group, 210 (5.0%) died during hospitalization, and of the 1,056 patients in the validation group, 50 (4.7%) died during hospitalization. The AUCs with 95% confidence intervals (CIs) for the different models were as follows: random forest model, 0.86 (95% CI = 0.82 to 0.90); CART model, 0.69 (95% CI = 0.62 to 0.77); logistic regression model, 0.76 (95% CI = 0.69 to 0.82); CURB-65, 0.73 (95% CI = 0.67 to 0.80); MEDS, 0.71 (95% CI = 0.63 to 0.77); and mREMS, 0.72 (95% CI = 0.65 to 0.79). The random forest model AUC was statistically different from all other models (p ≤ 0.003 for all comparisons). In this proof-of-concept study, a local big data-driven, machine learning approach outperformed existing CDRs as well as traditional analytic techniques for predicting in-hospital mortality of ED patients with sepsis. Future research should prospectively evaluate the effectiveness of this approach and whether it translates into improved clinical outcomes for high-risk sepsis patients. The methods developed serve as an example of a new model for predictive analytics in emergency care that can be automated, applied to other clinical outcomes of interest, and deployed in EHRs to enable locally relevant clinical predictions. © 2015 by the Society for Academic Emergency Medicine.

  16. Prediction of In-hospital Mortality in Emergency Department Patients With Sepsis: A Local Big Data–Driven, Machine Learning Approach

    PubMed Central

    Taylor, R. Andrew; Pare, Joseph R.; Venkatesh, Arjun K.; Mowafi, Hani; Melnick, Edward R.; Fleischman, William; Hall, M. Kennedy

    2018-01-01

    Objectives Predictive analytics in emergency care has mostly been limited to the use of clinical decision rules (CDRs) in the form of simple heuristics and scoring systems. In the development of CDRs, limitations in analytic methods and concerns with usability have generally constrained models to a preselected small set of variables judged to be clinically relevant and to rules that are easily calculated. Furthermore, CDRs frequently suffer from questions of generalizability, take years to develop, and lack the ability to be updated as new information becomes available. Newer analytic and machine learning techniques capable of harnessing the large number of variables that are already available through electronic health records (EHRs) may better predict patient outcomes and facilitate automation and deployment within clinical decision support systems. In this proof-of-concept study, a local, big data–driven, machine learning approach is compared to existing CDRs and traditional analytic methods using the prediction of sepsis in-hospital mortality as the use case. Methods This was a retrospective study of adult ED visits admitted to the hospital meeting criteria for sepsis from October 2013 to October 2014. Sepsis was defined as meeting criteria for systemic inflammatory response syndrome with an infectious admitting diagnosis in the ED. ED visits were randomly partitioned into an 80%/20% split for training and validation. A random forest model (machine learning approach) was constructed using over 500 clinical variables from data available within the EHRs of four hospitals to predict in-hospital mortality. The machine learning prediction model was then compared to a classification and regression tree (CART) model, logistic regression model, and previously developed prediction tools on the validation data set using area under the receiver operating characteristic curve (AUC) and chi-square statistics. Results There were 5,278 visits among 4,676 unique patients who met criteria for sepsis. Of the 4,222 patients in the training group, 210 (5.0%) died during hospitalization, and of the 1,056 patients in the validation group, 50 (4.7%) died during hospitalization. The AUCs with 95% confidence intervals (CIs) for the different models were as follows: random forest model, 0.86 (95% CI = 0.82 to 0.90); CART model, 0.69 (95% CI = 0.62 to 0.77); logistic regression model, 0.76 (95% CI = 0.69 to 0.82); CURB-65, 0.73 (95% CI = 0.67 to 0.80); MEDS, 0.71 (95% CI = 0.63 to 0.77); and mREMS, 0.72 (95% CI = 0.65 to 0.79). The random forest model AUC was statistically different from all other models (p ≤ 0.003 for all comparisons). Conclusions In this proof-of-concept study, a local big data–driven, machine learning approach outperformed existing CDRs as well as traditional analytic techniques for predicting in-hospital mortality of ED patients with sepsis. Future research should prospectively evaluate the effectiveness of this approach and whether it translates into improved clinical outcomes for high-risk sepsis patients. The methods developed serve as an example of a new model for predictive analytics in emergency care that can be automated, applied to other clinical outcomes of interest, and deployed in EHRs to enable locally relevant clinical predictions. PMID:26679719

  17. Risk estimation using probability machines

    PubMed Central

    2014-01-01

    Background Logistic regression has been the de facto, and often the only, model used in the description and analysis of relationships between a binary outcome and observed features. It is widely used to obtain the conditional probabilities of the outcome given predictors, as well as predictor effect size estimates using conditional odds ratios. Results We show how statistical learning machines for binary outcomes, provably consistent for the nonparametric regression problem, can be used to provide both consistent conditional probability estimation and conditional effect size estimates. Effect size estimates from learning machines leverage our understanding of counterfactual arguments central to the interpretation of such estimates. We show that, if the data generating model is logistic, we can recover accurate probability predictions and effect size estimates with nearly the same efficiency as a correct logistic model, both for main effects and interactions. We also propose a method using learning machines to scan for possible interaction effects quickly and efficiently. Simulations using random forest probability machines are presented. Conclusions The models we propose make no assumptions about the data structure, and capture the patterns in the data by just specifying the predictors involved and not any particular model structure. So they do not run the same risks of model mis-specification and the resultant estimation biases as a logistic model. This methodology, which we call a “risk machine”, will share properties from the statistical machine that it is derived from. PMID:24581306

  18. A comparison of machine learning and Bayesian modelling for molecular serotyping.

    PubMed

    Newton, Richard; Wernisch, Lorenz

    2017-08-11

    Streptococcus pneumoniae is a human pathogen that is a major cause of infant mortality. Identifying the pneumococcal serotype is an important step in monitoring the impact of vaccines used to protect against disease. Genomic microarrays provide an effective method for molecular serotyping. Previously we developed an empirical Bayesian model for the classification of serotypes from a molecular serotyping array. With only few samples available, a model driven approach was the only option. In the meanwhile, several thousand samples have been made available to us, providing an opportunity to investigate serotype classification by machine learning methods, which could complement the Bayesian model. We compare the performance of the original Bayesian model with two machine learning algorithms: Gradient Boosting Machines and Random Forests. We present our results as an example of a generic strategy whereby a preliminary probabilistic model is complemented or replaced by a machine learning classifier once enough data are available. Despite the availability of thousands of serotyping arrays, a problem encountered when applying machine learning methods is the lack of training data containing mixtures of serotypes; due to the large number of possible combinations. Most of the available training data comprises samples with only a single serotype. To overcome the lack of training data we implemented an iterative analysis, creating artificial training data of serotype mixtures by combining raw data from single serotype arrays. With the enhanced training set the machine learning algorithms out perform the original Bayesian model. However, for serotypes currently lacking sufficient training data the best performing implementation was a combination of the results of the Bayesian Model and the Gradient Boosting Machine. As well as being an effective method for classifying biological data, machine learning can also be used as an efficient method for revealing subtle biological insights, which we illustrate with an example.

  19. A Constitutive Model for Creep Lifetime of PBO Braided Cord

    NASA Technical Reports Server (NTRS)

    Sterling, W. J.

    2007-01-01

    A constitutive model to describe the creep lifetime of PBO braided cord has been developed and fit to laboratory data. The model follows an approach proposed for p-aramid cord in similar applications, and has a Boltzman-type representation that arises from consideration of the failure phenomenon mechanism. The data were obtained using a hydraulic-type universal testing machine, and were analyzed according to Weibull statistics using commercially-available software. The application of concern to the author is NASA's Ultra- Long Duration Balloon and other gossamer spacecraft, but the motivations for the related p-aramid works suggest broader interest.

  20. AstroML: "better, faster, cheaper" towards state-of-the-art data mining and machine learning

    NASA Astrophysics Data System (ADS)

    Ivezic, Zeljko; Connolly, Andrew J.; Vanderplas, Jacob

    2015-01-01

    We present AstroML, a Python module for machine learning and data mining built on numpy, scipy, scikit-learn, matplotlib, and astropy, and distributed under an open license. AstroML contains a growing library of statistical and machine learning routines for analyzing astronomical data in Python, loaders for several open astronomical datasets (such as SDSS and other recent major surveys), and a large suite of examples of analyzing and visualizing astronomical datasets. AstroML is especially suitable for introducing undergraduate students to numerical research projects and for graduate students to rapidly undertake cutting-edge research. The long-term goal of astroML is to provide a community repository for fast Python implementations of common tools and routines used for statistical data analysis in astronomy and astrophysics (see http://www.astroml.org).

  1. 48 CFR 6104.402 - Filing claims [Rule 402].

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... number, and facsimile machine number, if available, of the claimant; (ii) The name, address, telephone number, and facsimile machine number, if available, of the agency employee who denied the claim; (iii) A... Clerk's facsimile machine number is: (202) 606-0019. The Board's working hours are 8:00 a.m. to 4:30 p.m...

  2. Understanding dental CAD/CAM for restorations - dental milling machines from a mechanical engineering viewpoint. Part A: chairside milling machines.

    PubMed

    Lebon, Nicolas; Tapie, Laurent; Duret, Francois; Attal, Jean-Pierre

    2016-01-01

    The dental milling machine is an important device in the dental CAD/CAM chain. Nowadays, dental numerical controlled (NC) milling machines are available for dental surgeries (chairside solution). This article provides a mechanical engineering approach to NC milling machines to help dentists understand the involvement of technology in digital dentistry practice. First, some technical concepts and definitions associated with NC milling machines are described from a mechanical engineering viewpoint. The technical and economic criteria of four chairside dental NC milling machines that are available on the market are then described. The technical criteria are focused on the capacities of the embedded technologies of these milling machines to mill both prosthetic materials and types of shape restorations. The economic criteria are focused on investment costs and interoperability with third-party software. The clinical relevance of the technology is assessed in terms of the accuracy and integrity of the restoration.

  3. Three lessons for genetic toxicology from baseball analytics.

    PubMed

    Dertinger, Stephen D

    2017-07-01

    In many respects the evolution of baseball statistics mirrors advances made in the field of genetic toxicology. From its inception, baseball and statistics have been inextricably linked. Generations of players and fans have used a number of relatively simple measurements to describe team and individual player's current performance, as well as for historical record-keeping purposes. Over the years, baseball analytics has progressed in several important ways. Early advances were based on deriving more meaningful metrics from simpler forerunners. Now, technological innovations are delivering much deeper insights. Videography, radar, and other advances that include automatic player recognition capabilities provide the means to measure more complex and useful factors. Fielders' reaction times, efficiency of the route taken to reach a batted ball, and pitch-framing effectiveness come to mind. With the current availability of complex measurements from multiple data streams, multifactorial analyses occurring via machine learning algorithms have become necessary to make sense of the terabytes of data that are now being captured in every Major League Baseball game. Collectively, these advances have transformed baseball statistics from being largely descriptive in nature to serving data-driven, predictive roles. Whereas genetic toxicology has charted a somewhat parallel course, a case can be made that greater utilization of baseball's mindset and strategies would serve our scientific field well. This paper describes three useful lessons for genetic toxicology, courtesy of the field of baseball analytics: seek objective knowledge; incorporate multiple data streams; and embrace machine learning. Environ. Mol. Mutagen. 58:390-397, 2017. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

  4. High Resolution Mapping of Soil Properties Using Remote Sensing Variables in South-Western Burkina Faso: A Comparison of Machine Learning and Multiple Linear Regression Models

    PubMed Central

    Welp, Gerhard; Thiel, Michael

    2017-01-01

    Accurate and detailed spatial soil information is essential for environmental modelling, risk assessment and decision making. The use of Remote Sensing data as secondary sources of information in digital soil mapping has been found to be cost effective and less time consuming compared to traditional soil mapping approaches. But the potentials of Remote Sensing data in improving knowledge of local scale soil information in West Africa have not been fully explored. This study investigated the use of high spatial resolution satellite data (RapidEye and Landsat), terrain/climatic data and laboratory analysed soil samples to map the spatial distribution of six soil properties–sand, silt, clay, cation exchange capacity (CEC), soil organic carbon (SOC) and nitrogen–in a 580 km2 agricultural watershed in south-western Burkina Faso. Four statistical prediction models–multiple linear regression (MLR), random forest regression (RFR), support vector machine (SVM), stochastic gradient boosting (SGB)–were tested and compared. Internal validation was conducted by cross validation while the predictions were validated against an independent set of soil samples considering the modelling area and an extrapolation area. Model performance statistics revealed that the machine learning techniques performed marginally better than the MLR, with the RFR providing in most cases the highest accuracy. The inability of MLR to handle non-linear relationships between dependent and independent variables was found to be a limitation in accurately predicting soil properties at unsampled locations. Satellite data acquired during ploughing or early crop development stages (e.g. May, June) were found to be the most important spectral predictors while elevation, temperature and precipitation came up as prominent terrain/climatic variables in predicting soil properties. The results further showed that shortwave infrared and near infrared channels of Landsat8 as well as soil specific indices of redness, coloration and saturation were prominent predictors in digital soil mapping. Considering the increased availability of freely available Remote Sensing data (e.g. Landsat, SRTM, Sentinels), soil information at local and regional scales in data poor regions such as West Africa can be improved with relatively little financial and human resources. PMID:28114334

  5. High Resolution Mapping of Soil Properties Using Remote Sensing Variables in South-Western Burkina Faso: A Comparison of Machine Learning and Multiple Linear Regression Models.

    PubMed

    Forkuor, Gerald; Hounkpatin, Ozias K L; Welp, Gerhard; Thiel, Michael

    2017-01-01

    Accurate and detailed spatial soil information is essential for environmental modelling, risk assessment and decision making. The use of Remote Sensing data as secondary sources of information in digital soil mapping has been found to be cost effective and less time consuming compared to traditional soil mapping approaches. But the potentials of Remote Sensing data in improving knowledge of local scale soil information in West Africa have not been fully explored. This study investigated the use of high spatial resolution satellite data (RapidEye and Landsat), terrain/climatic data and laboratory analysed soil samples to map the spatial distribution of six soil properties-sand, silt, clay, cation exchange capacity (CEC), soil organic carbon (SOC) and nitrogen-in a 580 km2 agricultural watershed in south-western Burkina Faso. Four statistical prediction models-multiple linear regression (MLR), random forest regression (RFR), support vector machine (SVM), stochastic gradient boosting (SGB)-were tested and compared. Internal validation was conducted by cross validation while the predictions were validated against an independent set of soil samples considering the modelling area and an extrapolation area. Model performance statistics revealed that the machine learning techniques performed marginally better than the MLR, with the RFR providing in most cases the highest accuracy. The inability of MLR to handle non-linear relationships between dependent and independent variables was found to be a limitation in accurately predicting soil properties at unsampled locations. Satellite data acquired during ploughing or early crop development stages (e.g. May, June) were found to be the most important spectral predictors while elevation, temperature and precipitation came up as prominent terrain/climatic variables in predicting soil properties. The results further showed that shortwave infrared and near infrared channels of Landsat8 as well as soil specific indices of redness, coloration and saturation were prominent predictors in digital soil mapping. Considering the increased availability of freely available Remote Sensing data (e.g. Landsat, SRTM, Sentinels), soil information at local and regional scales in data poor regions such as West Africa can be improved with relatively little financial and human resources.

  6. Research in image management and access

    NASA Technical Reports Server (NTRS)

    Vondran, Raymond F.; Barron, Billy J.

    1993-01-01

    Presently, the problem of over-all library system design has been compounded by the accretion of both function and structure to a basic framework of requirements. While more device power has led to increased functionality, opportunities for reducing system complexity at the user interface level have not always been pursued with equal zeal. The purpose of this book is therefore to set forth and examine these opportunities, within the general framework of human factors research in man-machine interfaces. Human factors may be viewed as a series of trade-off decisions among four polarized objectives: machine resources and user specifications; functionality and user requirements. In the past, a limiting factor was the availability of systems. However, in the last two years, over one hundred libraries supported by many different software configurations have been added to the Internet. This document includes a statistical analysis of human responses to five Internet library systems by key features, development of the ideal online catalog system, and ideal online catalog systems for libraries and information centers.

  7. Semi-supervised vibration-based classification and condition monitoring of compressors

    NASA Astrophysics Data System (ADS)

    Potočnik, Primož; Govekar, Edvard

    2017-09-01

    Semi-supervised vibration-based classification and condition monitoring of the reciprocating compressors installed in refrigeration appliances is proposed in this paper. The method addresses the problem of industrial condition monitoring where prior class definitions are often not available or difficult to obtain from local experts. The proposed method combines feature extraction, principal component analysis, and statistical analysis for the extraction of initial class representatives, and compares the capability of various classification methods, including discriminant analysis (DA), neural networks (NN), support vector machines (SVM), and extreme learning machines (ELM). The use of the method is demonstrated on a case study which was based on industrially acquired vibration measurements of reciprocating compressors during the production of refrigeration appliances. The paper presents a comparative qualitative analysis of the applied classifiers, confirming the good performance of several nonlinear classifiers. If the model parameters are properly selected, then very good classification performance can be obtained from NN trained by Bayesian regularization, SVM and ELM classifiers. The method can be effectively applied for the industrial condition monitoring of compressors.

  8. Comprehensive machine learning analysis of Hydra behavior reveals a stable basal behavioral repertoire.

    PubMed

    Han, Shuting; Taralova, Ekaterina; Dupre, Christophe; Yuste, Rafael

    2018-03-28

    Animal behavior has been studied for centuries, but few efficient methods are available to automatically identify and classify it. Quantitative behavioral studies have been hindered by the subjective and imprecise nature of human observation, and the slow speed of annotating behavioral data. Here, we developed an automatic behavior analysis pipeline for the cnidarian Hydra vulgaris using machine learning. We imaged freely behaving Hydra , extracted motion and shape features from the videos, and constructed a dictionary of visual features to classify pre-defined behaviors. We also identified unannotated behaviors with unsupervised methods. Using this analysis pipeline, we quantified 6 basic behaviors and found surprisingly similar behavior statistics across animals within the same species, regardless of experimental conditions. Our analysis indicates that the fundamental behavioral repertoire of Hydra is stable. This robustness could reflect a homeostatic neural control of "housekeeping" behaviors which could have been already present in the earliest nervous systems. © 2018, Han et al.

  9. Parallel and Scalable Clustering and Classification for Big Data in Geosciences

    NASA Astrophysics Data System (ADS)

    Riedel, M.

    2015-12-01

    Machine learning, data mining, and statistical computing are common techniques to perform analysis in earth sciences. This contribution will focus on two concrete and widely used data analytics methods suitable to analyse 'big data' in the context of geoscience use cases: clustering and classification. From the broad class of available clustering methods we focus on the density-based spatial clustering of appliactions with noise (DBSCAN) algorithm that enables the identification of outliers or interesting anomalies. A new open source parallel and scalable DBSCAN implementation will be discussed in the light of a scientific use case that detects water mixing events in the Koljoefjords. The second technique we cover is classification, with a focus set on the support vector machines algorithm (SVMs), as one of the best out-of-the-box classification algorithm. A parallel and scalable SVM implementation will be discussed in the light of a scientific use case in the field of remote sensing with 52 different classes of land cover types.

  10. A Hybrid dasymetric and machine learning approach to high-resolution residential electricity consumption modeling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Morton, April M; Nagle, Nicholas N; Piburn, Jesse O

    As urban areas continue to grow and evolve in a world of increasing environmental awareness, the need for detailed information regarding residential energy consumption patterns has become increasingly important. Though current modeling efforts mark significant progress in the effort to better understand the spatial distribution of energy consumption, the majority of techniques are highly dependent on region-specific data sources and often require building- or dwelling-level details that are not publicly available for many regions in the United States. Furthermore, many existing methods do not account for errors in input data sources and may not accurately reflect inherent uncertainties in modelmore » outputs. We propose an alternative and more general hybrid approach to high-resolution residential electricity consumption modeling by merging a dasymetric model with a complementary machine learning algorithm. The method s flexible data requirement and statistical framework ensure that the model both is applicable to a wide range of regions and considers errors in input data sources.« less

  11. Advanced Telecommunications Technologies in Rural Communities: Factors Affecting Use.

    ERIC Educational Resources Information Center

    Leistritz, F. Larry; Allen, John C.; Johnson, Bruce B.; Olsen, Duane; Sell, Randy

    1997-01-01

    A survey of 2,000 rural residents in 6 states (36% response) found that 56% used answering machines, 48% fax machines, 46% personal computers, 27% cell phones, and 25% modems. Higher use was associated with higher income and education. Distance from the nearest metropolitan statistical area increased use. A large majority believed…

  12. OFFICE MACHINES USED IN BUSINESS TODAY.

    ERIC Educational Resources Information Center

    COOK, FRED S.; MALICHE, ELEANOR

    INTERVIEWS OF 239 BUSINESSES OF THE BAY CITY STANDARD METROPOLITAN STATISTICAL AREA OF MICHIGAN PROVIDED INFORMATION ON (1) THE TYPE AND NUMBER OF MACHINES USED IN BUSINESS, (2) THE TRAINING DEMANDED BY EMPLOYERS FOR PERSONNEL USING THIS OFFICE EQUIPMENT, (3) THE EXTENT OF ON-THE-JOB TRAINING GIVEN BY EMPLOYERS, (4) THE IMPLICATIONS FOR VOCATIONAL…

  13. Anomaly detection for machine learning redshifts applied to SDSS galaxies

    NASA Astrophysics Data System (ADS)

    Hoyle, Ben; Rau, Markus Michael; Paech, Kerstin; Bonnett, Christopher; Seitz, Stella; Weller, Jochen

    2015-10-01

    We present an analysis of anomaly detection for machine learning redshift estimation. Anomaly detection allows the removal of poor training examples, which can adversely influence redshift estimates. Anomalous training examples may be photometric galaxies with incorrect spectroscopic redshifts, or galaxies with one or more poorly measured photometric quantity. We select 2.5 million `clean' SDSS DR12 galaxies with reliable spectroscopic redshifts, and 6730 `anomalous' galaxies with spectroscopic redshift measurements which are flagged as unreliable. We contaminate the clean base galaxy sample with galaxies with unreliable redshifts and attempt to recover the contaminating galaxies using the Elliptical Envelope technique. We then train four machine learning architectures for redshift analysis on both the contaminated sample and on the preprocessed `anomaly-removed' sample and measure redshift statistics on a clean validation sample generated without any preprocessing. We find an improvement on all measured statistics of up to 80 per cent when training on the anomaly removed sample as compared with training on the contaminated sample for each of the machine learning routines explored. We further describe a method to estimate the contamination fraction of a base data sample.

  14. Choice Behavior of Nonpathological Women Playing Concurrently Available Slot Machines: Effect of Changes in Payback Percentages

    ERIC Educational Resources Information Center

    Weatherly, Jeffrey N.; Thompson, Bradley J.; Hodny, Marisa; Meier, Ellen

    2009-01-01

    In a simulated casino environment, 6 nonpathological women played concurrently available commercial slot machines programmed to pay out at different rates. Participants did not always demonstrate preferences for the higher paying machine. The data suggest that factors other than programmed or obtained rate of reinforcement may control gambling…

  15. 49 CFR 214.533 - Schedule of repairs subject to availability of parts.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... Maintenance Machines and Hi-Rail Vehicles § 214.533 Schedule of repairs subject to availability of parts. (a... maintenance machine or a hi-rail vehicle by the end of the next business day following the report of the... maintenance machine or hi-rail vehicle within seven calendar days after receiving the necessary part. The...

  16. Applying Sparse Machine Learning Methods to Twitter: Analysis of the 2012 Change in Pap Smear Guidelines. A Sequential Mixed-Methods Study.

    PubMed

    Lyles, Courtney Rees; Godbehere, Andrew; Le, Gem; El Ghaoui, Laurent; Sarkar, Urmimala

    2016-06-10

    It is difficult to synthesize the vast amount of textual data available from social media websites. Capturing real-world discussions via social media could provide insights into individuals' opinions and the decision-making process. We conducted a sequential mixed methods study to determine the utility of sparse machine learning techniques in summarizing Twitter dialogues. We chose a narrowly defined topic for this approach: cervical cancer discussions over a 6-month time period surrounding a change in Pap smear screening guidelines. We applied statistical methodologies, known as sparse machine learning algorithms, to summarize Twitter messages about cervical cancer before and after the 2012 change in Pap smear screening guidelines by the US Preventive Services Task Force (USPSTF). All messages containing the search terms "cervical cancer," "Pap smear," and "Pap test" were analyzed during: (1) January 1-March 13, 2012, and (2) March 14-June 30, 2012. Topic modeling was used to discern the most common topics from each time period, and determine the singular value criterion for each topic. The results were then qualitatively coded from top 10 relevant topics to determine the efficiency of clustering method in grouping distinct ideas, and how the discussion differed before vs. after the change in guidelines . This machine learning method was effective in grouping the relevant discussion topics about cervical cancer during the respective time periods (~20% overall irrelevant content in both time periods). Qualitative analysis determined that a significant portion of the top discussion topics in the second time period directly reflected the USPSTF guideline change (eg, "New Screening Guidelines for Cervical Cancer"), and many topics in both time periods were addressing basic screening promotion and education (eg, "It is Cervical Cancer Awareness Month! Click the link to see where you can receive a free or low cost Pap test.") It was demonstrated that machine learning tools can be useful in cervical cancer prevention and screening discussions on Twitter. This method allowed us to prove that there is publicly available significant information about cervical cancer screening on social media sites. Moreover, we observed a direct impact of the guideline change within the Twitter messages.

  17. Cheminformatic models based on machine learning for pyruvate kinase inhibitors of Leishmania mexicana.

    PubMed

    Jamal, Salma; Scaria, Vinod

    2013-11-19

    Leishmaniasis is a neglected tropical disease which affects approx. 12 million individuals worldwide and caused by parasite Leishmania. The current drugs used in the treatment of Leishmaniasis are highly toxic and has seen widespread emergence of drug resistant strains which necessitates the need for the development of new therapeutic options. The high throughput screen data available has made it possible to generate computational predictive models which have the ability to assess the active scaffolds in a chemical library followed by its ADME/toxicity properties in the biological trials. In the present study, we have used publicly available, high-throughput screen datasets of chemical moieties which have been adjudged to target the pyruvate kinase enzyme of L. mexicana (LmPK). The machine learning approach was used to create computational models capable of predicting the biological activity of novel antileishmanial compounds. Further, we evaluated the molecules using the substructure based approach to identify the common substructures contributing to their activity. We generated computational models based on machine learning methods and evaluated the performance of these models based on various statistical figures of merit. Random forest based approach was determined to be the most sensitive, better accuracy as well as ROC. We further added a substructure based approach to analyze the molecules to identify potentially enriched substructures in the active dataset. We believe that the models developed in the present study would lead to reduction in cost and length of clinical studies and hence newer drugs would appear faster in the market providing better healthcare options to the patients.

  18. Availability of Vending Machines and School Stores in California Schools.

    PubMed

    Cisse-Egbuonye, Nafissatou; Liles, Sandy; Schmitz, Katharine E; Kassem, Nada; Irvin, Veronica L; Hovell, Melbourne F

    2016-01-01

    This study examined the availability of foods sold in vending machines and school stores in United States public and private schools, and associations of availability with students' food purchases and consumption. Descriptive analyses, chi-square tests, and Spearman product-moment correlations were conducted on data collected from 521 students aged 8 to 15 years recruited from orthodontic offices in California. Vending machines were more common in private schools than in public schools, whereas school stores were common in both private and public schools. The food items most commonly available in both vending machines and school stores in all schools were predominately foods of minimal nutritional value (FMNV). Participant report of availability of food items in vending machines and/or school stores was significantly correlated with (1) participant purchase of each item from those sources, except for energy drinks, milk, fruits, and vegetables; and (2) participants' friends' consumption of items at lunch, for 2 categories of FMNV (candy, cookies, or cake; soda or sports drinks). Despite the Child Nutrition and Women, Infants, and Children (WIC) Reauthorization Act of 2004, FMNV were still available in schools, and may be contributing to unhealthy dietary choices and ultimately to health risks. © 2015, American School Health Association.

  19. Availability of vending machines and school stores in California schools

    PubMed Central

    Liles, Sandy; Schmitz, Katharine E.; Kassem, Nada O.F; Irvin, Veronica L; Hovell, Melbourne F.

    2015-01-01

    Background This study examined the availability of foods sold in vending machines and school stores in US public and private schools, and associations of availability with students' food purchases and consumption. Methods Descriptive analyses, chi-square tests, and Spearman product-moment correlations were conducted on data collected from 521 students aged 8 to15 years recruited from orthodontic offices in California. Results Vending machines were more common in private schools than in public schools, while school stores were common in both private and public schools. The food items most commonly available in both vending machines and school stores in all schools were predominately foods of minimal nutritional value (FMNV). Participant report of availability of food items in vending machines and/or school stores was significantly correlated with: (1) participant purchase of each item from those sources, except for energy drinks, milk, fruits, and vegetables; and (2) participants' friends' consumption of items at lunch, for two categories of FMNV (candy, cookies, or cake; soda or sports drinks). Conclusions Despite the Child Nutrition and WIC reauthorization Act of 2004, FMNV were still available in schools, and may be contributing to unhealthy dietary choices and ultimately to health risks. PMID:26645420

  20. Machine Learning Approaches for Predicting Radiation Therapy Outcomes: A Clinician's Perspective

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kang, John; Schwartz, Russell; Flickinger, John

    Radiation oncology has always been deeply rooted in modeling, from the early days of isoeffect curves to the contemporary Quantitative Analysis of Normal Tissue Effects in the Clinic (QUANTEC) initiative. In recent years, medical modeling for both prognostic and therapeutic purposes has exploded thanks to increasing availability of electronic data and genomics. One promising direction that medical modeling is moving toward is adopting the same machine learning methods used by companies such as Google and Facebook to combat disease. Broadly defined, machine learning is a branch of computer science that deals with making predictions from complex data through statistical models.more » These methods serve to uncover patterns in data and are actively used in areas such as speech recognition, handwriting recognition, face recognition, “spam” filtering (junk email), and targeted advertising. Although multiple radiation oncology research groups have shown the value of applied machine learning (ML), clinical adoption has been slow due to the high barrier to understanding these complex models by clinicians. Here, we present a review of the use of ML to predict radiation therapy outcomes from the clinician's point of view with the hope that it lowers the “barrier to entry” for those without formal training in ML. We begin by describing 7 principles that one should consider when evaluating (or creating) an ML model in radiation oncology. We next introduce 3 popular ML methods—logistic regression (LR), support vector machine (SVM), and artificial neural network (ANN)—and critique 3 seminal papers in the context of these principles. Although current studies are in exploratory stages, the overall methodology has progressively matured, and the field is ready for larger-scale further investigation.« less

  1. Classification without labels: learning from mixed samples in high energy physics

    NASA Astrophysics Data System (ADS)

    Metodiev, Eric M.; Nachman, Benjamin; Thaler, Jesse

    2017-10-01

    Modern machine learning techniques can be used to construct powerful models for difficult collider physics problems. In many applications, however, these models are trained on imperfect simulations due to a lack of truth-level information in the data, which risks the model learning artifacts of the simulation. In this paper, we introduce the paradigm of classification without labels (CWoLa) in which a classifier is trained to distinguish statistical mixtures of classes, which are common in collider physics. Crucially, neither individual labels nor class proportions are required, yet we prove that the optimal classifier in the CWoLa paradigm is also the optimal classifier in the traditional fully-supervised case where all label information is available. After demonstrating the power of this method in an analytical toy example, we consider a realistic benchmark for collider physics: distinguishing quark- versus gluon-initiated jets using mixed quark/gluon training samples. More generally, CWoLa can be applied to any classification problem where labels or class proportions are unknown or simulations are unreliable, but statistical mixtures of the classes are available.

  2. Classification without labels: learning from mixed samples in high energy physics

    DOE PAGES

    Metodiev, Eric M.; Nachman, Benjamin; Thaler, Jesse

    2017-10-25

    Modern machine learning techniques can be used to construct powerful models for difficult collider physics problems. In many applications, however, these models are trained on imperfect simulations due to a lack of truth-level information in the data, which risks the model learning artifacts of the simulation. In this paper, we introduce the paradigm of classification without labels (CWoLa) in which a classifier is trained to distinguish statistical mixtures of classes, which are common in collider physics. Crucially, neither individual labels nor class proportions are required, yet we prove that the optimal classifier in the CWoLa paradigm is also the optimalmore » classifier in the traditional fully-supervised case where all label information is available. After demonstrating the power of this method in an analytical toy example, we consider a realistic benchmark for collider physics: distinguishing quark- versus gluon-initiated jets using mixed quark/gluon training samples. More generally, CWoLa can be applied to any classification problem where labels or class proportions are unknown or simulations are unreliable, but statistical mixtures of the classes are available.« less

  3. Classification without labels: learning from mixed samples in high energy physics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Metodiev, Eric M.; Nachman, Benjamin; Thaler, Jesse

    Modern machine learning techniques can be used to construct powerful models for difficult collider physics problems. In many applications, however, these models are trained on imperfect simulations due to a lack of truth-level information in the data, which risks the model learning artifacts of the simulation. In this paper, we introduce the paradigm of classification without labels (CWoLa) in which a classifier is trained to distinguish statistical mixtures of classes, which are common in collider physics. Crucially, neither individual labels nor class proportions are required, yet we prove that the optimal classifier in the CWoLa paradigm is also the optimalmore » classifier in the traditional fully-supervised case where all label information is available. After demonstrating the power of this method in an analytical toy example, we consider a realistic benchmark for collider physics: distinguishing quark- versus gluon-initiated jets using mixed quark/gluon training samples. More generally, CWoLa can be applied to any classification problem where labels or class proportions are unknown or simulations are unreliable, but statistical mixtures of the classes are available.« less

  4. Relative Performance of Hardwood Sawing Machines

    Treesearch

    Philip H. Steele; Michael W. Wade; Steven H. Bullard; Philip A. Araman

    1991-01-01

    Only limited information has been available to hardwood sawmillers on the performance of their sawing machines. This study analyzes a large database of individual machine studies to provide detailed information on 6 machine types. These machine types were band headrig, circular headrig, band linebar resaw, vertical band splitter resaw, single arbor gang resaw and...

  5. Spectral feature extraction of EEG signals and pattern recognition during mental tasks of 2-D cursor movements for BCI using SVM and ANN.

    PubMed

    Bascil, M Serdar; Tesneli, Ahmet Y; Temurtas, Feyzullah

    2016-09-01

    Brain computer interface (BCI) is a new communication way between man and machine. It identifies mental task patterns stored in electroencephalogram (EEG). So, it extracts brain electrical activities recorded by EEG and transforms them machine control commands. The main goal of BCI is to make available assistive environmental devices for paralyzed people such as computers and makes their life easier. This study deals with feature extraction and mental task pattern recognition on 2-D cursor control from EEG as offline analysis approach. The hemispherical power density changes are computed and compared on alpha-beta frequency bands with only mental imagination of cursor movements. First of all, power spectral density (PSD) features of EEG signals are extracted and high dimensional data reduced by principle component analysis (PCA) and independent component analysis (ICA) which are statistical algorithms. In the last stage, all features are classified with two types of support vector machine (SVM) which are linear and least squares (LS-SVM) and three different artificial neural network (ANN) structures which are learning vector quantization (LVQ), multilayer neural network (MLNN) and probabilistic neural network (PNN) and mental task patterns are successfully identified via k-fold cross validation technique.

  6. A wearable smartphone-based platform for real-time cardiovascular disease detection via electrocardiogram processing.

    PubMed

    Oresko, Joseph J; Duschl, Heather; Cheng, Allen C

    2010-05-01

    Cardiovascular disease (CVD) is the single leading cause of global mortality and is projected to remain so. Cardiac arrhythmia is a very common type of CVD and may indicate an increased risk of stroke or sudden cardiac death. The ECG is the most widely adopted clinical tool to diagnose and assess the risk of arrhythmia. ECGs measure and display the electrical activity of the heart from the body surface. During patients' hospital visits, however, arrhythmias may not be detected on standard resting ECG machines, since the condition may not be present at that moment in time. While Holter-based portable monitoring solutions offer 24-48 h ECG recording, they lack the capability of providing any real-time feedback for the thousands of heart beats they record, which must be tediously analyzed offline. In this paper, we seek to unite the portability of Holter monitors and the real-time processing capability of state-of-the-art resting ECG machines to provide an assistive diagnosis solution using smartphones. Specifically, we developed two smartphone-based wearable CVD-detection platforms capable of performing real-time ECG acquisition and display, feature extraction, and beat classification. Furthermore, the same statistical summaries available on resting ECG machines are provided.

  7. Diamond Tool Specific Wear Rate Assessment in Granite Machining by Means of Knoop Micro-Hardness and Process Parameters

    NASA Astrophysics Data System (ADS)

    Goktan, R. M.; Gunes Yılmaz, N.

    2017-09-01

    The present study was undertaken to investigate the potential usability of Knoop micro-hardness, both as a single parameter and in combination with operational parameters, for sawblade specific wear rate (SWR) assessment in the machining of ornamental granites. The sawing tests were performed on different commercially available granite varieties by using a fully instrumented side-cutting machine. During the sawing tests, two fundamental productivity parameters, namely the workpiece feed rate and cutting depth, were varied at different levels. The good correspondence observed between the measured Knoop hardness and SWR values for different operational conditions indicates that it has the potential to be used as a rock material property that can be employed in preliminary wear estimations of diamond sawblades. Also, a multiple regression model directed to SWR prediction was developed which takes into account the Knoop hardness, cutting depth and workpiece feed rate. The relative contribution of each independent variable in the prediction of SWR was determined by using test statistics. The prediction accuracy of the established model was checked against new observations. The strong prediction performance of the model suggests that its framework may be applied to other granites and operational conditions for quantifying or differentiating the relative wear performance of diamond sawblades.

  8. Specification of a new de-stoner machine: evaluation of machining effects on olive paste's rheology and olive oil yield and quality.

    PubMed

    Romaniello, Roberto; Leone, Alessandro; Tamborrino, Antonia

    2017-01-01

    An industrial prototype of a partial de-stoner machine was specified, built and implemented in an industrial olive oil extraction plant. The partial de-stoner machine was compared to the traditional mechanical crusher to assess its quantitative and qualitative performance. The extraction efficiency of the olive oil extraction plant, olive oil quality, sensory evaluation and rheological aspects were investigated. The results indicate that by using the partial de-stoner machine the extraction plant did not show statistical differences with respect to the traditional mechanical crushing. Moreover, the partial de-stoner machine allowed recovery of 60% of olive pits and the oils obtained were characterised by more marked green fruitiness, flavour and aroma than the oils produced using the traditional processing systems. The partial de-stoner machine removes the limitations of the traditional total de-stoner machine, opening new frontiers for the recovery of pits to be used as biomass. Moreover, the partial de-stoner machine permitted a significant reduction in the viscosity of the olive paste. © 2016 Society of Chemical Industry. © 2016 Society of Chemical Industry.

  9. HHsvm: fast and accurate classification of profile–profile matches identified by HHsearch

    PubMed Central

    Dlakić, Mensur

    2009-01-01

    Motivation: Recently developed profile–profile methods rival structural comparisons in their ability to detect homology between distantly related proteins. Despite this tremendous progress, many genuine relationships between protein families cannot be recognized as comparisons of their profiles result in scores that are statistically insignificant. Results: Using known evolutionary relationships among protein superfamilies in SCOP database, support vector machines were trained on four sets of discriminatory features derived from the output of HHsearch. Upon validation, it was shown that the automatic classification of all profile–profile matches was superior to fixed threshold-based annotation in terms of sensitivity and specificity. The effectiveness of this approach was demonstrated by annotating several domains of unknown function from the Pfam database. Availability: Programs and scripts implementing the methods described in this manuscript are freely available from http://hhsvm.dlakiclab.org/. Contact: mdlakic@montana.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:19773335

  10. Progress with modeling activity landscapes in drug discovery.

    PubMed

    Vogt, Martin

    2018-04-19

    Activity landscapes (ALs) are representations and models of compound data sets annotated with a target-specific activity. In contrast to quantitative structure-activity relationship (QSAR) models, ALs aim at characterizing structure-activity relationships (SARs) on a large-scale level encompassing all active compounds for specific targets. The popularity of AL modeling has grown substantially with the public availability of large activity-annotated compound data sets. AL modeling crucially depends on molecular representations and similarity metrics used to assess structural similarity. Areas covered: The concepts of AL modeling are introduced and its basis in quantitatively assessing molecular similarity is discussed. The different types of AL modeling approaches are introduced. AL designs can broadly be divided into three categories: compound-pair based, dimensionality reduction, and network approaches. Recent developments for each of these categories are discussed focusing on the application of mathematical, statistical, and machine learning tools for AL modeling. AL modeling using chemical space networks is covered in more detail. Expert opinion: AL modeling has remained a largely descriptive approach for the analysis of SARs. Beyond mere visualization, the application of analytical tools from statistics, machine learning and network theory has aided in the sophistication of AL designs and provides a step forward in transforming ALs from descriptive to predictive tools. To this end, optimizing representations that encode activity relevant features of molecules might prove to be a crucial step.

  11. Proposed hybrid-classifier ensemble algorithm to map snow cover area

    NASA Astrophysics Data System (ADS)

    Nijhawan, Rahul; Raman, Balasubramanian; Das, Josodhir

    2018-01-01

    Metaclassification ensemble approach is known to improve the prediction performance of snow-covered area. The methodology adopted in this case is based on neural network along with four state-of-art machine learning algorithms: support vector machine, artificial neural networks, spectral angle mapper, K-mean clustering, and a snow index: normalized difference snow index. An AdaBoost ensemble algorithm related to decision tree for snow-cover mapping is also proposed. According to available literature, these methods have been rarely used for snow-cover mapping. Employing the above techniques, a study was conducted for Raktavarn and Chaturangi Bamak glaciers, Uttarakhand, Himalaya using multispectral Landsat 7 ETM+ (enhanced thematic mapper) image. The study also compares the results with those obtained from statistical combination methods (majority rule and belief functions) and accuracies of individual classifiers. Accuracy assessment is performed by computing the quantity and allocation disagreement, analyzing statistic measures (accuracy, precision, specificity, AUC, and sensitivity) and receiver operating characteristic curves. A total of 225 combinations of parameters for individual classifiers were trained and tested on the dataset and results were compared with the proposed approach. It was observed that the proposed methodology produced the highest classification accuracy (95.21%), close to (94.01%) that was produced by the proposed AdaBoost ensemble algorithm. From the sets of observations, it was concluded that the ensemble of classifiers produced better results compared to individual classifiers.

  12. Kernel-based whole-genome prediction of complex traits: a review.

    PubMed

    Morota, Gota; Gianola, Daniel

    2014-01-01

    Prediction of genetic values has been a focus of applied quantitative genetics since the beginning of the 20th century, with renewed interest following the advent of the era of whole genome-enabled prediction. Opportunities offered by the emergence of high-dimensional genomic data fueled by post-Sanger sequencing technologies, especially molecular markers, have driven researchers to extend Ronald Fisher and Sewall Wright's models to confront new challenges. In particular, kernel methods are gaining consideration as a regression method of choice for genome-enabled prediction. Complex traits are presumably influenced by many genomic regions working in concert with others (clearly so when considering pathways), thus generating interactions. Motivated by this view, a growing number of statistical approaches based on kernels attempt to capture non-additive effects, either parametrically or non-parametrically. This review centers on whole-genome regression using kernel methods applied to a wide range of quantitative traits of agricultural importance in animals and plants. We discuss various kernel-based approaches tailored to capturing total genetic variation, with the aim of arriving at an enhanced predictive performance in the light of available genome annotation information. Connections between prediction machines born in animal breeding, statistics, and machine learning are revisited, and their empirical prediction performance is discussed. Overall, while some encouraging results have been obtained with non-parametric kernels, recovering non-additive genetic variation in a validation dataset remains a challenge in quantitative genetics.

  13. Relative Kerf and Sawing Variation Values for Some Hardwood Sawing Machines

    Treesearch

    Philip H. Steele; Michael W. Wade; Steven H. Bullard; Philip A. Araman

    1992-01-01

    Information on the conversion efficiency of sawing machines is important to those involved in the management, maintenance, and design of sawmills. Little information on the conversion characteristics of hardwood sawing machines has been available. This study, based on 266 studies of 6 machine types, provides an analysis of the machine characteristics of kerf width,...

  14. Classification of Variable Objects in Massive Sky Monitoring Surveys

    NASA Astrophysics Data System (ADS)

    Woźniak, Przemek; Wyrzykowski, Łukasz; Belokurov, Vasily

    2012-03-01

    The era of great sky surveys is upon us. Over the past decade we have seen rapid progress toward a continuous photometric record of the optical sky. Numerous sky surveys are discovering and monitoring variable objects by hundreds of thousands. Advances in detector, computing, and networking technology are driving applications of all shapes and sizes ranging from small all sky monitors, through networks of robotic telescopes of modest size, to big glass facilities equipped with giga-pixel CCD mosaics. The Large Synoptic Survey Telescope will be the first peta-scale astronomical survey [18]. It will expand the volume of the parameter space available to us by three orders of magnitude and explore the mutable heavens down to an unprecedented level of sensitivity. Proliferation of large, multidimensional astronomical data sets is stimulating the work on new methods and tools to handle the identification and classification challenge [3]. Given exponentially growing data rates, automated classification of variability types is quickly becoming a necessity. Taking humans out of the loop not only eliminates the subjective nature of visual classification, but is also an enabling factor for time-critical applications. Full automation is especially important for studies of explosive phenomena such as γ-ray bursts that require rapid follow-up observations before the event is over. While there is a general consensus that machine learning will provide a viable solution, the available algorithmic toolbox remains underutilized in astronomy by comparison with other fields such as genomics or market research. Part of the problem is the nature of astronomical data sets that tend to be dominated by a variety of irregularities. Not all algorithms can handle gracefully uneven time sampling, missing features, or sparsely populated high-dimensional spaces. More sophisticated algorithms and better tools available in standard software packages are required to facilitate the adoption of machine learning in astronomy. The goal of this chapter is to show a number of successful applications of state-of-the-art machine learning methodology to time-resolved astronomical data, illustrate what is possible today, and help identify areas for further research and development. After a brief comparison of the utility of various machine learning classifiers, the discussion focuses on support vector machines (SVM), neural nets, and self-organizing maps. Traditionally, to detect and classify transient variability astronomers used ad hoc scan statistics. These methods will remain important as feature extractors for input into generic machine learning algorithms. Experience shows that the performance of machine learning tools on astronomical data critically depends on the definition and quality of the input features, and that a considerable amount of preprocessing is required before standard algorithms can be applied. However, with continued investments of effort by a growing number of astro-informatics savvy computer scientists and astronomers the much-needed expertise and infrastructure are growing faster than ever.

  15. Operational planning using Climatological Observations for Maritime Prediction and Analysis Support Service (COMPASS)

    NASA Astrophysics Data System (ADS)

    O'Connor, Alison; Kirtman, Benjamin; Harrison, Scott; Gorman, Joe

    2016-05-01

    The US Navy faces several limitations when planning operations in regard to forecasting environmental conditions. Currently, mission analysis and planning tools rely heavily on short-term (less than a week) forecasts or long-term statistical climate products. However, newly available data in the form of weather forecast ensembles provides dynamical and statistical extended-range predictions that can produce more accurate predictions if ensemble members can be combined correctly. Charles River Analytics is designing the Climatological Observations for Maritime Prediction and Analysis Support Service (COMPASS), which performs data fusion over extended-range multi-model ensembles, such as the North American Multi-Model Ensemble (NMME), to produce a unified forecast for several weeks to several seasons in the future. We evaluated thirty years of forecasts using machine learning to select predictions for an all-encompassing and superior forecast that can be used to inform the Navy's decision planning process.

  16. SCENERY: a web application for (causal) network reconstruction from cytometry data

    PubMed Central

    Papoutsoglou, Georgios; Athineou, Giorgos; Lagani, Vincenzo; Xanthopoulos, Iordanis; Schmidt, Angelika; Éliás, Szabolcs; Tegnér, Jesper

    2017-01-01

    Abstract Flow and mass cytometry technologies can probe proteins as biological markers in thousands of individual cells simultaneously, providing unprecedented opportunities for reconstructing networks of protein interactions through machine learning algorithms. The network reconstruction (NR) problem has been well-studied by the machine learning community. However, the potentials of available methods remain largely unknown to the cytometry community, mainly due to their intrinsic complexity and the lack of comprehensive, powerful and easy-to-use NR software implementations specific for cytometry data. To bridge this gap, we present Single CEll NEtwork Reconstruction sYstem (SCENERY), a web server featuring several standard and advanced cytometry data analysis methods coupled with NR algorithms in a user-friendly, on-line environment. In SCENERY, users may upload their data and set their own study design. The server offers several data analysis options categorized into three classes of methods: data (pre)processing, statistical analysis and NR. The server also provides interactive visualization and download of results as ready-to-publish images or multimedia reports. Its core is modular and based on the widely-used and robust R platform allowing power users to extend its functionalities by submitting their own NR methods. SCENERY is available at scenery.csd.uoc.gr or http://mensxmachina.org/en/software/. PMID:28525568

  17. Application of machine learning and expert systems to Statistical Process Control (SPC) chart interpretation

    NASA Technical Reports Server (NTRS)

    Shewhart, Mark

    1991-01-01

    Statistical Process Control (SPC) charts are one of several tools used in quality control. Other tools include flow charts, histograms, cause and effect diagrams, check sheets, Pareto diagrams, graphs, and scatter diagrams. A control chart is simply a graph which indicates process variation over time. The purpose of drawing a control chart is to detect any changes in the process signalled by abnormal points or patterns on the graph. The Artificial Intelligence Support Center (AISC) of the Acquisition Logistics Division has developed a hybrid machine learning expert system prototype which automates the process of constructing and interpreting control charts.

  18. Objective research of auscultation signals in Traditional Chinese Medicine based on wavelet packet energy and support vector machine.

    PubMed

    Yan, Jianjun; Shen, Xiaojing; Wang, Yiqin; Li, Fufeng; Xia, Chunming; Guo, Rui; Chen, Chunfeng; Shen, Qingwei

    2010-01-01

    This study aims at utilising Wavelet Packet Transform (WPT) and Support Vector Machine (SVM) algorithm to make objective analysis and quantitative research for the auscultation in Traditional Chinese Medicine (TCM) diagnosis. First, Wavelet Packet Decomposition (WPD) at level 6 was employed to split more elaborate frequency bands of the auscultation signals. Then statistic analysis was made based on the extracted Wavelet Packet Energy (WPE) features from WPD coefficients. Furthermore, the pattern recognition was used to distinguish mixed subjects' statistical feature values of sample groups through SVM. Finally, the experimental results showed that the classification accuracies were at a high level.

  19. Machine learning patterns for neuroimaging-genetic studies in the cloud.

    PubMed

    Da Mota, Benoit; Tudoran, Radu; Costan, Alexandru; Varoquaux, Gaël; Brasche, Goetz; Conrod, Patricia; Lemaitre, Herve; Paus, Tomas; Rietschel, Marcella; Frouin, Vincent; Poline, Jean-Baptiste; Antoniu, Gabriel; Thirion, Bertrand

    2014-01-01

    Brain imaging is a natural intermediate phenotype to understand the link between genetic information and behavior or brain pathologies risk factors. Massive efforts have been made in the last few years to acquire high-dimensional neuroimaging and genetic data on large cohorts of subjects. The statistical analysis of such data is carried out with increasingly sophisticated techniques and represents a great computational challenge. Fortunately, increasing computational power in distributed architectures can be harnessed, if new neuroinformatics infrastructures are designed and training to use these new tools is provided. Combining a MapReduce framework (TomusBLOB) with machine learning algorithms (Scikit-learn library), we design a scalable analysis tool that can deal with non-parametric statistics on high-dimensional data. End-users describe the statistical procedure to perform and can then test the model on their own computers before running the very same code in the cloud at a larger scale. We illustrate the potential of our approach on real data with an experiment showing how the functional signal in subcortical brain regions can be significantly fit with genome-wide genotypes. This experiment demonstrates the scalability and the reliability of our framework in the cloud with a 2 weeks deployment on hundreds of virtual machines.

  20. MSUSTAT.

    ERIC Educational Resources Information Center

    Mauriello, David

    1984-01-01

    Reviews an interactive statistical analysis package (designed to run on 8- and 16-bit machines that utilize CP/M 80 and MS-DOS operating systems), considering its features and uses, documentation, operation, and performance. The package consists of 40 general purpose statistical procedures derived from the classic textbook "Statistical…

  1. Extracting laboratory test information from biomedical text

    PubMed Central

    Kang, Yanna Shen; Kayaalp, Mehmet

    2013-01-01

    Background: No previous study reported the efficacy of current natural language processing (NLP) methods for extracting laboratory test information from narrative documents. This study investigates the pathology informatics question of how accurately such information can be extracted from text with the current tools and techniques, especially machine learning and symbolic NLP methods. The study data came from a text corpus maintained by the U.S. Food and Drug Administration, containing a rich set of information on laboratory tests and test devices. Methods: The authors developed a symbolic information extraction (SIE) system to extract device and test specific information about four types of laboratory test entities: Specimens, analytes, units of measures and detection limits. They compared the performance of SIE and three prominent machine learning based NLP systems, LingPipe, GATE and BANNER, each implementing a distinct supervised machine learning method, hidden Markov models, support vector machines and conditional random fields, respectively. Results: Machine learning systems recognized laboratory test entities with moderately high recall, but low precision rates. Their recall rates were relatively higher when the number of distinct entity values (e.g., the spectrum of specimens) was very limited or when lexical morphology of the entity was distinctive (as in units of measures), yet SIE outperformed them with statistically significant margins on extracting specimen, analyte and detection limit information in both precision and F-measure. Its high recall performance was statistically significant on analyte information extraction. Conclusions: Despite its shortcomings against machine learning methods, a well-tailored symbolic system may better discern relevancy among a pile of information of the same type and may outperform a machine learning system by tapping into lexically non-local contextual information such as the document structure. PMID:24083058

  2. 3D Visualization of Machine Learning Algorithms with Astronomical Data

    NASA Astrophysics Data System (ADS)

    Kent, Brian R.

    2016-01-01

    We present innovative machine learning (ML) methods using unsupervised clustering with minimum spanning trees (MSTs) to study 3D astronomical catalogs. Utilizing Python code to build trees based on galaxy catalogs, we can render the results with the visualization suite Blender to produce interactive 360 degree panoramic videos. The catalogs and their ML results can be explored in a 3D space using mobile devices, tablets or desktop browsers. We compare the statistics of the MST results to a number of machine learning methods relating to optimization and efficiency.

  3. Machine Learning Prediction of the Energy Gap of Graphene Nanoflakes Using Topological Autocorrelation Vectors.

    PubMed

    Fernandez, Michael; Abreu, Jose I; Shi, Hongqing; Barnard, Amanda S

    2016-11-14

    The possibility of band gap engineering in graphene opens countless new opportunities for application in nanoelectronics. In this work, the energy gaps of 622 computationally optimized graphene nanoflakes were mapped to topological autocorrelation vectors using machine learning techniques. Machine learning modeling revealed that the most relevant correlations appear at topological distances in the range of 1 to 42 with prediction accuracy higher than 80%. The data-driven model can statistically discriminate between graphene nanoflakes with different energy gaps on the basis of their molecular topology.

  4. What subject matter questions motivate the use of machine learning approaches compared to statistical models for probability prediction?

    PubMed

    Binder, Harald

    2014-07-01

    This is a discussion of the following papers: "Probability estimation with machine learning methods for dichotomous and multicategory outcome: Theory" by Jochen Kruppa, Yufeng Liu, Gérard Biau, Michael Kohler, Inke R. König, James D. Malley, and Andreas Ziegler; and "Probability estimation with machine learning methods for dichotomous and multicategory outcome: Applications" by Jochen Kruppa, Yufeng Liu, Hans-Christian Diener, Theresa Holste, Christian Weimar, Inke R. König, and Andreas Ziegler. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  5. Availability of Vending Machines and School Stores in California Schools

    ERIC Educational Resources Information Center

    Cisse-Egbuonye, Nafissatou; Liles, Sandy; Schmitz, Katharine E.; Kassem, Nada; Irvin, Veronica L.; Hovell, Melbourne F.

    2016-01-01

    Background: This study examined the availability of foods sold in vending machines and school stores in United States public and private schools, and associations of availability with students' food purchases and consumption. Methods: Descriptive analyses, chi-square tests, and Spearman product-moment correlations were conducted on data collected…

  6. Special Issue: Big data and predictive computational modeling

    NASA Astrophysics Data System (ADS)

    Koutsourelakis, P. S.; Zabaras, N.; Girolami, M.

    2016-09-01

    The motivation for this special issue stems from the symposium on "Big Data and Predictive Computational Modeling" that took place at the Institute for Advanced Study, Technical University of Munich, during May 18-21, 2015. With a mindset firmly grounded in computational discovery, but a polychromatic set of viewpoints, several leading scientists, from physics and chemistry, biology, engineering, applied mathematics, scientific computing, neuroscience, statistics and machine learning, engaged in discussions and exchanged ideas for four days. This special issue contains a subset of the presentations. Video and slides of all the presentations are available on the TUM-IAS website http://www.tum-ias.de/bigdata2015/.

  7. 41 CFR 101-25.106 - Servicing of office machines.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... inventory in relation to operating needs; i.e., availability of reserve machine in case of breakdown; (9... machines. 101-25.106 Section 101-25.106 Public Contracts and Property Management Federal Property...-General Policies § 101-25.106 Servicing of office machines. (a) The determination as to whether office...

  8. Study of the Effect of Lubricant Emulsion Percentage and Tool Material on Surface Roughness in Machining of EN-AC 48000 Alloy

    NASA Astrophysics Data System (ADS)

    Soltani, E.; Shahali, H.; Zarepour, H.

    2011-01-01

    In this paper, the effect of machining parameters, namely, lubricant emulsion percentage and tool material on surface roughness has been studied in machining process of EN-AC 48000 aluminum alloy. EN-AC 48000 aluminum alloy is an important alloy in industries. Machining of this alloy is of vital importance due to built-up edge and tool wear. A L9 Taguchi standard orthogonal array has been applied as experimental design to investigate the effect of the factors and their interaction. Nine machining tests have been carried out with three random replications resulting in 27 experiments. Three type of cutting tools including coated carbide (CD1810), uncoated carbide (H10), and polycrystalline diamond (CD10) have been used in this research. Emulsion percentage of lubricant is selected at three levels including 3%, 5% and 10%. Statistical analysis has been employed to study the effect of factors and their interactions using ANOVA method. Moreover, the optimal factors level has been achieved through signal to noise ratio (S/N) analysis. Also, a regression model has been provided to predict the surface roughness. Finally, the results of the confirmation tests have been presented to verify the adequacy of the predictive model. In this research, surface quality was improved by 9% using lubricant and statistical optimization method.

  9. A Hierarchical Multivariate Bayesian Approach to Ensemble Model output Statistics in Atmospheric Prediction

    DTIC Science & Technology

    2017-09-01

    efficacy of statistical post-processing methods downstream of these dynamical model components with a hierarchical multivariate Bayesian approach to...Bayesian hierarchical modeling, Markov chain Monte Carlo methods , Metropolis algorithm, machine learning, atmospheric prediction 15. NUMBER OF PAGES...scale processes. However, this dissertation explores the efficacy of statistical post-processing methods downstream of these dynamical model components

  10. PSGMiner: A modular software for polysomnographic analysis.

    PubMed

    Umut, İlhan

    2016-06-01

    Sleep disorders affect a great percentage of the population. The diagnosis of these disorders is usually made by polysomnography. This paper details the development of new software to carry out feature extraction in order to perform robust analysis and classification of sleep events using polysomnographic data. The software, called PSGMiner, is a tool, which visualizes, processes and classifies bioelectrical data. The purpose of this program is to provide researchers with a platform with which to test new hypotheses by creating tests to check for correlations that are not available in commercially available software. The software is freely available under the GPL3 License. PSGMiner is composed of a number of diverse modules such as feature extraction, annotation, and machine learning modules, all of which are accessible from the main module. Using the software, it is possible to extract features of polysomnography using digital signal processing and statistical methods and to perform different analyses. The features can be classified through the use of five classification algorithms. PSGMiner offers an architecture designed for integrating new methods. Automatic scoring, which is available in almost all commercial PSG software, is not inherently available in this program, though it can be implemented by two different methodologies (machine learning and algorithms). While similar software focuses on a certain signal or event composed of a small number of modules with no expansion possibility, the software introduced here can handle all polysomnographic signals and events. The software simplifies the processing of polysomnographic signals for researchers and physicians that are not experts in computer programming. It can find correlations between different events which could help predict an oncoming event such as sleep apnea. The software could also be used for educational purposes. Copyright © 2016 Elsevier Ltd. All rights reserved.

  11. SIP: A Web-Based Astronomical Image Processing Program

    NASA Astrophysics Data System (ADS)

    Simonetti, J. H.

    1999-12-01

    I have written an astronomical image processing and analysis program designed to run over the internet in a Java-compatible web browser. The program, Sky Image Processor (SIP), is accessible at the SIP webpage (http://www.phys.vt.edu/SIP). Since nothing is installed on the user's machine, there is no need to download upgrades; the latest version of the program is always instantly available. Furthermore, the Java programming language is designed to work on any computer platform (any machine and operating system). The program could be used with students in web-based instruction or in a computer laboratory setting; it may also be of use in some research or outreach applications. While SIP is similar to other image processing programs, it is unique in some important respects. For example, SIP can load images from the user's machine or from the Web. An instructor can put images on a web server for students to load and analyze on their own personal computer. Or, the instructor can inform the students of images to load from any other web server. Furthermore, since SIP was written with students in mind, the philosophy is to present the user with the most basic tools necessary to process and analyze astronomical images. Images can be combined (by addition, subtraction, multiplication, or division), multiplied by a constant, smoothed, cropped, flipped, rotated, and so on. Statistics can be gathered for pixels within a box drawn by the user. Basic tools are available for gathering data from an image which can be used for performing simple differential photometry, or astrometry. Therefore, students can learn how astronomical image processing works. Since SIP is not part of a commercial CCD camera package, the program is written to handle the most common denominator image file, the FITS format.

  12. Scheduling job shop - A case study

    NASA Astrophysics Data System (ADS)

    Abas, M.; Abbas, A.; Khan, W. A.

    2016-08-01

    The scheduling in job shop is important for efficient utilization of machines in the manufacturing industry. There are number of algorithms available for scheduling of jobs which depend on machines tools, indirect consumables and jobs which are to be processed. In this paper a case study is presented for scheduling of jobs when parts are treated on available machines. Through time and motion study setup time and operation time are measured as total processing time for variety of products having different manufacturing processes. Based on due dates different level of priority are assigned to the jobs and the jobs are scheduled on the basis of priority. In view of the measured processing time, the times for processing of some new jobs are estimated and for efficient utilization of the machines available an algorithm is proposed and validated.

  13. Food labeling; calorie labeling of articles of food in vending machines. Final rule.

    PubMed

    2014-12-01

    To implement the vending machine food labeling provisions of the Patient Protection and Affordable Care Act of 2010 (ACA), the Food and Drug Administration (FDA or we) is establishing requirements for providing calorie declarations for food sold from certain vending machines. This final rule will ensure that calorie information is available for certain food sold from a vending machine that does not permit a prospective purchaser to examine the Nutrition Facts Panel before purchasing the article, or does not otherwise provide visible nutrition information at the point of purchase. The declaration of accurate and clear calorie information for food sold from vending machines will make calorie information available to consumers in a direct and accessible manner to enable consumers to make informed and healthful dietary choices. This final rule applies to certain food from vending machines operated by a person engaged in the business of owning or operating 20 or more vending machines. Vending machine operators not subject to the rules may elect to be subject to the Federal requirements by registering with FDA.

  14. A Developmental Approach to Machine Learning?

    PubMed Central

    Smith, Linda B.; Slone, Lauren K.

    2017-01-01

    Visual learning depends on both the algorithms and the training material. This essay considers the natural statistics of infant- and toddler-egocentric vision. These natural training sets for human visual object recognition are very different from the training data fed into machine vision systems. Rather than equal experiences with all kinds of things, toddlers experience extremely skewed distributions with many repeated occurrences of a very few things. And though highly variable when considered as a whole, individual views of things are experienced in a specific order – with slow, smooth visual changes moment-to-moment, and developmentally ordered transitions in scene content. We propose that the skewed, ordered, biased visual experiences of infants and toddlers are the training data that allow human learners to develop a way to recognize everything, both the pervasively present entities and the rarely encountered ones. The joint consideration of real-world statistics for learning by researchers of human and machine learning seems likely to bring advances in both disciplines. PMID:29259573

  15. Feature recognition and detection for ancient architecture based on machine vision

    NASA Astrophysics Data System (ADS)

    Zou, Zheng; Wang, Niannian; Zhao, Peng; Zhao, Xuefeng

    2018-03-01

    Ancient architecture has a very high historical and artistic value. The ancient buildings have a wide variety of textures and decorative paintings, which contain a lot of historical meaning. Therefore, the research and statistics work of these different compositional and decorative features play an important role in the subsequent research. However, until recently, the statistics of those components are mainly by artificial method, which consumes a lot of labor and time, inefficiently. At present, as the strong support of big data and GPU accelerated training, machine vision with deep learning as the core has been rapidly developed and widely used in many fields. This paper proposes an idea to recognize and detect the textures, decorations and other features of ancient building based on machine vision. First, classify a large number of surface textures images of ancient building components manually as a set of samples. Then, using the convolution neural network to train the samples in order to get a classification detector. Finally verify its precision.

  16. Volumetric Verification of Multiaxis Machine Tool Using Laser Tracker

    PubMed Central

    Aguilar, Juan José

    2014-01-01

    This paper aims to present a method of volumetric verification in machine tools with linear and rotary axes using a laser tracker. Beyond a method for a particular machine, it presents a methodology that can be used in any machine type. Along this paper, the schema and kinematic model of a machine with three axes of movement, two linear and one rotational axes, including the measurement system and the nominal rotation matrix of the rotational axis are presented. Using this, the machine tool volumetric error is obtained and nonlinear optimization techniques are employed to improve the accuracy of the machine tool. The verification provides a mathematical, not physical, compensation, in less time than other methods of verification by means of the indirect measurement of geometric errors of the machine from the linear and rotary axes. This paper presents an extensive study about the appropriateness and drawbacks of the regression function employed depending on the types of movement of the axes of any machine. In the same way, strengths and weaknesses of measurement methods and optimization techniques depending on the space available to place the measurement system are presented. These studies provide the most appropriate strategies to verify each machine tool taking into consideration its configuration and its available work space. PMID:25202744

  17. A Comparative Study of "Google Translate" Translations: An Error Analysis of English-to-Persian and Persian-to-English Translations

    ERIC Educational Resources Information Center

    Ghasemi, Hadis; Hashemian, Mahmood

    2016-01-01

    Both lack of time and the need to translate texts for numerous reasons brought about an increase in studying machine translation with a history spanning over 65 years. During the last decades, Google Translate, as a statistical machine translation (SMT), was in the center of attention for supporting 90 languages. Although there are many studies on…

  18. A New Mathematical Framework for Design Under Uncertainty

    DTIC Science & Technology

    2016-05-05

    blending multiple information sources via auto-regressive stochastic modeling. A computationally efficient machine learning framework is developed based on...sion and machine learning approaches; see Fig. 1. This will lead to a comprehensive description of system performance with less uncertainty than in the...Bayesian optimization of super-cavitating hy- drofoils The goal of this study is to demonstrate the capabilities of statistical learning and

  19. A computational visual saliency model based on statistics and machine learning.

    PubMed

    Lin, Ru-Je; Lin, Wei-Song

    2014-08-01

    Identifying the type of stimuli that attracts human visual attention has been an appealing topic for scientists for many years. In particular, marking the salient regions in images is useful for both psychologists and many computer vision applications. In this paper, we propose a computational approach for producing saliency maps using statistics and machine learning methods. Based on four assumptions, three properties (Feature-Prior, Position-Prior, and Feature-Distribution) can be derived and combined by a simple intersection operation to obtain a saliency map. These properties are implemented by a similarity computation, support vector regression (SVR) technique, statistical analysis of training samples, and information theory using low-level features. This technique is able to learn the preferences of human visual behavior while simultaneously considering feature uniqueness. Experimental results show that our approach performs better in predicting human visual attention regions than 12 other models in two test databases. © 2014 ARVO.

  20. Machine Learning Predictions of a Multiresolution Climate Model Ensemble

    NASA Astrophysics Data System (ADS)

    Anderson, Gemma J.; Lucas, Donald D.

    2018-05-01

    Statistical models of high-resolution climate models are useful for many purposes, including sensitivity and uncertainty analyses, but building them can be computationally prohibitive. We generated a unique multiresolution perturbed parameter ensemble of a global climate model. We use a novel application of a machine learning technique known as random forests to train a statistical model on the ensemble to make high-resolution model predictions of two important quantities: global mean top-of-atmosphere energy flux and precipitation. The random forests leverage cheaper low-resolution simulations, greatly reducing the number of high-resolution simulations required to train the statistical model. We demonstrate that high-resolution predictions of these quantities can be obtained by training on an ensemble that includes only a small number of high-resolution simulations. We also find that global annually averaged precipitation is more sensitive to resolution changes than to any of the model parameters considered.

  1. Nowcasting Cloud Fields for U.S. Air Force Special Operations

    DTIC Science & Technology

    2017-03-01

    application of Bayes’ Rule offers many advantages over Kernel Density Estimation (KDE) and other commonly used statistical post-processing methods...reflectance and probability of cloud. A statistical post-processing technique is applied using Bayesian estimation to train the system from a set of past...nowcasting, low cloud forecasting, cloud reflectance, ISR, Bayesian estimation, statistical post-processing, machine learning 15. NUMBER OF PAGES

  2. 49 CFR 214.533 - Schedule of repairs subject to availability of parts.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... Maintenance Machines and Hi-Rail Vehicles § 214.533 Schedule of repairs subject to availability of parts. (a... 49 Transportation 4 2011-10-01 2011-10-01 false Schedule of repairs subject to availability of... maintenance machine or a hi-rail vehicle by the end of the next business day following the report of the...

  3. Machine Learning–Based Differential Network Analysis: A Study of Stress-Responsive Transcriptomes in Arabidopsis[W

    PubMed Central

    Ma, Chuang; Xin, Mingming; Feldmann, Kenneth A.; Wang, Xiangfeng

    2014-01-01

    Machine learning (ML) is an intelligent data mining technique that builds a prediction model based on the learning of prior knowledge to recognize patterns in large-scale data sets. We present an ML-based methodology for transcriptome analysis via comparison of gene coexpression networks, implemented as an R package called machine learning–based differential network analysis (mlDNA) and apply this method to reanalyze a set of abiotic stress expression data in Arabidopsis thaliana. The mlDNA first used a ML-based filtering process to remove nonexpressed, constitutively expressed, or non-stress-responsive “noninformative” genes prior to network construction, through learning the patterns of 32 expression characteristics of known stress-related genes. The retained “informative” genes were subsequently analyzed by ML-based network comparison to predict candidate stress-related genes showing expression and network differences between control and stress networks, based on 33 network topological characteristics. Comparative evaluation of the network-centric and gene-centric analytic methods showed that mlDNA substantially outperformed traditional statistical testing–based differential expression analysis at identifying stress-related genes, with markedly improved prediction accuracy. To experimentally validate the mlDNA predictions, we selected 89 candidates out of the 1784 predicted salt stress–related genes with available SALK T-DNA mutagenesis lines for phenotypic screening and identified two previously unreported genes, mutants of which showed salt-sensitive phenotypes. PMID:24520154

  4. Pattern Activity Clustering and Evaluation (PACE)

    NASA Astrophysics Data System (ADS)

    Blasch, Erik; Banas, Christopher; Paul, Michael; Bussjager, Becky; Seetharaman, Guna

    2012-06-01

    With the vast amount of network information available on activities of people (i.e. motions, transportation routes, and site visits) there is a need to explore the salient properties of data that detect and discriminate the behavior of individuals. Recent machine learning approaches include methods of data mining, statistical analysis, clustering, and estimation that support activity-based intelligence. We seek to explore contemporary methods in activity analysis using machine learning techniques that discover and characterize behaviors that enable grouping, anomaly detection, and adversarial intent prediction. To evaluate these methods, we describe the mathematics and potential information theory metrics to characterize behavior. A scenario is presented to demonstrate the concept and metrics that could be useful for layered sensing behavior pattern learning and analysis. We leverage work on group tracking, learning and clustering approaches; as well as utilize information theoretical metrics for classification, behavioral and event pattern recognition, and activity and entity analysis. The performance evaluation of activity analysis supports high-level information fusion of user alerts, data queries and sensor management for data extraction, relations discovery, and situation analysis of existing data.

  5. Texture classification of lung computed tomography images

    NASA Astrophysics Data System (ADS)

    Pheng, Hang See; Shamsuddin, Siti M.

    2013-03-01

    Current development of algorithms in computer-aided diagnosis (CAD) scheme is growing rapidly to assist the radiologist in medical image interpretation. Texture analysis of computed tomography (CT) scans is one of important preliminary stage in the computerized detection system and classification for lung cancer. Among different types of images features analysis, Haralick texture with variety of statistical measures has been used widely in image texture description. The extraction of texture feature values is essential to be used by a CAD especially in classification of the normal and abnormal tissue on the cross sectional CT images. This paper aims to compare experimental results using texture extraction and different machine leaning methods in the classification normal and abnormal tissues through lung CT images. The machine learning methods involve in this assessment are Artificial Immune Recognition System (AIRS), Naive Bayes, Decision Tree (J48) and Backpropagation Neural Network. AIRS is found to provide high accuracy (99.2%) and sensitivity (98.0%) in the assessment. For experiments and testing purpose, publicly available datasets in the Reference Image Database to Evaluate Therapy Response (RIDER) are used as study cases.

  6. Support vector machines to detect physiological patterns for EEG and EMG-based human-computer interaction: a review

    NASA Astrophysics Data System (ADS)

    Quitadamo, L. R.; Cavrini, F.; Sbernini, L.; Riillo, F.; Bianchi, L.; Seri, S.; Saggio, G.

    2017-02-01

    Support vector machines (SVMs) are widely used classifiers for detecting physiological patterns in human-computer interaction (HCI). Their success is due to their versatility, robustness and large availability of free dedicated toolboxes. Frequently in the literature, insufficient details about the SVM implementation and/or parameters selection are reported, making it impossible to reproduce study analysis and results. In order to perform an optimized classification and report a proper description of the results, it is necessary to have a comprehensive critical overview of the applications of SVM. The aim of this paper is to provide a review of the usage of SVM in the determination of brain and muscle patterns for HCI, by focusing on electroencephalography (EEG) and electromyography (EMG) techniques. In particular, an overview of the basic principles of SVM theory is outlined, together with a description of several relevant literature implementations. Furthermore, details concerning reviewed papers are listed in tables and statistics of SVM use in the literature are presented. Suitability of SVM for HCI is discussed and critical comparisons with other classifiers are reported.

  7. Defense Logistics: Space-Available Travel Challenges May Be Exacerbated If Eligibility Expands

    DTIC Science & Technology

    2012-09-10

    space-available travelers’ use of terminal facilities results in additional maintenance costs for waiting areas, restrooms, and vending machines ...additional required maintenance. For example, additional travelers’ use of waiting areas, restrooms, and vending machines in the terminals could require

  8. Machine learning approach for automated screening of malaria parasite using light microscopic images.

    PubMed

    Das, Dev Kumar; Ghosh, Madhumala; Pal, Mallika; Maiti, Asok K; Chakraborty, Chandan

    2013-02-01

    The aim of this paper is to address the development of computer assisted malaria parasite characterization and classification using machine learning approach based on light microscopic images of peripheral blood smears. In doing this, microscopic image acquisition from stained slides, illumination correction and noise reduction, erythrocyte segmentation, feature extraction, feature selection and finally classification of different stages of malaria (Plasmodium vivax and Plasmodium falciparum) have been investigated. The erythrocytes are segmented using marker controlled watershed transformation and subsequently total ninety six features describing shape-size and texture of erythrocytes are extracted in respect to the parasitemia infected versus non-infected cells. Ninety four features are found to be statistically significant in discriminating six classes. Here a feature selection-cum-classification scheme has been devised by combining F-statistic, statistical learning techniques i.e., Bayesian learning and support vector machine (SVM) in order to provide the higher classification accuracy using best set of discriminating features. Results show that Bayesian approach provides the highest accuracy i.e., 84% for malaria classification by selecting 19 most significant features while SVM provides highest accuracy i.e., 83.5% with 9 most significant features. Finally, the performance of these two classifiers under feature selection framework has been compared toward malaria parasite classification. Copyright © 2012 Elsevier Ltd. All rights reserved.

  9. A Critical Review for Developing Accurate and Dynamic Predictive Models Using Machine Learning Methods in Medicine and Health Care.

    PubMed

    Alanazi, Hamdan O; Abdullah, Abdul Hanan; Qureshi, Kashif Naseer

    2017-04-01

    Recently, Artificial Intelligence (AI) has been used widely in medicine and health care sector. In machine learning, the classification or prediction is a major field of AI. Today, the study of existing predictive models based on machine learning methods is extremely active. Doctors need accurate predictions for the outcomes of their patients' diseases. In addition, for accurate predictions, timing is another significant factor that influences treatment decisions. In this paper, existing predictive models in medicine and health care have critically reviewed. Furthermore, the most famous machine learning methods have explained, and the confusion between a statistical approach and machine learning has clarified. A review of related literature reveals that the predictions of existing predictive models differ even when the same dataset is used. Therefore, existing predictive models are essential, and current methods must be improved.

  10. Unified risk analysis of fatigue failure in ductile alloy components during all three stages of fatigue crack evolution process.

    PubMed

    Patankar, Ravindra

    2003-10-01

    Statistical fatigue life of a ductile alloy specimen is traditionally divided into three stages, namely, crack nucleation, small crack growth, and large crack growth. Crack nucleation and small crack growth show a wide variation and hence a big spread on cycles versus crack length graph. Relatively, large crack growth shows a lesser variation. Therefore, different models are fitted to the different stages of the fatigue evolution process, thus treating different stages as different phenomena. With these independent models, it is impossible to predict one phenomenon based on the information available about the other phenomenon. Experimentally, it is easier to carry out crack length measurements of large cracks compared to nucleating cracks and small cracks. Thus, it is easier to collect statistical data for large crack growth compared to the painstaking effort it would take to collect statistical data for crack nucleation and small crack growth. This article presents a fracture mechanics-based stochastic model of fatigue crack growth in ductile alloys that are commonly encountered in mechanical structures and machine components. The model has been validated by Ray (1998) for crack propagation by various statistical fatigue data. Based on the model, this article proposes a technique to predict statistical information of fatigue crack nucleation and small crack growth properties that uses the statistical properties of large crack growth under constant amplitude stress excitation. The statistical properties of large crack growth under constant amplitude stress excitation can be obtained via experiments.

  11. Applying Sparse Machine Learning Methods to Twitter: Analysis of the 2012 Change in Pap Smear Guidelines. A Sequential Mixed-Methods Study

    PubMed Central

    Godbehere, Andrew; Le, Gem; El Ghaoui, Laurent; Sarkar, Urmimala

    2016-01-01

    Background It is difficult to synthesize the vast amount of textual data available from social media websites. Capturing real-world discussions via social media could provide insights into individuals’ opinions and the decision-making process. Objective We conducted a sequential mixed methods study to determine the utility of sparse machine learning techniques in summarizing Twitter dialogues. We chose a narrowly defined topic for this approach: cervical cancer discussions over a 6-month time period surrounding a change in Pap smear screening guidelines. Methods We applied statistical methodologies, known as sparse machine learning algorithms, to summarize Twitter messages about cervical cancer before and after the 2012 change in Pap smear screening guidelines by the US Preventive Services Task Force (USPSTF). All messages containing the search terms “cervical cancer,” “Pap smear,” and “Pap test” were analyzed during: (1) January 1–March 13, 2012, and (2) March 14–June 30, 2012. Topic modeling was used to discern the most common topics from each time period, and determine the singular value criterion for each topic. The results were then qualitatively coded from top 10 relevant topics to determine the efficiency of clustering method in grouping distinct ideas, and how the discussion differed before vs. after the change in guidelines . Results This machine learning method was effective in grouping the relevant discussion topics about cervical cancer during the respective time periods (~20% overall irrelevant content in both time periods). Qualitative analysis determined that a significant portion of the top discussion topics in the second time period directly reflected the USPSTF guideline change (eg, “New Screening Guidelines for Cervical Cancer”), and many topics in both time periods were addressing basic screening promotion and education (eg, “It is Cervical Cancer Awareness Month! Click the link to see where you can receive a free or low cost Pap test.”) Conclusions It was demonstrated that machine learning tools can be useful in cervical cancer prevention and screening discussions on Twitter. This method allowed us to prove that there is publicly available significant information about cervical cancer screening on social media sites. Moreover, we observed a direct impact of the guideline change within the Twitter messages. PMID:27288093

  12. Towards application of rule learning to the meta-analysis of clinical data: an example of the metabolic syndrome.

    PubMed

    Wojtusiak, Janusz; Michalski, Ryszard S; Simanivanh, Thipkesone; Baranova, Ancha V

    2009-12-01

    Systematic reviews and meta-analysis of published clinical datasets are important part of medical research. By combining results of multiple studies, meta-analysis is able to increase confidence in its conclusions, validate particular study results, and sometimes lead to new findings. Extensive theory has been built on how to aggregate results from multiple studies and arrive to the statistically valid conclusions. Surprisingly, very little has been done to adopt advanced machine learning methods to support meta-analysis. In this paper we describe a novel machine learning methodology that is capable of inducing accurate and easy to understand attributional rules from aggregated data. Thus, the methodology can be used to support traditional meta-analysis in systematic reviews. Most machine learning applications give primary attention to predictive accuracy of the learned knowledge, and lesser attention to its understandability. Here we employed attributional rules, the special form of rules that are relatively easy to interpret for medical experts who are not necessarily trained in statistics and meta-analysis. The methodology has been implemented and initially tested on a set of publicly available clinical data describing patients with metabolic syndrome (MS). The objective of this application was to determine rules describing combinations of clinical parameters used for metabolic syndrome diagnosis, and to develop rules for predicting whether particular patients are likely to develop secondary complications of MS. The aggregated clinical data was retrieved from 20 separate hospital cohorts that included 12 groups of patients with present liver disease symptoms and 8 control groups of healthy subjects. The total of 152 attributes were used, most of which were measured, however, in different studies. Twenty most common attributes were selected for the rule learning process. By applying the developed rule learning methodology we arrived at several different possible rulesets that can be used to predict three considered complications of MS, namely nonalcoholic fatty liver disease (NAFLD), simple steatosis (SS), and nonalcoholic steatohepatitis (NASH).

  13. Profits, commercial food supplier involvement, and school vending machine snack food availability: implications for implementing the new competitive foods rule.

    PubMed

    Terry-McElrath, Yvonne M; Hood, Nancy E; Colabianchi, Natalie; O'Malley, Patrick M; Johnston, Lloyd D

    2014-07-01

    The 2013-2014 school year involved preparation for implementing the new US Department of Agriculture (USDA) competitive foods nutrition standards. An awareness of associations between commercial supplier involvement, food vending practices, and food vending item availability may assist schools in preparing for the new standards. Analyses used 2007-2012 questionnaire data from administrators of 814 middle and 801 high schools in the nationally representative Youth, Education, and Society study to examine prevalence of profit from and commercial involvement with vending machine food sales, and associations between such measures and food availability. Profits for the school district were associated with decreased low-nutrient, energy-dense (LNED) food availability and increased fruit/vegetable availability. Profits for the school and use of company suppliers were associated with increased LNED availability; company suppliers also were associated with decreased fruit/vegetable availability. Supplier "say" in vending food selection was associated with increased LNED availability and decreased fruit/vegetable availability. Results support (1) increased district involvement with school vending policies and practices, and (2) limited supplier "say" as to what items are made available in student-accessed vending machines. Schools and districts should pay close attention to which food items replace vending machine LNED foods following implementation of the new nutrition standards. © 2014, American School Health Association.

  14. Machine vision system for measuring conifer seedling morphology

    NASA Astrophysics Data System (ADS)

    Rigney, Michael P.; Kranzler, Glenn A.

    1995-01-01

    A PC-based machine vision system providing rapid measurement of bare-root tree seedling morphological features has been designed. The system uses backlighting and a 2048-pixel line- scan camera to acquire images with transverse resolutions as high as 0.05 mm for precise measurement of stem diameter. Individual seedlings are manually loaded on a conveyor belt and inspected by the vision system in less than 0.25 seconds. Designed for quality control and morphological data acquisition by nursery personnel, the system provides a user-friendly, menu-driven graphical interface. The system automatically locates the seedling root collar and measures stem diameter, shoot height, sturdiness ratio, root mass length, projected shoot and root area, shoot-root area ratio, and percent fine roots. Sample statistics are computed for each measured feature. Measurements for each seedling may be stored for later analysis. Feature measurements may be compared with multi-class quality criteria to determine sample quality or to perform multi-class sorting. Statistical summary and classification reports may be printed to facilitate the communication of quality concerns with grading personnel. Tests were conducted at a commercial forest nursery to evaluate measurement precision. Four quality control personnel measured root collar diameter, stem height, and root mass length on each of 200 conifer seedlings. The same seedlings were inspected four times by the machine vision system. Machine stem diameter measurement precision was four times greater than that of manual measurements. Machine and manual measurements had comparable precision for shoot height and root mass length.

  15. Effects of the sliding rehabilitation machine on balance and gait in chronic stroke patients - a controlled clinical trial.

    PubMed

    Byun, Seung-Deuk; Jung, Tae-Du; Kim, Chul-Hyun; Lee, Yang-Soo

    2011-05-01

    To investigate the effects of a sliding rehabilitation machine on balance and gait in chronic stroke patients. A non-randomized crossover design. Inpatient rehabilitation in a general hospital. Thirty patients with chronic stroke who had medium or high falling risk as determined by the Berg Balance Scale. Participants were divided into two groups and underwent four weeks of training. Group A (n = 15) underwent training with the sliding rehabilitation machine for two weeks with concurrent conventional training, followed by conventional training only for another two weeks. Group B (n = 15) underwent the same training in reverse order. The effect of the experimental period was defined as the sum of changes during training with sliding rehabilitation machine in each group, and the effect of the control period was defined as those during the conventional training only in each group. Functional Ambulation Category, Berg Balance Scale, Six-Minute Walk Test, Timed Up and Go Test, Korean Modified Barthel Index, Modified Ashworth Scale and Manual Muscle Test. Statistically significant improvements were observed in all parameters except Modified Ashworth Scale in the experimental period, but only in Six-Minute Walk Test (P < 0.01) in the control period. There were also statistically significant differences in the degree of change in all parameters in the experimental period as compared to the control period. The sliding rehabilitation machine may be a useful tool for the improvement of balance and gait abilities in chronic stroke patients.

  16. The association between the availability of sugar-sweetened beverage in school vending machines and its consumption among adolescents in California: a propensity score matching approach.

    PubMed

    Shi, Lu

    2010-01-01

    There is controversy over to what degree banning sugar-sweetened beverage (SSB) sales at schools could decrease the SSB intake. This paper uses the adolescent sample of 2005 California Health Interview Survey to estimate the association between the availability of SSB from school vending machines and the amount of SSB consumption. Propensity score stratification and kernel-based propensity score matching are used to address the selection bias issue in cross-sectional data. Propensity score stratification shows that adolescents who had access to SSB through their school vending machines consumed 0.170 more drinks of SSB than those who did not (P < .05). Kernel-based propensity score matching shows the SSB consumption difference to be 0.158 on the prior day (P < .05). This paper strengthens the evidence for the association between SSB availability via school vending machines and the actual SSB consumption, while future studies are needed to explore changes in other beverages after SSB becomes less available.

  17. Healthier choices in an Australian health service: a pre-post audit of an intervention to improve the nutritional value of foods and drinks in vending machines and food outlets

    PubMed Central

    2013-01-01

    Background Vending machines and shops located within health care facilities are a source of food and drinks for staff, visitors and outpatients and they have the potential to promote healthy food and drink choices. This paper describes perceptions of parents and managers of health-service located food outlets towards the availability and labelling of healthier food options and the food and drinks offered for sale in health care facilities in Australia. It also describes the impact of an intervention to improve availability and labelling of healthier foods and drinks for sale. Methods Parents (n = 168) and food outlet managers (n = 17) were surveyed. Food and drinks for sale in health-service operated food outlets (n = 5) and vending machines (n = 90) in health care facilities in the Hunter New England region of NSW were audited pre (2007) and post (2010/11) the introduction of policy and associated support to increase the availability of healthier choices. A traffic light system was used to classify foods from least (red) to most healthy choices (green). Results Almost all (95%) parents and most (65%) food outlet managers thought food outlets on health service sites should have signs clearly showing healthy choices. Parents (90%) also thought all food outlets on health service sites should provide mostly healthy items compared to 47% of managers. The proportion of healthier beverage slots in vending machines increased from 29% to 51% at follow-up and the proportion of machines that labelled healthier drinks increased from 0 to 26%. No outlets labelled healthier items at baseline compared to 4 out of 5 after the intervention. No changes were observed in the availability or labelling of healthier food in vending machines or the availability of healthier food or drinks in food outlets. Conclusions Baseline availability and labelling of healthier food and beverage choices for sale in health care facilities was poor in spite of the support of parents and outlet managers for such initiatives. The intervention encouraged improvements in the availability and labelling of healthier drinks but not foods in vending machines. PMID:24274916

  18. Healthier choices in an Australian health service: a pre-post audit of an intervention to improve the nutritional value of foods and drinks in vending machines and food outlets.

    PubMed

    Bell, Colin; Pond, Nicole; Davies, Lynda; Francis, Jeryl Lynn; Campbell, Elizabeth; Wiggers, John

    2013-11-25

    Vending machines and shops located within health care facilities are a source of food and drinks for staff, visitors and outpatients and they have the potential to promote healthy food and drink choices. This paper describes perceptions of parents and managers of health-service located food outlets towards the availability and labelling of healthier food options and the food and drinks offered for sale in health care facilities in Australia. It also describes the impact of an intervention to improve availability and labelling of healthier foods and drinks for sale. Parents (n = 168) and food outlet managers (n = 17) were surveyed. Food and drinks for sale in health-service operated food outlets (n = 5) and vending machines (n = 90) in health care facilities in the Hunter New England region of NSW were audited pre (2007) and post (2010/11) the introduction of policy and associated support to increase the availability of healthier choices. A traffic light system was used to classify foods from least (red) to most healthy choices (green). Almost all (95%) parents and most (65%) food outlet managers thought food outlets on health service sites should have signs clearly showing healthy choices. Parents (90%) also thought all food outlets on health service sites should provide mostly healthy items compared to 47% of managers. The proportion of healthier beverage slots in vending machines increased from 29% to 51% at follow-up and the proportion of machines that labelled healthier drinks increased from 0 to 26%. No outlets labelled healthier items at baseline compared to 4 out of 5 after the intervention. No changes were observed in the availability or labelling of healthier food in vending machines or the availability of healthier food or drinks in food outlets. Baseline availability and labelling of healthier food and beverage choices for sale in health care facilities was poor in spite of the support of parents and outlet managers for such initiatives. The intervention encouraged improvements in the availability and labelling of healthier drinks but not foods in vending machines.

  19. Are products sold in university vending machines nutritionally poor? A food environment audit.

    PubMed

    Grech, Amanda; Hebden, Lana; Roy, Rajshri; Allman-Farinelli, Margaret

    2017-04-01

    (i) To audit the nutritional composition, promotion and cost of products available from vending machines available to young adults; and (ii) to examine the relationship between product availability and sales. A cross-sectional analysis of snacks and beverages available and purchased at a large urban university was conducted between March and September 2014. Sales were electronically tracked for nine months. A total of 61 vending machines were identified; 95% (n = 864) of the available snacks and 49% of beverages (n = 455) were less-healthy items. The mean (SD) nutrient value of snacks sold was: energy 1173 kJ (437.5), saturated fat 5.36 g (3.6), sodium 251 mg (219), fibre 1.56 g (1.29) and energy density 20.16 kJ/g (2.34) per portion vended. There was a strong correlation between the availability of food and beverages and purchases (R 2 = 0.98, P < 0.001). Vending machines market and sell less-healthy food and beverages to university students. Efforts to improve the nutritional quality are indicated and afford an opportunity to improve the diet quality of young adults, a group at risk of obesity. © 2016 Dietitians Association of Australia.

  20. Understanding dental CAD/CAM for restorations--dental milling machines from a mechanical engineering viewpoint. Part B: labside milling machines.

    PubMed

    Lebon, Nicolas; Tapie, Laurent; Duret, Francois; Attal, Jean-Pierre

    2016-01-01

    Nowadays, dental numerical controlled (NC) milling machines are available for dental laboratories (labside solution) and dental production centers. This article provides a mechanical engineering approach to NC milling machines to help dental technicians understand the involvement of technology in digital dentistry practice. The technical and economic criteria are described for four labside and two production center dental NC milling machines available on the market. The technical criteria are focused on the capacities of the embedded technologies of milling machines to mill prosthetic materials and various restoration shapes. The economic criteria are focused on investment cost and interoperability with third-party software. The clinical relevance of the technology is discussed through the accuracy and integrity of the restoration. It can be asserted that dental production center milling machines offer a wider range of materials and types of restoration shapes than labside solutions, while labside solutions offer a wider range than chairside solutions. The accuracy and integrity of restorations may be improved as a function of the embedded technologies provided. However, the more complex the technical solutions available, the more skilled the user must be. Investment cost and interoperability with third-party software increase according to the quality of the embedded technologies implemented. Each private dental practice may decide which fabrication option to use depending on the scope of the practice.

  1. Machinability of titanium metal matrix composites (Ti-MMCs)

    NASA Astrophysics Data System (ADS)

    Aramesh, Maryam

    Titanium metal matrix composites (Ti-MMCs), as a new generation of materials, have various potential applications in aerospace and automotive industries. The presence of ceramic particles enhances the physical and mechanical properties of the alloy matrix. However, the hard and abrasive nature of these particles causes various issues in the field of their machinability. Severe tool wear and short tool life are the most important drawbacks of machining this class of materials. There is very limited work in the literature regarding the machinability of this class of materials especially in the area of tool life estimation and tool wear. By far, polycrystalline diamond (PCD) tools appear to be the best choice for machining MMCs from researchers' point of view. However, due to their high cost, economical alternatives are sought. Cubic boron nitride (CBN) inserts, as the second hardest available tools, show superior characteristics such as great wear resistance, high hardness at elevated temperatures, a low coefficient of friction and a high melting point. Yet, so far CBN tools have not been studied during machining of Ti-MMCs. In this study, a comprehensive study has been performed to explore the tool wear mechanisms of CBN inserts during turning of Ti-MMCs. The unique morphology of the worn faces of the tools was investigated for the first time, which led to new insights in the identification of chemical wear mechanisms during machining of Ti-MMCs. Utilizing the full tool life capacity of cutting tools is also very crucial, due to the considerable costs associated with suboptimal replacement of tools. This strongly motivates development of a reliable model for tool life estimation under any cutting conditions. In this study, a novel model based on the survival analysis methodology is developed to estimate the progressive states of tool wear under any cutting conditions during machining of Ti-MMCs. This statistical model takes into account the machining time in addition to the effect of cutting parameters. Thus, promising results were obtained which showed a very good agreement with the experimental results. Moreover, a more advanced model was constructed, by adding the tool wear as another variable to the previous model. Therefore, a new model was proposed for estimating the remaining life of worn inserts under different cutting conditions, using the current tool wear data as an input. The results of this model were validated with the experimental results. The estimated results were well consistent with the results obtained from the experiments.

  2. When Machines Think: Radiology's Next Frontier.

    PubMed

    Dreyer, Keith J; Geis, J Raymond

    2017-12-01

    Artificial intelligence (AI), machine learning, and deep learning are terms now seen frequently, all of which refer to computer algorithms that change as they are exposed to more data. Many of these algorithms are surprisingly good at recognizing objects in images. The combination of large amounts of machine-consumable digital data, increased and cheaper computing power, and increasingly sophisticated statistical models combine to enable machines to find patterns in data in ways that are not only cost-effective but also potentially beyond humans' abilities. Building an AI algorithm can be surprisingly easy. Understanding the associated data structures and statistics, on the other hand, is often difficult and obscure. Converting the algorithm into a sophisticated product that works consistently in broad, general clinical use is complex and incompletely understood. To show how these AI products reduce costs and improve outcomes will require clinical translation and industrial-grade integration into routine workflow. Radiology has the chance to leverage AI to become a center of intelligently aggregated, quantitative, diagnostic information. Centaur radiologists, formed as a synergy of human plus computer, will provide interpretations using data extracted from images by humans and image-analysis computer algorithms, as well as the electronic health record, genomics, and other disparate sources. These interpretations will form the foundation of precision health care, or care customized to an individual patient. © RSNA, 2017.

  3. Machinery Bearing Fault Diagnosis Using Variational Mode Decomposition and Support Vector Machine as a Classifier

    NASA Astrophysics Data System (ADS)

    Rama Krishna, K.; Ramachandran, K. I.

    2018-02-01

    Crack propagation is a major cause of failure in rotating machines. It adversely affects the productivity, safety, and the machining quality. Hence, detecting the crack’s severity accurately is imperative for the predictive maintenance of such machines. Fault diagnosis is an established concept in identifying the faults, for observing the non-linear behaviour of the vibration signals at various operating conditions. In this work, we find the classification efficiencies for both original and the reconstructed vibrational signals. The reconstructed signals are obtained using Variational Mode Decomposition (VMD), by splitting the original signal into three intrinsic mode functional components and framing them accordingly. Feature extraction, feature selection and feature classification are the three phases in obtaining the classification efficiencies. All the statistical features from the original signals and reconstructed signals are found out in feature extraction process individually. A few statistical parameters are selected in feature selection process and are classified using the SVM classifier. The obtained results show the best parameters and appropriate kernel in SVM classifier for detecting the faults in bearings. Hence, we conclude that better results were obtained by VMD and SVM process over normal process using SVM. This is owing to denoising and filtering the raw vibrational signals.

  4. Occupational Accidents with Agricultural Machinery in Austria.

    PubMed

    Kogler, Robert; Quendler, Elisabeth; Boxberger, Josef

    2016-01-01

    The number of recognized accidents with fatalities during agricultural and forestry work, despite better technology and coordinated prevention and trainings, is still very high in Austria. The accident scenarios in which people are injured are very different on farms. The common causes of accidents in agriculture and forestry are the loss of control of machine, means of transport or handling equipment, hand-held tool, and object or animal, followed by slipping, stumbling and falling, breakage, bursting, splitting, slipping, fall, and collapse of material agent. In the literature, a number of studies of general (machine- and animal-related accidents) and specific (machine-related accidents) agricultural and forestry accident situations can be found that refer to different databases. From the database Data of the Austrian Workers Compensation Board (AUVA) about occupational accidents with different agricultural machinery over the period 2008-2010 in Austria, main characteristics of the accident, the victim, and the employer as well as variables on causes and circumstances by frequency and contexts of parameters were statistically analyzed by employing the chi-square test and odds ratio. The aim of the study was to determine the information content and quality of the European Statistics on Accidents at Work (ESAW) variables to evaluate safety gaps and risks as well as the accidental man-machine interaction.

  5. oneChannelGUI: a graphical interface to Bioconductor tools, designed for life scientists who are not familiar with R language.

    PubMed

    Sanges, Remo; Cordero, Francesca; Calogero, Raffaele A

    2007-12-15

    OneChannelGUI is an add-on Bioconductor package providing a new set of functions extending the capability of the affylmGUI package. This library provides a graphical interface (GUI) for Bioconductor libraries to be used for quality control, normalization, filtering, statistical validation and data mining for single channel microarrays. Affymetrix 3' expression (IVT) arrays as well as the new whole transcript expression arrays, i.e. gene/exon 1.0 ST, are actually implemented. oneChannelGUI is available for most platforms on which R runs, i.e. Windows and Unix-like machines. http://www.bioconductor.org/packages/2.0/bioc/html/oneChannelGUI.html

  6. Travelogue--a newcomer encounters statistics and the computer.

    PubMed

    Bruce, Peter

    2011-11-01

    Computer-intensive methods have revolutionized statistics, giving rise to new areas of analysis and expertise in predictive analytics, image processing, pattern recognition, machine learning, genomic analysis, and more. Interest naturally centers on the new capabilities the computer allows the analyst to bring to the table. This article, instead, focuses on the account of how computer-based resampling methods, with their relative simplicity and transparency, enticed one individual, untutored in statistics or mathematics, on a long journey into learning statistics, then teaching it, then starting an education institution.

  7. Evaluation of liquefaction potential of soil based on standard penetration test using multi-gene genetic programming model

    NASA Astrophysics Data System (ADS)

    Muduli, Pradyut; Das, Sarat

    2014-06-01

    This paper discusses the evaluation of liquefaction potential of soil based on standard penetration test (SPT) dataset using evolutionary artificial intelligence technique, multi-gene genetic programming (MGGP). The liquefaction classification accuracy (94.19%) of the developed liquefaction index (LI) model is found to be better than that of available artificial neural network (ANN) model (88.37%) and at par with the available support vector machine (SVM) model (94.19%) on the basis of the testing data. Further, an empirical equation is presented using MGGP to approximate the unknown limit state function representing the cyclic resistance ratio (CRR) of soil based on developed LI model. Using an independent database of 227 cases, the overall rates of successful prediction of occurrence of liquefaction and non-liquefaction are found to be 87, 86, and 84% by the developed MGGP based model, available ANN and the statistical models, respectively, on the basis of calculated factor of safety (F s) against the liquefaction occurrence.

  8. SCENERY: a web application for (causal) network reconstruction from cytometry data.

    PubMed

    Papoutsoglou, Georgios; Athineou, Giorgos; Lagani, Vincenzo; Xanthopoulos, Iordanis; Schmidt, Angelika; Éliás, Szabolcs; Tegnér, Jesper; Tsamardinos, Ioannis

    2017-07-03

    Flow and mass cytometry technologies can probe proteins as biological markers in thousands of individual cells simultaneously, providing unprecedented opportunities for reconstructing networks of protein interactions through machine learning algorithms. The network reconstruction (NR) problem has been well-studied by the machine learning community. However, the potentials of available methods remain largely unknown to the cytometry community, mainly due to their intrinsic complexity and the lack of comprehensive, powerful and easy-to-use NR software implementations specific for cytometry data. To bridge this gap, we present Single CEll NEtwork Reconstruction sYstem (SCENERY), a web server featuring several standard and advanced cytometry data analysis methods coupled with NR algorithms in a user-friendly, on-line environment. In SCENERY, users may upload their data and set their own study design. The server offers several data analysis options categorized into three classes of methods: data (pre)processing, statistical analysis and NR. The server also provides interactive visualization and download of results as ready-to-publish images or multimedia reports. Its core is modular and based on the widely-used and robust R platform allowing power users to extend its functionalities by submitting their own NR methods. SCENERY is available at scenery.csd.uoc.gr or http://mensxmachina.org/en/software/. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  9. VirtualSpace: A vision of a machine-learned virtual space environment

    NASA Astrophysics Data System (ADS)

    Bortnik, J.; Sarno-Smith, L. K.; Chu, X.; Li, W.; Ma, Q.; Angelopoulos, V.; Thorne, R. M.

    2017-12-01

    Space borne instrumentation tends to come and go. A typical instrument will go through a phase of design and construction, be deployed on a spacecraft for several years while it collects data, and then be decommissioned and fade into obscurity. The data collected from that instrument will typically receive much attention while it is being collected, perhaps in the form of event studies, conjunctions with other instruments, or a few statistical surveys, but once the instrument or spacecraft is decommissioned, the data will be archived and receive progressively less attention with every passing year. This is the fate of all historical data, and will be the fate of data being collected by instruments even at the present time. But what if those instruments could come alive, and all be simultaneously present at any and every point in time and space? Imagine the scientific insights, and societal gains that could be achieved with a grand (virtual) heliophysical observatory that consists of every current and historical mission ever deployed? We propose that this is not just fantasy but is imminently doable with the data currently available, with the present computational resources, and with currently available algorithms. This project revitalizes existing data resources and lays the groundwork for incorporating data from every future mission to expand the scope and refine the resolution of the virtual observatory. We call this project VirtualSpace: a machine-learned virtual space environment.

  10. Foods Sold in School Vending Machines are Associated with Overall Student Dietary Intake

    PubMed Central

    Rovner, Alisha J.; Nansel, Tonja R.; Wang, Jing; Iannotti, Ronald J.

    2010-01-01

    Purpose To examine the association between foods sold in school vending machines and students’ dietary behaviors. Methods The 2005-2006 US Health Behavior in School Aged Children (HBSC) survey was administered to 6th to 10th graders and school administrators. Students’ dietary intake was estimated with a brief food frequency measure. Administrators completed questions about foods sold in vending machines. For each food intake behavior, a multilevel regression analysis modeled students (level 1) nested within schools (level 2), with the corresponding food sold in vending machines as the main predictor. Control variables included gender, grade, family affluence and school poverty. Analyses were conducted separately for 6th to 8th and 9th to 10th grades. Results Eighty-three percent of schools (152 schools, 5,930 students) had vending machines which primarily sold foods of minimal nutritional values (soft drinks, chips and sweets). In younger grades, availability of fruits/vegetables and chocolate/sweets was positively related to the corresponding food intake, with vending machine content and school poverty explaining 70.6% of between-school variation in fruit/vegetable consumption, and 71.7% in sweets consumption. In older grades, there was no significant effect of foods available in vending machines on reported consumption of those foods. Conclusions Vending machines are widely available in US public schools. In younger grades, school vending machines were related to students’ diets positively or negatively, depending on what was sold in them. Schools are in a powerful position to influence children’s diets; therefore attention to foods sold in them is necessary in order to try to improve children’s diets. PMID:21185519

  11. Food sold in school vending machines is associated with overall student dietary intake.

    PubMed

    Rovner, Alisha J; Nansel, Tonja R; Wang, Jing; Iannotti, Ronald J

    2011-01-01

    To examine the association between food sold in school vending machines and the dietary behaviors of students. The 2005-2006 U.S. Health Behavior in School-aged Children survey was administered to 6th to 10th graders and school administrators. Dietary intake in students was estimated with a brief food frequency measure. School administrators completed questions regarding food sold in vending machines. For each food intake behavior, a multilevel regression analysis modeled students (level 1) nested within schools (level 2), with the corresponding food sold in vending machines as the main predictor. Control variables included gender, grade, family affluence, and school poverty index. Analyses were conducted separately for 6th to 8th and 9th-10th grades. In all, 83% of the schools (152 schools; 5,930 students) had vending machines that primarily sold food of minimal nutritional values (soft drinks, chips, and sweets). In younger grades, availability of fruit and/or vegetables and chocolate and/or sweets was positively related to the corresponding food intake, with vending machine content and school poverty index providing an explanation for 70.6% of between-school variation in fruit and/or vegetable consumption and 71.7% in sweets consumption. Among the older grades, there was no significant effect of food available in vending machines on reported consumption of those food. Vending machines are widely available in public schools in the United States. In younger grades, school vending machines were either positively or negatively related to the diets of the students, depending on what was sold in them. Schools are in a powerful position to influence the diets of children; therefore, attention to the food sold at school is necessary to try to improve their diets. Copyright © 2011 Society for Adolescent Health and Medicine. All rights reserved.

  12. Pre-use anesthesia machine check; certified anesthesia technician based quality improvement audit.

    PubMed

    Al Suhaibani, Mazen; Al Malki, Assaf; Al Dosary, Saad; Al Barmawi, Hanan; Pogoku, Mahdhav

    2014-01-01

    Quality assurance of providing a work ready machine in multiple theatre operating rooms in a tertiary modern medical center in Riyadh. The aim of the following study is to keep high quality environment for workers and patients in surgical operating rooms. Technicians based audit by using key performance indicators to assure inspection, passing test of machine worthiness for use daily and in between cases and in case of unexpected failure to provide quick replacement by ready to use another anesthetic machine. The anesthetic machines in all operating rooms are daily and continuously inspected and passed as ready by technicians and verified by anesthesiologist consultant or assistant consultant. The daily records of each machines were collected then inspected for data analysis by quality improvement committee department for descriptive analysis and report the degree of staff compliance to daily inspection as "met" items. Replaced machine during use and overall compliance. Distractive statistic using Microsoft Excel 2003 tables and graphs of sums and percentages of item studied in this audit. Audit obtained highest compliance percentage and low rate of replacement of machine which indicate unexpected machine state of use and quick machine switch. The authors are able to conclude that following regular inspection and running self-check recommended by the manufacturers can contribute to abort any possibility of hazard of anesthesia machine failure during operation. Furthermore in case of unexpected reason to replace the anesthesia machine in quick maneuver contributes to high assured operative utilization of man machine inter-phase in modern surgical operating rooms.

  13. Improved analyses using function datasets and statistical modeling

    Treesearch

    John S. Hogland; Nathaniel M. Anderson

    2014-01-01

    Raster modeling is an integral component of spatial analysis. However, conventional raster modeling techniques can require a substantial amount of processing time and storage space and have limited statistical functionality and machine learning algorithms. To address this issue, we developed a new modeling framework using C# and ArcObjects and integrated that framework...

  14. In vitro assessment of cutting efficiency and durability of zirconia removal diamond rotary instruments.

    PubMed

    Kim, Joon-Soo; Bae, Ji-Hyeon; Yun, Mi-Jung; Huh, Jung-Bo

    2017-06-01

    Recently, zirconia removal diamond rotary instruments have become commercially available for efficient cutting of zirconia. However, research of cutting efficiency and the cutting characteristics of zirconia removal diamond rotary instruments is limited. The purpose of this in vitro study was to assess and compare the cutting efficiency, durability, and diamond rotary instrument wear pattern of zirconia diamond removal rotary instruments with those of conventional diamond rotary instruments. In addition, the surface characteristics of the cut zirconia were assessed. Block specimens of 3 mol% yttrium cation-doped tetragonal zirconia polycrystal were machined 10 times for 1 minute each using a high-speed handpiece with 6 types of diamond rotary instrument from 2 manufacturers at a constant force of 2 N (n=5). An electronic scale was used to measure the lost weight after each cut in order to evaluate the cutting efficiency. Field emission scanning electron microscopy was used to evaluate diamond rotary instrument wear patterns and machined zirconia block surface characteristics. Data were statistically analyzed using the Kruskal-Wallis test, followed by the Mann-Whitney U test (α=.05). Zirconia removal fine grit diamond rotary instruments showed cutting efficiency that was reduced compared with conventional fine grit diamond rotary instruments. Diamond grit fracture was the most dominant diamond rotary instrument wear pattern in all groups. All machined zirconia surfaces were primarily subjected to plastic deformation, which is evidence of ductile cutting. Zirconia blocks machined with zirconia removal fine grit diamond rotary instruments showed the least incidence of surface flaws. Although zirconia removal diamond rotary instruments did not show improved cutting efficiency compared with conventional diamond rotary instruments, the machined zirconia surface showed smoother furrows of plastic deformation and fewer surface flaws. Copyright © 2016 Editorial Council for the Journal of Prosthetic Dentistry. Published by Elsevier Inc. All rights reserved.

  15. Statistical Learning Analysis in Neuroscience: Aiming for Transparency

    PubMed Central

    Hanke, Michael; Halchenko, Yaroslav O.; Haxby, James V.; Pollmann, Stefan

    2009-01-01

    Encouraged by a rise of reciprocal interest between the machine learning and neuroscience communities, several recent studies have demonstrated the explanatory power of statistical learning techniques for the analysis of neural data. In order to facilitate a wider adoption of these methods, neuroscientific research needs to ensure a maximum of transparency to allow for comprehensive evaluation of the employed procedures. We argue that such transparency requires “neuroscience-aware” technology for the performance of multivariate pattern analyses of neural data that can be documented in a comprehensive, yet comprehensible way. Recently, we introduced PyMVPA, a specialized Python framework for machine learning based data analysis that addresses this demand. Here, we review its features and applicability to various neural data modalities. PMID:20582270

  16. Machine learning applications in proteomics research: how the past can boost the future.

    PubMed

    Kelchtermans, Pieter; Bittremieux, Wout; De Grave, Kurt; Degroeve, Sven; Ramon, Jan; Laukens, Kris; Valkenborg, Dirk; Barsnes, Harald; Martens, Lennart

    2014-03-01

    Machine learning is a subdiscipline within artificial intelligence that focuses on algorithms that allow computers to learn solving a (complex) problem from existing data. This ability can be used to generate a solution to a particularly intractable problem, given that enough data are available to train and subsequently evaluate an algorithm on. Since MS-based proteomics has no shortage of complex problems, and since publicly available data are becoming available in ever growing amounts, machine learning is fast becoming a very popular tool in the field. We here therefore present an overview of the different applications of machine learning in proteomics that together cover nearly the entire wet- and dry-lab workflow, and that address key bottlenecks in experiment planning and design, as well as in data processing and analysis. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  17. Manipulating Slot Machine Preference in Problem Gamblers through Contextual Control

    ERIC Educational Resources Information Center

    Nastally, Becky L.; Dixon, Mark R.; Jackson, James W.

    2010-01-01

    Pathological and nonpathological gamblers completed a task that assessed preference among 2 concurrently available slot machines. Subsequent assessments of choice were conducted after various attempts to transfer contextual functions associated with irrelevant characteristics of the slot machines. Results indicated that the nonproblem gambling…

  18. Effectiveness and efficiency of different weight machine-based strength training programmes for patients with hip or knee osteoarthritis: a protocol for a quasi-experimental controlled study in the context of health services research.

    PubMed

    Krauss, Inga; Müller, Gerhard; Steinhilber, Benjamin; Haupt, Georg; Janssen, Pia; Martus, Peter

    2017-01-01

    Osteoarthritis is a chronic musculoskeletal disease with a major impact on the individual and the healthcare system. As there is no cure, therapy aims for symptom release and reduction of disease progression. Physical exercises have been defined as a core treatment for osteoarthritis. However, research questions related to dose response, sustainability of effects, economic efficiency and safety are still open and will be evaluated in this trial, investigating a progressive weight machine-based strength training. This is a quasi-experimental controlled trial in the context of health services research. The intervention group (n=300) is recruited from participants of an offer for insurants of a health insurance company suffering from hip or knee osteoarthritis. Potential participants of the control group are selected and written to from the insurance database according to predefined matching criteria. The final statistical twins from the control responders will be determined via propensity score matching (n=300). The training intervention comprises 24 supervised mandatory sessions (2/week) and another 12 facultative sessions (1/week). Exercises include resistance training for the lower extremity and core muscles by use of weight machines and small training devices. The training offer is available at two sites. They differ with respect to the weight machines in use resulting in different dosage parameters. Primary outcomes are self-reported pain and function immediately after the 12-week intervention period. Health-related quality of life, self-efficacy, cost utility and safety will be evaluated as secondary outcomes. Secondary analysis will be undertaken with two strata related to study site. Participants will be followed up 6, 12 and 24 months after baseline. German Clinical Trial Register DRKS00009257. Pre-results.

  19. A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue.

    PubMed

    Chen, Zhenyu; Li, Jianping; Wei, Liwei

    2007-10-01

    Recently, gene expression profiling using microarray techniques has been shown as a promising tool to improve the diagnosis and treatment of cancer. Gene expression data contain high level of noise and the overwhelming number of genes relative to the number of available samples. It brings out a great challenge for machine learning and statistic techniques. Support vector machine (SVM) has been successfully used to classify gene expression data of cancer tissue. In the medical field, it is crucial to deliver the user a transparent decision process. How to explain the computed solutions and present the extracted knowledge becomes a main obstacle for SVM. A multiple kernel support vector machine (MK-SVM) scheme, consisting of feature selection, rule extraction and prediction modeling is proposed to improve the explanation capacity of SVM. In this scheme, we show that the feature selection problem can be translated into an ordinary multiple parameters learning problem. And a shrinkage approach: 1-norm based linear programming is proposed to obtain the sparse parameters and the corresponding selected features. We propose a novel rule extraction approach using the information provided by the separating hyperplane and support vectors to improve the generalization capacity and comprehensibility of rules and reduce the computational complexity. Two public gene expression datasets: leukemia dataset and colon tumor dataset are used to demonstrate the performance of this approach. Using the small number of selected genes, MK-SVM achieves encouraging classification accuracy: more than 90% for both two datasets. Moreover, very simple rules with linguist labels are extracted. The rule sets have high diagnostic power because of their good classification performance.

  20. Compilation of DNA sequences of Escherichia coli (update 1991)

    PubMed Central

    Kröger, Manfred; Wahl, Ralf; Rice, Peter

    1991-01-01

    We have compiled the DNA sequence data for E.coli available from the GENBANK and EMBL data libraries and over a period of several years independently from the literature. This is the third listing replacing and increasing the former listing roughly by one fifth. However, in order to save space this printed version contains DNA sequence information only. The complete compilation is now available in machine readable form from the EMBL data library (ECD release 6). After deletion of all detected overlaps a total of 1 492 282 individual bp is found to be determined till the beginning of 1991. This corresponds to a total of 31.62% of the entire E.coli chromosome consisting of about 4,720 kbp. This number may actually be higher by some extra 2,5% derived from lysogenic bacteriophage lambda and various DNA sequences already received for statistical purposes only. PMID:2041799

  1. 48 CFR 52.223-13 - Acquisition of EPEAT®-Registered Imaging Equipment.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ...) Facsimile machine (fax machine)—A commercially available imaging product whose primary functions are... available imaging product with a sole function of the production of hard copy duplicates from graphic hard... functionally integrated components, that performs two or more of the core functions of copying, printing...

  2. Applications of Support Vector Machines In Chemo And Bioinformatics

    NASA Astrophysics Data System (ADS)

    Jayaraman, V. K.; Sundararajan, V.

    2010-10-01

    Conventional linear & nonlinear tools for classification, regression & data driven modeling are being replaced on a rapid scale by newer techniques & tools based on artificial intelligence and machine learning. While the linear techniques are not applicable for inherently nonlinear problems, newer methods serve as attractive alternatives for solving real life problems. Support Vector Machine (SVM) classifiers are a set of universal feed-forward network based classification algorithms that have been formulated from statistical learning theory and structural risk minimization principle. SVM regression closely follows the classification methodology. In this work recent applications of SVM in Chemo & Bioinformatics will be described with suitable illustrative examples.

  3. Design features and results from fatigue reliability research machines.

    NASA Technical Reports Server (NTRS)

    Lalli, V. R.; Kececioglu, D.; Mcconnell, J. B.

    1971-01-01

    The design, fabrication, development, operation, calibration and results from reversed bending combined with steady torque fatigue research machines are presented. Fifteen-centimeter long, notched, SAE 4340 steel specimens are subjected to various combinations of these stresses and cycled to failure. Failure occurs when the crack in the notch passes through the specimen automatically shutting down the test machine. These cycles-to-failure data are statistically analyzed to develop a probabilistic S-N diagram. These diagrams have many uses; a rotating component design example given in the literature shows that minimum size and weight for a specified number of cycles and reliability can be calculated using these diagrams.

  4. Promoting the purchase of low-calorie foods from school vending machines: a cluster-randomized controlled study.

    PubMed

    Kocken, Paul L; Eeuwijk, Jennifer; Van Kesteren, Nicole M C; Dusseldorp, Elise; Buijs, Goof; Bassa-Dafesh, Zeina; Snel, Jeltje

    2012-03-01

    Vending machines account for food sales and revenue in schools. We examined 3 strategies for promoting the sale of lower-calorie food products from vending machines in high schools in the Netherlands. A school-based randomized controlled trial was conducted in 13 experimental schools and 15 control schools. Three strategies were tested within each experimental school: increasing the availability of lower-calorie products in vending machines, labeling products, and reducing the price of lower-calorie products. The experimental schools introduced the strategies in 3 consecutive phases, with phase 3 incorporating all 3 strategies. The control schools remained the same. The sales volumes from the vending machines were registered. Products were grouped into (1) extra foods containing empty calories, for example, candies and potato chips, (2) nutrient-rich basic foods, and (3) beverages. They were also divided into favorable, moderately unfavorable, and unfavorable products. Total sales volumes for experimental and control schools did not differ significantly for the extra and beverage products. Proportionally, the higher availability of lower-calorie extra products in the experimental schools led to higher sales of moderately unfavorable extra products than in the control schools, and to higher sales of favorable extra products in experimental schools where students have to stay during breaks. Together, availability, labeling, and price reduction raised the proportional sales of favorable beverages. Results indicate that when the availability of lower-calorie foods is increased and is also combined with labeling and reduced prices, students make healthier choices without buying more or fewer products from school vending machines. Changes to school vending machines help to create a healthy school environment. © 2012, American School Health Association.

  5. Constraining geostatistical models with hydrological data to improve prediction realism

    NASA Astrophysics Data System (ADS)

    Demyanov, V.; Rojas, T.; Christie, M.; Arnold, D.

    2012-04-01

    Geostatistical models reproduce spatial correlation based on the available on site data and more general concepts about the modelled patters, e.g. training images. One of the problem of modelling natural systems with geostatistics is in maintaining realism spatial features and so they agree with the physical processes in nature. Tuning the model parameters to the data may lead to geostatistical realisations with unrealistic spatial patterns, which would still honour the data. Such model would result in poor predictions, even though although fit the available data well. Conditioning the model to a wider range of relevant data provide a remedy that avoid producing unrealistic features in spatial models. For instance, there are vast amounts of information about the geometries of river channels that can be used in describing fluvial environment. Relations between the geometrical channel characteristics (width, depth, wave length, amplitude, etc.) are complex and non-parametric and are exhibit a great deal of uncertainty, which is important to propagate rigorously into the predictive model. These relations can be described within a Bayesian approach as multi-dimensional prior probability distributions. We propose a way to constrain multi-point statistics models with intelligent priors obtained from analysing a vast collection of contemporary river patterns based on previously published works. We applied machine learning techniques, namely neural networks and support vector machines, to extract multivariate non-parametric relations between geometrical characteristics of fluvial channels from the available data. An example demonstrates how ensuring geological realism helps to deliver more reliable prediction of a subsurface oil reservoir in a fluvial depositional environment.

  6. Evaluation of Cepstrum Algorithm with Impact Seeded Fault Data of Helicopter Oil Cooler Fan Bearings and Machine Fault Simulator Data

    DTIC Science & Technology

    2013-02-01

    of a bearing must be put into practice. There are many potential methods, the most traditional being the use of statistical time-domain features...accelerate degradation to test multiples bearings to gain statistical relevance and extrapolate results to scale for field conditions. Temperature...as time statistics , frequency estimation to improve the fault frequency detection. For future investigations, one can further explore the

  7. Improving Statistical Machine Translation Through N-best List Re-ranking and Optimization

    DTIC Science & Technology

    2014-03-27

    of Master of Science in Cyber Operations Jordan S. Keefer, B.S.C.S. Second Lieutenant, USAF March 2014 DISTRIBUTION STATEMENT A: APPROVED FOR PUBLIC...Atlantic Trade Organization NIST National Institute of Standards and Technology NL natural language NSF National Science Foundation ix Acronym Definition...the machine translation problem. In 1964 the Director of the National Science Foundation (NSF), 4 Dr. Leland Haworth, commissioned a research team to

  8. Expert system and process optimization techniques for real-time monitoring and control of plasma processes

    NASA Astrophysics Data System (ADS)

    Cheng, Jie; Qian, Zhaogang; Irani, Keki B.; Etemad, Hossein; Elta, Michael E.

    1991-03-01

    To meet the ever-increasing demand of the rapidly-growing semiconductor manufacturing industry it is critical to have a comprehensive methodology integrating techniques for process optimization real-time monitoring and adaptive process control. To this end we have accomplished an integrated knowledge-based approach combining latest expert system technology machine learning method and traditional statistical process control (SPC) techniques. This knowledge-based approach is advantageous in that it makes it possible for the task of process optimization and adaptive control to be performed consistently and predictably. Furthermore this approach can be used to construct high-level and qualitative description of processes and thus make the process behavior easy to monitor predict and control. Two software packages RIST (Rule Induction and Statistical Testing) and KARSM (Knowledge Acquisition from Response Surface Methodology) have been developed and incorporated with two commercially available packages G2 (real-time expert system) and ULTRAMAX (a tool for sequential process optimization).

  9. Perceptual basis of evolving Western musical styles

    PubMed Central

    Rodriguez Zivic, Pablo H.; Shifres, Favio; Cecchi, Guillermo A.

    2013-01-01

    The brain processes temporal statistics to predict future events and to categorize perceptual objects. These statistics, called expectancies, are found in music perception, and they span a variety of different features and time scales. Specifically, there is evidence that music perception involves strong expectancies regarding the distribution of a melodic interval, namely, the distance between two consecutive notes within the context of another. The recent availability of a large Western music dataset, consisting of the historical record condensed as melodic interval counts, has opened new possibilities for data-driven analysis of musical perception. In this context, we present an analytical approach that, based on cognitive theories of music expectation and machine learning techniques, recovers a set of factors that accurately identifies historical trends and stylistic transitions between the Baroque, Classical, Romantic, and Post-Romantic periods. We also offer a plausible musicological and cognitive interpretation of these factors, allowing us to propose them as data-driven principles of melodic expectation. PMID:23716669

  10. The influence of maintenance quality of hemodialysis machines on hemodialysis efficiency.

    PubMed

    Azar, Ahmad Taher

    2009-01-01

    Several studies suggest that there is a correlation between dose of dialysis and machine maintenance. However, in spite of the current practice, there are conflicting reports regarding the relationship between dose of dialysis or patient outcome, and machine maintenance. In order to evaluate the impact of hemodialysis machine maintenance on dialysis adequacy Kt/V and session performance, data were processed on 134 patients on 3-times-per-week dialysis regimens by dividing the patients into four groups and also dividing the hemodialysis machines into four groups according to their year of installation. The equilibrated dialysis dose eq Kt/V, urea reduction ratio (URR) and the overall equipment effectiveness (OEE) were calculated in each group to show the effect hemodialysis machine efficiency on the overall session performance. The average working time per machine per month was 270 hours. The cumulative number of hours according to the year of installation was: 26,122 hours for machines installed in 1998; 21,596 hours for machines installed in 1999, 8362 hours for those installed in 2003 and 2486 hours for those installed in 2005. The mean time between failures (MTBF) was 1.8, 2.1, 4.2 and 6 months between failures for machines installed in 1999, 1998, 2003 and 2005, respectively. Statistical analysis demonstrated that the dialysis dose eq Kt/V and URR were increased as the overall equipment effectiveness (OEE) increases with regular maintenance procedures. Maintenance has become one of the most expedient approaches to guarantee high machine dependability. The efficiency of dialysis machine is relevant in assuring a proper dialysis adequacy.

  11. Assessing a Novel Method to Reduce Anesthesia Machine Contamination: A Prospective, Observational Trial.

    PubMed

    Biddle, Chuck J; George-Gay, Beverly; Prasanna, Praveen; Hill, Emily M; Davis, Thomas C; Verhulst, Brad

    2018-01-01

    Anesthesia machines are known reservoirs of bacterial species, potentially contributing to healthcare associated infections (HAIs). An inexpensive, disposable, nonpermeable, transparent anesthesia machine wrap (AMW) may reduce microbial contamination of the anesthesia machine. This study quantified the density and diversity of bacterial species found on anesthesia machines after terminal cleaning and between cases during actual anesthesia care to assess the impact of the AMW. We hypothesized reduced bioburden with the use of the AMW. In a prospective, experimental research design, the AMW was used in 11 surgical cases (intervention group) and not used in 11 control surgical cases. Cases were consecutively assigned to general surgical operating rooms. Seven frequently touched and difficult to disinfect "hot spots" were cultured on each machine preceding and following each case. The density and diversity of cultured colony forming units (CFUs) between the covered and uncovered machines were compared using Wilcoxon signed-rank test and Student's t -tests. There was a statistically significant reduction in CFU density and diversity when the AMW was employed. The protective effect of the AMW during regular anesthetic care provides a reliable and low-cost method to minimize the transmission of pathogens across patients and potentially reduces HAIs.

  12. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Keller, J; Hardin, M; Giaddui, T

    Purpose: To test whether unified vendor specified beam conformance for matched machines implies volumetric modulated arc radiotherapy(VMAT) delivery consistency. Methods: Twenty-two identical patient QA plans, eleven 6MV and eleven 15MV, were delivered to the Delta{sup 4}(Scandidos, Uppsala, Sweden) on two Varian TrueBEAM matched machines. Sixteen patient QA plans, nine 6 MV and seven 10 MV, were delivered to Delta{sup 4} on two Elekta Agility matched machines. The percent dose deviation(%DDev), distance-to-agreement(DTA), and the gamma analysis(γ) were collected for all plans and the differences in measurements were tabulated between matched machines. A paired t-test analysis of the data with an alphamore » of 0.05 determines statistical significance. Power(P) was calculated to detect a difference of 5%; all data except Elekta %DDev sets were strong with above a 0.85 power. Results: The average differences for Varian machines (%DDev, DTA, and γ) are 6.4%, 1.6% and 2.7% for 6MV, respectively, and 8.0%, 0.6%, and 2.5% for 15MV. The average differences for matched Elekta machines (%DDev, DTA, and γ) are 10.2%, 0.6% and 0.9% for 6 MV, respectively, and 7.0%, 1.9%, and 2.8% for 10MV.A paired t-test shows for Varian the %DDev difference is significant for 6MV and 15MV(p-value6MV=0.019, P6MV=0.96; p-value15MV=0.0003, P15MV=0.86). Differences in DTA are insignificant for both 6MV and 15MV(p-value6MV=0.063, P6MV=1; p-value15MV=0.907, P15MV=1). Varian differences in gamma are significant for both energies(p-value6MV=0.025, P6MV=0.99; p-value15MV=0.013, P15MV=1). A paired t-test shows for Elekta the difference in %DDev is significant for 6MV but not 10MV(p-value6MV=0.00065, P6MV=0.68; p-value10MV=0.262, P10MV=0.39). Differences in DTA are statistically insignificant(p-value6MV=0.803, P6MV = 1; p-value10MV=0.269, P10MV=1). Elekta differences in gamma are significant for 10MV only(p-value6MV=0.094, P6MV=1; p-value10MV=0.011, P10MV=1). Conclusion: These results show vendor specified beam conformance across machines does not ensure equivalent patient specific QA pass rates. Gamma differences are statistically significant in three of the four comparisons for two pairs of vendor matched machines.« less

  13. State but not District Nutrition Policies Are Associated with Less Junk Food in Vending Machines and School Stores in US Public Schools

    PubMed Central

    KUBIK, MARTHA Y.; WALL, MELANIE; SHEN, LIJUAN; NANNEY, MARILYN S.; NELSON, TOBEN F.; LASKA, MELISSA N.; STORY, MARY

    2012-01-01

    Background Policy that targets the school food environment has been advanced as one way to increase the availability of healthy food at schools and healthy food choice by students. Although both state- and district-level policy initiatives have focused on school nutrition standards, it remains to be seen whether these policies translate into healthy food practices at the school level, where student behavior will be impacted. Objective To examine whether state- and district-level nutrition policies addressing junk food in school vending machines and school stores were associated with less junk food in school vending machines and school stores. Junk food was defined as foods and beverages with low nutrient density that provide calories primarily through fats and added sugars. Design A cross-sectional study design was used to assess self-report data collected by computer-assisted telephone interviews or self-administered mail questionnaires from state-, district-, and school-level respondents participating in the School Health Policies and Programs Study 2006. The School Health Policies and Programs Study, administered every 6 years since 1994 by the Centers for Disease Control and Prevention, is considered the largest, most comprehensive assessment of school health policies and programs in the United States. Subjects/setting A nationally representative sample (n = 563) of public elementary, middle, and high schools was studied. Statistical analysis Logistic regression adjusted for school characteristics, sampling weights, and clustering was used to analyze data. Policies were assessed for strength (required, recommended, neither required nor recommended prohibiting junk food) and whether strength was similar for school vending machines and school stores. Results School vending machines and school stores were more prevalent in high schools (93%) than middle (84%) and elementary (30%) schools. For state policies, elementary schools that required prohibiting junk food in school vending machines and school stores offered less junk food than elementary schools that neither required nor recommended prohibiting junk food (13% vs 37%; P = 0.006). Middle schools that required prohibiting junk food in vending machines and school stores offered less junk food than middle schools that recommended prohibiting junk food (71% vs 87%; P = 0.07). Similar associations were not evident for district-level polices or high schools. Conclusions Policy may be an effective tool to decrease junk food in schools, particularly in elementary and middle schools. PMID:20630161

  14. An experimental study of factors affecting the selective inhibition of sintering process

    NASA Astrophysics Data System (ADS)

    Asiabanpour, Bahram

    Selective Inhibition of Sintering (SIS) is a new rapid prototyping method that builds parts in a layer-by-layer fabrication basis. SIS works by joining powder particles through sintering in the part's body, and by sintering inhibition of some selected powder areas. The objective of this research has been to improve the new SIS process, which has been invented at USC. The process improvement is based on statistical design of experiments. To conduct the needed experiments a working machine and related path generator software were needed. The machine and its control software were made available prior to this research. The path generator algorithms and software had to be created. This program should obtain model geometry data from a CAD file and generate an appropriate path file for the printer nozzle. Also, the program should generate a simulation file for path file inspection using virtual prototyping. The activities related to path generator constitute the first part of this research, which has resulted in an efficient path generator. In addition, to reach an acceptable level of accuracy, strength, and surface quality in the fabricated parts, all effective factors in the SIS process should be identified and controlled. Simultaneous analytical and experimental studies were conducted to recognize effective factors and to control the SIS process. Also, it was known that polystyrene was the most appropriate polymer powder and saturated potassium iodide was the most effective inhibitor among the available candidate materials. In addition, statistical tools were applied to improve the desirable properties of the parts fabricated by the SIS process. An investigation of part strength was conducted using the Response Surface Methodology (RSM) and a region of acceptable operating conditions for the part strength was found. Then, through analysis of the experimental results, the impact of the factors on the final part surface quality and dimensional accuracy was modeled. After developing a desirability function model, process operating conditions for maximum desirability were identified. Finally, the desirability model was validated.

  15. Building gene expression profile classifiers with a simple and efficient rejection option in R.

    PubMed

    Benso, Alfredo; Di Carlo, Stefano; Politano, Gianfranco; Savino, Alessandro; Hafeezurrehman, Hafeez

    2011-01-01

    The collection of gene expression profiles from DNA microarrays and their analysis with pattern recognition algorithms is a powerful technology applied to several biological problems. Common pattern recognition systems classify samples assigning them to a set of known classes. However, in a clinical diagnostics setup, novel and unknown classes (new pathologies) may appear and one must be able to reject those samples that do not fit the trained model. The problem of implementing a rejection option in a multi-class classifier has not been widely addressed in the statistical literature. Gene expression profiles represent a critical case study since they suffer from the curse of dimensionality problem that negatively reflects on the reliability of both traditional rejection models and also more recent approaches such as one-class classifiers. This paper presents a set of empirical decision rules that can be used to implement a rejection option in a set of multi-class classifiers widely used for the analysis of gene expression profiles. In particular, we focus on the classifiers implemented in the R Language and Environment for Statistical Computing (R for short in the remaining of this paper). The main contribution of the proposed rules is their simplicity, which enables an easy integration with available data analysis environments. Since in the definition of a rejection model tuning of the involved parameters is often a complex and delicate task, in this paper we exploit an evolutionary strategy to automate this process. This allows the final user to maximize the rejection accuracy with minimum manual intervention. This paper shows how the use of simple decision rules can be used to help the use of complex machine learning algorithms in real experimental setups. The proposed approach is almost completely automated and therefore a good candidate for being integrated in data analysis flows in labs where the machine learning expertise required to tune traditional classifiers might not be available.

  16. Designing a mathematical model for integrating dynamic cellular manufacturing into supply chain system

    NASA Astrophysics Data System (ADS)

    Aalaei, Amin; Davoudpour, Hamid

    2012-11-01

    This article presents designing a new mathematical model for integrating dynamic cellular manufacturing into supply chain system with an extensive coverage of important manufacturing features consideration of multiple plants location, multi-markets allocation, multi-period planning horizons with demand and part mix variation, machine capacity, and the main constraints are demand of markets satisfaction in each period, machine availability, machine time-capacity, worker assignment, available time of worker, production volume for each plant and the amounts allocated to each market. The aim of the proposed model is to minimize holding and outsourcing costs, inter-cell material handling cost, external transportation cost, procurement & maintenance and overhead cost of machines, setup cost, reconfiguration cost of machines installation and removal, hiring, firing and salary worker costs. Aimed to prove the potential benefits of such a design, presented an example is shown using a proposed model.

  17. Vending machine assessment methodology. A systematic review.

    PubMed

    Matthews, Melissa A; Horacek, Tanya M

    2015-07-01

    The nutritional quality of food and beverage products sold in vending machines has been implicated as a contributing factor to the development of an obesogenic food environment. How comprehensive, reliable, and valid are the current assessment tools for vending machines to support or refute these claims? A systematic review was conducted to summarize, compare, and evaluate the current methodologies and available tools for vending machine assessment. A total of 24 relevant research studies published between 1981 and 2013 met inclusion criteria for this review. The methodological variables reviewed in this study include assessment tool type, study location, machine accessibility, product availability, healthfulness criteria, portion size, price, product promotion, and quality of scientific practice. There were wide variations in the depth of the assessment methodologies and product healthfulness criteria utilized among the reviewed studies. Of the reviewed studies, 39% evaluated machine accessibility, 91% evaluated product availability, 96% established healthfulness criteria, 70% evaluated portion size, 48% evaluated price, 52% evaluated product promotion, and 22% evaluated the quality of scientific practice. Of all reviewed articles, 87% reached conclusions that provided insight into the healthfulness of vended products and/or vending environment. Product healthfulness criteria and complexity for snack and beverage products was also found to be variable between the reviewed studies. These findings make it difficult to compare results between studies. A universal, valid, and reliable vending machine assessment tool that is comprehensive yet user-friendly is recommended. Copyright © 2015 Elsevier Ltd. All rights reserved.

  18. A Character Level Based and Word Level Based Approach for Chinese-Vietnamese Machine Translation.

    PubMed

    Tran, Phuoc; Dinh, Dien; Nguyen, Hien T

    2016-01-01

    Chinese and Vietnamese have the same isolated language; that is, the words are not delimited by spaces. In machine translation, word segmentation is often done first when translating from Chinese or Vietnamese into different languages (typically English) and vice versa. However, it is a matter for consideration that words may or may not be segmented when translating between two languages in which spaces are not used between words, such as Chinese and Vietnamese. Since Chinese-Vietnamese is a low-resource language pair, the sparse data problem is evident in the translation system of this language pair. Therefore, while translating, whether it should be segmented or not becomes more important. In this paper, we propose a new method for translating Chinese to Vietnamese based on a combination of the advantages of character level and word level translation. In addition, a hybrid approach that combines statistics and rules is used to translate on the word level. And at the character level, a statistical translation is used. The experimental results showed that our method improved the performance of machine translation over that of character or word level translation.

  19. Effect of overglazed and polished surface finishes on the compressive fracture strength of machinable ceramic materials.

    PubMed

    Asai, Tetsuya; Kazama, Ryunosuke; Fukushima, Masayoshi; Okiji, Takashi

    2010-11-01

    Controversy prevails over the effect of overglazing on the fracture strength of ceramic materials. Therefore, the effects of different surface finishes on the compressive fracture strength of machinable ceramic materials were investigated in this study. Plates prepared from four commercial brands of ceramic materials were either surface-polished or overglazed (n=10 per ceramic material for each surface finish), and bonded to flat surfaces of human dentin using a resin cement. Loads at failure were determined and statistically analyzed using two-way ANOVA and Bonferroni test. Although no statistical differences in load value were detected between polished and overglazed groups (p>0.05), the fracture load of Vita Mark II was significantly lower than those of ProCAD and IPS Empress CAD, whereas that of IPS e.max CAD was significantly higher than the latter two ceramic materials (p<0.05). It was concluded that overglazed and polished surfaces produced similar compressive fracture strengths irrespective of the machinable ceramic material tested, and that fracture strength was material-dependent.

  20. 40 CFR 63.467 - Recordkeeping requirements.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... of a batch vapor or in-line solvent cleaning machine complying with the provisions of § 63.463 shall... for the lifetime of the machine. (1) Owner's manuals, or if not available, written maintenance and operating procedures, for the solvent cleaning machine and control equipment. (2) The date of installation...

  1. 40 CFR 63.467 - Recordkeeping requirements.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... of a batch vapor or in-line solvent cleaning machine complying with the provisions of § 63.463 shall... for the lifetime of the machine. (1) Owner's manuals, or if not available, written maintenance and operating procedures, for the solvent cleaning machine and control equipment. (2) The date of installation...

  2. 40 CFR 63.467 - Recordkeeping requirements.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... of a batch vapor or in-line solvent cleaning machine complying with the provisions of § 63.463 shall... for the lifetime of the machine. (1) Owner's manuals, or if not available, written maintenance and operating procedures, for the solvent cleaning machine and control equipment. (2) The date of installation...

  3. 40 CFR 63.467 - Recordkeeping requirements.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... of a batch vapor or in-line solvent cleaning machine complying with the provisions of § 63.463 shall... for the lifetime of the machine. (1) Owner's manuals, or if not available, written maintenance and operating procedures, for the solvent cleaning machine and control equipment. (2) The date of installation...

  4. Comparing machine learning and logistic regression methods for predicting hypertension using a combination of gene expression and next-generation sequencing data.

    PubMed

    Held, Elizabeth; Cape, Joshua; Tintle, Nathan

    2016-01-01

    Machine learning methods continue to show promise in the analysis of data from genetic association studies because of the high number of variables relative to the number of observations. However, few best practices exist for the application of these methods. We extend a recently proposed supervised machine learning approach for predicting disease risk by genotypes to be able to incorporate gene expression data and rare variants. We then apply 2 different versions of the approach (radial and linear support vector machines) to simulated data from Genetic Analysis Workshop 19 and compare performance to logistic regression. Method performance was not radically different across the 3 methods, although the linear support vector machine tended to show small gains in predictive ability relative to a radial support vector machine and logistic regression. Importantly, as the number of genes in the models was increased, even when those genes contained causal rare variants, model predictive ability showed a statistically significant decrease in performance for both the radial support vector machine and logistic regression. The linear support vector machine showed more robust performance to the inclusion of additional genes. Further work is needed to evaluate machine learning approaches on larger samples and to evaluate the relative improvement in model prediction from the incorporation of gene expression data.

  5. AN EIGHT WEEK SEMINAR IN AN INTRODUCTION TO NUMERICAL CONTROL ON TWO- AND THREE-AXIS MACHINE TOOLS FOR VOCATIONAL AND TECHNICAL MACHINE TOOL INSTRUCTORS. FINAL REPORT.

    ERIC Educational Resources Information Center

    BOLDT, MILTON; POKORNY, HARRY

    THIRTY-THREE MACHINE SHOP INSTRUCTORS FROM 17 STATES PARTICIPATED IN AN 8-WEEK SEMINAR TO DEVELOP THE SKILLS AND KNOWLEDGE ESSENTIAL FOR TEACHING THE OPERATION OF NUMERICALLY CONTROLLED MACHINE TOOLS. THE SEMINAR WAS GIVEN FROM JUNE 20 TO AUGUST 12, 1966, WITH COLLEGE CREDIT AVAILABLE THROUGH STOUT STATE UNIVERSITY. THE PARTICIPANTS COMPLETED AN…

  6. Reducing lumber thickness variation using real-time statistical process control

    Treesearch

    Thomas M. Young; Brian H. Bond; Jan Wiedenbeck

    2002-01-01

    A technology feasibility study for reducing lumber thickness variation was conducted from April 2001 until March 2002 at two sawmills located in the southern U.S. A real-time statistical process control (SPC) system was developed that featured Wonderware human machine interface technology (HMI) with distributed real-time control charts for all sawing centers and...

  7. Survey of Commercially Available Computer-Readable Bibliographic Data Bases.

    ERIC Educational Resources Information Center

    Schneider, John H., Ed.; And Others

    This document contains the results of a survey of 94 U. S. organizations, and 36 organizations in other countries that were thought to prepare machine-readable data bases. Of those surveyed, 55 organizations (40 in U. S., 15 in other countries) provided completed camera-ready forms describing 81 commercially available, machine-readable data bases…

  8. 47 CFR 76.1700 - Records to be maintained by cable system operators.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... inspection file shall be available for public inspection at any time during regular business hours. (c) All... of any material in the public inspection file shall be available for machine reproduction upon.... Requests for machine copies shall be fulfilled at a location specified by the system operator, within a...

  9. The Association between the Availability of Sugar-Sweetened Beverage in School Vending Machines and Its Consumption among Adolescents in California: A Propensity Score Matching Approach

    PubMed Central

    Shi, Lu

    2010-01-01

    There is controversy over to what degree banning sugar-sweetened beverage (SSB) sales at schools could decrease the SSB intake. This paper uses the adolescent sample of 2005 California Health Interview Survey to estimate the association between the availability of SSB from school vending machines and the amount of SSB consumption. Propensity score stratification and kernel-based propensity score matching are used to address the selection bias issue in cross-sectional data. Propensity score stratification shows that adolescents who had access to SSB through their school vending machines consumed 0.170 more drinks of SSB than those who did not (P < .05). Kernel-based propensity score matching shows the SSB consumption difference to be 0.158 on the prior day (P < .05). This paper strengthens the evidence for the association between SSB availability via school vending machines and the actual SSB consumption, while future studies are needed to explore changes in other beverages after SSB becomes less available. PMID:20976298

  10. IBM Watson Analytics: Automating Visualization, Descriptive, and Predictive Statistics

    PubMed Central

    2016-01-01

    Background We live in an era of explosive data generation that will continue to grow and involve all industries. One of the results of this explosion is the need for newer and more efficient data analytics procedures. Traditionally, data analytics required a substantial background in statistics and computer science. In 2015, International Business Machines Corporation (IBM) released the IBM Watson Analytics (IBMWA) software that delivered advanced statistical procedures based on the Statistical Package for the Social Sciences (SPSS). The latest entry of Watson Analytics into the field of analytical software products provides users with enhanced functions that are not available in many existing programs. For example, Watson Analytics automatically analyzes datasets, examines data quality, and determines the optimal statistical approach. Users can request exploratory, predictive, and visual analytics. Using natural language processing (NLP), users are able to submit additional questions for analyses in a quick response format. This analytical package is available free to academic institutions (faculty and students) that plan to use the tools for noncommercial purposes. Objective To report the features of IBMWA and discuss how this software subjectively and objectively compares to other data mining programs. Methods The salient features of the IBMWA program were examined and compared with other common analytical platforms, using validated health datasets. Results Using a validated dataset, IBMWA delivered similar predictions compared with several commercial and open source data mining software applications. The visual analytics generated by IBMWA were similar to results from programs such as Microsoft Excel and Tableau Software. In addition, assistance with data preprocessing and data exploration was an inherent component of the IBMWA application. Sensitivity and specificity were not included in the IBMWA predictive analytics results, nor were odds ratios, confidence intervals, or a confusion matrix. Conclusions IBMWA is a new alternative for data analytics software that automates descriptive, predictive, and visual analytics. This program is very user-friendly but requires data preprocessing, statistical conceptual understanding, and domain expertise. PMID:27729304

  11. IBM Watson Analytics: Automating Visualization, Descriptive, and Predictive Statistics.

    PubMed

    Hoyt, Robert Eugene; Snider, Dallas; Thompson, Carla; Mantravadi, Sarita

    2016-10-11

    We live in an era of explosive data generation that will continue to grow and involve all industries. One of the results of this explosion is the need for newer and more efficient data analytics procedures. Traditionally, data analytics required a substantial background in statistics and computer science. In 2015, International Business Machines Corporation (IBM) released the IBM Watson Analytics (IBMWA) software that delivered advanced statistical procedures based on the Statistical Package for the Social Sciences (SPSS). The latest entry of Watson Analytics into the field of analytical software products provides users with enhanced functions that are not available in many existing programs. For example, Watson Analytics automatically analyzes datasets, examines data quality, and determines the optimal statistical approach. Users can request exploratory, predictive, and visual analytics. Using natural language processing (NLP), users are able to submit additional questions for analyses in a quick response format. This analytical package is available free to academic institutions (faculty and students) that plan to use the tools for noncommercial purposes. To report the features of IBMWA and discuss how this software subjectively and objectively compares to other data mining programs. The salient features of the IBMWA program were examined and compared with other common analytical platforms, using validated health datasets. Using a validated dataset, IBMWA delivered similar predictions compared with several commercial and open source data mining software applications. The visual analytics generated by IBMWA were similar to results from programs such as Microsoft Excel and Tableau Software. In addition, assistance with data preprocessing and data exploration was an inherent component of the IBMWA application. Sensitivity and specificity were not included in the IBMWA predictive analytics results, nor were odds ratios, confidence intervals, or a confusion matrix. IBMWA is a new alternative for data analytics software that automates descriptive, predictive, and visual analytics. This program is very user-friendly but requires data preprocessing, statistical conceptual understanding, and domain expertise.

  12. Statistical Analysis of NAS Parallel Benchmarks and LINPACK Results

    NASA Technical Reports Server (NTRS)

    Meuer, Hans-Werner; Simon, Horst D.; Strohmeier, Erich; Lasinski, T. A. (Technical Monitor)

    1994-01-01

    In the last three years extensive performance data have been reported for parallel machines both based on the NAS Parallel Benchmarks, and on LINPACK. In this study we have used the reported benchmark results and performed a number of statistical experiments using factor, cluster, and regression analyses. In addition to the performance results of LINPACK and the eight NAS parallel benchmarks, we have also included peak performance of the machine, and the LINPACK n and n(sub 1/2) values. Some of the results and observations can be summarized as follows: 1) All benchmarks are strongly correlated with peak performance. 2) LINPACK and EP have each a unique signature. 3) The remaining NPB can grouped into three groups as follows: (CG and IS), (LU and SP), and (MG, FT, and BT). Hence three (or four with EP) benchmarks are sufficient to characterize the overall NPB performance. Our poster presentation will follow a standard poster format, and will present the data of our statistical analysis in detail.

  13. Comparison of Machine Learning Methods for the Arterial Hypertension Diagnostics

    PubMed Central

    Belo, David; Gamboa, Hugo

    2017-01-01

    The paper presents results of machine learning approach accuracy applied analysis of cardiac activity. The study evaluates the diagnostics possibilities of the arterial hypertension by means of the short-term heart rate variability signals. Two groups were studied: 30 relatively healthy volunteers and 40 patients suffering from the arterial hypertension of II-III degree. The following machine learning approaches were studied: linear and quadratic discriminant analysis, k-nearest neighbors, support vector machine with radial basis, decision trees, and naive Bayes classifier. Moreover, in the study, different methods of feature extraction are analyzed: statistical, spectral, wavelet, and multifractal. All in all, 53 features were investigated. Investigation results show that discriminant analysis achieves the highest classification accuracy. The suggested approach of noncorrelated feature set search achieved higher results than data set based on the principal components. PMID:28831239

  14. Identifying and Investigating Unexpected Response to Treatment: A Diabetes Case Study.

    PubMed

    Ozery-Flato, Michal; Ein-Dor, Liat; Parush-Shear-Yashuv, Naama; Aharonov, Ranit; Neuvirth, Hani; Kohn, Martin S; Hu, Jianying

    2016-09-01

    The availability of electronic health records creates fertile ground for developing computational models of various medical conditions. We present a new approach for detecting and analyzing patients with unexpected responses to treatment, building on machine learning and statistical methodology. Given a specific patient, we compute a statistical score for the deviation of the patient's response from responses observed in other patients having similar characteristics and medication regimens. These scores are used to define cohorts of patients showing deviant responses. Statistical tests are then applied to identify clinical features that correlate with these cohorts. We implement this methodology in a tool that is designed to assist researchers in the pharmaceutical field to uncover new features associated with reduced response to a treatment. It can also aid physicians by flagging patients who are not responding to treatment as expected and hence deserve more attention. The tool provides comprehensive visualizations of the analysis results and the supporting data, both at the cohort level and at the level of individual patients. We demonstrate the utility of our methodology and tool in a population of type II diabetic patients, treated with antidiabetic drugs, and monitored by the HbA1C test.

  15. FSW of Aluminum Tailor Welded Blanks across Machine Platforms

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hovanski, Yuri; Upadhyay, Piyush; Carlson, Blair

    2015-02-16

    Development and characterization of friction stir welded aluminum tailor welded blanks was successfully carried out on three separate machine platforms. Each was a commercially available, gantry style, multi-axis machine designed specifically for friction stir welding. Weld parameters were developed to support high volume production of dissimilar thickness aluminum tailor welded blanks at speeds of 3 m/min and greater. Parameters originally developed on an ultra-high stiffness servo driven machine where first transferred to a high stiffness servo-hydraulic friction stir welding machine, and subsequently transferred to a purpose built machine designed to accommodate thin sheet aluminum welding. The inherent beam stiffness, bearingmore » compliance, and control system for each machine were distinctly unique, which posed specific challenges in transferring welding parameters across machine platforms. This work documents the challenges imposed by successfully transferring weld parameters from machine to machine, produced from different manufacturers and with unique control systems and interfaces.« less

  16. Measurement of W + bb and a search for MSSM Higgs bosons with the CMS detector at the LHC

    NASA Astrophysics Data System (ADS)

    O'Connor, Alexander Pinpin

    Tooling used to cure composite laminates in the aerospace and automotive industries must provide a dimensionally stable geometry throughout the thermal cycle applied during the part curing process. This requires that the Coefficient of Thermal Expansion (CTE) of the tooling materials match that of the composite being cured. The traditional tooling material for production applications is a nickel alloy. Poor machinability and high material costs increase the expense of metallic tooling made from nickel alloys such as 'Invar 36' or 'Invar 42'. Currently, metallic tooling is unable to meet the needs of applications requiring rapid affordable tooling solutions. In applications where the tooling is not required to have the durability provided by metals, such as for small area repair, an opportunity exists for non-metallic tooling materials like graphite, carbon foams, composites, or ceramics and machinable glasses. Nevertheless, efficient machining of brittle, non-metallic materials is challenging due to low ductility, porosity, and high hardness. The machining of a layup tool comprises a large portion of the final cost. Achieving maximum process economy requires optimization of the machining process in the given tooling material. Therefore, machinability of the tooling material is a critical aspect of the overall cost of the tool. In this work, three commercially available, brittle/porous, non-metallic candidate tooling materials were selected, namely: (AAC) Autoclaved Aerated Concrete, CB1100 ceramic block and Cfoam carbon foam. Machining tests were conducted in order to evaluate the machinability of these materials using end milling. Chip formation, cutting forces, cutting tool wear, machining induced damage, surface quality and surface integrity were investigated using High Speed Steel (HSS), carbide, diamond abrasive and Polycrystalline Diamond (PCD) cutting tools. Cutting forces were found to be random in magnitude, which was a result of material porosity. The abrasive nature of Cfoam produced rapid tool wear when using HSS and PCD type cutting tools. However, tool wear was not significant in AAC or CB1100 regardless of the type of cutting edge. Machining induced damage was observed in the form of macro-scale chipping and fracture in combination with micro-scale cracking. Transverse rupture test results revealed significant reductions in residual strength and damage tolerance in CB1100. In contrast, AAC and Cfoam showed no correlation between machining induced damage and a reduction in surface integrity. Cutting forces in machining were modeled for all materials. Cutting force regression models were developed based on Design of Experiment and Analysis of Variance. A mechanistic cutting force model was proposed based upon conventional end milling force models and statistical distributions of material porosity. In order to validate the model, predicted cutting forces were compared to experimental results. Predicted cutting forces agreed well with experimental measurements. Furthermore, over the range of cutting conditions tested, the proposed model was shown to have comparable predictive accuracy to empirically produced regression models; greatly reducing the number of cutting tests required to simulate cutting forces. Further, this work demonstrates a key adaptation of metallic cutting force models to brittle porous material; a vital step in the research into the machining of these materials using end milling.

  17. Study on the Optimization and Process Modeling of the Rotary Ultrasonic Machining of Zerodur Glass-Ceramic

    NASA Astrophysics Data System (ADS)

    Pitts, James Daniel

    Rotary ultrasonic machining (RUM), a hybrid process combining ultrasonic machining and diamond grinding, was created to increase material removal rates for the fabrication of hard and brittle workpieces. The objective of this research was to experimentally derive empirical equations for the prediction of multiple machined surface roughness parameters for helically pocketed rotary ultrasonic machined Zerodur glass-ceramic workpieces by means of a systematic statistical experimental approach. A Taguchi parametric screening design of experiments was employed to systematically determine the RUM process parameters with the largest effect on mean surface roughness. Next empirically determined equations for the seven common surface quality metrics were developed via Box-Behnken surface response experimental trials. Validation trials were conducted resulting in predicted and experimental surface roughness in varying levels of agreement. The reductions in cutting force and tool wear associated with RUM, reported by previous researchers, was experimentally verified to also extended to helical pocketing of Zerodur glass-ceramic.

  18. Improved nucleic acid descriptors for siRNA efficacy prediction.

    PubMed

    Sciabola, Simone; Cao, Qing; Orozco, Modesto; Faustino, Ignacio; Stanton, Robert V

    2013-02-01

    Although considerable progress has been made recently in understanding how gene silencing is mediated by the RNAi pathway, the rational design of effective sequences is still a challenging task. In this article, we demonstrate that including three-dimensional descriptors improved the discrimination between active and inactive small interfering RNAs (siRNAs) in a statistical model. Five descriptor types were used: (i) nucleotide position along the siRNA sequence, (ii) nucleotide composition in terms of presence/absence of specific combinations of di- and trinucleotides, (iii) nucleotide interactions by means of a modified auto- and cross-covariance function, (iv) nucleotide thermodynamic stability derived by the nearest neighbor model representation and (v) nucleic acid structure flexibility. The duplex flexibility descriptors are derived from extended molecular dynamics simulations, which are able to describe the sequence-dependent elastic properties of RNA duplexes, even for non-standard oligonucleotides. The matrix of descriptors was analysed using three statistical packages in R (partial least squares, random forest, and support vector machine), and the most predictive model was implemented in a modeling tool we have made publicly available through SourceForge. Our implementation of new RNA descriptors coupled with appropriate statistical algorithms resulted in improved model performance for the selection of siRNA candidates when compared with publicly available siRNA prediction tools and previously published test sets. Additional validation studies based on in-house RNA interference projects confirmed the robustness of the scoring procedure in prospective studies.

  19. SQC: secure quality control for meta-analysis of genome-wide association studies.

    PubMed

    Huang, Zhicong; Lin, Huang; Fellay, Jacques; Kutalik, Zoltán; Hubaux, Jean-Pierre

    2017-08-01

    Due to the limited power of small-scale genome-wide association studies (GWAS), researchers tend to collaborate and establish a larger consortium in order to perform large-scale GWAS. Genome-wide association meta-analysis (GWAMA) is a statistical tool that aims to synthesize results from multiple independent studies to increase the statistical power and reduce false-positive findings of GWAS. However, it has been demonstrated that the aggregate data of individual studies are subject to inference attacks, hence privacy concerns arise when researchers share study data in GWAMA. In this article, we propose a secure quality control (SQC) protocol, which enables checking the quality of data in a privacy-preserving way without revealing sensitive information to a potential adversary. SQC employs state-of-the-art cryptographic and statistical techniques for privacy protection. We implement the solution in a meta-analysis pipeline with real data to demonstrate the efficiency and scalability on commodity machines. The distributed execution of SQC on a cluster of 128 cores for one million genetic variants takes less than one hour, which is a modest cost considering the 10-month time span usually observed for the completion of the QC procedure that includes timing of logistics. SQC is implemented in Java and is publicly available at https://github.com/acs6610987/secureqc. jean-pierre.hubaux@epfl.ch. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  20. Advances in industrial biopharmaceutical batch process monitoring: Machine-learning methods for small data problems.

    PubMed

    Tulsyan, Aditya; Garvin, Christopher; Ündey, Cenk

    2018-04-06

    Biopharmaceutical manufacturing comprises of multiple distinct processing steps that require effective and efficient monitoring of many variables simultaneously in real-time. The state-of-the-art real-time multivariate statistical batch process monitoring (BPM) platforms have been in use in recent years to ensure comprehensive monitoring is in place as a complementary tool for continued process verification to detect weak signals. This article addresses a longstanding, industry-wide problem in BPM, referred to as the "Low-N" problem, wherein a product has a limited production history. The current best industrial practice to address the Low-N problem is to switch from a multivariate to a univariate BPM, until sufficient product history is available to build and deploy a multivariate BPM platform. Every batch run without a robust multivariate BPM platform poses risk of not detecting potential weak signals developing in the process that might have an impact on process and product performance. In this article, we propose an approach to solve the Low-N problem by generating an arbitrarily large number of in silico batches through a combination of hardware exploitation and machine-learning methods. To the best of authors' knowledge, this is the first article to provide a solution to the Low-N problem in biopharmaceutical manufacturing using machine-learning methods. Several industrial case studies from bulk drug substance manufacturing are presented to demonstrate the efficacy of the proposed approach for BPM under various Low-N scenarios. © 2018 Wiley Periodicals, Inc.

  1. Application of the Teager-Kaiser energy operator in bearing fault diagnosis.

    PubMed

    Henríquez Rodríguez, Patricia; Alonso, Jesús B; Ferrer, Miguel A; Travieso, Carlos M

    2013-03-01

    Condition monitoring of rotating machines is important in the prevention of failures. As most machine malfunctions are related to bearing failures, several bearing diagnosis techniques have been developed. Some of them feature the bearing vibration signal with statistical measures and others extract the bearing fault characteristic frequency from the AM component of the vibration signal. In this paper, we propose to transform the vibration signal to the Teager-Kaiser domain and feature it with statistical and energy-based measures. A bearing database with normal and faulty bearings is used. The diagnosis is performed with two classifiers: a neural network classifier and a LS-SVM classifier. Experiments show that the Teager domain features outperform those based on the temporal or AM signal. Copyright © 2012 ISA. Published by Elsevier Ltd. All rights reserved.

  2. Stroke dynamics and frequency of 3 phacoemulsification machines.

    PubMed

    Tognetto, Daniele; Cecchini, Paolo; Leon, Pia; Di Nicola, Marta; Ravalico, Giuseppe

    2012-02-01

    To measure the working frequency and the stroke dynamics of the phaco tip of 3 phacoemulsification machines. University Eye Clinic of Trieste, Italy. Experimental study. A video wet fixture was assembled to measure the working frequency using a micro camera and a micropulsed strobe-light system. A different video wet fixture was created to measure tip displacement as vectorial movement at different phaco powers using a microscopic video apparatus. The working frequency of the Infiniti Ozil machine was 43.0 kHz in longitudinal mode and 31.6 kHz in torsional mode. The frequency of the Whitestar Signature machine was 29.0 kHz in longitudinal mode and 38.0 kHz with the Ellips FX handpiece. The Stellaris machine had a frequency of 28.8 kHz. The longitudinal stroke of the 3 machines at different phaco powers was statistically significantly different. The Stellaris machine had the highest stroke extent (139 μm). The lateral movement of the Infiniti Ozil and Whitestar Signature machines differed significantly. No movement on the y-axis was observed for the Infiniti Ozil machine in torsional mode. The elliptical path of the Ellips FX handpiece had different x and y components at different phaco powers. The 3 phaco machines performed differently in terms of working frequency and stroke dynamics. The knowledge of the peculiar lateral and elliptical path strokes of Infiniti and Whitestar Signature machines may allow the surgeon to fully use these features for lens removal. Copyright © 2012 ASCRS and ESCRS. Published by Elsevier Inc. All rights reserved.

  3. Optical Neasurements Of Diamond-Turned Surfaces

    NASA Astrophysics Data System (ADS)

    Politch, Jacob

    1989-07-01

    We describe here a system for measuring very accurately diamond-turned surfaces. This system is based on heterodyne interfercmetry and measures surface height variations with an accuracy of 4A, and the spatial resolution is 1 micrometer. Fran the measured data we have calculated the statistical properties of the surface - enabling us to identify the spatial frequencies caused by the vibrations of the diamond - turning machine and the measuring machine as well as the frequency of the grid.

  4. Landslide susceptibility modeling applying machine learning methods: A case study from Longju in the Three Gorges Reservoir area, China

    NASA Astrophysics Data System (ADS)

    Zhou, Chao; Yin, Kunlong; Cao, Ying; Ahmed, Bayes; Li, Yuanyao; Catani, Filippo; Pourghasemi, Hamid Reza

    2018-03-01

    Landslide is a common natural hazard and responsible for extensive damage and losses in mountainous areas. In this study, Longju in the Three Gorges Reservoir area in China was taken as a case study for landslide susceptibility assessment in order to develop effective risk prevention and mitigation strategies. To begin, 202 landslides were identified, including 95 colluvial landslides and 107 rockfalls. Twelve landslide causal factor maps were prepared initially, and the relationship between these factors and each landslide type was analyzed using the information value model. Later, the unimportant factors were selected and eliminated using the information gain ratio technique. The landslide locations were randomly divided into two groups: 70% for training and 30% for verifying. Two machine learning models: the support vector machine (SVM) and artificial neural network (ANN), and a multivariate statistical model: the logistic regression (LR), were applied for landslide susceptibility modeling (LSM) for each type. The LSM index maps, obtained from combining the assessment results of the two landslide types, were classified into five levels. The performance of the LSMs was evaluated using the receiver operating characteristics curve and Friedman test. Results show that the elimination of noise-generating factors and the separated modeling of each landslide type have significantly increased the prediction accuracy. The machine learning models outperformed the multivariate statistical model and SVM model was found ideal for the case study area.

  5. Global assessment of soil organic carbon stocks and spatial distribution of histosols: the Machine Learning approach

    NASA Astrophysics Data System (ADS)

    Hengl, Tomislav

    2016-04-01

    Preliminary results of predicting distribution of soil organic soils (Histosols) and soil organic carbon stock (in tonnes per ha) using global compilations of soil profiles (about 150,000 points) and covariates at 250 m spatial resolution (about 150 covariates; mainly MODIS seasonal land products, SRTM DEM derivatives, climatic images, lithological and land cover and landform maps) are presented. We focus on using a data-driven approach i.e. Machine Learning techniques that often require no knowledge about the distribution of the target variable or knowledge about the possible relationships. Other advantages of using machine learning are (DOI: 10.1371/journal.pone.0125814): All rules required to produce outputs are formalized. The whole procedure is documented (the statistical model and associated computer script), enabling reproducible research. Predicted surfaces can make use of various information sources and can be optimized relative to all available quantitative point and covariate data. There is more flexibility in terms of the spatial extent, resolution and support of requested maps. Automated mapping is also more cost-effective: once the system is operational, maintenance and production of updates are an order of magnitude faster and cheaper. Consequently, prediction maps can be updated and improved at shorter and shorter time intervals. Some disadvantages of automated soil mapping based on Machine Learning are: Models are data-driven and any serious blunders or artifacts in the input data can propagate to order-of-magnitude larger errors than in the case of expert-based systems. Fitting machine learning models is at the order of magnitude computationally more demanding. Computing effort can be even tens of thousands higher than if e.g. linear geostatistics is used. Many machine learning models are fairly complex often abstract and any interpretation of such models is not trivial and require special multidimensional / multivariable plotting and data mining tools. Results of model fitting using the R packages nnet, randomForest and the h2o software (machine learning functions) show that significant models can be fitted for soil classes, bulk density (R-square 0.76), soil organic carbon (R-square 0.62) and coarse fragments (R-square 0.59). Consequently, we were able to estimate soil organic carbon stock for majority of the land mask (excluding permanent ice) and detect patches of landscape containing mainly organic soils (peat and similar). Our results confirm that hotspots of soil organic carbon in Tropics are peatlands in Indonesia, north of Peru, west Amazon and Congo river basin. Majority of world soil organic carbon stock is likely in the Northern latitudes (tundra and taiga of the north). Distribution of histosols seems to be mainly controlled by climatic conditions (especially temperature regime and water vapor) and hydrologic position in the landscape. Predicted distributions of organic soils (probability of occurrence) and total soil organic carbon stock at resolutions of 1 km and 250 m are available via the SoilGrids.org project homepage.

  6. School Vending Machine Purchasing Behavior: Results from the 2005 YouthStyles Survey

    ERIC Educational Resources Information Center

    Thompson, Olivia M.; Yaroch, Amy L.; Moser, Richard P.; Rutten, Lila J. Finney; Agurs-Collins, Tanya

    2010-01-01

    Background: Competitive foods are often available in school vending machines. Providing youth with access to school vending machines, and thus competitive foods, is of concern, considering the continued high prevalence of childhood obesity: competitive foods tend to be energy dense and nutrient poor and can contribute to increased energy intake in…

  7. 48 CFR 6105.502 - Request for decision [Rule 502].

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ...) Include— (A) The name, address, telephone number, facsimile machine number, and e-mail address, if available, of the official making the request; (B) The name, address, telephone number, facsimile machine... Clerk's facsimile machine number is: (202) 606-0019. The Board's working hours are 8:00 a.m. to 4:30 p.m...

  8. 48 CFR 6105.502 - Request for decision [Rule 502].

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ...) Include— (A) The name, address, telephone number, facsimile machine number, and e-mail address, if available, of the official making the request; (B) The name, address, telephone number, facsimile machine... Clerk's facsimile machine number is: (202) 606-0019. The Board's working hours are 8:00 a.m. to 4:30 p.m...

  9. Sentinel node status prediction by four statistical models: results from a large bi-institutional series (n = 1132).

    PubMed

    Mocellin, Simone; Thompson, John F; Pasquali, Sandro; Montesco, Maria C; Pilati, Pierluigi; Nitti, Donato; Saw, Robyn P; Scolyer, Richard A; Stretch, Jonathan R; Rossi, Carlo R

    2009-12-01

    To improve selection for sentinel node (SN) biopsy (SNB) in patients with cutaneous melanoma using statistical models predicting SN status. About 80% of patients currently undergoing SNB are node negative. In the absence of conclusive evidence of a SNBassociated survival benefit, these patients may be over-treated. Here, we tested the efficiency of 4 different models in predicting SN status. The clinicopathologic data (age, gender, tumor thickness, Clark level, regression, ulceration, histologic subtype, and mitotic index) of 1132 melanoma patients who had undergone SNB at institutions in Italy and Australia were analyzed. Logistic regression, classification tree, random forest, and support vector machine models were fitted to the data. The predictive models were built with the aim of maximizing the negative predictive value (NPV) and reducing the rate of SNB procedures though minimizing the error rate. After cross-validation logistic regression, classification tree, random forest, and support vector machine predictive models obtained clinically relevant NPV (93.6%, 94.0%, 97.1%, and 93.0%, respectively), SNB reduction (27.5%, 29.8%, 18.2%, and 30.1%, respectively), and error rates (1.8%, 1.8%, 0.5%, and 2.1%, respectively). Using commonly available clinicopathologic variables, predictive models can preoperatively identify a proportion of patients ( approximately 25%) who might be spared SNB, with an acceptable (1%-2%) error. If validated in large prospective series, these models might be implemented in the clinical setting for improved patient selection, which ultimately would lead to better quality of life for patients and optimization of resource allocation for the health care system.

  10. Impact of Machine Virtualization on Timing Precision for Performance-critical Tasks

    NASA Astrophysics Data System (ADS)

    Karpov, Kirill; Fedotova, Irina; Siemens, Eduard

    2017-07-01

    In this paper we present a measurement study to characterize the impact of hardware virtualization on basic software timing, as well as on precise sleep operations of an operating system. We investigated how timer hardware is shared among heavily CPU-, I/O- and Network-bound tasks on a virtual machine as well as on the host machine. VMware ESXi and QEMU/KVM have been chosen as commonly used examples of hypervisor- and host-based models. Based on statistical parameters of retrieved distributions, our results provide a very good estimation of timing behavior. It is essential for real-time and performance-critical applications such as image processing or real-time control.

  11. Supervised Machine Learning for Regionalization of Environmental Data: Distribution of Uranium in Groundwater in Ukraine

    NASA Astrophysics Data System (ADS)

    Govorov, Michael; Gienko, Gennady; Putrenko, Viktor

    2018-05-01

    In this paper, several supervised machine learning algorithms were explored to define homogeneous regions of con-centration of uranium in surface waters in Ukraine using multiple environmental parameters. The previous study was focused on finding the primary environmental parameters related to uranium in ground waters using several methods of spatial statistics and unsupervised classification. At this step, we refined the regionalization using Artifi-cial Neural Networks (ANN) techniques including Multilayer Perceptron (MLP), Radial Basis Function (RBF), and Convolutional Neural Network (CNN). The study is focused on building local ANN models which may significantly improve the prediction results of machine learning algorithms by taking into considerations non-stationarity and autocorrelation in spatial data.

  12. Model Considerations for Memory-based Automatic Music Transcription

    NASA Astrophysics Data System (ADS)

    Albrecht, Štěpán; Šmídl, Václav

    2009-12-01

    The problem of automatic music description is considered. The recorded music is modeled as a superposition of known sounds from a library weighted by unknown weights. Similar observation models are commonly used in statistics and machine learning. Many methods for estimation of the weights are available. These methods differ in the assumptions imposed on the weights. In Bayesian paradigm, these assumptions are typically expressed in the form of prior probability density function (pdf) on the weights. In this paper, commonly used assumptions about music signal are summarized and complemented by a new assumption. These assumptions are translated into pdfs and combined into a single prior density using combination of pdfs. Validity of the model is tested in simulation using synthetic data.

  13. VAT: a computational framework to functionally annotate variants in personal genomes within a cloud-computing environment.

    PubMed

    Habegger, Lukas; Balasubramanian, Suganthi; Chen, David Z; Khurana, Ekta; Sboner, Andrea; Harmanci, Arif; Rozowsky, Joel; Clarke, Declan; Snyder, Michael; Gerstein, Mark

    2012-09-01

    The functional annotation of variants obtained through sequencing projects is generally assumed to be a simple intersection of genomic coordinates with genomic features. However, complexities arise for several reasons, including the differential effects of a variant on alternatively spliced transcripts, as well as the difficulty in assessing the impact of small insertions/deletions and large structural variants. Taking these factors into consideration, we developed the Variant Annotation Tool (VAT) to functionally annotate variants from multiple personal genomes at the transcript level as well as obtain summary statistics across genes and individuals. VAT also allows visualization of the effects of different variants, integrates allele frequencies and genotype data from the underlying individuals and facilitates comparative analysis between different groups of individuals. VAT can either be run through a command-line interface or as a web application. Finally, in order to enable on-demand access and to minimize unnecessary transfers of large data files, VAT can be run as a virtual machine in a cloud-computing environment. VAT is implemented in C and PHP. The VAT web service, Amazon Machine Image, source code and detailed documentation are available at vat.gersteinlab.org.

  14. A Genetic Algorithm for Flow Shop Scheduling with Assembly Operations to Minimize Makespan

    NASA Astrophysics Data System (ADS)

    Bhongade, A. S.; Khodke, P. M.

    2014-04-01

    Manufacturing systems, in which, several parts are processed through machining workstations and later assembled to form final products, is common. Though scheduling of such problems are solved using heuristics, available solution approaches can provide solution for only moderate sized problems due to large computation time required. In this work, scheduling approach is developed for such flow-shop manufacturing system having machining workstations followed by assembly workstations. The initial schedule is generated using Disjunctive method and genetic algorithm (GA) is applied further for generating schedule for large sized problems. GA is found to give near optimal solution based on the deviation of makespan from lower bound. The lower bound of makespan of such problem is estimated and percent deviation of makespan from lower bounds is used as a performance measure to evaluate the schedules. Computational experiments are conducted on problems developed using fractional factorial orthogonal array, varying the number of parts per product, number of products, and number of workstations (ranging upto 1,520 number of operations). A statistical analysis indicated the significance of all the three factors considered. It is concluded that GA method can obtain optimal makespan.

  15. ARK: Aggregation of Reads by K-Means for Estimation of Bacterial Community Composition.

    PubMed

    Koslicki, David; Chatterjee, Saikat; Shahrivar, Damon; Walker, Alan W; Francis, Suzanna C; Fraser, Louise J; Vehkaperä, Mikko; Lan, Yueheng; Corander, Jukka

    2015-01-01

    Estimation of bacterial community composition from high-throughput sequenced 16S rRNA gene amplicons is a key task in microbial ecology. Since the sequence data from each sample typically consist of a large number of reads and are adversely impacted by different levels of biological and technical noise, accurate analysis of such large datasets is challenging. There has been a recent surge of interest in using compressed sensing inspired and convex-optimization based methods to solve the estimation problem for bacterial community composition. These methods typically rely on summarizing the sequence data by frequencies of low-order k-mers and matching this information statistically with a taxonomically structured database. Here we show that the accuracy of the resulting community composition estimates can be substantially improved by aggregating the reads from a sample with an unsupervised machine learning approach prior to the estimation phase. The aggregation of reads is a pre-processing approach where we use a standard K-means clustering algorithm that partitions a large set of reads into subsets with reasonable computational cost to provide several vectors of first order statistics instead of only single statistical summarization in terms of k-mer frequencies. The output of the clustering is then processed further to obtain the final estimate for each sample. The resulting method is called Aggregation of Reads by K-means (ARK), and it is based on a statistical argument via mixture density formulation. ARK is found to improve the fidelity and robustness of several recently introduced methods, with only a modest increase in computational complexity. An open source, platform-independent implementation of the method in the Julia programming language is freely available at https://github.com/dkoslicki/ARK. A Matlab implementation is available at http://www.ee.kth.se/ctsoftware.

  16. An investigation of chatter and tool wear when machining titanium

    NASA Technical Reports Server (NTRS)

    Sutherland, I. A.

    1974-01-01

    The low thermal conductivity of titanium, together with the low contact area between chip and tool and the unusually high chip velocities, gives rise to high tool tip temperatures and accelerated tool wear. Machining speeds have to be considerably reduced to avoid these high temperatures with a consequential loss of productivity. Restoring this lost productivity involves increasing other machining variables, such as feed and depth-of-cut, and can lead to another machining problem commonly known as chatter. This work is to acquaint users with these problems, to examine the variables that may be encountered when machining a material like titanium, and to advise the machine tool user on how to maximize the output from the machines and tooling available to him. Recommendations are made on ways of improving tolerances, reducing machine tool instability or chatter, and improving productivity. New tool materials, tool coatings, and coolants are reviewed and their relevance examined when machining titanium.

  17. Descriptive Statistics of the Genome: Phylogenetic Classification of Viruses.

    PubMed

    Hernandez, Troy; Yang, Jie

    2016-10-01

    The typical process for classifying and submitting a newly sequenced virus to the NCBI database involves two steps. First, a BLAST search is performed to determine likely family candidates. That is followed by checking the candidate families with the pairwise sequence alignment tool for similar species. The submitter's judgment is then used to determine the most likely species classification. The aim of this article is to show that this process can be automated into a fast, accurate, one-step process using the proposed alignment-free method and properly implemented machine learning techniques. We present a new family of alignment-free vectorizations of the genome, the generalized vector, that maintains the speed of existing alignment-free methods while outperforming all available methods. This new alignment-free vectorization uses the frequency of genomic words (k-mers), as is done in the composition vector, and incorporates descriptive statistics of those k-mers' positional information, as inspired by the natural vector. We analyze five different characterizations of genome similarity using k-nearest neighbor classification and evaluate these on two collections of viruses totaling over 10,000 viruses. We show that our proposed method performs better than, or as well as, other methods at every level of the phylogenetic hierarchy. The data and R code is available upon request.

  18. Discomfort analysis in computerized numeric control machine operations.

    PubMed

    Muthukumar, Krishnamoorthy; Sankaranarayanasamy, Krishnasamy; Ganguli, Anindya Kumar

    2012-06-01

    The introduction of computerized numeric control (CNC) technology in manufacturing industries has revolutionized the production process, but there are some health and safety problems associated with these machines. The present study aimed to investigate the extent of postural discomfort in CNC machine operators, and the relationship of this discomfort to the display and control panel height, with a view to validate the anthropometric recommendation for the location of the display and control panel in CNC machines. The postural discomforts associated with CNC machines were studied in 122 male operators using Corlett and Bishop's body part discomfort mapping, subject information, and discomfort level at various time intervals from starting to end of a shift. This information was collected using a questionnaire. Statistical analysis was carried out using ANOVA. Neck discomfort due to the positioning of the machine displays, and shoulder and arm discomfort due to the positioning of controls were identified as common health issues in the operators of these machines. The study revealed that 45.9% of machine operators reported discomfort in the lower back, 41.8% in the neck, 22.1% in the upper-back, 53.3% in the shoulder and arm, and 21.3% of the operators reported discomfort in the leg. Discomfort increased with the progress of the day and was highest at the end of a shift; subject age had no effect on patient tendency to experience discomfort levels.

  19. Discomfort Analysis in Computerized Numeric Control Machine Operations

    PubMed Central

    Sankaranarayanasamy, Krishnasamy; Ganguli, Anindya Kumar

    2012-01-01

    Objectives The introduction of computerized numeric control (CNC) technology in manufacturing industries has revolutionized the production process, but there are some health and safety problems associated with these machines. The present study aimed to investigate the extent of postural discomfort in CNC machine operators, and the relationship of this discomfort to the display and control panel height, with a view to validate the anthropometric recommendation for the location of the display and control panel in CNC machines. Methods The postural discomforts associated with CNC machines were studied in 122 male operators using Corlett and Bishop's body part discomfort mapping, subject information, and discomfort level at various time intervals from starting to end of a shift. This information was collected using a questionnaire. Statistical analysis was carried out using ANOVA. Results Neck discomfort due to the positioning of the machine displays, and shoulder and arm discomfort due to the positioning of controls were identified as common health issues in the operators of these machines. The study revealed that 45.9% of machine operators reported discomfort in the lower back, 41.8% in the neck, 22.1% in the upper-back, 53.3% in the shoulder and arm, and 21.3% of the operators reported discomfort in the leg. Conclusion Discomfort increased with the progress of the day and was highest at the end of a shift; subject age had no effect on patient tendency to experience discomfort levels. PMID:22993720

  20. Health-promoting vending machines: evaluation of a pediatric hospital intervention.

    PubMed

    Van Hulst, Andraea; Barnett, Tracie A; Déry, Véronique; Côté, Geneviève; Colin, Christine

    2013-01-01

    Taking advantage of a natural experiment made possible by the placement of health-promoting vending machines (HPVMs), we evaluated the impact of the intervention on consumers' attitudes toward and practices with vending machines in a pediatric hospital. Vending machines offering healthy snacks, meals, and beverages were developed to replace four vending machines offering the usual high-energy, low-nutrition fare. A pre- and post-intervention evaluation design was used; data were collected through exit surveys and six-week follow-up telephone surveys among potential vending machine users before (n=293) and after (n=226) placement of HPVMs. Chi-2 statistics were used to compare pre- and post-intervention participants' responses. More than 90% of pre- and post-intervention participants were satisfied with their purchase. Post-intervention participants were more likely to state that nutritional content and appropriateness of portion size were elements that influenced their purchase. Overall, post-intervention participants were more likely than pre-intervention participants to perceive as healthy the options offered by the hospital vending machines. Thirty-three percent of post-intervention participants recalled two or more sources of information integrated in the HPVM concept. No differences were found between pre- and post-intervention participants' readiness to adopt healthy diets. While the HPVM project had challenges as well as strengths, vending machines offering healthy snacks are feasible in hospital settings.

  1. Statistical interpretation of machine learning-based feature importance scores for biomarker discovery.

    PubMed

    Huynh-Thu, Vân Anh; Saeys, Yvan; Wehenkel, Louis; Geurts, Pierre

    2012-07-01

    Univariate statistical tests are widely used for biomarker discovery in bioinformatics. These procedures are simple, fast and their output is easily interpretable by biologists but they can only identify variables that provide a significant amount of information in isolation from the other variables. As biological processes are expected to involve complex interactions between variables, univariate methods thus potentially miss some informative biomarkers. Variable relevance scores provided by machine learning techniques, however, are potentially able to highlight multivariate interacting effects, but unlike the p-values returned by univariate tests, these relevance scores are usually not statistically interpretable. This lack of interpretability hampers the determination of a relevance threshold for extracting a feature subset from the rankings and also prevents the wide adoption of these methods by practicians. We evaluated several, existing and novel, procedures that extract relevant features from rankings derived from machine learning approaches. These procedures replace the relevance scores with measures that can be interpreted in a statistical way, such as p-values, false discovery rates, or family wise error rates, for which it is easier to determine a significance level. Experiments were performed on several artificial problems as well as on real microarray datasets. Although the methods differ in terms of computing times and the tradeoff, they achieve in terms of false positives and false negatives, some of them greatly help in the extraction of truly relevant biomarkers and should thus be of great practical interest for biologists and physicians. As a side conclusion, our experiments also clearly highlight that using model performance as a criterion for feature selection is often counter-productive. Python source codes of all tested methods, as well as the MATLAB scripts used for data simulation, can be found in the Supplementary Material.

  2. CMM Interim Check (U)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Montano, Joshua Daniel

    2015-03-23

    Coordinate Measuring Machines (CMM) are widely used in industry, throughout the Nuclear Weapons Complex and at Los Alamos National Laboratory (LANL) to verify part conformance to design definition. Calibration cycles for CMMs at LANL are predominantly one year in length. Unfortunately, several nonconformance reports have been generated to document the discovery of a certified machine found out of tolerance during a calibration closeout. In an effort to reduce risk to product quality two solutions were proposed – shorten the calibration cycle which could be costly, or perform an interim check to monitor the machine’s performance between cycles. The CMM interimmore » check discussed makes use of Renishaw’s Machine Checking Gauge. This off-the-shelf product simulates a large sphere within a CMM’s measurement volume and allows for error estimation. Data was gathered, analyzed, and simulated from seven machines in seventeen different configurations to create statistical process control run charts for on-the-floor monitoring.« less

  3. A Comparison of Machine Learning Approaches for Corn Yield Estimation

    NASA Astrophysics Data System (ADS)

    Kim, N.; Lee, Y. W.

    2017-12-01

    Machine learning is an efficient empirical method for classification and prediction, and it is another approach to crop yield estimation. The objective of this study is to estimate corn yield in the Midwestern United States by employing the machine learning approaches such as the support vector machine (SVM), random forest (RF), and deep neural networks (DNN), and to perform the comprehensive comparison for their results. We constructed the database using satellite images from MODIS, the climate data of PRISM climate group, and GLDAS soil moisture data. In addition, to examine the seasonal sensitivities of corn yields, two period groups were set up: May to September (MJJAS) and July and August (JA). In overall, the DNN showed the highest accuracies in term of the correlation coefficient for the two period groups. The differences between our predictions and USDA yield statistics were about 10-11 %.

  4. NASA's online machine aided indexing system

    NASA Technical Reports Server (NTRS)

    Silvester, June P.; Genuardi, Michael T.; Klingbiel, Paul H.

    1993-01-01

    This report describes the NASA Lexical Dictionary, a machine aided indexing system used online at the National Aeronautics and Space Administration's Center for Aerospace Information (CASI). This system is comprised of a text processor that is based on the computational, non-syntactic analysis of input text, and an extensive 'knowledge base' that serves to recognize and translate text-extracted concepts. The structure and function of the various NLD system components are described in detail. Methods used for the development of the knowledge base are discussed. Particular attention is given to a statistically-based text analysis program that provides the knowledge base developer with a list of concept-specific phrases extracted from large textual corpora. Production and quality benefits resulting from the integration of machine aided indexing at CASI are discussed along with a number of secondary applications of NLD-derived systems including on-line spell checking and machine aided lexicography.

  5. Modeling Geomagnetic Variations using a Machine Learning Framework

    NASA Astrophysics Data System (ADS)

    Cheung, C. M. M.; Handmer, C.; Kosar, B.; Gerules, G.; Poduval, B.; Mackintosh, G.; Munoz-Jaramillo, A.; Bobra, M.; Hernandez, T.; McGranaghan, R. M.

    2017-12-01

    We present a framework for data-driven modeling of Heliophysics time series data. The Solar Terrestrial Interaction Neural net Generator (STING) is an open source python module built on top of state-of-the-art statistical learning frameworks (traditional machine learning methods as well as deep learning). To showcase the capability of STING, we deploy it for the problem of predicting the temporal variation of geomagnetic fields. The data used includes solar wind measurements from the OMNI database and geomagnetic field data taken by magnetometers at US Geological Survey observatories. We examine the predictive capability of different machine learning techniques (recurrent neural networks, support vector machines) for a range of forecasting times (minutes to 12 hours). STING is designed to be extensible to other types of data. We show how STING can be used on large sets of data from different sensors/observatories and adapted to tackle other problems in Heliophysics.

  6. Machine Learning in the Big Data Era: Are We There Yet?

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sukumar, Sreenivas Rangan

    In this paper, we discuss the machine learning challenges of the Big Data era. We observe that recent innovations in being able to collect, access, organize, integrate, and query massive amounts of data from a wide variety of data sources have brought statistical machine learning under more scrutiny and evaluation for gleaning insights from the data than ever before. In that context, we pose and debate the question - Are machine learning algorithms scaling with the ability to store and compute? If yes, how? If not, why not? We survey recent developments in the state-of-the-art to discuss emerging and outstandingmore » challenges in the design and implementation of machine learning algorithms at scale. We leverage experience from real-world Big Data knowledge discovery projects across domains of national security and healthcare to suggest our efforts be focused along the following axes: (i) the data science challenge - designing scalable and flexible computational architectures for machine learning (beyond just data-retrieval); (ii) the science of data challenge the ability to understand characteristics of data before applying machine learning algorithms and tools; and (iii) the scalable predictive functions challenge the ability to construct, learn and infer with increasing sample size, dimensionality, and categories of labels. We conclude with a discussion of opportunities and directions for future research.« less

  7. Mortality risk prediction in burn injury: Comparison of logistic regression with machine learning approaches.

    PubMed

    Stylianou, Neophytos; Akbarov, Artur; Kontopantelis, Evangelos; Buchan, Iain; Dunn, Ken W

    2015-08-01

    Predicting mortality from burn injury has traditionally employed logistic regression models. Alternative machine learning methods have been introduced in some areas of clinical prediction as the necessary software and computational facilities have become accessible. Here we compare logistic regression and machine learning predictions of mortality from burn. An established logistic mortality model was compared to machine learning methods (artificial neural network, support vector machine, random forests and naïve Bayes) using a population-based (England & Wales) case-cohort registry. Predictive evaluation used: area under the receiver operating characteristic curve; sensitivity; specificity; positive predictive value and Youden's index. All methods had comparable discriminatory abilities, similar sensitivities, specificities and positive predictive values. Although some machine learning methods performed marginally better than logistic regression the differences were seldom statistically significant and clinically insubstantial. Random forests were marginally better for high positive predictive value and reasonable sensitivity. Neural networks yielded slightly better prediction overall. Logistic regression gives an optimal mix of performance and interpretability. The established logistic regression model of burn mortality performs well against more complex alternatives. Clinical prediction with a small set of strong, stable, independent predictors is unlikely to gain much from machine learning outside specialist research contexts. Copyright © 2015 Elsevier Ltd and ISBI. All rights reserved.

  8. Prediction of outcome in internet-delivered cognitive behaviour therapy for paediatric obsessive-compulsive disorder: A machine learning approach.

    PubMed

    Lenhard, Fabian; Sauer, Sebastian; Andersson, Erik; Månsson, Kristoffer Nt; Mataix-Cols, David; Rück, Christian; Serlachius, Eva

    2018-03-01

    There are no consistent predictors of treatment outcome in paediatric obsessive-compulsive disorder (OCD). One reason for this might be the use of suboptimal statistical methodology. Machine learning is an approach to efficiently analyse complex data. Machine learning has been widely used within other fields, but has rarely been tested in the prediction of paediatric mental health treatment outcomes. To test four different machine learning methods in the prediction of treatment response in a sample of paediatric OCD patients who had received Internet-delivered cognitive behaviour therapy (ICBT). Participants were 61 adolescents (12-17 years) who enrolled in a randomized controlled trial and received ICBT. All clinical baseline variables were used to predict strictly defined treatment response status three months after ICBT. Four machine learning algorithms were implemented. For comparison, we also employed a traditional logistic regression approach. Multivariate logistic regression could not detect any significant predictors. In contrast, all four machine learning algorithms performed well in the prediction of treatment response, with 75 to 83% accuracy. The results suggest that machine learning algorithms can successfully be applied to predict paediatric OCD treatment outcome. Validation studies and studies in other disorders are warranted. Copyright © 2017 John Wiley & Sons, Ltd.

  9. On the Safety of Machine Learning: Cyber-Physical Systems, Decision Sciences, and Data Products.

    PubMed

    Varshney, Kush R; Alemzadeh, Homa

    2017-09-01

    Machine learning algorithms increasingly influence our decisions and interact with us in all parts of our daily lives. Therefore, just as we consider the safety of power plants, highways, and a variety of other engineered socio-technical systems, we must also take into account the safety of systems involving machine learning. Heretofore, the definition of safety has not been formalized in a machine learning context. In this article, we do so by defining machine learning safety in terms of risk, epistemic uncertainty, and the harm incurred by unwanted outcomes. We then use this definition to examine safety in all sorts of applications in cyber-physical systems, decision sciences, and data products. We find that the foundational principle of modern statistical machine learning, empirical risk minimization, is not always a sufficient objective. We discuss how four different categories of strategies for achieving safety in engineering, including inherently safe design, safety reserves, safe fail, and procedural safeguards can be mapped to a machine learning context. We then discuss example techniques that can be adopted in each category, such as considering interpretability and causality of predictive models, objective functions beyond expected prediction accuracy, human involvement for labeling difficult or rare examples, and user experience design of software and open data.

  10. Are we there yet?

    PubMed

    Cristianini, Nello

    2010-05-01

    Statistical approaches to Artificial Intelligence are behind most success stories of the field in the past decade. The idea of generating non-trivial behaviour by analysing vast amounts of data has enabled recommendation systems, search engines, spam filters, optical character recognition, machine translation and speech recognition, among other things. As we celebrate the spectacular achievements of this line of research, we need to assess its full potential and its limitations. What are the next steps to take towards machine intelligence? 2010 Elsevier Ltd. All rights reserved.

  11. Machine Learning Methods for Production Cases Analysis

    NASA Astrophysics Data System (ADS)

    Mokrova, Nataliya V.; Mokrov, Alexander M.; Safonova, Alexandra V.; Vishnyakov, Igor V.

    2018-03-01

    Approach to analysis of events occurring during the production process were proposed. Described machine learning system is able to solve classification tasks related to production control and hazard identification at an early stage. Descriptors of the internal production network data were used for training and testing of applied models. k-Nearest Neighbors and Random forest methods were used to illustrate and analyze proposed solution. The quality of the developed classifiers was estimated using standard statistical metrics, such as precision, recall and accuracy.

  12. Machine learning properties of binary wurtzite superlattices

    DOE PAGES

    Pilania, G.; Liu, X. -Y.

    2018-01-12

    The burgeoning paradigm of high-throughput computations and materials informatics brings new opportunities in terms of targeted materials design and discovery. The discovery process can be significantly accelerated and streamlined if one can learn effectively from available knowledge and past data to predict materials properties efficiently. Indeed, a very active area in materials science research is to develop machine learning based methods that can deliver automated and cross-validated predictive models using either already available materials data or new data generated in a targeted manner. In the present paper, we show that fast and accurate predictions of a wide range of propertiesmore » of binary wurtzite superlattices, formed by a diverse set of chemistries, can be made by employing state-of-the-art statistical learning methods trained on quantum mechanical computations in combination with a judiciously chosen numerical representation to encode materials’ similarity. These surrogate learning models then allow for efficient screening of vast chemical spaces by providing instant predictions of the targeted properties. Moreover, the models can be systematically improved in an adaptive manner, incorporate properties computed at different levels of fidelities and are naturally amenable to inverse materials design strategies. Finally, while the learning approach to make predictions for a wide range of properties (including structural, elastic and electronic properties) is demonstrated here for a specific example set containing more than 1200 binary wurtzite superlattices, the adopted framework is equally applicable to other classes of materials as well.« less

  13. Machine learning properties of binary wurtzite superlattices

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pilania, G.; Liu, X. -Y.

    The burgeoning paradigm of high-throughput computations and materials informatics brings new opportunities in terms of targeted materials design and discovery. The discovery process can be significantly accelerated and streamlined if one can learn effectively from available knowledge and past data to predict materials properties efficiently. Indeed, a very active area in materials science research is to develop machine learning based methods that can deliver automated and cross-validated predictive models using either already available materials data or new data generated in a targeted manner. In the present paper, we show that fast and accurate predictions of a wide range of propertiesmore » of binary wurtzite superlattices, formed by a diverse set of chemistries, can be made by employing state-of-the-art statistical learning methods trained on quantum mechanical computations in combination with a judiciously chosen numerical representation to encode materials’ similarity. These surrogate learning models then allow for efficient screening of vast chemical spaces by providing instant predictions of the targeted properties. Moreover, the models can be systematically improved in an adaptive manner, incorporate properties computed at different levels of fidelities and are naturally amenable to inverse materials design strategies. Finally, while the learning approach to make predictions for a wide range of properties (including structural, elastic and electronic properties) is demonstrated here for a specific example set containing more than 1200 binary wurtzite superlattices, the adopted framework is equally applicable to other classes of materials as well.« less

  14. TU-FG-201-05: Varian MPC as a Statistical Process Control Tool

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Carver, A; Rowbottom, C

    Purpose: Quality assurance in radiotherapy requires the measurement of various machine parameters to ensure they remain within permitted values over time. In Truebeam release 2.0 the Machine Performance Check (MPC) was released allowing beam output and machine axis movements to be assessed in a single test. We aim to evaluate the Varian Machine Performance Check (MPC) as a tool for Statistical Process Control (SPC). Methods: Varian’s MPC tool was used on three Truebeam and one EDGE linac for a period of approximately one year. MPC was commissioned against independent systems. After this period the data were reviewed to determine whethermore » or not the MPC was useful as a process control tool. Analyses on individual tests were analysed using Shewhart control plots, using Matlab for analysis. Principal component analysis was used to determine if a multivariate model was of any benefit in analysing the data. Results: Control charts were found to be useful to detect beam output changes, worn T-nuts and jaw calibration issues. Upper and lower control limits were defined at the 95% level. Multivariate SPC was performed using Principal Component Analysis. We found little evidence of clustering beyond that which might be naively expected such as beam uniformity and beam output. Whilst this makes multivariate analysis of little use it suggests that each test is giving independent information. Conclusion: The variety of independent parameters tested in MPC makes it a sensitive tool for routine machine QA. We have determined that using control charts in our QA programme would rapidly detect changes in machine performance. The use of control charts allows large quantities of tests to be performed on all linacs without visual inspection of all results. The use of control limits alerts users when data are inconsistent with previous measurements before they become out of specification. A. Carver has received a speaker’s honorarium from Varian.« less

  15. Prevalence and associated factors of work related musculoskeletal disorders among commercial milling machine operators in South-Eastern Nigerian markets.

    PubMed

    Ojukwu, Chidiebele Petronilla; Anyanwu, Godson Emeka; Nwabueze, Augustine Chijindu; Anekwu, Emelie Morris; Chukwu, Sylvester Caesar

    2017-01-01

    Milling machine operators perform physically demanding tasks that can lead to work related musculoskeletal disorders (WRMSDs), but literature on WRMSDs among milling machine operators is scarce. Knowledge of prevalence and risk factors of WRMSDs can be an appropriate base for planning and implementing ergonomics intervention programs in the workplace. This study aimed to determine the prevalence, pattern and associated factors of WRMSDs among commercial milling machine operators in Enugu, Nigeria. This cross-sectional survey involved 148 commercial milling machine operators (74 hand-operated milling machine operators (HOMMO) and 74 electrically-operated milling machine operators (EOMMO)), within the age range of 18-65 years, who were conveniently selected from four markets in Enugu, Nigeria. A standard Nordic questionnaire was used to assess the prevalence of WRMSDs among the participants. Data were summarized using descriptive statistics. There was a significant difference (p = 0.001) related to prevalence of WRMSDs between HOMMOs (77%) and EOMMOs (50%). All body parts were affected in both groups and shoulders (85.1%) and lower back (46%) had the highest percentage of prevalence. Working in awkward and same postures, working with injury, poor workplace design, repetition of tasks, vibratory working equipments, reduced rest, high job demand and heavy lifting were significantly associated with the prevalence of WRMSDs. WRMSDs are prevalent among commercial milling machine operators with higher occurrence in HOMMOs. Ergonomic interventions, including the re-design of milling machines and appropriate work posture education of machine operators are recommended in the milling industry.

  16. Nutritional value of foods sold in vending machines in a UK University: Formative, cross-sectional research to inform an environmental intervention.

    PubMed

    Park, Hanla; Papadaki, Angeliki

    2016-01-01

    Vending machine use has been associated with low dietary quality among children but there is limited evidence on its role in food habits of University students. We aimed to examine the nutritional value of foods sold in vending machines in a UK University and conduct formative research to investigate differences in food intake and body weight by vending machine use among 137 University students. The nutrient content of snacks and beverages available at nine campus vending machines was assessed by direct observation in May 2014. Participants (mean age 22.5 years; 54% males) subsequently completed a self-administered questionnaire to assess vending machine behaviours and food intake. Self-reported weight and height were collected. Vending machine snacks were generally high in sugar, fat and saturated fat, whereas most beverages were high in sugar. Seventy three participants (53.3%) used vending machines more than once per week and 82.2% (n 60) of vending machine users used them to snack between meals. Vending machine accessibility was positively correlated with vending machine use (r = 0.209, P = 0.015). Vending machine users, compared to non-users, reported a significantly higher weekly consumption of savoury snacks (5.2 vs. 2.8, P = 0.014), fruit juice (6.5 vs. 4.3, P = 0.035), soft drinks (5.1 vs. 1.9, P = 0.006), meat products (8.3 vs. 5.6, P = 0.029) and microwave meals (2.0 vs. 1.3, P = 0.020). No between-group differences were found in body weight. Most foods available from vending machines in this UK University were of low nutritional quality. In this sample of University students, vending machine users displayed several unfavourable dietary behaviours, compared to non-users. Findings can be used to inform the development of an environmental intervention that will focus on vending machines to improve dietary behaviours in University students in the UK. Copyright © 2015 Elsevier Ltd. All rights reserved.

  17. Multivariate analysis of fMRI time series: classification and regression of brain responses using machine learning.

    PubMed

    Formisano, Elia; De Martino, Federico; Valente, Giancarlo

    2008-09-01

    Machine learning and pattern recognition techniques are being increasingly employed in functional magnetic resonance imaging (fMRI) data analysis. By taking into account the full spatial pattern of brain activity measured simultaneously at many locations, these methods allow detecting subtle, non-strictly localized effects that may remain invisible to the conventional analysis with univariate statistical methods. In typical fMRI applications, pattern recognition algorithms "learn" a functional relationship between brain response patterns and a perceptual, cognitive or behavioral state of a subject expressed in terms of a label, which may assume discrete (classification) or continuous (regression) values. This learned functional relationship is then used to predict the unseen labels from a new data set ("brain reading"). In this article, we describe the mathematical foundations of machine learning applications in fMRI. We focus on two methods, support vector machines and relevance vector machines, which are respectively suited for the classification and regression of fMRI patterns. Furthermore, by means of several examples and applications, we illustrate and discuss the methodological challenges of using machine learning algorithms in the context of fMRI data analysis.

  18. Inverse Problems in Geodynamics Using Machine Learning Algorithms

    NASA Astrophysics Data System (ADS)

    Shahnas, M. H.; Yuen, D. A.; Pysklywec, R. N.

    2018-01-01

    During the past few decades numerical studies have been widely employed to explore the style of circulation and mixing in the mantle of Earth and other planets. However, in geodynamical studies there are many properties from mineral physics, geochemistry, and petrology in these numerical models. Machine learning, as a computational statistic-related technique and a subfield of artificial intelligence, has rapidly emerged recently in many fields of sciences and engineering. We focus here on the application of supervised machine learning (SML) algorithms in predictions of mantle flow processes. Specifically, we emphasize on estimating mantle properties by employing machine learning techniques in solving an inverse problem. Using snapshots of numerical convection models as training samples, we enable machine learning models to determine the magnitude of the spin transition-induced density anomalies that can cause flow stagnation at midmantle depths. Employing support vector machine algorithms, we show that SML techniques can successfully predict the magnitude of mantle density anomalies and can also be used in characterizing mantle flow patterns. The technique can be extended to more complex geodynamic problems in mantle dynamics by employing deep learning algorithms for putting constraints on properties such as viscosity, elastic parameters, and the nature of thermal and chemical anomalies.

  19. Phenolic cutter for machining foam insulation

    NASA Technical Reports Server (NTRS)

    Blair, T. A.; Miller, A. C.; Price, B. W.; Stiles, W. S.

    1970-01-01

    Pre-pregged fiber glass is an efficient abrasive for machining polystyrene and polyurethane foams. It bonds easily to any cutter base made of aluminum, steel, or phenolic, is inexpensive, and is readily available.

  20. Tensile strength of laser welded cobalt-chromium alloy with and without an argon atmosphere.

    PubMed

    Tartari, Anna; Clark, Robert K F; Juszczyk, Andrzej S; Radford, David R

    2010-06-01

    The tensile strength and depth of weld of two cobalt chromium alloys before and after laser welding with and without an argon gas atmosphere were investigated. Using two cobalt chromium alloys, rod shaped specimens (5 cm x 1.5 mm) were cast. Specimens were sand blasted, sectioned and welded with a pulsed Nd: YAG laser welding machine and tested in tension using an Instron universal testing machine. A statistically significant difference in tensile strength was observed between the two alloys. The tensile strength of specimens following laser welding was significantly less than the unwelded controls. Scanning electron microscopy showed that the micro-structure of the cast alloy was altered in the region of the weld. No statistically significant difference was found between specimens welded with or without an argon atmosphere.

  1. Statistical Mechanics of Coherent Ising Machine — The Case of Ferromagnetic and Finite-Loading Hopfield Models —

    NASA Astrophysics Data System (ADS)

    Aonishi, Toru; Mimura, Kazushi; Utsunomiya, Shoko; Okada, Masato; Yamamoto, Yoshihisa

    2017-10-01

    The coherent Ising machine (CIM) has attracted attention as one of the most effective Ising computing architectures for solving large scale optimization problems because of its scalability and high-speed computational ability. However, it is difficult to implement the Ising computation in the CIM because the theories and techniques of classical thermodynamic equilibrium Ising spin systems cannot be directly applied to the CIM. This means we have to adapt these theories and techniques to the CIM. Here we focus on a ferromagnetic model and a finite loading Hopfield model, which are canonical models sharing a common mathematical structure with almost all other Ising models. We derive macroscopic equations to capture nonequilibrium phase transitions in these models. The statistical mechanical methods developed here constitute a basis for constructing evaluation methods for other Ising computation models.

  2. Machine learning classifier using abnormal brain network topological metrics in major depressive disorder.

    PubMed

    Guo, Hao; Cao, Xiaohua; Liu, Zhifen; Li, Haifang; Chen, Junjie; Zhang, Kerang

    2012-12-05

    Resting state functional brain networks have been widely studied in brain disease research. However, it is currently unclear whether abnormal resting state functional brain network metrics can be used with machine learning for the classification of brain diseases. Resting state functional brain networks were constructed for 28 healthy controls and 38 major depressive disorder patients by thresholding partial correlation matrices of 90 regions. Three nodal metrics were calculated using graph theory-based approaches. Nonparametric permutation tests were then used for group comparisons of topological metrics, which were used as classified features in six different algorithms. We used statistical significance as the threshold for selecting features and measured the accuracies of six classifiers with different number of features. A sensitivity analysis method was used to evaluate the importance of different features. The result indicated that some of the regions exhibited significantly abnormal nodal centralities, including the limbic system, basal ganglia, medial temporal, and prefrontal regions. Support vector machine with radial basis kernel function algorithm and neural network algorithm exhibited the highest average accuracy (79.27 and 78.22%, respectively) with 28 features (P<0.05). Correlation analysis between feature importance and the statistical significance of metrics was investigated, and the results revealed a strong positive correlation between them. Overall, the current study demonstrated that major depressive disorder is associated with abnormal functional brain network topological metrics and statistically significant nodal metrics can be successfully used for feature selection in classification algorithms.

  3. A framework for medical image retrieval using machine learning and statistical similarity matching techniques with relevance feedback.

    PubMed

    Rahman, Md Mahmudur; Bhattacharya, Prabir; Desai, Bipin C

    2007-01-01

    A content-based image retrieval (CBIR) framework for diverse collection of medical images of different imaging modalities, anatomic regions with different orientations and biological systems is proposed. Organization of images in such a database (DB) is well defined with predefined semantic categories; hence, it can be useful for category-specific searching. The proposed framework consists of machine learning methods for image prefiltering, similarity matching using statistical distance measures, and a relevance feedback (RF) scheme. To narrow down the semantic gap and increase the retrieval efficiency, we investigate both supervised and unsupervised learning techniques to associate low-level global image features (e.g., color, texture, and edge) in the projected PCA-based eigenspace with their high-level semantic and visual categories. Specially, we explore the use of a probabilistic multiclass support vector machine (SVM) and fuzzy c-mean (FCM) clustering for categorization and prefiltering of images to reduce the search space. A category-specific statistical similarity matching is proposed in a finer level on the prefiltered images. To incorporate a better perception subjectivity, an RF mechanism is also added to update the query parameters dynamically and adjust the proposed matching functions. Experiments are based on a ground-truth DB consisting of 5000 diverse medical images of 20 predefined categories. Analysis of results based on cross-validation (CV) accuracy and precision-recall for image categorization and retrieval is reported. It demonstrates the improvement, effectiveness, and efficiency achieved by the proposed framework.

  4. Retention of veneered stainless steel crowns on replicated typodont primary incisors: an in vitro study.

    PubMed

    Guelmann, Marcio; Gehring, Daren F; Turner, Clara

    2003-01-01

    The purpose of this in vitro study was to determine the effect of crimping and cementation on retention of veneered stainless steel crowns. One hundred twenty crowns, 90 from 3 commercially available brands of veneered stainless steel crowns (Dura Crown, Kinder Krown, and NuSmile Primary Crown) and 30 (plain) Unitek stainless steel crowns were assessed for retention. An orthodontic wire was soldered perpendicular to the incisal edge of the crowns; the crowns were fitted to acrylic replicas of ideal crown preparations and were divided equally into 3 test groups: group 1--crowns were crimped only (no cement used); group 2--crowns were cemented only; and group 3--crowns were crimped and cemented to the acrylic replicas. An Instron machine recorded the amount of force necessary to dislodge the crowns and the results were statistically analyzed using 2-way ANOVA and Tukey honestly significant difference (HSD) test. Group 3 was statistically more retentive than groups 1 and 2. Group 2 was statistically more retentive than group 1 (P < .001). In group 1, Unitek crowns were statistically more retentive than the veneered crowns (P < .05). In group 2, NuSmile crowns showed statistically less retention values than all other crowns (P < .05). In group 3,Kinder Krown crowns showed statistically better retention rates than all other brands (P < .05). Significantly higher retention values were obtained for all brands tested when crimping and cement were combined. The crowns with veneer facings were significantly more retentive than the nonveneered ones when cement and crimping were combined.

  5. Investigation of machinability characteristics on EN47 steel for cutting force and tool wear using optimization technique

    NASA Astrophysics Data System (ADS)

    M, Vasu; Shivananda Nayaka, H.

    2018-06-01

    In this experimental work dry turning process carried out on EN47 spring steel with coated tungsten carbide tool insert with 0.8 mm nose radius are optimized by using statistical technique. Experiments were conducted at three different cutting speeds (625, 796 and 1250 rpm) with three different feed rates (0.046, 0.062 and 0.093 mm/rev) and depth of cuts (0.2, 0.3 and 0.4 mm). Experiments are conducted based on full factorial design (FFD) 33 three factors and three levels. Analysis of variance is used to identify significant factor for each output response. The result reveals that feed rate is the most significant factor influencing on cutting force followed by depth of cut and cutting speed having less significance. Optimum machining condition for cutting force obtained from the statistical technique. Tool wear measurements are performed with optimum condition of Vc = 796 rpm, ap = 0.2 mm, f = 0.046 mm/rev. The minimum tool wear observed as 0.086 mm with 5 min machining. Analysis of tool wear was done by confocal microscope it was observed that tool wear increases with increasing cutting time.

  6. Noise induced hearing loss of forest workers in Turkey.

    PubMed

    Tunay, M; Melemez, K

    2008-09-01

    In this study, a total number of 114 workers who were in 3 different groups in terms of age and work underwent audiometric analysis. In order to determine whether there was a statistically significant difference between the hearing loss levels of the workers who were included in the study, variance analysis was applied with the help of the data obtained as a result of the evaluation. Correlation and regression analysis were applied in order to determine the relations between hearing loss and their age and their time of work. As a result of the variance analysis, statistically significant differences were found at 500, 2000 and 4000 Hz frequencies. The most specific difference was observed among chainsaw machine operators at 4000 Hz frequency, which was determined by the variance analysis. As a result of the correlation analysis, significant relations were found between time of work and hearing loss in 0.01 confidence level and between age and hearing loss in 0.05 confidence level. Forest workers using chainsaw machines should be informed, they should wear or use protective materials and less noising chainsaw machines should be used if possible and workers should undergo audiometric tests when they start work and once a year.

  7. A Character Level Based and Word Level Based Approach for Chinese-Vietnamese Machine Translation

    PubMed Central

    2016-01-01

    Chinese and Vietnamese have the same isolated language; that is, the words are not delimited by spaces. In machine translation, word segmentation is often done first when translating from Chinese or Vietnamese into different languages (typically English) and vice versa. However, it is a matter for consideration that words may or may not be segmented when translating between two languages in which spaces are not used between words, such as Chinese and Vietnamese. Since Chinese-Vietnamese is a low-resource language pair, the sparse data problem is evident in the translation system of this language pair. Therefore, while translating, whether it should be segmented or not becomes more important. In this paper, we propose a new method for translating Chinese to Vietnamese based on a combination of the advantages of character level and word level translation. In addition, a hybrid approach that combines statistics and rules is used to translate on the word level. And at the character level, a statistical translation is used. The experimental results showed that our method improved the performance of machine translation over that of character or word level translation. PMID:27446207

  8. Fault diagnosis of automobile hydraulic brake system using statistical features and support vector machines

    NASA Astrophysics Data System (ADS)

    Jegadeeshwaran, R.; Sugumaran, V.

    2015-02-01

    Hydraulic brakes in automobiles are important components for the safety of passengers; therefore, the brakes are a good subject for condition monitoring. The condition of the brake components can be monitored by using the vibration characteristics. On-line condition monitoring by using machine learning approach is proposed in this paper as a possible solution to such problems. The vibration signals for both good as well as faulty conditions of brakes were acquired from a hydraulic brake test setup with the help of a piezoelectric transducer and a data acquisition system. Descriptive statistical features were extracted from the acquired vibration signals and the feature selection was carried out using the C4.5 decision tree algorithm. There is no specific method to find the right number of features required for classification for a given problem. Hence an extensive study is needed to find the optimum number of features. The effect of the number of features was also studied, by using the decision tree as well as Support Vector Machines (SVM). The selected features were classified using the C-SVM and Nu-SVM with different kernel functions. The results are discussed and the conclusion of the study is presented.

  9. Secure Autonomous Automated Scheduling (SAAS). Rev. 1.1

    NASA Technical Reports Server (NTRS)

    Walke, Jon G.; Dikeman, Larry; Sage, Stephen P.; Miller, Eric M.

    2010-01-01

    This report describes network-centric operations, where a virtual mission operations center autonomously receives sensor triggers, and schedules space and ground assets using Internet-based technologies and service-oriented architectures. For proof-of-concept purposes, sensor triggers are received from the United States Geological Survey (USGS) to determine targets for space-based sensors. The Surrey Satellite Technology Limited (SSTL) Disaster Monitoring Constellation satellite, the UK-DMC, is used as the space-based sensor. The UK-DMC's availability is determined via machine-to-machine communications using SSTL's mission planning system. Access to/from the UK-DMC for tasking and sensor data is via SSTL's and Universal Space Network's (USN) ground assets. The availability and scheduling of USN's assets can also be performed autonomously via machine-to-machine communications. All communication, both on the ground and between ground and space, uses open Internet standards

  10. Enhanced networked server management with random remote backups

    NASA Astrophysics Data System (ADS)

    Kim, Song-Kyoo

    2003-08-01

    In this paper, the model is focused on available server management in network environments. The (remote) backup servers are hooked up by VPN (Virtual Private Network) and replace broken main severs immediately. A virtual private network (VPN) is a way to use a public network infrastructure and hooks up long-distance servers within a single network infrastructure. The servers can be represent as "machines" and then the system deals with main unreliable and random auxiliary spare (remote backup) machines. When the system performs a mandatory routine maintenance, auxiliary machines are being used for backups during idle periods. Unlike other existing models, the availability of auxiliary machines is changed for each activation in this enhanced model. Analytically tractable results are obtained by using several mathematical techniques and the results are demonstrated in the framework of optimized networked server allocation problems.

  11. Human Rights Texts: Converting Human Rights Primary Source Documents into Data.

    PubMed

    Fariss, Christopher J; Linder, Fridolin J; Jones, Zachary M; Crabtree, Charles D; Biek, Megan A; Ross, Ana-Sophia M; Kaur, Taranamol; Tsai, Michael

    2015-01-01

    We introduce and make publicly available a large corpus of digitized primary source human rights documents which are published annually by monitoring agencies that include Amnesty International, Human Rights Watch, the Lawyers Committee for Human Rights, and the United States Department of State. In addition to the digitized text, we also make available and describe document-term matrices, which are datasets that systematically organize the word counts from each unique document by each unique term within the corpus of human rights documents. To contextualize the importance of this corpus, we describe the development of coding procedures in the human rights community and several existing categorical indicators that have been created by human coding of the human rights documents contained in the corpus. We then discuss how the new human rights corpus and the existing human rights datasets can be used with a variety of statistical analyses and machine learning algorithms to help scholars understand how human rights practices and reporting have evolved over time. We close with a discussion of our plans for dataset maintenance, updating, and availability.

  12. Human Rights Texts: Converting Human Rights Primary Source Documents into Data

    PubMed Central

    Fariss, Christopher J.; Linder, Fridolin J.; Jones, Zachary M.; Crabtree, Charles D.; Biek, Megan A.; Ross, Ana-Sophia M.; Kaur, Taranamol; Tsai, Michael

    2015-01-01

    We introduce and make publicly available a large corpus of digitized primary source human rights documents which are published annually by monitoring agencies that include Amnesty International, Human Rights Watch, the Lawyers Committee for Human Rights, and the United States Department of State. In addition to the digitized text, we also make available and describe document-term matrices, which are datasets that systematically organize the word counts from each unique document by each unique term within the corpus of human rights documents. To contextualize the importance of this corpus, we describe the development of coding procedures in the human rights community and several existing categorical indicators that have been created by human coding of the human rights documents contained in the corpus. We then discuss how the new human rights corpus and the existing human rights datasets can be used with a variety of statistical analyses and machine learning algorithms to help scholars understand how human rights practices and reporting have evolved over time. We close with a discussion of our plans for dataset maintenance, updating, and availability. PMID:26418817

  13. An Evaluation of Online Machine Translation of Arabic into English News Headlines: Implications on Students' Learning Purposes

    ERIC Educational Resources Information Center

    Kadhim, Kais A.; Habeeb, Luwaytha S.; Sapar, Ahmad Arifin; Hussin, Zaharah; Abdullah, Muhammad Ridhuan Tony Lim

    2013-01-01

    Nowadays, online Machine Translation (MT) is used widely with translation software, such as Google and Babylon, being easily available and downloadable. This study aims to test the translation quality of these two machine systems in translating Arabic news headlines into English. 40 Arabic news headlines were selected from three online sources,…

  14. Advancing Research in Second Language Writing through Computational Tools and Machine Learning Techniques: A Research Agenda

    ERIC Educational Resources Information Center

    Crossley, Scott A.

    2013-01-01

    This paper provides an agenda for replication studies focusing on second language (L2) writing and the use of natural language processing (NLP) tools and machine learning algorithms. Specifically, it introduces a range of the available NLP tools and machine learning algorithms and demonstrates how these could be used to replicate seminal studies…

  15. Electronic vending machines for dispensing rapid HIV self-testing kits: a case study.

    PubMed

    Young, Sean D; Klausner, Jeffrey; Fynn, Risa; Bolan, Robert

    2014-02-01

    This short report evaluates the feasibility of using electronic vending machines for dispensing oral, fluid, rapid HIV self-testing kits in Los Angeles County. Feasibility criteria that needed to be addressed were defined as: (1) ability to find a manufacturer who would allow dispensing of HIV testing kits and could fit them to the dimensions of a vending machine, (2) ability to identify and address potential initial obstacles, trade-offs in choosing a machine location, and (3) ability to gain community approval for implementing this approach in a community setting. To address these issues, we contracted a vending machine company who could supply a customized, Internet-enabled machine that could dispense HIV kits and partnered with a local health center available to host the machine onsite and provide counseling to participants, if needed. Vending machines appear to be feasible technologies that can be used to distribute HIV testing kits.

  16. Electronic vending machines for dispensing rapid HIV self-testing kits: A case study

    PubMed Central

    Young, Sean D.; Klausner, Jeffrey; Fynn, Risa; Bolan, Robert

    2014-01-01

    This short report evaluates the feasibility of using electronic vending machines for dispensing oral, fluid, rapid HIV-self testing kits in Los Angeles County. Feasibility criteria that needed to be addressed were defined as: 1) ability to find a manufacturer who would allow dispensing of HIV testing kits and could fit them to the dimensions of a vending machine, 2) ability to identify and address potential initial obstacles, trade-offs in choosing a machine location, and 3) ability to gain community approval for implementing this approach in a community setting. To address these issues, we contracted a vending machine company who could supply a customized, Internet-enabled machine that could dispense HIV kits and partnered with a local health center available to host the machine onsite and provide counseling to participants, if needed. Vending machines appear to be feasible technologies that can be used to distribute HIV testing kits. PMID:23777528

  17. Support vector machine in machine condition monitoring and fault diagnosis

    NASA Astrophysics Data System (ADS)

    Widodo, Achmad; Yang, Bo-Suk

    2007-08-01

    Recently, the issue of machine condition monitoring and fault diagnosis as a part of maintenance system became global due to the potential advantages to be gained from reduced maintenance costs, improved productivity and increased machine availability. This paper presents a survey of machine condition monitoring and fault diagnosis using support vector machine (SVM). It attempts to summarize and review the recent research and developments of SVM in machine condition monitoring and diagnosis. Numerous methods have been developed based on intelligent systems such as artificial neural network, fuzzy expert system, condition-based reasoning, random forest, etc. However, the use of SVM for machine condition monitoring and fault diagnosis is still rare. SVM has excellent performance in generalization so it can produce high accuracy in classification for machine condition monitoring and diagnosis. Until 2006, the use of SVM in machine condition monitoring and fault diagnosis is tending to develop towards expertise orientation and problem-oriented domain. Finally, the ability to continually change and obtain a novel idea for machine condition monitoring and fault diagnosis using SVM will be future works.

  18. Ensemble Methods

    NASA Astrophysics Data System (ADS)

    Re, Matteo; Valentini, Giorgio

    2012-03-01

    Ensemble methods are statistical and computational learning procedures reminiscent of the human social learning behavior of seeking several opinions before making any crucial decision. The idea of combining the opinions of different "experts" to obtain an overall “ensemble” decision is rooted in our culture at least from the classical age of ancient Greece, and it has been formalized during the Enlightenment with the Condorcet Jury Theorem[45]), which proved that the judgment of a committee is superior to those of individuals, provided the individuals have reasonable competence. Ensembles are sets of learning machines that combine in some way their decisions, or their learning algorithms, or different views of data, or other specific characteristics to obtain more reliable and more accurate predictions in supervised and unsupervised learning problems [48,116]. A simple example is represented by the majority vote ensemble, by which the decisions of different learning machines are combined, and the class that receives the majority of “votes” (i.e., the class predicted by the majority of the learning machines) is the class predicted by the overall ensemble [158]. In the literature, a plethora of terms other than ensembles has been used, such as fusion, combination, aggregation, and committee, to indicate sets of learning machines that work together to solve a machine learning problem [19,40,56,66,99,108,123], but in this chapter we maintain the term ensemble in its widest meaning, in order to include the whole range of combination methods. Nowadays, ensemble methods represent one of the main current research lines in machine learning [48,116], and the interest of the research community on ensemble methods is witnessed by conferences and workshops specifically devoted to ensembles, first of all the multiple classifier systems (MCS) conference organized by Roli, Kittler, Windeatt, and other researchers of this area [14,62,85,149,173]. Several theories have been proposed to explain the characteristics and the successful application of ensembles to different application domains. For instance, Allwein, Schapire, and Singer interpreted the improved generalization capabilities of ensembles of learning machines in the framework of large margin classifiers [4,177], Kleinberg in the context of stochastic discrimination theory [112], and Breiman and Friedman in the light of the bias-variance analysis borrowed from classical statistics [21,70]. Empirical studies showed that both in classification and regression problems, ensembles improve on single learning machines, and moreover large experimental studies compared the effectiveness of different ensemble methods on benchmark data sets [10,11,49,188]. The interest in this research area is motivated also by the availability of very fast computers and networks of workstations at a relatively low cost that allow the implementation and the experimentation of complex ensemble methods using off-the-shelf computer platforms. However, as explained in Section 26.2 there are deeper reasons to use ensembles of learning machines, motivated by the intrinsic characteristics of the ensemble methods. The main aim of this chapter is to introduce ensemble methods and to provide an overview and a bibliography of the main areas of research, without pretending to be exhaustive or to explain the detailed characteristics of each ensemble method. The paper is organized as follows. In the next section, the main theoretical and practical reasons for combining multiple learners are introduced. Section 26.3 depicts the main taxonomies on ensemble methods proposed in the literature. In Section 26.4 and 26.5, we present an overview of the main supervised ensemble methods reported in the literature, adopting a simple taxonomy, originally proposed in Ref. [201]. Applications of ensemble methods are only marginally considered, but a specific section on some relevant applications of ensemble methods in astronomy and astrophysics has been added (Section 26.6). The conclusion (Section 26.7) ends this paper and lists some issues not covered in this work.

  19. Simulation-driven machine learning: Bearing fault classification

    NASA Astrophysics Data System (ADS)

    Sobie, Cameron; Freitas, Carina; Nicolai, Mike

    2018-01-01

    Increasing the accuracy of mechanical fault detection has the potential to improve system safety and economic performance by minimizing scheduled maintenance and the probability of unexpected system failure. Advances in computational performance have enabled the application of machine learning algorithms across numerous applications including condition monitoring and failure detection. Past applications of machine learning to physical failure have relied explicitly on historical data, which limits the feasibility of this approach to in-service components with extended service histories. Furthermore, recorded failure data is often only valid for the specific circumstances and components for which it was collected. This work directly addresses these challenges for roller bearings with race faults by generating training data using information gained from high resolution simulations of roller bearing dynamics, which is used to train machine learning algorithms that are then validated against four experimental datasets. Several different machine learning methodologies are compared starting from well-established statistical feature-based methods to convolutional neural networks, and a novel application of dynamic time warping (DTW) to bearing fault classification is proposed as a robust, parameter free method for race fault detection.

  20. New Trends in E-Science: Machine Learning and Knowledge Discovery in Databases

    NASA Astrophysics Data System (ADS)

    Brescia, Massimo

    2012-11-01

    Data mining, or Knowledge Discovery in Databases (KDD), while being the main methodology to extract the scientific information contained in Massive Data Sets (MDS), needs to tackle crucial problems since it has to orchestrate complex challenges posed by transparent access to different computing environments, scalability of algorithms, reusability of resources. To achieve a leap forward for the progress of e-science in the data avalanche era, the community needs to implement an infrastructure capable of performing data access, processing and mining in a distributed but integrated context. The increasing complexity of modern technologies carried out a huge production of data, whose related warehouse management and the need to optimize analysis and mining procedures lead to a change in concept on modern science. Classical data exploration, based on local user own data storage and limited computing infrastructures, is no more efficient in the case of MDS, worldwide spread over inhomogeneous data centres and requiring teraflop processing power. In this context modern experimental and observational science requires a good understanding of computer science, network infrastructures, Data Mining, etc. i.e. of all those techniques which fall into the domain of the so called e-science (recently assessed also by the Fourth Paradigm of Science). Such understanding is almost completely absent in the older generations of scientists and this reflects in the inadequacy of most academic and research programs. A paradigm shift is needed: statistical pattern recognition, object oriented programming, distributed computing, parallel programming need to become an essential part of scientific background. A possible practical solution is to provide the research community with easy-to understand, easy-to-use tools, based on the Web 2.0 technologies and Machine Learning methodology. Tools where almost all the complexity is hidden to the final user, but which are still flexible and able to produce efficient and reliable scientific results. All these considerations will be described in the detail in the chapter. Moreover, examples of modern applications offering to a wide variety of e-science communities a large spectrum of computational facilities to exploit the wealth of available massive data sets and powerful machine learning and statistical algorithms will be also introduced.

  1. Machine learning to analyze images of shocked materials for precise and accurate measurements

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dresselhaus-Cooper, Leora; Howard, Marylesa; Hock, Margaret C.

    A supervised machine learning algorithm, called locally adaptive discriminant analysis (LADA), has been developed to locate boundaries between identifiable image features that have varying intensities. LADA is an adaptation of image segmentation, which includes techniques that find the positions of image features (classes) using statistical intensity distributions for each class in the image. In order to place a pixel in the proper class, LADA considers the intensity at that pixel and the distribution of intensities in local (nearby) pixels. This paper presents the use of LADA to provide, with statistical uncertainties, the positions and shapes of features within ultrafast imagesmore » of shock waves. We demonstrate the ability to locate image features including crystals, density changes associated with shock waves, and material jetting caused by shock waves. This algorithm can analyze images that exhibit a wide range of physical phenomena because it does not rely on comparison to a model. LADA enables analysis of images from shock physics with statistical rigor independent of underlying models or simulations.« less

  2. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Webb-Robertson, Bobbie-Jo M.

    Accurate identification of peptides is a current challenge in mass spectrometry (MS) based proteomics. The standard approach uses a search routine to compare tandem mass spectra to a database of peptides associated with the target organism. These database search routines yield multiple metrics associated with the quality of the mapping of the experimental spectrum to the theoretical spectrum of a peptide. The structure of these results make separating correct from false identifications difficult and has created a false identification problem. Statistical confidence scores are an approach to battle this false positive problem that has led to significant improvements in peptidemore » identification. We have shown that machine learning, specifically support vector machine (SVM), is an effective approach to separating true peptide identifications from false ones. The SVM-based peptide statistical scoring method transforms a peptide into a vector representation based on database search metrics to train and validate the SVM. In practice, following the database search routine, a peptides is denoted in its vector representation and the SVM generates a single statistical score that is then used to classify presence or absence in the sample« less

  3. A survey of machine readable data bases

    NASA Technical Reports Server (NTRS)

    Matlock, P.

    1981-01-01

    Forty-two of the machine readable data bases available to the technologist and researcher in the natural sciences and engineering are described and compared with the data bases and date base services offered by NASA.

  4. Health Promotion and Healthier Products Increase Vending Purchases: A Randomized Factorial Trial.

    PubMed

    Hua, Sophia V; Kimmel, Lisa; Van Emmenes, Michael; Taherian, Rafi; Remer, Geraldine; Millman, Adam; Ickovics, Jeannette R

    2017-07-01

    The current food environment has a high prevalence of nutrient-sparse foods and beverages, most starkly seen in vending machine offerings. There are currently few studies that explore different interventions that might lead to healthier vending machine purchases. To examine how healthier product availability, price reductions, and/or promotional signs affect sales and revenue of snack and beverage vending machines. A 2×2×2 factorial randomized controlled trial was conducted. Students, staff, and employees on a university campus. All co-located snack and beverage vending machines (n=56, 28 snack and 28 beverage) were randomized into one of eight conditions: availability of healthier products and/or 25% price reduction for healthier items and/or promotional signs on machines. Aggregate sales and revenue data for the 5-month study period (February to June 2015) were compared with data from the same months 1 year prior. Analyses were conducted July 2015. The change in units sold and revenue between February through June 2014 and 2015. Linear regression models (main effects and interaction effects) and t test analyses were performed. The interaction between healthier product guidelines and promotional signs in snack vending machines documented increased revenue (P<0.05). Beverage machines randomized to meet healthier product guidelines documented increased units sold (P<0.05) with no revenue change. Price reductions alone had no effect, nor were there any effects for the three-way interaction of the factors. Examining top-selling products for all vending machines combined, pre- to postintervention, we found an overall shift to healthier purchasing. When healthier vending snacks are available, promotional signs are also important to ensure consumers purchase those items in greater amounts. Mitigating potential loss in profits is essential for sustainability of a healthier food environment. Copyright © 2017 Academy of Nutrition and Dietetics. Published by Elsevier Inc. All rights reserved.

  5. Statistical quality control for volumetric modulated arc therapy (VMAT) delivery by using the machine's log data

    NASA Astrophysics Data System (ADS)

    Cheong, Kwang-Ho; Lee, Me-Yeon; Kang, Sei-Kwon; Yoon, Jai-Woong; Park, Soah; Hwang, Taejin; Kim, Haeyoung; Kim, Kyoung Ju; Han, Tae Jin; Bae, Hoonsik

    2015-07-01

    The aim of this study is to set up statistical quality control for monitoring the volumetric modulated arc therapy (VMAT) delivery error by using the machine's log data. Eclipse and a Clinac iX linac with the RapidArc system (Varian Medical Systems, Palo Alto, USA) are used for delivery of the VMAT plan. During the delivery of the RapidArc fields, the machine determines the delivered monitor units (MUs) and the gantry angle's position accuracy and the standard deviations of the MU ( σMU: dosimetric error) and the gantry angle ( σGA: geometric error) are displayed on the console monitor after completion of the RapidArc delivery. In the present study, first, the log data were analyzed to confirm its validity and usability; then, statistical process control (SPC) was applied to monitor the σMU and the σGA in a timely manner for all RapidArc fields: a total of 195 arc fields for 99 patients. The MU and the GA were determined twice for all fields, that is, first during the patient-specific plan QA and then again during the first treatment. The sMU and the σGA time series were quite stable irrespective of the treatment site; however, the sGA strongly depended on the gantry's rotation speed. The σGA of the RapidArc delivery for stereotactic body radiation therapy (SBRT) was smaller than that for the typical VMAT. Therefore, SPC was applied for SBRT cases and general cases respectively. Moreover, the accuracy of the potential meter of the gantry rotation is important because the σGA can change dramatically due to its condition. By applying SPC to the σMU and σGA, we could monitor the delivery error efficiently. However, the upper and the lower limits of SPC need to be determined carefully with full knowledge of the machine and log data.

  6. Uranium hydrogeochemical and stream sediment reconnaissance of the Philip Smith Mountains NTMS quadrangle, Alaska

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    1981-09-01

    Results of a hydrogeochemical and stream sediment reconnaissance of the Philip Smith Mountains NTMS quadrangle, Alaska are presented. In addition to this abbreviated data release, more complete data are available to the public in machine-readable form. In this data release are location data, field analyses, and laboratory analyses of several different sample media. For the sake of brevity, many field site observations have not been included in this volume. These data are, however, available on the magnetic tape. Appendices A and B describe the sample media and summarize the analytical results for each medium. The data were subsetted by onemore » of the Los Alamos National Laboratory (LANL) sorting programs into groups of stream sediment and lake sediment samples. For each group which contains a sufficient number of observations, statistical tables, tables of raw data, and 1:1000000 scale maps of pertinent elements have been included in this report.« less

  7. An Update on Statistical Boosting in Biomedicine.

    PubMed

    Mayr, Andreas; Hofner, Benjamin; Waldmann, Elisabeth; Hepp, Tobias; Meyer, Sebastian; Gefeller, Olaf

    2017-01-01

    Statistical boosting algorithms have triggered a lot of research during the last decade. They combine a powerful machine learning approach with classical statistical modelling, offering various practical advantages like automated variable selection and implicit regularization of effect estimates. They are extremely flexible, as the underlying base-learners (regression functions defining the type of effect for the explanatory variables) can be combined with any kind of loss function (target function to be optimized, defining the type of regression setting). In this review article, we highlight the most recent methodological developments on statistical boosting regarding variable selection, functional regression, and advanced time-to-event modelling. Additionally, we provide a short overview on relevant applications of statistical boosting in biomedicine.

  8. [Hygienic assessment of student's nutrition through vending machines (fast food)].

    PubMed

    Karelin, A O; Pavlova, D V; Babalyan, A V

    2015-01-01

    The article presents the results of a research work on studying the nutrition of students through vending machines (fast food), taking into account consumer priorities of students of medical University, the features and possible consequences of their use by students. The object of study was assortment of products sold through vending machines on the territory of the First Saint-Petersburg Medical University. Net calories, content of proteins, fats and carbohydrates, glycemic index, glycemic load were determined for each product. Information about the use of vending machines was obtained by questionnaires of students 2 and 4 courses of medical and dental faculties by standardized interview method. As was found, most sold through vending machines products has a high energy value, mainly due to refined carbohydrates, and was characterized by medium and high glycemic load. They have got low protein content. Most of the students (87.3%) take some products from the vending machines, mainly because of lack of time for canteen and buffets visiting. Only 4.2% students like assortment of vending machines. More than 50% students have got gastrointestinal complaints. Statistically significant relationship between time of study at the University and morbidity of gastrointestinal tract, as well as the number of students needing medical diet nutrition was found. The students who need the medical diet use fast food significantly more often (46.6% who need the medical diet and 37.7% who don't need it).

  9. A graphical user interface for RAId, a knowledge integrated proteomics analysis suite with accurate statistics.

    PubMed

    Joyce, Brendan; Lee, Danny; Rubio, Alex; Ogurtsov, Aleksey; Alves, Gelio; Yu, Yi-Kuo

    2018-03-15

    RAId is a software package that has been actively developed for the past 10 years for computationally and visually analyzing MS/MS data. Founded on rigorous statistical methods, RAId's core program computes accurate E-values for peptides and proteins identified during database searches. Making this robust tool readily accessible for the proteomics community by developing a graphical user interface (GUI) is our main goal here. We have constructed a graphical user interface to facilitate the use of RAId on users' local machines. Written in Java, RAId_GUI not only makes easy executions of RAId but also provides tools for data/spectra visualization, MS-product analysis, molecular isotopic distribution analysis, and graphing the retrieval versus the proportion of false discoveries. The results viewer displays and allows the users to download the analyses results. Both the knowledge-integrated organismal databases and the code package (containing source code, the graphical user interface, and a user manual) are available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads/raid.html .

  10. Applied learning-based color tone mapping for face recognition in video surveillance system

    NASA Astrophysics Data System (ADS)

    Yew, Chuu Tian; Suandi, Shahrel Azmin

    2012-04-01

    In this paper, we present an applied learning-based color tone mapping technique for video surveillance system. This technique can be applied onto both color and grayscale surveillance images. The basic idea is to learn the color or intensity statistics from a training dataset of photorealistic images of the candidates appeared in the surveillance images, and remap the color or intensity of the input image so that the color or intensity statistics match those in the training dataset. It is well known that the difference in commercial surveillance cameras models, and signal processing chipsets used by different manufacturers will cause the color and intensity of the images to differ from one another, thus creating additional challenges for face recognition in video surveillance system. Using Multi-Class Support Vector Machines as the classifier on a publicly available video surveillance camera database, namely SCface database, this approach is validated and compared to the results of using holistic approach on grayscale images. The results show that this technique is suitable to improve the color or intensity quality of video surveillance system for face recognition.

  11. Finite element computation on nearest neighbor connected machines

    NASA Technical Reports Server (NTRS)

    Mcaulay, A. D.

    1984-01-01

    Research aimed at faster, more cost effective parallel machines and algorithms for improving designer productivity with finite element computations is discussed. A set of 8 boards, containing 4 nearest neighbor connected arrays of commercially available floating point chips and substantial memory, are inserted into a commercially available machine. One-tenth Mflop (64 bit operation) processors provide an 89% efficiency when solving the equations arising in a finite element problem for a single variable regular grid of size 40 by 40 by 40. This is approximately 15 to 20 times faster than a much more expensive machine such as a VAX 11/780 used in double precision. The efficiency falls off as faster or more processors are envisaged because communication times become dominant. A novel successive overrelaxation algorithm which uses cyclic reduction in order to permit data transfer and computation to overlap in time is proposed.

  12. Virtual Mission Operations of Remote Sensors With Rapid Access To and From Space

    NASA Technical Reports Server (NTRS)

    Ivancic, William D.; Stewart, Dave; Walke, Jon; Dikeman, Larry; Sage, Steven; Miller, Eric; Northam, James; Jackson, Chris; Taylor, John; Lynch, Scott; hide

    2010-01-01

    This paper describes network-centric operations, where a virtual mission operations center autonomously receives sensor triggers, and schedules space and ground assets using Internet-based technologies and service-oriented architectures. For proof-of-concept purposes, sensor triggers are received from the United States Geological Survey (USGS) to determine targets for space-based sensors. The Surrey Satellite Technology Limited (SSTL) Disaster Monitoring Constellation satellite, the United Kingdom Disaster Monitoring Constellation (UK-DMC), is used as the space-based sensor. The UK-DMC s availability is determined via machine-to-machine communications using SSTL s mission planning system. Access to/from the UK-DMC for tasking and sensor data is via SSTL s and Universal Space Network s (USN) ground assets. The availability and scheduling of USN s assets can also be performed autonomously via machine-to-machine communications. All communication, both on the ground and between ground and space, uses open Internet standards.

  13. The Availability of Competitive Foods and Beverages to Middle School Students in Appalachian Virginia Before Implementation of the 2014 Smart Snacks in School Standards.

    PubMed

    Mann, Georgianna; Kraak, Vivica; Serrano, Elena

    2015-09-17

    The study objective was to examine the nutritional quality of competitive foods and beverages (foods and beverages from vending machines and à la carte foods) available to rural middle school students, before implementation of the US Department of Agriculture's Smart Snacks in School standards in July 2014. In spring 2014, we audited vending machines and à la carte cafeteria foods and beverages in 8 rural Appalachian middle schools in Virginia. Few schools had vending machines. Few à la carte and vending machine foods met Smart Snacks in School standards (36.5%); however, most beverages did (78.2%). The major challenges to meeting standards were fat and sodium content of foods. Most competitive foods (62.2%) did not meet new standards, and rural schools with limited resources will likely require assistance to fully comply.

  14. Statistical properties of two sine waves in Gaussian noise.

    NASA Technical Reports Server (NTRS)

    Esposito, R.; Wilson, L. R.

    1973-01-01

    A detailed study is presented of some statistical properties of a stochastic process that consists of the sum of two sine waves of unknown relative phase and a normal process. Since none of the statistics investigated seem to yield a closed-form expression, all the derivations are cast in a form that is particularly suitable for machine computation. Specifically, results are presented for the probability density function (pdf) of the envelope and the instantaneous value, the moments of these distributions, and the relative cumulative density function (cdf).

  15. The Southampton-York Natural Scenes (SYNS) dataset: Statistics of surface attitude

    PubMed Central

    Adams, Wendy J.; Elder, James H.; Graf, Erich W.; Leyland, Julian; Lugtigheid, Arthur J.; Muryy, Alexander

    2016-01-01

    Recovering 3D scenes from 2D images is an under-constrained task; optimal estimation depends upon knowledge of the underlying scene statistics. Here we introduce the Southampton-York Natural Scenes dataset (SYNS: https://syns.soton.ac.uk), which provides comprehensive scene statistics useful for understanding biological vision and for improving machine vision systems. In order to capture the diversity of environments that humans encounter, scenes were surveyed at random locations within 25 indoor and outdoor categories. Each survey includes (i) spherical LiDAR range data (ii) high-dynamic range spherical imagery and (iii) a panorama of stereo image pairs. We envisage many uses for the dataset and present one example: an analysis of surface attitude statistics, conditioned on scene category and viewing elevation. Surface normals were estimated using a novel adaptive scale selection algorithm. Across categories, surface attitude below the horizon is dominated by the ground plane (0° tilt). Near the horizon, probability density is elevated at 90°/270° tilt due to vertical surfaces (trees, walls). Above the horizon, probability density is elevated near 0° slant due to overhead structure such as ceilings and leaf canopies. These structural regularities represent potentially useful prior assumptions for human and machine observers, and may predict human biases in perceived surface attitude. PMID:27782103

  16. Improving de novo sequence assembly using machine learning and comparative genomics for overlap correction.

    PubMed

    Palmer, Lance E; Dejori, Mathaeus; Bolanos, Randall; Fasulo, Daniel

    2010-01-15

    With the rapid expansion of DNA sequencing databases, it is now feasible to identify relevant information from prior sequencing projects and completed genomes and apply it to de novo sequencing of new organisms. As an example, this paper demonstrates how such extra information can be used to improve de novo assemblies by augmenting the overlapping step. Finding all pairs of overlapping reads is a key task in many genome assemblers, and to this end, highly efficient algorithms have been developed to find alignments in large collections of sequences. It is well known that due to repeated sequences, many aligned pairs of reads nevertheless do not overlap. But no overlapping algorithm to date takes a rigorous approach to separating aligned but non-overlapping read pairs from true overlaps. We present an approach that extends the Minimus assembler by a data driven step to classify overlaps as true or false prior to contig construction. We trained several different classification models within the Weka framework using various statistics derived from overlaps of reads available from prior sequencing projects. These statistics included percent mismatch and k-mer frequencies within the overlaps as well as a comparative genomics score derived from mapping reads to multiple reference genomes. We show that in real whole-genome sequencing data from the E. coli and S. aureus genomes, by providing a curated set of overlaps to the contigging phase of the assembler, we nearly doubled the median contig length (N50) without sacrificing coverage of the genome or increasing the number of mis-assemblies. Machine learning methods that use comparative and non-comparative features to classify overlaps as true or false can be used to improve the quality of a sequence assembly.

  17. Supervised learning methods for pathological arterial pulse wave differentiation: A SVM and neural networks approach.

    PubMed

    Paiva, Joana S; Cardoso, João; Pereira, Tânia

    2018-01-01

    The main goal of this study was to develop an automatic method based on supervised learning methods, able to distinguish healthy from pathologic arterial pulse wave (APW), and those two from noisy waveforms (non-relevant segments of the signal), from the data acquired during a clinical examination with a novel optical system. The APW dataset analysed was composed by signals acquired in a clinical environment from a total of 213 subjects, including healthy volunteers and non-healthy patients. The signals were parameterised by means of 39pulse features: morphologic, time domain statistics, cross-correlation features, wavelet features. Multiclass Support Vector Machine Recursive Feature Elimination (SVM RFE) method was used to select the most relevant features. A comparative study was performed in order to evaluate the performance of the two classifiers: Support Vector Machine (SVM) and Artificial Neural Network (ANN). SVM achieved a statistically significant better performance for this problem with an average accuracy of 0.9917±0.0024 and a F-Measure of 0.9925±0.0019, in comparison with ANN, which reached the values of 0.9847±0.0032 and 0.9852±0.0031 for Accuracy and F-Measure, respectively. A significant difference was observed between the performances obtained with SVM classifier using a different number of features from the original set available. The comparison between SVM and NN allowed reassert the higher performance of SVM. The results obtained in this study showed the potential of the proposed method to differentiate those three important signal outcomes (healthy, pathologic and noise) and to reduce bias associated with clinical diagnosis of cardiovascular disease using APW. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. Multivariate statistical analysis software technologies for astrophysical research involving large data bases

    NASA Technical Reports Server (NTRS)

    Djorgovski, S. George

    1994-01-01

    We developed a package to process and analyze the data from the digital version of the Second Palomar Sky Survey. This system, called SKICAT, incorporates the latest in machine learning and expert systems software technology, in order to classify the detected objects objectively and uniformly, and facilitate handling of the enormous data sets from digital sky surveys and other sources. The system provides a powerful, integrated environment for the manipulation and scientific investigation of catalogs from virtually any source. It serves three principal functions: image catalog construction, catalog management, and catalog analysis. Through use of the GID3* Decision Tree artificial induction software, SKICAT automates the process of classifying objects within CCD and digitized plate images. To exploit these catalogs, the system also provides tools to merge them into a large, complete database which may be easily queried and modified when new data or better methods of calibrating or classifying become available. The most innovative feature of SKICAT is the facility it provides to experiment with and apply the latest in machine learning technology to the tasks of catalog construction and analysis. SKICAT provides a unique environment for implementing these tools for any number of future scientific purposes. Initial scientific verification and performance tests have been made using galaxy counts and measurements of galaxy clustering from small subsets of the survey data, and a search for very high redshift quasars. All of the tests were successful, and produced new and interesting scientific results. Attachments to this report give detailed accounts of the technical aspects for multivariate statistical analysis of small and moderate-size data sets, called STATPROG. The package was tested extensively on a number of real scientific applications, and has produced real, published results.

  19. Molecular classification of liver cirrhosis in a rat model by proteomics and bioinformatics.

    PubMed

    Xu, Xiu-Qin; Leow, Chon K; Lu, Xin; Zhang, Xuegong; Liu, Jun S; Wong, Wing-Hung; Asperger, Arndt; Deininger, Sören; Eastwood Leung, Hon-Chiu

    2004-10-01

    Liver cirrhosis is a worldwide health problem. Reliable, noninvasive methods for early detection of liver cirrhosis are not available. Using a three-step approach, we classified sera from rats with liver cirrhosis following different treatment insults. The approach consisted of: (i) protein profiling using surface-enhanced laser desorption/ionization (SELDI) technology; (ii) selection of a statistically significant serum biomarker set using machine learning algorithms; and (iii) identification of selected serum biomarkers by peptide sequencing. We generated serum protein profiles from three groups of rats: (i) normal (n=8), (ii) thioacetamide-induced liver cirrhosis (n=22), and (iii) bile duct ligation-induced liver fibrosis (n=5) using a weak cation exchanger surface. Profiling data were further analyzed by a recursive support vector machine algorithm to select a panel of statistically significant biomarkers for class prediction. Sensitivity and specificity of classification using the selected protein marker set were higher than 92%. A consistently down-regulated 3495 Da protein in cirrhosis samples was one of the selected significant biomarkers. This 3495 Da protein was purified on-chip and trypsin digested. Further structural characterization of this biomarkers candidate was done by using cross-platform matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS) peptide mass fingerprinting (PMF) and matrix-assisted laser desorption/ionization time of flight/time of flight (MALDI-TOF/TOF) tandem mass spectrometry (MS/MS). Combined data from PMF and MS/MS spectra of two tryptic peptides suggested that this 3495 Da protein shared homology to a histidine-rich glycoprotein. These results demonstrated a novel approach to discovery of new biomarkers for early detection of liver cirrhosis and classification of liver diseases.

  20. Bridge Health Monitoring Using a Machine Learning Strategy

    DOT National Transportation Integrated Search

    2017-01-01

    The goal of this project was to cast the SHM problem within a statistical pattern recognition framework. Techniques borrowed from speaker recognition, particularly speaker verification, were used as this discipline deals with problems very similar to...

  1. Investigating output and energy variations and their relationship to delivery QA results using Statistical Process Control for helical tomotherapy.

    PubMed

    Binny, Diana; Mezzenga, Emilio; Lancaster, Craig M; Trapp, Jamie V; Kairn, Tanya; Crowe, Scott B

    2017-06-01

    The aims of this study were to investigate machine beam parameters using the TomoTherapy quality assurance (TQA) tool, establish a correlation to patient delivery quality assurance results and to evaluate the relationship between energy variations detected using different TQA modules. TQA daily measurement results from two treatment machines for periods of up to 4years were acquired. Analyses of beam quality, helical and static output variations were made. Variations from planned dose were also analysed using Statistical Process Control (SPC) technique and their relationship to output trends were studied. Energy variations appeared to be one of the contributing factors to delivery output dose seen in the analysis. Ion chamber measurements were reliable indicators of energy and output variations and were linear with patient dose verifications. Crown Copyright © 2017. Published by Elsevier Ltd. All rights reserved.

  2. Multi-fidelity machine learning models for accurate bandgap predictions of solids

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pilania, Ghanshyam; Gubernatis, James E.; Lookman, Turab

    Here, we present a multi-fidelity co-kriging statistical learning framework that combines variable-fidelity quantum mechanical calculations of bandgaps to generate a machine-learned model that enables low-cost accurate predictions of the bandgaps at the highest fidelity level. Additionally, the adopted Gaussian process regression formulation allows us to predict the underlying uncertainties as a measure of our confidence in the predictions. In using a set of 600 elpasolite compounds as an example dataset and using semi-local and hybrid exchange correlation functionals within density functional theory as two levels of fidelities, we demonstrate the excellent learning performance of the method against actual high fidelitymore » quantum mechanical calculations of the bandgaps. The presented statistical learning method is not restricted to bandgaps or electronic structure methods and extends the utility of high throughput property predictions in a significant way.« less

  3. Multi-fidelity machine learning models for accurate bandgap predictions of solids

    DOE PAGES

    Pilania, Ghanshyam; Gubernatis, James E.; Lookman, Turab

    2016-12-28

    Here, we present a multi-fidelity co-kriging statistical learning framework that combines variable-fidelity quantum mechanical calculations of bandgaps to generate a machine-learned model that enables low-cost accurate predictions of the bandgaps at the highest fidelity level. Additionally, the adopted Gaussian process regression formulation allows us to predict the underlying uncertainties as a measure of our confidence in the predictions. In using a set of 600 elpasolite compounds as an example dataset and using semi-local and hybrid exchange correlation functionals within density functional theory as two levels of fidelities, we demonstrate the excellent learning performance of the method against actual high fidelitymore » quantum mechanical calculations of the bandgaps. The presented statistical learning method is not restricted to bandgaps or electronic structure methods and extends the utility of high throughput property predictions in a significant way.« less

  4. Saving all the bits

    NASA Technical Reports Server (NTRS)

    Denning, Peter J.

    1990-01-01

    The scientific tradition of saving all the data from experiments for independent validation and for further investigation is under profound challenge by modern satellite data collectors and by supercomputers. The volume of data is beyond the capacity to store, transmit, and comprehend the data. A promising line of study is discovery machines that study the data at the collection site and transmit statistical summaries of patterns observed. Examples of discovery machines are the Autoclass system and the genetic memory system of NASA-Ames, and the proposal for knowbots by Kahn and Cerf.

  5. Time for paradigmatic substitution in psychology. What are the alternatives?

    PubMed

    Kolstad, Arnulf

    2010-03-01

    This article focuses on the "machine paradigm" in psychology and its consequences for (mis)understanding of human beings. It discusses causes of the mainstream epistemology in Western societies, referring to philosophical traditions, the prestige of some natural sciences and mathematical statistics. It emphasizes how the higher psychological functions develop dialectically from a biological basis and how the brain due to its plasticity changes with mental and physical activity. This makes a causal machine paradigm unfit to describe and explain human psychology and human development. Some concepts for an alternative paradigm are suggested.

  6. Epidermis area detection for immunofluorescence microscopy

    NASA Astrophysics Data System (ADS)

    Dovganich, Andrey; Krylov, Andrey; Nasonov, Andrey; Makhneva, Natalia

    2018-04-01

    We propose a novel image segmentation method for immunofluorescence microscopy images of skin tissue for the diagnosis of various skin diseases. The segmentation is based on machine learning algorithms. The feature vector is filled by three groups of features: statistical features, Laws' texture energy measures and local binary patterns. The images are preprocessed for better learning. Different machine learning algorithms have been used and the best results have been obtained with random forest algorithm. We use the proposed method to detect the epidermis region as a part of pemphigus diagnosis system.

  7. Developing Human-Machine Interfaces to Support Appropriate Trust and Reliance on Automated Combat Identification Systems (Developpement d’Interfaces Homme-Machine Pour Appuyer la Confiance dans les Systemes Automatises d’Identification au Combat)

    DTIC Science & Technology

    2008-03-31

    on automation; the ‘response bias’ approach. This new approach is based on Signal Detection Theory (SDT) (Macmillan & Creelman , 1991; Wickens...SDT), response bias will vary with the expectation of the target probability, whereas their sensitivity will stay constant (Macmillan & Creelman ...measures, C has the simplest statistical properties (Macmillan & Creelman , 1991, p273), and it was also the measure used in Dzindolet et al.’s study

  8. The upgraded Large Plasma Device, a machine for studying frontier basic plasma physics.

    PubMed

    Gekelman, W; Pribyl, P; Lucky, Z; Drandell, M; Leneman, D; Maggs, J; Vincena, S; Van Compernolle, B; Tripathi, S K P; Morales, G; Carter, T A; Wang, Y; DeHaas, T

    2016-02-01

    In 1991 a manuscript describing an instrument for studying magnetized plasmas was published in this journal. The Large Plasma Device (LAPD) was upgraded in 2001 and has become a national user facility for the study of basic plasma physics. The upgrade as well as diagnostics introduced since then has significantly changed the capabilities of the device. All references to the machine still quote the original RSI paper, which at this time is not appropriate. In this work, the properties of the updated LAPD are presented. The strategy of the machine construction, the available diagnostics, the parameters available for experiments, as well as illustrations of several experiments are presented here.

  9. Development and validation of a machine learning algorithm and hybrid system to predict the need for life-saving interventions in trauma patients.

    PubMed

    Liu, Nehemiah T; Holcomb, John B; Wade, Charles E; Batchinsky, Andriy I; Cancio, Leopoldo C; Darrah, Mark I; Salinas, José

    2014-02-01

    Accurate and effective diagnosis of actual injury severity can be problematic in trauma patients. Inherent physiologic compensatory mechanisms may prevent accurate diagnosis and mask true severity in many circumstances. The objective of this project was the development and validation of a multiparameter machine learning algorithm and system capable of predicting the need for life-saving interventions (LSIs) in trauma patients. Statistics based on means, slopes, and maxima of various vital sign measurements corresponding to 79 trauma patient records generated over 110,000 feature sets, which were used to develop, train, and implement the system. Comparisons among several machine learning models proved that a multilayer perceptron would best implement the algorithm in a hybrid system consisting of a machine learning component and basic detection rules. Additionally, 295,994 feature sets from 82 h of trauma patient data showed that the system can obtain 89.8 % accuracy within 5 min of recorded LSIs. Use of machine learning technologies combined with basic detection rules provides a potential approach for accurately assessing the need for LSIs in trauma patients. The performance of this system demonstrates that machine learning technology can be implemented in a real-time fashion and potentially used in a critical care environment.

  10. Moving beyond regression techniques in cardiovascular risk prediction: applying machine learning to address analytic challenges

    PubMed Central

    Goldstein, Benjamin A.; Navar, Ann Marie; Carter, Rickey E.

    2017-01-01

    Abstract Risk prediction plays an important role in clinical cardiology research. Traditionally, most risk models have been based on regression models. While useful and robust, these statistical methods are limited to using a small number of predictors which operate in the same way on everyone, and uniformly throughout their range. The purpose of this review is to illustrate the use of machine-learning methods for development of risk prediction models. Typically presented as black box approaches, most machine-learning methods are aimed at solving particular challenges that arise in data analysis that are not well addressed by typical regression approaches. To illustrate these challenges, as well as how different methods can address them, we consider trying to predicting mortality after diagnosis of acute myocardial infarction. We use data derived from our institution's electronic health record and abstract data on 13 regularly measured laboratory markers. We walk through different challenges that arise in modelling these data and then introduce different machine-learning approaches. Finally, we discuss general issues in the application of machine-learning methods including tuning parameters, loss functions, variable importance, and missing data. Overall, this review serves as an introduction for those working on risk modelling to approach the diffuse field of machine learning. PMID:27436868

  11. Accuracy of tracking forest machines with GPS

    Treesearch

    M.W. Veal; S.E. Taylor; T.P. McDonald; D.K. McLemore; M.R. Dunn

    2001-01-01

    This paper describes the results of a study that measured the accuracy of using GPS to track movement of forest machines. Two different commercially available GPS receivers (Trimble ProXR and GeoExplorer II) were used to track\\r\

  12. ENERGY STAR Certified Vending Machines

    EPA Pesticide Factsheets

    Certified models meet all ENERGY STAR requirements as listed in the Version 3.0 ENERGY STAR Program Requirements for Refrigerated Beverage Vending Machines that are effective as of March 1, 2013. A detailed listing of key efficiency criteria are available at

  13. Effect of cutting parameters on surface finish and machinability of graphite reinforced Al-8011 matrix composite

    NASA Astrophysics Data System (ADS)

    Anil, K. C.; Vikas, M. G.; Shanmukha Teja, B.; Sreenivas Rao, K. V.

    2017-04-01

    Many materials such as alloys, composites find their applications on the basis of machinability, cost and availability. In the present work, graphite (Grp) reinforced Aluminium 8011 is synthesized by convention stir casting process and Surface finish & machinability of prepared composite is examined by using lathe tool dynamometer attached with BANKA Lathe by varying the machining parameters like spindle speed, Depth of cut and Feed rate in 3 levels. Also, Roughness Average (Ra) of machined surfaces is measured by using Surface Roughness Tester (Mitutoyo SJ201). From the studies it is cleared that mechanical properties of a composites increases with addition of Grp and The cutting force were decreased with the reinforcement percentage and thus increases the machinability of composites and also results in increased surface finish.

  14. Statistical analysis and machine learning algorithms for optical biopsy

    NASA Astrophysics Data System (ADS)

    Wu, Binlin; Liu, Cheng-hui; Boydston-White, Susie; Beckman, Hugh; Sriramoju, Vidyasagar; Sordillo, Laura; Zhang, Chunyuan; Zhang, Lin; Shi, Lingyan; Smith, Jason; Bailin, Jacob; Alfano, Robert R.

    2018-02-01

    Analyzing spectral or imaging data collected with various optical biopsy methods is often times difficult due to the complexity of the biological basis. Robust methods that can utilize the spectral or imaging data and detect the characteristic spectral or spatial signatures for different types of tissue is challenging but highly desired. In this study, we used various machine learning algorithms to analyze a spectral dataset acquired from human skin normal and cancerous tissue samples using resonance Raman spectroscopy with 532nm excitation. The algorithms including principal component analysis, nonnegative matrix factorization, and autoencoder artificial neural network are used to reduce dimension of the dataset and detect features. A support vector machine with a linear kernel is used to classify the normal tissue and cancerous tissue samples. The efficacies of the methods are compared.

  15. Means and method of balancing multi-cylinder reciprocating machines

    DOEpatents

    Corey, John A.; Walsh, Michael M.

    1985-01-01

    A virtual balancing axis arrangement is described for multi-cylinder reciprocating piston machines for effectively balancing out imbalanced forces and minimizing residual imbalance moments acting on the crankshaft of such machines without requiring the use of additional parallel-arrayed balancing shafts or complex and expensive gear arrangements. The novel virtual balancing axis arrangement is capable of being designed into multi-cylinder reciprocating piston and crankshaft machines for substantially reducing vibrations induced during operation of such machines with only minimal number of additional component parts. Some of the required component parts may be available from parts already required for operation of auxiliary equipment, such as oil and water pumps used in certain types of reciprocating piston and crankshaft machine so that by appropriate location and dimensioning in accordance with the teachings of the invention, the virtual balancing axis arrangement can be built into the machine at little or no additional cost.

  16. A system framework of inter-enterprise machining quality control based on fractal theory

    NASA Astrophysics Data System (ADS)

    Zhao, Liping; Qin, Yongtao; Yao, Yiyong; Yan, Peng

    2014-03-01

    In order to meet the quality control requirement of dynamic and complicated product machining processes among enterprises, a system framework of inter-enterprise machining quality control based on fractal was proposed. In this system framework, the fractal-specific characteristic of inter-enterprise machining quality control function was analysed, and the model of inter-enterprise machining quality control was constructed by the nature of fractal structures. Furthermore, the goal-driven strategy of inter-enterprise quality control and the dynamic organisation strategy of inter-enterprise quality improvement were constructed by the characteristic analysis on this model. In addition, the architecture of inter-enterprise machining quality control based on fractal was established by means of Web service. Finally, a case study for application was presented. The result showed that the proposed method was available, and could provide guidance for quality control and support for product reliability in inter-enterprise machining processes.

  17. The Impacts of Industrial Robots

    DTIC Science & Technology

    1981-11-01

    plastics, ’and strain gauges are used to measure very small forces at a number of points on the robot’s "end effector. Except for the simplest on-off...devices, tactile sensors are not yet found on commercially available robots. Forces are sensed by using strain gauges or piezoelectric sensors to...tools: deburring, drilling , grinding,milling,routing machines ii. plastic materialsformirg and injection machines iii. metal die casting machines iv

  18. Dual scan CT image recovery from truncated projections

    NASA Astrophysics Data System (ADS)

    Sarkar, Shubhabrata; Wahi, Pankaj; Munshi, Prabhat

    2017-12-01

    There are computerized tomography (CT) scanners available commercially for imaging small objects and they are often categorized as mini-CT X-ray machines. One major limitation of these machines is their inability to scan large objects with good image quality because of the truncation of projection data. An algorithm is proposed in this work which enables such machines to scan large objects while maintaining the quality of the recovered image.

  19. A hybrid approach to select features and classify diseases based on medical data

    NASA Astrophysics Data System (ADS)

    AbdelLatif, Hisham; Luo, Jiawei

    2018-03-01

    Feature selection is popular problem in the classification of diseases in clinical medicine. Here, we developing a hybrid methodology to classify diseases, based on three medical datasets, Arrhythmia, Breast cancer, and Hepatitis datasets. This methodology called k-means ANOVA Support Vector Machine (K-ANOVA-SVM) uses K-means cluster with ANOVA statistical to preprocessing data and selection the significant features, and Support Vector Machines in the classification process. To compare and evaluate the performance, we choice three classification algorithms, decision tree Naïve Bayes, Support Vector Machines and applied the medical datasets direct to these algorithms. Our methodology was a much better classification accuracy is given of 98% in Arrhythmia datasets, 92% in Breast cancer datasets and 88% in Hepatitis datasets, Compare to use the medical data directly with decision tree Naïve Bayes, and Support Vector Machines. Also, the ROC curve and precision with (K-ANOVA-SVM) Achieved best results than other algorithms

  20. Fast machine-learning online optimization of ultra-cold-atom experiments.

    PubMed

    Wigley, P B; Everitt, P J; van den Hengel, A; Bastian, J W; Sooriyabandara, M A; McDonald, G D; Hardman, K S; Quinlivan, C D; Manju, P; Kuhn, C C N; Petersen, I R; Luiten, A N; Hope, J J; Robins, N P; Hush, M R

    2016-05-16

    We apply an online optimization process based on machine learning to the production of Bose-Einstein condensates (BEC). BEC is typically created with an exponential evaporation ramp that is optimal for ergodic dynamics with two-body s-wave interactions and no other loss rates, but likely sub-optimal for real experiments. Through repeated machine-controlled scientific experimentation and observations our 'learner' discovers an optimal evaporation ramp for BEC production. In contrast to previous work, our learner uses a Gaussian process to develop a statistical model of the relationship between the parameters it controls and the quality of the BEC produced. We demonstrate that the Gaussian process machine learner is able to discover a ramp that produces high quality BECs in 10 times fewer iterations than a previously used online optimization technique. Furthermore, we show the internal model developed can be used to determine which parameters are essential in BEC creation and which are unimportant, providing insight into the optimization process of the system.

  1. Fast machine-learning online optimization of ultra-cold-atom experiments

    PubMed Central

    Wigley, P. B.; Everitt, P. J.; van den Hengel, A.; Bastian, J. W.; Sooriyabandara, M. A.; McDonald, G. D.; Hardman, K. S.; Quinlivan, C. D.; Manju, P.; Kuhn, C. C. N.; Petersen, I. R.; Luiten, A. N.; Hope, J. J.; Robins, N. P.; Hush, M. R.

    2016-01-01

    We apply an online optimization process based on machine learning to the production of Bose-Einstein condensates (BEC). BEC is typically created with an exponential evaporation ramp that is optimal for ergodic dynamics with two-body s-wave interactions and no other loss rates, but likely sub-optimal for real experiments. Through repeated machine-controlled scientific experimentation and observations our ‘learner’ discovers an optimal evaporation ramp for BEC production. In contrast to previous work, our learner uses a Gaussian process to develop a statistical model of the relationship between the parameters it controls and the quality of the BEC produced. We demonstrate that the Gaussian process machine learner is able to discover a ramp that produces high quality BECs in 10 times fewer iterations than a previously used online optimization technique. Furthermore, we show the internal model developed can be used to determine which parameters are essential in BEC creation and which are unimportant, providing insight into the optimization process of the system. PMID:27180805

  2. Current Developments in Machine Learning Techniques in Biological Data Mining.

    PubMed

    Dumancas, Gerard G; Adrianto, Indra; Bello, Ghalib; Dozmorov, Mikhail

    2017-01-01

    This supplement is intended to focus on the use of machine learning techniques to generate meaningful information on biological data. This supplement under Bioinformatics and Biology Insights aims to provide scientists and researchers working in this rapid and evolving field with online, open-access articles authored by leading international experts in this field. Advances in the field of biology have generated massive opportunities to allow the implementation of modern computational and statistical techniques. Machine learning methods in particular, a subfield of computer science, have evolved as an indispensable tool applied to a wide spectrum of bioinformatics applications. Thus, it is broadly used to investigate the underlying mechanisms leading to a specific disease, as well as the biomarker discovery process. With a growth in this specific area of science comes the need to access up-to-date, high-quality scholarly articles that will leverage the knowledge of scientists and researchers in the various applications of machine learning techniques in mining biological data.

  3. Efficient Embedded Decoding of Neural Network Language Models in a Machine Translation System.

    PubMed

    Zamora-Martinez, Francisco; Castro-Bleda, Maria Jose

    2018-02-22

    Neural Network Language Models (NNLMs) are a successful approach to Natural Language Processing tasks, such as Machine Translation. We introduce in this work a Statistical Machine Translation (SMT) system which fully integrates NNLMs in the decoding stage, breaking the traditional approach based on [Formula: see text]-best list rescoring. The neural net models (both language models (LMs) and translation models) are fully coupled in the decoding stage, allowing to more strongly influence the translation quality. Computational issues were solved by using a novel idea based on memorization and smoothing of the softmax constants to avoid their computation, which introduces a trade-off between LM quality and computational cost. These ideas were studied in a machine translation task with different combinations of neural networks used both as translation models and as target LMs, comparing phrase-based and [Formula: see text]-gram-based systems, showing that the integrated approach seems more promising for [Formula: see text]-gram-based systems, even with nonfull-quality NNLMs.

  4. PMLB: a large benchmark suite for machine learning evaluation and comparison.

    PubMed

    Olson, Randal S; La Cava, William; Orzechowski, Patryk; Urbanowicz, Ryan J; Moore, Jason H

    2017-01-01

    The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists. The present study introduces an accessible, curated, and developing public benchmark resource to facilitate identification of the strengths and weaknesses of different machine learning methodologies. We compare meta-features among the current set of benchmark datasets in this resource to characterize the diversity of available data. Finally, we apply a number of established machine learning methods to the entire benchmark suite and analyze how datasets and algorithms cluster in terms of performance. From this study, we find that existing benchmarks lack the diversity to properly benchmark machine learning algorithms, and there are several gaps in benchmarking problems that still need to be considered. This work represents another important step towards understanding the limitations of popular benchmarking suites and developing a resource that connects existing benchmarking standards to more diverse and efficient standards in the future.

  5. Machine learning and medicine: book review and commentary.

    PubMed

    Koprowski, Robert; Foster, Kenneth R

    2018-02-01

    This article is a review of the book "Master machine learning algorithms, discover how they work and implement them from scratch" (ISBN: not available, 37 USD, 163 pages) edited by Jason Brownlee published by the Author, edition, v1.10 http://MachineLearningMastery.com . An accompanying commentary discusses some of the issues that are involved with use of machine learning and data mining techniques to develop predictive models for diagnosis or prognosis of disease, and to call attention to additional requirements for developing diagnostic and prognostic algorithms that are generally useful in medicine. Appendix provides examples that illustrate potential problems with machine learning that are not addressed in the reviewed book.

  6. Modalities, Relations, and Learning

    NASA Astrophysics Data System (ADS)

    Müller, Martin Eric

    While the popularity of statistical, probabilistic and exhaustive machine learning techniques still increases, relational and logic approaches are still a niche market in research. While the former approaches focus on predictive accuracy, the latter ones prove to be indispensable in knowledge discovery.

  7. Adhesive retention of experimental fiber-reinforced composite, orthodontic acrylic resin, and aliphatic urethane acrylate to silicone elastomer for maxillofacial prostheses.

    PubMed

    Kosor, Begüm Yerci; Artunç, Celal; Şahan, Heval

    2015-07-01

    A key factor of an implant-retained facial prosthesis is the success of the bonding between the substructure and the silicone elastomer. Little has been reported on the bonding of fiber reinforced composite (FRC) to silicone elastomers. Experimental FRC could be a solution for facial prostheses supported by light-activated aliphatic urethane acrylate, orthodontic acrylic resin, or commercially available FRCs. The purpose of this study was to evaluate the bonding of the experimental FRC, orthodontic acrylic resin, and light-activated aliphatic urethane acrylate to a commercially available high-temperature vulcanizing silicone elastomer. Shear and 180-degree peel bond strengths of 3 different substructures (experimental FRC, orthodontic acrylic resin, light-activated aliphatic urethane acrylate) (n=15) to a high-temperature vulcanizing maxillofacial silicone elastomer (M511) with a primer (G611) were assessed after 200 hours of accelerated artificial light-aging. The specimens were tested in a universal testing machine at a cross-head speed of 10 mm/min. Data were collected and statistically analyzed by 1-way ANOVA, followed by the Bonferroni correction and the Dunnett post hoc test (α=.05). Modes of failure were visually determined and categorized as adhesive, cohesive, or mixed and were statistically analyzed with the chi-squared goodness-of-fit test (α=.05). As the mean shear bond strength values were evaluated statistically, no difference was found among the experimental FRC, aliphatic urethane acrylate, and orthodontic acrylic resin subgroups (P>.05). The mean peel bond strengths of experimental fiber reinforced composite and aliphatic urethane acrylate were not found to be statistically different (P>.05). The mean value of the orthodontic acrylic resin subgroup peel bond strength was found to be statistically lower (P<.05). Shear test failure types were found to be statistically different (P<.05), whereas 180-degree peel test failure types were not found to be statistically significant (P>.05). Shear forces predominantly exhibited cohesive failure (64.4%), whereas peel forces predominantly exhibited adhesive failure (93.3%). The mean shear bond strengths of the experimental FRC and aliphatic urethane acrylate groups were not found to be statistically different (P>.05). The mean value of the 180-degree peel strength of the orthodontic acrylic resin group was found to be lower (P<.05). Copyright © 2015 Editorial Council for the Journal of Prosthetic Dentistry. Published by Elsevier Inc. All rights reserved.

  8. Situational Analysis of Essential Surgical Care Management in Iran Using the WHO Tool

    PubMed Central

    Kalhor, Rohollah; Keshavarz Mohamadi, Nastaran; Khalesi, Nader; Jafari, Mehdi

    2016-01-01

    Background: Surgery is an essential component of health care, yet it has usually been overlooked in public health across the world. Objectives: This study aimed to perform a situational analysis of essential surgical care management at district hospitals in Iran. Materials and Methods: This research was a descriptive and cross-sectional study performed at 42 first-referral district hospitals of Iran in 2013. The World Health Organization (WHO) Tool for the situational analysis of emergency and essential care was used for data collection in four domains of facilities and equipment, human resources, surgical interventions, and infrastructure. Data analysis was conducted using simple descriptive statistical methods. Results: In this study, 100% of the studied hospitals had oxygen cylinders, running water, electricity, anesthesia machines, emergency departments, archives of medical records, and X-ray machines. In 100% of the surveyed hospitals, specialists in surgery, anesthesia, and obstetrics and gynecology were available as full-time staff. Life-saving procedures were performed in the majority of the hospitals. Among urgent procedures, neonatal surgeries were conducted in 14.3% of the hospitals. Regarding non-urgent procedures, acute burn management was conducted in 38.1% of the hospitals. Also, a few other procedures such as cricothyrotomy and foreign body removal were performed in 85.7% of the hospitals. Conclusions: The results indicated that suitable facilities and equipment, human resources, and infrastructure were available in the district hospitals in Iran. These findings showed that there is potential for the district hospitals to provide care in a wider spectrum. PMID:27437121

  9. Sequential compression devices in postoperative urologic patients: an observational trial and survey study on the influence of patient and hospital factors on compliance.

    PubMed

    Ritsema, David F; Watson, Jennifer M; Stiteler, Amanda P; Nguyen, Mike M

    2013-04-11

    Sequential compression devices (SCDs) are commonly used for thromboprophylaxis in postoperative patients but compliance is often poor. We investigated causes for noncompliance, examining both hospital and patient related factors. 100 patients undergoing inpatient urologic surgery were enrolled. All patient had SCD sleeves placed preoperatively. Postoperative observations determined SCD compliance and reasons for non-compliance. Patient demographics, length of stay, inpatient unit type, and surgery type were recorded. At discharge, a patient survey gauged knowledge and attitudes regarding SCDs and bother with SCDs. Statistical analysis was performed to correlate SCD compliance with patient demographics; patient knowledge and attitudes regarding SCDs; and patient self-reported bother with SCDs. Observed overall compliance was 78.6%. The most commonly observed reasons for non-compliance were SCD machines not being initially available on the ward (71% of non-compliant observations on post-operative day 1) and SCD use not being restarted promptly after return to bed (50% of non-compliant observations for entire hospital stay). Mean self-reported bother scores related to SCDs were low, ranging from 1-3 out of 10 for all 12 categories of bother assessed. Patient demographics, knowledge, attitudes and bother with SCD devices were not significantly associated with non-compliance. Patient self-reported bother with SCD devices was low. Hospital factors, including SCD machine availability and timely restarting of devices by nursing staff when a patient returns to bed, played a greater role in SCD non-compliance than patient factors. Identifying and addressing hospital related causes for poor SCD compliance may improve postoperative urologic patient safety.

  10. [Greatness and tribulations of Zeiss and Leitz, two famous German optic companies III. Zeiss Ikon and elimination of Emanuel Goldberg].

    PubMed

    Gilgenkrantz, Simone

    2011-05-01

    Gathering archival documents to trace the history of the Zeiss company presents no difficulty : they are abundant… except for a period from 1932 to 1945, systematically ignored, and that corresponds to the Nazi period. On the website Zeiss Historica, among the outstanding personalities of the Zeiss company, we note that, for Professor Emanuel Goldberg, the web page « is still under development but an early picture of the professor is available. ». But fortunately, Mickael Buckland, a Professor at the UC Berkeley School of Information brought the life and the work of Emanuel Goldberg to light. Thanks to him, his works and innovations, who had disappeared from our cultural and scientific heritage, return to light after being erased during fifty years. Goldberg had published dozens of articles, obtained patents, developed cameras, microdots, movie cameras, and he designed what he called a "Statistical Machine ", the first electronic document retrieval machine. In France, if this rediscovery was made known to the world of information science, it has not had the impact it deserved in the scientific world. Therefore it is time to reconstruct his career and his work, and to analyse the reasons why some attempted to erase definitively his name and memory. © 2011 médecine/sciences - Inserm / SRMS.

  11. Prediction of Chemical Function: Model Development and ...

    EPA Pesticide Factsheets

    The United States Environmental Protection Agency’s Exposure Forecaster (ExpoCast) project is developing both statistical and mechanism-based computational models for predicting exposures to thousands of chemicals, including those in consumer products. The high-throughput (HT) screening-level exposures developed under ExpoCast can be combined with HT screening (HTS) bioactivity data for the risk-based prioritization of chemicals for further evaluation. The functional role (e.g. solvent, plasticizer, fragrance) that a chemical performs can drive both the types of products in which it is found and the concentration in which it is present and therefore impacting exposure potential. However, critical chemical use information (including functional role) is lacking for the majority of commercial chemicals for which exposure estimates are needed. A suite of machine-learning based models for classifying chemicals in terms of their likely functional roles in products based on structure were developed. This effort required collection, curation, and harmonization of publically-available data sources of chemical functional use information from government and industry bodies. Physicochemical and structure descriptor data were generated for chemicals with function data. Machine-learning classifier models for function were then built in a cross-validated manner from the descriptor/function data using the method of random forests. The models were applied to: 1) predict chemi

  12. Smart Interpretation - Application of Machine Learning in Geological Interpretation of AEM Data

    NASA Astrophysics Data System (ADS)

    Bach, T.; Gulbrandsen, M. L.; Jacobsen, R.; Pallesen, T. M.; Jørgensen, F.; Høyer, A. S.; Hansen, T. M.

    2015-12-01

    When using airborne geophysical measurements in e.g. groundwater mapping, an overwhelming amount of data is collected. Increasingly larger survey areas, denser data collection and limited resources, combines to an increasing problem of building geological models that use all the available data in a manner that is consistent with the geologists knowledge about the geology of the survey area. In the ERGO project, funded by The Danish National Advanced Technology Foundation, we address this problem, by developing new, usable tools, enabling the geologist utilize her geological knowledge directly in the interpretation of the AEM data, and thereby handle the large amount of data, In the project we have developed the mathematical basis for capturing geological expertise in a statistical model. Based on this, we have implemented new algorithms that have been operationalized and embedded in user friendly software. In this software, the machine learning algorithm, Smart Interpretation, enables the geologist to use the system as an assistant in the geological modelling process. As the software 'learns' the geology from the geologist, the system suggest new modelling features in the data. In this presentation we demonstrate the application of the results from the ERGO project, including the proposed modelling workflow utilized on a variety of data examples.

  13. Automated validation of patient safety clinical incident classification: macro analysis.

    PubMed

    Gupta, Jaiprakash; Patrick, Jon

    2013-01-01

    Patient safety is the buzz word in healthcare. Incident Information Management System (IIMS) is electronic software that stores clinical mishaps narratives in places where patients are treated. It is estimated that in one state alone over one million electronic text documents are available in IIMS. In this paper we investigate the data density available in the fields entered to notify an incident and the validity of the built in classification used by clinician to categories the incidents. Waikato Environment for Knowledge Analysis (WEKA) software was used to test the classes. Four statistical classifier based on J48, Naïve Bayes (NB), Naïve Bayes Multinominal (NBM) and Support Vector Machine using radial basis function (SVM_RBF) algorithms were used to validate the classes. The data pool was 10,000 clinical incidents drawn from 7 hospitals in one state in Australia. In first part of the study 1000 clinical incidents were selected to determine type and number of fields worth investigating and in the second part another 5448 clinical incidents were randomly selected to validate 13 clinical incident types. Result shows 74.6% of the cells were empty and only 23 fields had content over 70% of the time. The percentage correctly classified classes on four algorithms using categorical dataset ranged from 42 to 49%, using free-text datasets from 65% to 77% and using both datasets from 72% to 79%. Kappa statistic ranged from 0.36 to 0.4. for categorical data, from 0.61 to 0.74. for free-text and from 0.67 to 0.77 for both datasets. Similar increases in performance in the 3 experiments was noted on true positive rate, precision, F-measure and area under curve (AUC) of receiver operating characteristics (ROC) scores. The study demonstrates only 14 of 73 fields in IIMS have data that is usable for machine learning experiments. Irrespective of the type of algorithms used when all datasets are used performance was better. Classifier NBM showed best performance. We think the classifier can be improved further by reclassifying the most confused classes and there is scope to apply text mining tool on patient safety classifications.

  14. Impact of the HEALTHY Study on Vending Machine Offerings in Middle Schools.

    PubMed

    Hartstein, Jill; Cullen, Karen W; Virus, Amy; El Ghormli, Laure; Volpe, Stella L; Staten, Myrlene A; Bridgman, Jessica C; Stadler, Diane D; Gillis, Bonnie; McCormick, Sarah B; Mobley, Connie C

    2011-01-01

    The purpose of this study is to report the impact of the three-year middle school-based HEALTHY study on intervention school vending machine offerings. There were two goals for the vending machines: serve only dessert/snack foods with 200 kilocalories or less per single serving package, and eliminate 100% fruit juice and beverages with added sugar. Six schools in each of seven cities (Houston, TX, San Antonio, TX, Irvine, CA, Portland, OR, Pittsburg, PA, Philadelphia, PA, and Chapel Hill, NC) were randomized into intervention (n=21 schools) or control (n=21 schools) groups, with three intervention and three control schools per city. All items in vending machine slots were tallied twice in the fall of 2006 for baseline data and twice at the end of the study, in 2009. The percentage of total slots for each food/beverage category was calculated and compared between intervention and control schools at the end of study, using the Pearson chi-square test statistic. At baseline, 15 intervention and 15 control schools had beverage and/or snack vending machines, compared with 11 intervention and 11 control schools at the end of the study. At the end of study, all of the intervention schools with beverage vending machines, but only one out of the nine control schools, met the beverage goal. The snack goal was met by all of the intervention schools and only one of the four control schools with snack vending machines. The HEALTHY study's vending machine beverage and snack goals were successfully achieved in intervention schools, reducing access to less healthy food items outside the school meals program. Although the effect of these changes on student diet, energy balance and growth is unknown, these results suggest that healthier options for snacks can successfully be offered in school vending machines.

  15. Machine learning in cardiovascular medicine: are we there yet?

    PubMed

    Shameer, Khader; Johnson, Kipp W; Glicksberg, Benjamin S; Dudley, Joel T; Sengupta, Partho P

    2018-01-19

    Artificial intelligence (AI) broadly refers to analytical algorithms that iteratively learn from data, allowing computers to find hidden insights without being explicitly programmed where to look. These include a family of operations encompassing several terms like machine learning, cognitive learning, deep learning and reinforcement learning-based methods that can be used to integrate and interpret complex biomedical and healthcare data in scenarios where traditional statistical methods may not be able to perform. In this review article, we discuss the basics of machine learning algorithms and what potential data sources exist; evaluate the need for machine learning; and examine the potential limitations and challenges of implementing machine in the context of cardiovascular medicine. The most promising avenues for AI in medicine are the development of automated risk prediction algorithms which can be used to guide clinical care; use of unsupervised learning techniques to more precisely phenotype complex disease; and the implementation of reinforcement learning algorithms to intelligently augment healthcare providers. The utility of a machine learning-based predictive model will depend on factors including data heterogeneity, data depth, data breadth, nature of modelling task, choice of machine learning and feature selection algorithms, and orthogonal evidence. A critical understanding of the strength and limitations of various methods and tasks amenable to machine learning is vital. By leveraging the growing corpus of big data in medicine, we detail pathways by which machine learning may facilitate optimal development of patient-specific models for improving diagnoses, intervention and outcome in cardiovascular medicine. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  16. Machine learning for the New York City power grid.

    PubMed

    Rudin, Cynthia; Waltz, David; Anderson, Roger N; Boulanger, Albert; Salleb-Aouissi, Ansaf; Chow, Maggie; Dutta, Haimonti; Gross, Philip N; Huang, Bert; Ierome, Steve; Isaac, Delfina F; Kressner, Arthur; Passonneau, Rebecca J; Radeva, Axinia; Wu, Leon

    2012-02-01

    Power companies can benefit from the use of knowledge discovery methods and statistical machine learning for preventive maintenance. We introduce a general process for transforming historical electrical grid data into models that aim to predict the risk of failures for components and systems. These models can be used directly by power companies to assist with prioritization of maintenance and repair work. Specialized versions of this process are used to produce 1) feeder failure rankings, 2) cable, joint, terminator, and transformer rankings, 3) feeder Mean Time Between Failure (MTBF) estimates, and 4) manhole events vulnerability rankings. The process in its most general form can handle diverse, noisy, sources that are historical (static), semi-real-time, or realtime, incorporates state-of-the-art machine learning algorithms for prioritization (supervised ranking or MTBF), and includes an evaluation of results via cross-validation and blind test. Above and beyond the ranked lists and MTBF estimates are business management interfaces that allow the prediction capability to be integrated directly into corporate planning and decision support; such interfaces rely on several important properties of our general modeling approach: that machine learning features are meaningful to domain experts, that the processing of data is transparent, and that prediction results are accurate enough to support sound decision making. We discuss the challenges in working with historical electrical grid data that were not designed for predictive purposes. The “rawness” of these data contrasts with the accuracy of the statistical models that can be obtained from the process; these models are sufficiently accurate to assist in maintaining New York City’s electrical grid.

  17. Mapping the spatial distribution of Aedes aegypti and Aedes albopictus.

    PubMed

    Ding, Fangyu; Fu, Jingying; Jiang, Dong; Hao, Mengmeng; Lin, Gang

    2018-02-01

    Mosquito-borne infectious diseases, such as Rift Valley fever, Dengue, Chikungunya and Zika, have caused mass human death with the transnational expansion fueled by economic globalization. Simulating the distribution of the disease vectors is of great importance in formulating public health planning and disease control strategies. In the present study, we simulated the global distribution of Aedes aegypti and Aedes albopictus at a 5×5km spatial resolution with high-dimensional multidisciplinary datasets and machine learning methods Three relatively popular and robust machine learning models, including support vector machine (SVM), gradient boosting machine (GBM) and random forest (RF), were used. During the fine-tuning process based on training datasets of A. aegypti and A. albopictus, RF models achieved the highest performance with an area under the curve (AUC) of 0.973 and 0.974, respectively, followed by GBM (AUC of 0.971 and 0.972, respectively) and SVM (AUC of 0.963 and 0.964, respectively) models. The simulation difference between RF and GBM models was not statistically significant (p>0.05) based on the validation datasets, whereas statistically significant differences (p<0.05) were observed for RF and GBM simulations compared with SVM simulations. From the simulated maps derived from RF models, we observed that the distribution of A. albopictus was wider than that of A. aegypti along a latitudinal gradient. The discriminatory power of each factor in simulating the global distribution of the two species was also analyzed. Our results provided fundamental information for further study on disease transmission simulation and risk assessment. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. Impact of machining on the flexural fatigue strength of glass and polycrystalline CAD/CAM ceramics.

    PubMed

    Fraga, Sara; Amaral, Marina; Bottino, Marco Antônio; Valandro, Luiz Felipe; Kleverlaan, Cornelis Johannes; May, Liliana Gressler

    2017-11-01

    To assess the effect of machining on the flexural fatigue strength and on the surface roughness of different computer-aided design, computer-aided manufacturing (CAD/CAM) ceramics by comparing machined and polished after machining specimens. Disc-shaped specimens of yttria-stabilized polycrystalline tetragonal zirconia (Y-TZP), leucite-, and lithium disilicate-based glass ceramics were prepared by CAD/CAM machining, and divided into two groups: machining (M) and machining followed by polishing (MP). The surface roughness was measured and the flexural fatigue strength was evaluated by the step-test method (n=20). The initial load and the load increment for each ceramic material were based on a monotonic test (n=5). A maximum of 10,000 cycles was applied in each load step, at 1.4Hz. Weibull probability statistics was used for the analysis of the flexural fatigue strength, and Mann-Whitney test (α=5%) to compare roughness between the M and MP conditions. Machining resulted in lower values of characteristic flexural fatigue strength than machining followed by polishing. The greatest reduction in flexural fatigue strength from MP to M was observed for Y-TZP (40%; M=536.48MPa; MP=894.50MPa), followed by lithium disilicate (33%; M=187.71MPa; MP=278.93MPa) and leucite (29%; M=72.61MPa; MP=102.55MPa). Significantly higher values of roughness (Ra) were observed for M compared to MP (leucite: M=1.59μm and MP=0.08μm; lithium disilicate: M=1.84μm and MP=0.13μm; Y-TZP: M=1.79μm and MP=0.18μm). Machining negatively affected the flexural fatigue strength of CAD/CAM ceramics, indicating that machining of partially or fully sintered ceramics is deleterious to fatigue strength. Copyright © 2017 The Academy of Dental Materials. Published by Elsevier Ltd. All rights reserved.

  19. Interaction with Machine Improvisation

    NASA Astrophysics Data System (ADS)

    Assayag, Gerard; Bloch, George; Cont, Arshia; Dubnov, Shlomo

    We describe two multi-agent architectures for an improvisation oriented musician-machine interaction systems that learn in real time from human performers. The improvisation kernel is based on sequence modeling and statistical learning. We present two frameworks of interaction with this kernel. In the first, the stylistic interaction is guided by a human operator in front of an interactive computer environment. In the second framework, the stylistic interaction is delegated to machine intelligence and therefore, knowledge propagation and decision are taken care of by the computer alone. The first framework involves a hybrid architecture using two popular composition/performance environments, Max and OpenMusic, that are put to work and communicate together, each one handling the process at a different time/memory scale. The second framework shares the same representational schemes with the first but uses an Active Learning architecture based on collaborative, competitive and memory-based learning to handle stylistic interactions. Both systems are capable of processing real-time audio/video as well as MIDI. After discussing the general cognitive background of improvisation practices, the statistical modelling tools and the concurrent agent architecture are presented. Then, an Active Learning scheme is described and considered in terms of using different improvisation regimes for improvisation planning. Finally, we provide more details about the different system implementations and describe several performances with the system.

  20. Spectroscopic Diagnosis of Arsenic Contamination in Agricultural Soils

    PubMed Central

    Shi, Tiezhu; Liu, Huizeng; Chen, Yiyun; Fei, Teng; Wang, Junjie; Wu, Guofeng

    2017-01-01

    This study investigated the abilities of pre-processing, feature selection and machine-learning methods for the spectroscopic diagnosis of soil arsenic contamination. The spectral data were pre-processed by using Savitzky-Golay smoothing, first and second derivatives, multiplicative scatter correction, standard normal variate, and mean centering. Principle component analysis (PCA) and the RELIEF algorithm were used to extract spectral features. Machine-learning methods, including random forests (RF), artificial neural network (ANN), radial basis function- and linear function- based support vector machine (RBF- and LF-SVM) were employed for establishing diagnosis models. The model accuracies were evaluated and compared by using overall accuracies (OAs). The statistical significance of the difference between models was evaluated by using McNemar’s test (Z value). The results showed that the OAs varied with the different combinations of pre-processing, feature selection, and classification methods. Feature selection methods could improve the modeling efficiencies and diagnosis accuracies, and RELIEF often outperformed PCA. The optimal models established by RF (OA = 86%), ANN (OA = 89%), RBF- (OA = 89%) and LF-SVM (OA = 87%) had no statistical difference in diagnosis accuracies (Z < 1.96, p < 0.05). These results indicated that it was feasible to diagnose soil arsenic contamination using reflectance spectroscopy. The appropriate combination of multivariate methods was important to improve diagnosis accuracies. PMID:28471412

  1. Generation and Validation of Spatial Distribution of Hourly Wind Speed Time-Series using Machine Learning

    NASA Astrophysics Data System (ADS)

    Veronesi, F.; Grassi, S.

    2016-09-01

    Wind resource assessment is a key aspect of wind farm planning since it allows to estimate the long term electricity production. Moreover, wind speed time-series at high resolution are helpful to estimate the temporal changes of the electricity generation and indispensable to design stand-alone systems, which are affected by the mismatch of supply and demand. In this work, we present a new generalized statistical methodology to generate the spatial distribution of wind speed time-series, using Switzerland as a case study. This research is based upon a machine learning model and demonstrates that statistical wind resource assessment can successfully be used for estimating wind speed time-series. In fact, this method is able to obtain reliable wind speed estimates and propagate all the sources of uncertainty (from the measurements to the mapping process) in an efficient way, i.e. minimizing computational time and load. This allows not only an accurate estimation, but the creation of precise confidence intervals to map the stochasticity of the wind resource for a particular site. The validation shows that machine learning can minimize the bias of the wind speed hourly estimates. Moreover, for each mapped location this method delivers not only the mean wind speed, but also its confidence interval, which are crucial data for planners.

  2. Applications of machine learning in cancer prediction and prognosis.

    PubMed

    Cruz, Joseph A; Wishart, David S

    2007-02-11

    Machine learning is a branch of artificial intelligence that employs a variety of statistical, probabilistic and optimization techniques that allows computers to "learn" from past examples and to detect hard-to-discern patterns from large, noisy or complex data sets. This capability is particularly well-suited to medical applications, especially those that depend on complex proteomic and genomic measurements. As a result, machine learning is frequently used in cancer diagnosis and detection. More recently machine learning has been applied to cancer prognosis and prediction. This latter approach is particularly interesting as it is part of a growing trend towards personalized, predictive medicine. In assembling this review we conducted a broad survey of the different types of machine learning methods being used, the types of data being integrated and the performance of these methods in cancer prediction and prognosis. A number of trends are noted, including a growing dependence on protein biomarkers and microarray data, a strong bias towards applications in prostate and breast cancer, and a heavy reliance on "older" technologies such artificial neural networks (ANNs) instead of more recently developed or more easily interpretable machine learning methods. A number of published studies also appear to lack an appropriate level of validation or testing. Among the better designed and validated studies it is clear that machine learning methods can be used to substantially (15-25%) improve the accuracy of predicting cancer susceptibility, recurrence and mortality. At a more fundamental level, it is also evident that machine learning is also helping to improve our basic understanding of cancer development and progression.

  3. Patterns of radiotherapy infrastructure in Japan and in other countries with well-developed radiotherapy infrastructures.

    PubMed

    Nakamura, Katsumasa; Konishi, Kenta; Komatsu, Tetsuya; Sasaki, Tomonari; Shikama, Naoto

    2018-05-01

    In high-income countries, the number of radiotherapy machine per population reaches a sufficient level. However, the patterns of infrastructure of radiotherapy in high-income countries are not well known. Among 29 high-income countries with gross national income of $25,000 or more per capita, we selected 23 countries whose total number of newly diagnosed cancer patients in 2012 was reported in the Organisation for Economic Co-operation and Development Health Statistics 2017. The numbers of radiotherapy centers and teletherapy machines in each of these 23 countries were collected using the Dictionary of Radiotherapy Centers database. The number of cancer patients per teletherapy machine was 452.35-1398.22 (median 711.66) with a three-fold variation, whereas the number of cancer patients per radiotherapy center varied even more widely, from 826.16 to 5159.86 (median 2259.83) with a six-fold variation. The average number of teletherapy machines per radiotherapy center also ranged widely, from 1.24 to 8.29 (median 3.11) with a seven-fold variation. The number of teletherapy machines in each country was almost proportional to that of cancer patients, and the number of teletherapy machines per radiotherapy center was inversely related to the number of radiotherapy centers per cancer patients. The number of teletherapy machines per radiotherapy center in Japan was 1.24, the most fragmented among the high-income countries. The percentage of large radiotherapy centers having three or more teletherapy machines in Japan was the smallest among 23 high-income countries. Optimization of the radiotherapy infrastructure in Japan should be carefully considered.

  4. Applications of Machine Learning in Cancer Prediction and Prognosis

    PubMed Central

    Cruz, Joseph A.; Wishart, David S.

    2006-01-01

    Machine learning is a branch of artificial intelligence that employs a variety of statistical, probabilistic and optimization techniques that allows computers to “learn” from past examples and to detect hard-to-discern patterns from large, noisy or complex data sets. This capability is particularly well-suited to medical applications, especially those that depend on complex proteomic and genomic measurements. As a result, machine learning is frequently used in cancer diagnosis and detection. More recently machine learning has been applied to cancer prognosis and prediction. This latter approach is particularly interesting as it is part of a growing trend towards personalized, predictive medicine. In assembling this review we conducted a broad survey of the different types of machine learning methods being used, the types of data being integrated and the performance of these methods in cancer prediction and prognosis. A number of trends are noted, including a growing dependence on protein biomarkers and microarray data, a strong bias towards applications in prostate and breast cancer, and a heavy reliance on “older” technologies such artificial neural networks (ANNs) instead of more recently developed or more easily interpretable machine learning methods. A number of published studies also appear to lack an appropriate level of validation or testing. Among the better designed and validated studies it is clear that machine learning methods can be used to substantially (15–25%) improve the accuracy of predicting cancer susceptibility, recurrence and mortality. At a more fundamental level, it is also evident that machine learning is also helping to improve our basic understanding of cancer development and progression. PMID:19458758

  5. Joint FAM/Line Management Assessment Report on LLNL Machine Guarding Safety Program

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Armstrong, J. J.

    2016-07-19

    The LLNL Safety Program for Machine Guarding is implemented to comply with requirements in the ES&H Manual Document 11.2, "Hazards-General and Miscellaneous," Section 13 Machine Guarding (Rev 18, issued Dec. 15, 2015). The primary goal of this LLNL Safety Program is to ensure that LLNL operations involving machine guarding are managed so that workers, equipment and government property are adequately protected. This means that all such operations are planned and approved using the Integrated Safety Management System to provide the most cost effective and safest means available to support the LLNL mission.

  6. Development of a sterilizing in-place application for a production machine using Vaporized Hydrogen Peroxide.

    PubMed

    Mau, T; Hartmann, V; Burmeister, J; Langguth, P; Häusler, H

    2004-01-01

    The use of steam in sterilization processes is limited by the implementation of heat-sensitive components inside the machines to be sterilized. Alternative low-temperature sterilization methods need to be found and their suitability evaluated. Vaporized Hydrogen Peroxide (VHP) technology was adapted for a production machine consisting of highly sensitive pressure sensors and thermo-labile air tube systems. This new kind of "cold" surface sterilization, known from the Barrier Isolator Technology, is based on the controlled release of hydrogen peroxide vapour into sealed enclosures. A mobile VHP generator was used to generate the hydrogen peroxide vapour. The unit was combined with the air conduction system of the production machine. Terminal vacuum pumps were installed to distribute the gas within the production machine and for its elimination. In order to control the sterilization process, different physical process monitors were incorporated. The validation of the process was based on biological indicators (Geobacillus stearothermophilus). The Limited Spearman Karber Method (LSKM) was used to statistically evaluate the sterilization process. The results show that it is possible to sterilize surfaces in a complex tube system with the use of gaseous hydrogen peroxide. A total microbial reduction of 6 log units was reached.

  7. Perspectives on Machine Learning for Classification of Schizotypy Using fMRI Data.

    PubMed

    Madsen, Kristoffer H; Krohne, Laerke G; Cai, Xin-Lu; Wang, Yi; Chan, Raymond C K

    2018-03-15

    Functional magnetic resonance imaging is capable of estimating functional activation and connectivity in the human brain, and lately there has been increased interest in the use of these functional modalities combined with machine learning for identification of psychiatric traits. While these methods bear great potential for early diagnosis and better understanding of disease processes, there are wide ranges of processing choices and pitfalls that may severely hamper interpretation and generalization performance unless carefully considered. In this perspective article, we aim to motivate the use of machine learning schizotypy research. To this end, we describe common data processing steps while commenting on best practices and procedures. First, we introduce the important role of schizotypy to motivate the importance of reliable classification, and summarize existing machine learning literature on schizotypy. Then, we describe procedures for extraction of features based on fMRI data, including statistical parametric mapping, parcellation, complex network analysis, and decomposition methods, as well as classification with a special focus on support vector classification and deep learning. We provide more detailed descriptions and software as supplementary material. Finally, we present current challenges in machine learning for classification of schizotypy and comment on future trends and perspectives.

  8. Bayesian Kernel Methods for Non-Gaussian Distributions: Binary and Multi-class Classification Problems

    DTIC Science & Technology

    2013-05-28

    those of the support vector machine and relevance vector machine, and the model runs more quickly than the other algorithms . When one class occurs...incremental support vector machine algorithm for online learning when fewer than 50 data points are available. (a) Papers published in peer-reviewed journals...learning environments, where data processing occurs one observation at a time and the classification algorithm improves over time with new

  9. The Evaluation of Efficiency of the Use of Machine Working Time in the Industrial Company - Case Study

    NASA Astrophysics Data System (ADS)

    Kardas, Edyta; Brožova, Silvie; Pustějovská, Pavlína; Jursová, Simona

    2017-12-01

    In the paper the evaluation of efficiency of the use of machines in the selected production company was presented. The OEE method (Overall Equipment Effectiveness) was used for the analysis. The selected company deals with the production of tapered roller bearings. The analysis of effectiveness was done for 17 automatic grinding lines working in the department of grinding rollers. Low level of efficiency of machines was affected by problems with the availability of machines and devices. The causes of machine downtime on these lines was also analyzed. Three basic causes of downtime were identified: no kanban card, diamonding, no operator. Ways to improve the use of these machines were suggested. The analysis takes into account the actual results from the production process and covers the period of one calendar year.

  10. Evaluation of I-FIT results and machine variability using MnRoad test track mixtures.

    DOT National Transportation Integrated Search

    2017-06-01

    The Illinois Flexibility Index Test (I-FIT) was developed to distinguish between different mixtures in terms of potential cracking. Several : machines were manufactured and are currently available to perform the I-FIT. This report presents the result...

  11. Shedding Light on Synergistic Chemical Genetic Connections with Machine Learning.

    PubMed

    Ekins, Sean; Siqueira-Neto, Jair Lage

    2015-12-23

    Machine learning can be used to predict compounds acting synergistically, and this could greatly expand the universe of available potential treatments for diseases that are currently hidden in the dark chemical matter. Copyright © 2015 Elsevier Inc. All rights reserved.

  12. The effect of social demographic factors, snack consumption and vending machine use on oral health of children living in London.

    PubMed

    Maliderou, M; Reeves, S; Noble, C

    2006-10-07

    To investigate the effect of socio-economic status, sugar, snack consumption and vending machine use on the prevalence and severity of caries (DMF) in children. An observational study was carried out in a dental practice in inner city London. Sixty children were asked to complete a questionnaire and a three day food and drink diary. After a dental examination the number of decayed (D), missing (M) or filled (F) teeth provided a DMF score. Anova and Pearsons correlations were used to analyse the data statistically. Children from social groups I and II consumed significantly less (P < 0.05) sugar, confectionery, crisps and used a vending machine less often than children from other social groups. Children from Social groups I, II and III had significantly lower DMF scores. The average DMF from social group I children was 0.5 +/- 0.6, whilst group IV children had the greatest incidence and a DMF of 4.6 +/- 0.8. Significant correlations were identified between DMF and sugar, confectionery and crisp consumption and vending machine use, and a negative correlation between DMF and vegetable consumption. Socio-economic status and access to vending machines were found to have a significant effect on sugar intakes, foods choices, and dental health. The removal of vending machines from schools or at least installing 'healthy' vending machines is recommended. Health promotion programmes that account for social groups and snacking habits that are cost effective are required.

  13. An Electronic Cigarette Vaping Machine for the Characterization of Aerosol Delivery and Composition.

    PubMed

    Havel, Christopher M; Benowitz, Neal L; Jacob, Peyton; St Helen, Gideon

    2017-10-01

    Characterization of aerosols generated by electronic cigarettes (e-cigarettes) is one method used to evaluate the safety of e-cigarettes. While some researchers have modified smoking machines for e-cigarette aerosol generation, these machines are either not readily available, not automated for e-cigarette testing or have not been adequately described. The objective of this study was to build an e-cigarette vaping machine that can be used to test, under standard conditions, e-liquid aerosolization and nicotine and toxicant delivery. The vaping machine was assembled from commercially available parts, including a puff controller, vacuum pump, power supply, switch to control current flow to the atomizer, three-way value to direct air flow to the atomizer, and three gas dispersion tubes for aerosol trapping. To validate and illustrate its use, the variation in aerosol generation was assessed within and between KangerTech Mini ProTank 3 clearomizers, and the effect of voltage on aerosolization and toxic aldehyde generation were assessed. When using one ProTank 3 clearomizer and different e-liquid flavors, the coefficient of variation (CV) of aerosol generated ranged between 11.5% and 19.3%. The variation in aerosol generated between ProTank 3 clearomizers with different e-liquid flavors and voltage settings ranged between 8.3% and 16.3% CV. Aerosol generation increased linearly at 3-6V across e-liquids and clearomizer brands. Acetaldehyde, acrolein, and formaldehyde generation increased markedly at voltages at or above 5V. The vaping machine that we describe reproducibly aerosolizes e-liquids from e-cigarette atomizers under controlled conditions and is useful for testing of nicotine and toxicant delivery. This study describes an electronic cigarette vaping machine that was assembled from commercially available parts. The vaping machine can be replicated by researchers and used under standard conditions to generate e-cigarette aerosols and characterize nicotine and toxicant delivery. © The Author 2016. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  14. Fusing Data Mining, Machine Learning and Traditional Statistics to Detect Biomarkers Associated with Depression

    PubMed Central

    Dipnall, Joanna F.

    2016-01-01

    Background Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study. Methods The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009–2010). Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators. Results After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30), serum glucose (OR 1.01; 95% CI 1.00, 1.01) and total bilirubin (OR 0.12; 95% CI 0.05, 0.28). Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016), and current smokers (p<0.001). Conclusion The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and complex survey sampling methodology and was demonstrated to be a useful tool for detecting three biomarkers associated with depression for future hypothesis generation: red cell distribution width, serum glucose and total bilirubin. PMID:26848571

  15. Fusing Data Mining, Machine Learning and Traditional Statistics to Detect Biomarkers Associated with Depression.

    PubMed

    Dipnall, Joanna F; Pasco, Julie A; Berk, Michael; Williams, Lana J; Dodd, Seetal; Jacka, Felice N; Meyer, Denny

    2016-01-01

    Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study. The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009-2010). Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators. After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30), serum glucose (OR 1.01; 95% CI 1.00, 1.01) and total bilirubin (OR 0.12; 95% CI 0.05, 0.28). Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016), and current smokers (p<0.001). The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and complex survey sampling methodology and was demonstrated to be a useful tool for detecting three biomarkers associated with depression for future hypothesis generation: red cell distribution width, serum glucose and total bilirubin.

  16. Wind utilization in remote regions: An economic study. [for comparison with diesel engines

    NASA Technical Reports Server (NTRS)

    Vansant, J. H.

    1973-01-01

    A wind driven generator was considered as a supplement to a diesel group, for the purpose of economizing fuel when wind power is available. A specific location on Hudson's Bay, Povognituk, was selected. Technical and economic data available for a wind machine of 10-kilowatt nominal capacity and available wind data for that region were used for the study. After subtracting the yearly wind machine costs from savings in fuel costs, a net savings of $1400 per year is realized. These values are approximate, but are though to be highly conservative.

  17. ANN based Performance Evaluation of BDI for Condition Monitoring of Induction Motor Bearings

    NASA Astrophysics Data System (ADS)

    Patel, Raj Kumar; Giri, V. K.

    2017-06-01

    One of the critical parts in rotating machines is bearings and most of the failure arises from the defective bearings. Bearing failure leads to failure of a machine and the unpredicted productivity loss in the performance. Therefore, bearing fault detection and prognosis is an integral part of the preventive maintenance procedures. In this paper vibration signal for four conditions of a deep groove ball bearing; normal (N), inner race defect (IRD), ball defect (BD) and outer race defect (ORD) were acquired from a customized bearing test rig, under four different conditions and three different fault sizes. Two approaches have been opted for statistical feature extraction from the vibration signal. In the first approach, raw signal is used for statistical feature extraction and in the second approach statistical features extracted are based on bearing damage index (BDI). The proposed BDI technique uses wavelet packet node energy coefficients analysis method. Both the features are used as inputs to an ANN classifier to evaluate its performance. A comparison of ANN performance is made based on raw vibration data and data chosen by using BDI. The ANN performance has been found to be fairly higher when BDI based signals were used as inputs to the classifier.

  18. Machine Translation: The Alternative for the 21st Century?

    ERIC Educational Resources Information Center

    Cribb, V. Michael

    2000-01-01

    Outlines a scenario for the future of Teaching English as a Second or Other Languages that has seldom, if ever been considered in academic discussion: that advances in and availability of quality machine translation could mitigate the need for English language learning. (Author/VWL)

  19. Big data integration for regional hydrostratigraphic mapping

    NASA Astrophysics Data System (ADS)

    Friedel, M. J.

    2013-12-01

    Numerical models provide a way to evaluate groundwater systems, but determining the hydrostratigraphic units (HSUs) used in devising these models remains subjective, nonunique, and uncertain. A novel geophysical-hydrogeologic data integration scheme is proposed to constrain the estimation of continuous HSUs. First, machine-learning and multivariate statistical techniques are used to simultaneously integrate borehole hydrogeologic (lithology, hydraulic conductivity, aqueous field parameters, dissolved constituents) and geophysical (gamma, spontaneous potential, and resistivity) measurements. Second, airborne electromagnetic measurements are numerically inverted to obtain subsurface resistivity structure at randomly selected locations. Third, the machine-learning algorithm is trained using the borehole hydrostratigraphic units and inverted airborne resistivity profiles. The trained machine-learning algorithm is then used to estimate HSUs at independent resistivity profile locations. We demonstrate efficacy of the proposed approach to map the hydrostratigraphy of a heterogeneous surficial aquifer in northwestern Nebraska.

  20. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yale, S H

    A survey was conducted of x-ray facilities in 2000 dental offices under actual operating conditions. Each of 10 dental schools in the United States collected data on 200 local dental offices to implement geographic analysis of the status of radiation hygiene in the offices. The data provided records of roentgen (r) output of each machine, relative r dose to patient, and dose to operator. In addition, specific information relating to both operator and machine was coiiected and evaluated. Some dentists were found to be operating under unsafe conditions, but the average dentist covered in the survey was statistically safe. Onmore » the basis of the survey, it was concluded that the probiem of radiation hazards in dentistry will be resolved when all dental x-ray machines are properly filtered and collimated and high-speed dental x-ray film is used. (P.C.H.)« less

  1. A new Nawaz-Enscore-Ham-based heuristic for permutation flow-shop problems with bicriteria of makespan and machine idle time

    NASA Astrophysics Data System (ADS)

    Liu, Weibo; Jin, Yan; Price, Mark

    2016-10-01

    A new heuristic based on the Nawaz-Enscore-Ham algorithm is proposed in this article for solving a permutation flow-shop scheduling problem. A new priority rule is proposed by accounting for the average, mean absolute deviation, skewness and kurtosis, in order to fully describe the distribution style of processing times. A new tie-breaking rule is also introduced for achieving effective job insertion with the objective of minimizing both makespan and machine idle time. Statistical tests illustrate better solution quality of the proposed algorithm compared to existing benchmark heuristics.

  2. Optimisation of GaN LEDs and the reduction of efficiency droop using active machine learning

    DOE PAGES

    Rouet-Leduc, Bertrand; Barros, Kipton Marcos; Lookman, Turab; ...

    2016-04-26

    A fundamental challenge in the design of LEDs is to maximise electro-luminescence efficiency at high current densities. We simulate GaN-based LED structures that delay the onset of efficiency droop by spreading carrier concentrations evenly across the active region. Statistical analysis and machine learning effectively guide the selection of the next LED structure to be examined based upon its expected efficiency as well as model uncertainty. This active learning strategy rapidly constructs a model that predicts Poisson-Schrödinger simulations of devices, and that simultaneously produces structures with higher simulated efficiencies.

  3. Buying a Laser - Tips and Pearls

    PubMed Central

    Aurangabadkar, Sanjeev J; Mysore, Venkataram; Ahmed, E Suhail

    2014-01-01

    Lasers and aesthetic procedures have transformed dermatology practice. They have aided in the treatment of hitherto untreatable conditions and allowed better financial remuneration to the physician. The availability of a variety of laser devices of different makes, specifications and pricing has lead to confusion and dilemma in the mind of the buying physician. There are presently no guidelines available for buying a laser. Since purchase of a laser involves large investments, careful consideration to laser specifications, training, costing, warranty, availability of spares, and reliability of service are important prerequisites. This article describes various factors that are needed to be considered and also attempts to lay down criteria to be assessed while buying a laser system that will be useful to physicians before investing in a laser machine. Practice points Meticulous planning of the type of machine, specifications, financial aspects, maintenance and warranties is important.It is wise to sign a contract or agreement between the buyer and seller before purchase of a laser which covers key aspects of installation, after sales service and maintenance of the machine.Adequate training is essential; understanding laser physics and laser-tissue interaction goes a long way in getting the best out of the machine.The credibility of the dealer and company should be ascertained in order to be assured of after-sales service.Buying used machines, sharing of equipment to offset high initial investments is a good option but even more care is required to ensure proper functioning and maintenance. PMID:25136218

  4. Buying a laser - tips and pearls.

    PubMed

    Aurangabadkar, Sanjeev J; Mysore, Venkataram; Ahmed, E Suhail

    2014-04-01

    Lasers and aesthetic procedures have transformed dermatology practice. They have aided in the treatment of hitherto untreatable conditions and allowed better financial remuneration to the physician. The availability of a variety of laser devices of different makes, specifications and pricing has lead to confusion and dilemma in the mind of the buying physician. There are presently no guidelines available for buying a laser. Since purchase of a laser involves large investments, careful consideration to laser specifications, training, costing, warranty, availability of spares, and reliability of service are important prerequisites. This article describes various factors that are needed to be considered and also attempts to lay down criteria to be assessed while buying a laser system that will be useful to physicians before investing in a laser machine. Meticulous planning of the type of machine, specifications, financial aspects, maintenance and warranties is important.It is wise to sign a contract or agreement between the buyer and seller before purchase of a laser which covers key aspects of installation, after sales service and maintenance of the machine.Adequate training is essential; understanding laser physics and laser-tissue interaction goes a long way in getting the best out of the machine.The credibility of the dealer and company should be ascertained in order to be assured of after-sales service.Buying used machines, sharing of equipment to offset high initial investments is a good option but even more care is required to ensure proper functioning and maintenance.

  5. A Nano-Thin Film-Based Prototype QCM Sensor Array for Monitoring Human Breath and Respiratory Patterns.

    PubMed

    Selyanchyn, Roman; Wakamatsu, Shunichi; Hayashi, Kenshi; Lee, Seung-Woo

    2015-07-31

    Quartz crystal microbalance (QCM) sensor array was developed for multi-purpose human respiration assessment. The sensor system was designed to provide feedback for human respiration. Thorough optimization of measurement conditions: air flow, temperature in the QCM chamber, frequency measurement rate, and electrode position regarding to the gas flow-was performed. As shown, acquisition of respiratory parameters (rate and respiratory pattern) could be achieved even with a single electrode used in the system. The prototype system contains eight available QCM channels that can be potentially used for selective responses to certain breath chemicals. At present, the prototype machine is ready for the assessment of respiratory functions in larger populations in order to gain statistical validation. To the best of our knowledge, the developed prototype is the only respiratory assessment system based on surface modified QCM sensors.

  6. The Highly Adaptive Lasso Estimator

    PubMed Central

    Benkeser, David; van der Laan, Mark

    2017-01-01

    Estimation of a regression functions is a common goal of statistical learning. We propose a novel nonparametric regression estimator that, in contrast to many existing methods, does not rely on local smoothness assumptions nor is it constructed using local smoothing techniques. Instead, our estimator respects global smoothness constraints by virtue of falling in a class of right-hand continuous functions with left-hand limits that have variation norm bounded by a constant. Using empirical process theory, we establish a fast minimal rate of convergence of our proposed estimator and illustrate how such an estimator can be constructed using standard software. In simulations, we show that the finite-sample performance of our estimator is competitive with other popular machine learning techniques across a variety of data generating mechanisms. We also illustrate competitive performance in real data examples using several publicly available data sets. PMID:29094111

  7. Structure and Randomness of Continuous-Time, Discrete-Event Processes

    NASA Astrophysics Data System (ADS)

    Marzen, Sarah E.; Crutchfield, James P.

    2017-10-01

    Loosely speaking, the Shannon entropy rate is used to gauge a stochastic process' intrinsic randomness; the statistical complexity gives the cost of predicting the process. We calculate, for the first time, the entropy rate and statistical complexity of stochastic processes generated by finite unifilar hidden semi-Markov models—memoryful, state-dependent versions of renewal processes. Calculating these quantities requires introducing novel mathematical objects (ɛ -machines of hidden semi-Markov processes) and new information-theoretic methods to stochastic processes.

  8. Description and texts for the auxiliary programs for processing video information on the YeS computer. Part 3: Test program 2

    NASA Technical Reports Server (NTRS)

    Borisenko, V. I., G.g.; Stetsenko, Z. A.

    1980-01-01

    The functions were discribed and the operating instructions, the block diagram and the proposed versions are given for modifying the program in order to obtain the statistical characteristics of multi-channel video information. The program implements certain man-machine methods for investigating video information. It permits representation of the material and its statistical characteristics in a form which is convenient for the user.

  9. Early experiences in developing and managing the neuroscience gateway.

    PubMed

    Sivagnanam, Subhashini; Majumdar, Amit; Yoshimoto, Kenneth; Astakhov, Vadim; Bandrowski, Anita; Martone, MaryAnn; Carnevale, Nicholas T

    2015-02-01

    The last few decades have seen the emergence of computational neuroscience as a mature field where researchers are interested in modeling complex and large neuronal systems and require access to high performance computing machines and associated cyber infrastructure to manage computational workflow and data. The neuronal simulation tools, used in this research field, are also implemented for parallel computers and suitable for high performance computing machines. But using these tools on complex high performance computing machines remains a challenge because of issues with acquiring computer time on these machines located at national supercomputer centers, dealing with complex user interface of these machines, dealing with data management and retrieval. The Neuroscience Gateway is being developed to alleviate and/or hide these barriers to entry for computational neuroscientists. It hides or eliminates, from the point of view of the users, all the administrative and technical barriers and makes parallel neuronal simulation tools easily available and accessible on complex high performance computing machines. It handles the running of jobs and data management and retrieval. This paper shares the early experiences in bringing up this gateway and describes the software architecture it is based on, how it is implemented, and how users can use this for computational neuroscience research using high performance computing at the back end. We also look at parallel scaling of some publicly available neuronal models and analyze the recent usage data of the neuroscience gateway.

  10. Early experiences in developing and managing the neuroscience gateway

    PubMed Central

    Sivagnanam, Subhashini; Majumdar, Amit; Yoshimoto, Kenneth; Astakhov, Vadim; Bandrowski, Anita; Martone, MaryAnn; Carnevale, Nicholas. T.

    2015-01-01

    SUMMARY The last few decades have seen the emergence of computational neuroscience as a mature field where researchers are interested in modeling complex and large neuronal systems and require access to high performance computing machines and associated cyber infrastructure to manage computational workflow and data. The neuronal simulation tools, used in this research field, are also implemented for parallel computers and suitable for high performance computing machines. But using these tools on complex high performance computing machines remains a challenge because of issues with acquiring computer time on these machines located at national supercomputer centers, dealing with complex user interface of these machines, dealing with data management and retrieval. The Neuroscience Gateway is being developed to alleviate and/or hide these barriers to entry for computational neuroscientists. It hides or eliminates, from the point of view of the users, all the administrative and technical barriers and makes parallel neuronal simulation tools easily available and accessible on complex high performance computing machines. It handles the running of jobs and data management and retrieval. This paper shares the early experiences in bringing up this gateway and describes the software architecture it is based on, how it is implemented, and how users can use this for computational neuroscience research using high performance computing at the back end. We also look at parallel scaling of some publicly available neuronal models and analyze the recent usage data of the neuroscience gateway. PMID:26523124

  11. Evaluating Statistical Process Control (SPC) techniques and computing the uncertainty of force calibrations

    NASA Technical Reports Server (NTRS)

    Navard, Sharon E.

    1989-01-01

    In recent years there has been a push within NASA to use statistical techniques to improve the quality of production. Two areas where statistics are used are in establishing product and process quality control of flight hardware and in evaluating the uncertainty of calibration of instruments. The Flight Systems Quality Engineering branch is responsible for developing and assuring the quality of all flight hardware; the statistical process control methods employed are reviewed and evaluated. The Measurement Standards and Calibration Laboratory performs the calibration of all instruments used on-site at JSC as well as those used by all off-site contractors. These calibrations must be performed in such a way as to be traceable to national standards maintained by the National Institute of Standards and Technology, and they must meet a four-to-one ratio of the instrument specifications to calibrating standard uncertainty. In some instances this ratio is not met, and in these cases it is desirable to compute the exact uncertainty of the calibration and determine ways of reducing it. A particular example where this problem is encountered is with a machine which does automatic calibrations of force. The process of force calibration using the United Force Machine is described in detail. The sources of error are identified and quantified when possible. Suggestions for improvement are made.

  12. Moving beyond regression techniques in cardiovascular risk prediction: applying machine learning to address analytic challenges.

    PubMed

    Goldstein, Benjamin A; Navar, Ann Marie; Carter, Rickey E

    2017-06-14

    Risk prediction plays an important role in clinical cardiology research. Traditionally, most risk models have been based on regression models. While useful and robust, these statistical methods are limited to using a small number of predictors which operate in the same way on everyone, and uniformly throughout their range. The purpose of this review is to illustrate the use of machine-learning methods for development of risk prediction models. Typically presented as black box approaches, most machine-learning methods are aimed at solving particular challenges that arise in data analysis that are not well addressed by typical regression approaches. To illustrate these challenges, as well as how different methods can address them, we consider trying to predicting mortality after diagnosis of acute myocardial infarction. We use data derived from our institution's electronic health record and abstract data on 13 regularly measured laboratory markers. We walk through different challenges that arise in modelling these data and then introduce different machine-learning approaches. Finally, we discuss general issues in the application of machine-learning methods including tuning parameters, loss functions, variable importance, and missing data. Overall, this review serves as an introduction for those working on risk modelling to approach the diffuse field of machine learning. © The Author 2016. Published by Oxford University Press on behalf of the European Society of Cardiology.

  13. Do capuchin monkeys (Cebus apella) diagnose causal relations in the absence of a direct reward?

    PubMed

    Edwards, Brian J; Rottman, Benjamin M; Shankar, Maya; Betzler, Riana; Chituc, Vladimir; Rodriguez, Ricardo; Silva, Liara; Wibecan, Leah; Widness, Jane; Santos, Laurie R

    2014-01-01

    We adapted a method from developmental psychology to explore whether capuchin monkeys (Cebus apella) would place objects on a "blicket detector" machine to diagnose causal relations in the absence of a direct reward. Across five experiments, monkeys could place different objects on the machine and obtain evidence about the objects' causal properties based on whether each object "activated" the machine. In Experiments 1-3, monkeys received both audiovisual cues and a food reward whenever the machine activated. In these experiments, monkeys spontaneously placed objects on the machine and succeeded at discriminating various patterns of statistical evidence. In Experiments 4 and 5, we modified the procedure so that in the learning trials, monkeys received the audiovisual cues when the machine activated, but did not receive a food reward. In these experiments, monkeys failed to test novel objects in the absence of an immediate food reward, even when doing so could provide critical information about how to obtain a reward in future test trials in which the food reward delivery device was reattached. The present studies suggest that the gap between human and animal causal cognition may be in part a gap of motivation. Specifically, we propose that monkey causal learning is motivated by the desire to obtain a direct reward, and that unlike humans, monkeys do not engage in learning for learning's sake.

  14. Do Capuchin Monkeys (Cebus apella) Diagnose Causal Relations in the Absence of a Direct Reward?

    PubMed Central

    Edwards, Brian J.; Rottman, Benjamin M.; Shankar, Maya; Betzler, Riana; Chituc, Vladimir; Rodriguez, Ricardo; Silva, Liara; Wibecan, Leah; Widness, Jane; Santos, Laurie R.

    2014-01-01

    We adapted a method from developmental psychology [1] to explore whether capuchin monkeys (Cebus apella) would place objects on a “blicket detector” machine to diagnose causal relations in the absence of a direct reward. Across five experiments, monkeys could place different objects on the machine and obtain evidence about the objects’ causal properties based on whether each object “activated” the machine. In Experiments 1–3, monkeys received both audiovisual cues and a food reward whenever the machine activated. In these experiments, monkeys spontaneously placed objects on the machine and succeeded at discriminating various patterns of statistical evidence. In Experiments 4 and 5, we modified the procedure so that in the learning trials, monkeys received the audiovisual cues when the machine activated, but did not receive a food reward. In these experiments, monkeys failed to test novel objects in the absence of an immediate food reward, even when doing so could provide critical information about how to obtain a reward in future test trials in which the food reward delivery device was reattached. The present studies suggest that the gap between human and animal causal cognition may be in part a gap of motivation. Specifically, we propose that monkey causal learning is motivated by the desire to obtain a direct reward, and that unlike humans, monkeys do not engage in learning for learning’s sake. PMID:24586347

  15. Prediction of mortality after radical cystectomy for bladder cancer by machine learning techniques.

    PubMed

    Wang, Guanjin; Lam, Kin-Man; Deng, Zhaohong; Choi, Kup-Sze

    2015-08-01

    Bladder cancer is a common cancer in genitourinary malignancy. For muscle invasive bladder cancer, surgical removal of the bladder, i.e. radical cystectomy, is in general the definitive treatment which, unfortunately, carries significant morbidities and mortalities. Accurate prediction of the mortality of radical cystectomy is therefore needed. Statistical methods have conventionally been used for this purpose, despite the complex interactions of high-dimensional medical data. Machine learning has emerged as a promising technique for handling high-dimensional data, with increasing application in clinical decision support, e.g. cancer prediction and prognosis. Its ability to reveal the hidden nonlinear interactions and interpretable rules between dependent and independent variables is favorable for constructing models of effective generalization performance. In this paper, seven machine learning methods are utilized to predict the 5-year mortality of radical cystectomy, including back-propagation neural network (BPN), radial basis function (RBFN), extreme learning machine (ELM), regularized ELM (RELM), support vector machine (SVM), naive Bayes (NB) classifier and k-nearest neighbour (KNN), on a clinicopathological dataset of 117 patients of the urology unit of a hospital in Hong Kong. The experimental results indicate that RELM achieved the highest average prediction accuracy of 0.8 at a fast learning speed. The research findings demonstrate the potential of applying machine learning techniques to support clinical decision making. Copyright © 2015 Elsevier Ltd. All rights reserved.

  16. Adaptive hidden Markov model-based online learning framework for bearing faulty detection and performance degradation monitoring

    NASA Astrophysics Data System (ADS)

    Yu, Jianbo

    2017-01-01

    This study proposes an adaptive-learning-based method for machine faulty detection and health degradation monitoring. The kernel of the proposed method is an "evolving" model that uses an unsupervised online learning scheme, in which an adaptive hidden Markov model (AHMM) is used for online learning the dynamic health changes of machines in their full life. A statistical index is developed for recognizing the new health states in the machines. Those new health states are then described online by adding of new hidden states in AHMM. Furthermore, the health degradations in machines are quantified online by an AHMM-based health index (HI) that measures the similarity between two density distributions that describe the historic and current health states, respectively. When necessary, the proposed method characterizes the distinct operating modes of the machine and can learn online both abrupt as well as gradual health changes. Our method overcomes some drawbacks of the HIs (e.g., relatively low comprehensibility and applicability) based on fixed monitoring models constructed in the offline phase. Results from its application in a bearing life test reveal that the proposed method is effective in online detection and adaptive assessment of machine health degradation. This study provides a useful guide for developing a condition-based maintenance (CBM) system that uses an online learning method without considerable human intervention.

  17. Label-free sensor for automatic identification of erythrocytes using digital in-line holographic microscopy and machine learning.

    PubMed

    Go, Taesik; Byeon, Hyeokjun; Lee, Sang Joon

    2018-04-30

    Cell types of erythrocytes should be identified because they are closely related to their functionality and viability. Conventional methods for classifying erythrocytes are time consuming and labor intensive. Therefore, an automatic and accurate erythrocyte classification system is indispensable in healthcare and biomedical fields. In this study, we proposed a new label-free sensor for automatic identification of erythrocyte cell types using a digital in-line holographic microscopy (DIHM) combined with machine learning algorithms. A total of 12 features, including information on intensity distributions, morphological descriptors, and optical focusing characteristics, is quantitatively obtained from numerically reconstructed holographic images. All individual features for discocytes, echinocytes, and spherocytes are statistically different. To improve the performance of cell type identification, we adopted several machine learning algorithms, such as decision tree model, support vector machine, linear discriminant classification, and k-nearest neighbor classification. With the aid of these machine learning algorithms, the extracted features are effectively utilized to distinguish erythrocytes. Among the four tested algorithms, the decision tree model exhibits the best identification performance for the training sets (n = 440, 98.18%) and test sets (n = 190, 97.37%). This proposed methodology, which smartly combined DIHM and machine learning, would be helpful for sensing abnormal erythrocytes and computer-aided diagnosis of hematological diseases in clinic. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. On Docking, Scoring and Assessing Protein-DNA Complexes in a Rigid-Body Framework

    PubMed Central

    Parisien, Marc; Freed, Karl F.; Sosnick, Tobin R.

    2012-01-01

    We consider the identification of interacting protein-nucleic acid partners using the rigid body docking method FTdock, which is systematic and exhaustive in the exploration of docking conformations. The accuracy of rigid body docking methods is tested using known protein-DNA complexes for which the docked and undocked structures are both available. Additional tests with large decoy sets probe the efficacy of two published statistically derived scoring functions that contain a huge number of parameters. In contrast, we demonstrate that state-of-the-art machine learning techniques can enormously reduce the number of parameters required, thereby identifying the relevant docking features using a miniscule fraction of the number of parameters in the prior works. The present machine learning study considers a 300 dimensional vector (dependent on only 15 parameters), termed the Chemical Context Profile (CCP), where each dimension reflects a specific type of protein amino acid-nucleic acid base interaction. The CCP is designed to capture the chemical complementarities of the interface and is well suited for machine learning techniques. Our objective function is the Chemical Context Discrepancy (CCD), which is defined as the angle between the native system's CCP vector and the decoy's vector and which serves as a substitute for the more commonly used root mean squared deviation (RMSD). We demonstrate that the CCP provides a useful scoring function when certain dimensions are properly weighted. Finally, we explore how the amino acids on a protein's surface can help guide DNA binding, first through long-range interactions, followed by direct contacts, according to specific preferences for either the major or minor grooves of the DNA. PMID:22393431

  19. A Machine Learning Approach to Predicted Bathymetry

    NASA Astrophysics Data System (ADS)

    Wood, W. T.; Elmore, P. A.; Petry, F.

    2017-12-01

    Recent and on-going efforts have shown how machine learning (ML) techniques, incorporating more, and more disparate data than can be interpreted manually, can predict seafloor properties, with uncertainty, where they have not been measured directly. We examine here a ML approach to predicted bathymetry. Our approach employs a paradigm of global bathymetry as an integral component of global geology. From a marine geology and geophysics perspective the bathymetry is the thickness of one layer in an ensemble of layers that inter-relate to varying extents vertically and geospatially. The nature of the multidimensional relationships in these layers between bathymetry, gravity, magnetic field, age, and many other global measures is typically geospatially dependent and non-linear. The advantage of using ML is that these relationships need not be stated explicitly, nor do they need to be approximated with a transfer function - the machine learns them via the data. Fundamentally, ML operates by brute-force searching for multidimensional correlations between desired, but sparsely known data values (in this case water depth), and a multitude of (geologic) predictors. Predictors include quantities known extensively such as remotely sensed measurements (i.e. gravity and magnetics), distance from spreading ridge, trench etc., (and spatial statistics based on these quantities). Estimating bathymetry from an approximate transfer function is inherently model, as well as data limited - complex relationships are explicitly ruled out. The ML is a purely data-driven approach, so only the extent and quality of the available observations limit prediction accuracy. This allows for a system in which new data, of a wide variety of types, can be quickly and easily assimilated into updated bathymetry predictions with quantitative posterior uncertainties.

  20. Partitioned learning of deep Boltzmann machines for SNP data.

    PubMed

    Hess, Moritz; Lenz, Stefan; Blätte, Tamara J; Bullinger, Lars; Binder, Harald

    2017-10-15

    Learning the joint distributions of measurements, and in particular identification of an appropriate low-dimensional manifold, has been found to be a powerful ingredient of deep leaning approaches. Yet, such approaches have hardly been applied to single nucleotide polymorphism (SNP) data, probably due to the high number of features typically exceeding the number of studied individuals. After a brief overview of how deep Boltzmann machines (DBMs), a deep learning approach, can be adapted to SNP data in principle, we specifically present a way to alleviate the dimensionality problem by partitioned learning. We propose a sparse regression approach to coarsely screen the joint distribution of SNPs, followed by training several DBMs on SNP partitions that were identified by the screening. Aggregate features representing SNP patterns and the corresponding SNPs are extracted from the DBMs by a combination of statistical tests and sparse regression. In simulated case-control data, we show how this can uncover complex SNP patterns and augment results from univariate approaches, while maintaining type 1 error control. Time-to-event endpoints are considered in an application with acute myeloid leukemia patients, where SNP patterns are modeled after a pre-screening based on gene expression data. The proposed approach identified three SNPs that seem to jointly influence survival in a validation dataset. This indicates the added value of jointly investigating SNPs compared to standard univariate analyses and makes partitioned learning of DBMs an interesting complementary approach when analyzing SNP data. A Julia package is provided at 'http://github.com/binderh/BoltzmannMachines.jl'. binderh@imbi.uni-freiburg.de. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  1. Robust crop and weed segmentation under uncontrolled outdoor illumination

    USDA-ARS?s Scientific Manuscript database

    A new machine vision for weed detection was developed from RGB color model images. Processes included in the algorithm for the detection were excessive green conversion, threshold value computation by statistical analysis, adaptive image segmentation by adjusting the threshold value, median filter, ...

  2. On the Application of Syntactic Methodologies in Automatic Text Analysis.

    ERIC Educational Resources Information Center

    Salton, Gerard; And Others

    1990-01-01

    Summarizes various linguistic approaches proposed for document analysis in information retrieval environments. Topics discussed include syntactic analysis; use of machine-readable dictionary information; knowledge base construction; the PLNLP English Grammar (PEG) system; phrase normalization; and statistical and syntactic phrase evaluation used…

  3. Multitasking runtime systems for the Cedar Multiprocessor

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Guzzi, M.D.

    1986-07-01

    The programming of a MIMD machine is more complex than for SISD and SIMD machines. The multiple computational resources of the machine must be made available to the programming language compiler and to the programmer so that multitasking programs may be written. This thesis will explore the additional complexity of programming a MIMD machine, the Cedar Multiprocessor specifically, and the multitasking runtime system necessary to provide multitasking resources to the user. First, the problem will be well defined: the Cedar machine, its operating system, the programming language, and multitasking concepts will be described. Second, a solution to the problem, calledmore » macrotasking, will be proposed. This solution provides multitasking facilities to the programmer at a very coarse level with many visible machine dependencies. Third, an alternate solution, called microtasking, will be proposed. This solution provides multitasking facilities of a much finer grain. This solution does not depend so rigidly on the specific architecture of the machine. Finally, the two solutions will be compared for effectiveness. 12 refs., 16 figs.« less

  4. An automated curation procedure for addressing chemical errors and inconsistencies in public datasets used in QSAR modelling.

    PubMed

    Mansouri, K; Grulke, C M; Richard, A M; Judson, R S; Williams, A J

    2016-11-01

    The increasing availability of large collections of chemical structures and associated experimental data provides an opportunity to build robust QSAR models for applications in different fields. One common concern is the quality of both the chemical structure information and associated experimental data. Here we describe the development of an automated KNIME workflow to curate and correct errors in the structure and identity of chemicals using the publicly available PHYSPROP physicochemical properties and environmental fate datasets. The workflow first assembles structure-identity pairs using up to four provided chemical identifiers, including chemical name, CASRNs, SMILES, and MolBlock. Problems detected included errors and mismatches in chemical structure formats, identifiers and various structure validation issues, including hypervalency and stereochemistry descriptions. Subsequently, a machine learning procedure was applied to evaluate the impact of this curation process. The performance of QSAR models built on only the highest-quality subset of the original dataset was compared with the larger curated and corrected dataset. The latter showed statistically improved predictive performance. The final workflow was used to curate the full list of PHYSPROP datasets, and is being made publicly available for further usage and integration by the scientific community.

  5. Effectiveness of Direct Safety Regulations on Manufacturers and Users of Industrial Machines: Its Implications on Industrial Safety Policies in Republic of Korea.

    PubMed

    Choi, Gi Heung

    2017-03-01

    Despite considerable efforts made in recent years, the industrial accident rate and the fatality rate in the Republic of Korea are much higher than those in most developed countries in Europe and North America. Industrial safety policies and safety regulations are also known to be ineffective and inefficient in some cases. This study focuses on the quantitative evaluation of the effectiveness of direct safety regulations such as safety certification, self-declaration of conformity, and safety inspection of industrial machines in the Republic of Korea. Implications on safety policies to restructure the industrial safety system associated with industrial machines are also explored. Analysis of causes in industrial accidents associated with industrial machines confirms that technical causes need to be resolved to reduce both the frequency and the severity of such industrial accidents. Statistical analysis also confirms that the indirect effects of safety device regulation on users are limited for a variety of reasons. Safety device regulation needs to be shifted to complement safety certification and self-declaration of conformity for more balanced direct regulations on manufacturers and users. An example of cost-benefit analysis on conveyor justifies such a transition. Industrial safety policies and regulations associated with industrial machines must be directed towards eliminating the sources of danger at the stage of danger creation, thereby securing the safe industrial machines. Safety inspection further secures the safety of workers at the stage of danger use. The overall balance between such safety regulations is achieved by proper distribution of industrial machines subject to such regulations and the intensity of each regulation. Rearrangement of industrial machines subject to safety certification and self-declaration of conformity to include more movable industrial machines and other industrial machines with a high level of danger is also suggested.

  6. Power training using pneumatic machines vs. plate-loaded machines to improve muscle power in older adults.

    PubMed

    Balachandran, Anoop T; Gandia, Kristine; Jacobs, Kevin A; Streiner, David L; Eltoukhy, Moataz; Signorile, Joseph F

    2017-11-01

    Power training has been shown to be more effective than conventional resistance training for improving physical function in older adults; however, most trials have used pneumatic machines during training. Considering that the general public typically has access to plate-loaded machines, the effectiveness and safety of power training using plate-loaded machines compared to pneumatic machines is an important consideration. The purpose of this investigation was to compare the effects of high-velocity training using pneumatic machines (Pn) versus standard plate-loaded machines (PL). Independently-living older adults, 60years or older were randomized into two groups: pneumatic machine (Pn, n=19) and plate-loaded machine (PL, n=17). After 12weeks of high-velocity training twice per week, groups were analyzed using an intention-to-treat approach. Primary outcomes were lower body power measured using a linear transducer and upper body power using medicine ball throw. Secondary outcomes included lower and upper body muscle muscle strength, the Physical Performance Battery (PPB), gallon jug test, the timed up-and-go test, and self-reported function using the Patient Reported Outcomes Measurement Information System (PROMIS) and an online video questionnaire. Outcome assessors were blinded to group membership. Lower body power significantly improved in both groups (Pn: 19%, PL: 31%), with no significant difference between the groups (Cohen's d=0.4, 95% CI (-1.1, 0.3)). Upper body power significantly improved only in the PL group, but showed no significant difference between the groups (Pn: 3%, PL: 6%). For balance, there was a significant difference between the groups favoring the Pn group (d=0.7, 95% CI (0.1, 1.4)); however, there were no statistically significant differences between groups for PPB, gallon jug transfer, muscle muscle strength, timed up-and-go or self-reported function. No serious adverse events were reported in either of the groups. Pneumatic and plate-loaded machines were effective in improving lower body power and physical function in older adults. The results suggest that power training can be safely and effectively performed by older adults using either pneumatic or plate-loaded machines. Copyright © 2017 Elsevier Inc. All rights reserved.

  7. Mechanical properties of a new mica-based machinable glass ceramic for CAD/CAM restorations.

    PubMed

    Thompson, J Y; Bayne, S C; Heymann, H O

    1996-12-01

    Machinable ceramics (Vita Mark II and Dicor MGC) exhibit good short-term clinical performance, but long-term in vivo fracture resistance is still being monitored. The relatively low fracture toughness of currently available machinable ceramics restricts their use to conservative inlays and onlays. A new machinable glass ceramic (MGC-F) has been developed (Corning Inc.) with enhanced fluorescence and machinability. The purpose of this study was to characterize and compare key mechanical properties of MGC-F to Dicor MGC-Light, Dicor MGC-Dark, and Vita Mark II glass ceramics. The mean fracture toughness and indented biaxial flexure strength of MGC-F were each significantly greater (p < or = 0.01) than that of Dicor MGC-Light, Dicor MGC-Dark, and Vita Mark II ceramic materials. The results of this study indicate the potential for better in vivo fracture resistance of MGC-F compared with existing machinable ceramic materials for CAD/CAM restorations.

  8. Impact of Gastric Acid Induced Surface Changes on Mechanical Behavior and Optical Characteristics of Dental Ceramics.

    PubMed

    Kulkarni, Aditi; Rothrock, James; Thompson, Jeffery

    2018-01-14

    To test the impact of exposure to artificial gastric acid combined with toothbrush abrasion on the properties of dental ceramics. Earlier research has indicated that immersion in artificial gastric acid has caused increased surface roughness of dental ceramics; however, the combined effects of acid immersion and toothbrush abrasion and the impact of increased surface roughness on mechanical strength and optical properties have not been studied. Three commercially available ceramics were chosen for this study: feldspathic porcelain, lithium disilicate glass-ceramic, and monolithic zirconium oxide. The specimens (10 × 1 mm discs) were cut, thermally treated as required, and polished. Each material was divided into four groups (n = 8 per group): control (no exposure), acid only, brush only, acid + brush. The specimens were immersed in artificial gastric acid (50 ml of 0.2% [w/v] sodium chloride in 0.7% [v/v] hydrochloric acid mixed with 0.16 g of pepsin powder, pH = 2) for 2 minutes and rinsed with deionized water for 2 minutes. The procedure was repeated 6 times/day × 9 days, and specimens were stored in deionized water at 37°C. Toothbrush abrasion was performed using an ISO/ADA design brushing machine for 100 cycles/day × 9 days. The acid + brush group received both treatments. Specimens were examined under SEM and an optical microscope for morphological changes. Color and translucency were measured using spectrophotometer CIELAB coordinates (L*, a*, b*). Surface gloss was measured using a gloss meter. Surface roughness was measured using a stylus profilometer. Biaxial flexural strength was measured using a mechanical testing machine. The data were analyzed by one-way ANOVA followed by Tukey's HSD post hoc test (p < 0.05). Statistically significant changes were found for color, gloss, and surface roughness for porcelain and e.max specimens. No statistically significant changes were found for any properties of zirconia specimens. The acid treatment affected the surface roughness, color, and gloss of porcelain and e.max ceramics. The changes in translucency and mechanical strength for all materials were not statistically significant. Zirconia ceramic showed resistance to all treatments. © 2018 by the American College of Prosthodontists.

  9. 16 CFR 423.8 - Exemptions.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... washing and drycleaning procedures can safely be used on a product: (1) Machine washing in hot water; (2) Machine drying at a high setting; (3) Ironing at a hot setting; (4) Bleaching with all commercially... National Archives and Records Administration (NARA). For information on the availability of this material...

  10. A COMPARATIVE STUDY OF VIDEO TAPE RECORDINGS.

    ERIC Educational Resources Information Center

    WIENS, JACOB H.

    THE COMPARATIVE EFFECTIVENESS OF PRESENTLY AVAILABLE VIDEO TAPE MACHINES IS REPORTED, FOR THE CONVENIENCE OF SCHOOL ADMINISTRATORS PLANNING TO USE SUCH EQUIPMENT IN EDUCATIONAL PROGRAMS. TESTS WERE CONDUCTED AT THE WIENS ELECTRONIC LABORATORIES. MACHINE BRANDS TESTED WERE AMPEX, CONCORD, MACHTRONICS, PRECISION, RCA, SONY, AND WOLLENSAK. A DETAILED…

  11. 16 CFR 423.8 - Exemptions.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... washing and drycleaning procedures can safely be used on a product: (1) Machine washing in hot water; (2) Machine drying at a high setting; (3) Ironing at a hot setting; (4) Bleaching with all commercially... National Archives and Records Administration (NARA). For information on the availability of this material...

  12. Surface wind characteristics of some Aleutian Islands. [for selection of windpowered machine sites

    NASA Technical Reports Server (NTRS)

    Wentink, T., Jr.

    1973-01-01

    The wind power potential of Alaska is assessed in order to determine promising windpower sites for construction of wind machines and for shipment of wind derived energy. Analyses of near surface wind data from promising Aleutian sites accessible by ocean transport indicate probable velocity regimes and also present deficiencies in available data. It is shown that winds for some degree of power generation are available 77 percent of the time in the Aleutians with peak velocities depending on location.

  13. Anatomical entity mention recognition at literature scale

    PubMed Central

    Pyysalo, Sampo; Ananiadou, Sophia

    2014-01-01

    Motivation: Anatomical entities ranging from subcellular structures to organ systems are central to biomedical science, and mentions of these entities are essential to understanding the scientific literature. Despite extensive efforts to automatically analyze various aspects of biomedical text, there have been only few studies focusing on anatomical entities, and no dedicated methods for learning to automatically recognize anatomical entity mentions in free-form text have been introduced. Results: We present AnatomyTagger, a machine learning-based system for anatomical entity mention recognition. The system incorporates a broad array of approaches proposed to benefit tagging, including the use of Unified Medical Language System (UMLS)- and Open Biomedical Ontologies (OBO)-based lexical resources, word representations induced from unlabeled text, statistical truecasing and non-local features. We train and evaluate the system on a newly introduced corpus that substantially extends on previously available resources, and apply the resulting tagger to automatically annotate the entire open access scientific domain literature. The resulting analyses have been applied to extend services provided by the Europe PubMed Central literature database. Availability and implementation: All tools and resources introduced in this work are available from http://nactem.ac.uk/anatomytagger. Contact: sophia.ananiadou@manchester.ac.uk Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:24162468

  14. DREAMTools: a Python package for scoring collaborative challenges

    PubMed Central

    Cokelaer, Thomas; Bansal, Mukesh; Bare, Christopher; Bilal, Erhan; Bot, Brian M.; Chaibub Neto, Elias; Eduati, Federica; de la Fuente, Alberto; Gönen, Mehmet; Hill, Steven M.; Hoff, Bruce; Karr, Jonathan R.; Küffner, Robert; Menden, Michael P.; Meyer, Pablo; Norel, Raquel; Pratap, Abhishek; Prill, Robert J.; Weirauch, Matthew T.; Costello, James C.; Stolovitzky, Gustavo; Saez-Rodriguez, Julio

    2016-01-01

    DREAM challenges are community competitions designed to advance computational methods and address fundamental questions in system biology and translational medicine. Each challenge asks participants to develop and apply computational methods to either predict unobserved outcomes or to identify unknown model parameters given a set of training data. Computational methods are evaluated using an automated scoring metric, scores are posted to a public leaderboard, and methods are published to facilitate community discussions on how to build improved methods. By engaging participants from a wide range of science and engineering backgrounds, DREAM challenges can comparatively evaluate a wide range of statistical, machine learning, and biophysical methods. Here, we describe DREAMTools, a Python package for evaluating DREAM challenge scoring metrics. DREAMTools provides a command line interface that enables researchers to test new methods on past challenges, as well as a framework for scoring new challenges. As of March 2016, DREAMTools includes more than 80% of completed DREAM challenges. DREAMTools complements the data, metadata, and software tools available at the DREAM website http://dreamchallenges.org and on the Synapse platform at https://www.synapse.org. Availability:  DREAMTools is a Python package. Releases and documentation are available at http://pypi.python.org/pypi/dreamtools. The source code is available at http://github.com/dreamtools/dreamtools. PMID:27134723

  15. Predicting the outcomes of organic reactions via machine learning: are current descriptors sufficient?

    PubMed

    Skoraczyński, G; Dittwald, P; Miasojedow, B; Szymkuć, S; Gajewska, E P; Grzybowski, B A; Gambin, A

    2017-06-15

    As machine learning/artificial intelligence algorithms are defeating chess masters and, most recently, GO champions, there is interest - and hope - that they will prove equally useful in assisting chemists in predicting outcomes of organic reactions. This paper demonstrates, however, that the applicability of machine learning to the problems of chemical reactivity over diverse types of chemistries remains limited - in particular, with the currently available chemical descriptors, fundamental mathematical theorems impose upper bounds on the accuracy with which raction yields and times can be predicted. Improving the performance of machine-learning methods calls for the development of fundamentally new chemical descriptors.

  16. Machine Learning Techniques in Clinical Vision Sciences.

    PubMed

    Caixinha, Miguel; Nunes, Sandrina

    2017-01-01

    This review presents and discusses the contribution of machine learning techniques for diagnosis and disease monitoring in the context of clinical vision science. Many ocular diseases leading to blindness can be halted or delayed when detected and treated at its earliest stages. With the recent developments in diagnostic devices, imaging and genomics, new sources of data for early disease detection and patients' management are now available. Machine learning techniques emerged in the biomedical sciences as clinical decision-support techniques to improve sensitivity and specificity of disease detection and monitoring, increasing objectively the clinical decision-making process. This manuscript presents a review in multimodal ocular disease diagnosis and monitoring based on machine learning approaches. In the first section, the technical issues related to the different machine learning approaches will be present. Machine learning techniques are used to automatically recognize complex patterns in a given dataset. These techniques allows creating homogeneous groups (unsupervised learning), or creating a classifier predicting group membership of new cases (supervised learning), when a group label is available for each case. To ensure a good performance of the machine learning techniques in a given dataset, all possible sources of bias should be removed or minimized. For that, the representativeness of the input dataset for the true population should be confirmed, the noise should be removed, the missing data should be treated and the data dimensionally (i.e., the number of parameters/features and the number of cases in the dataset) should be adjusted. The application of machine learning techniques in ocular disease diagnosis and monitoring will be presented and discussed in the second section of this manuscript. To show the clinical benefits of machine learning in clinical vision sciences, several examples will be presented in glaucoma, age-related macular degeneration, and diabetic retinopathy, these ocular pathologies being the major causes of irreversible visual impairment.

  17. Artificial Intelligence Methods: Choice of algorithms, their complexity, and appropriateness within the context of hydrology and water resources. (Invited)

    NASA Astrophysics Data System (ADS)

    Bastidas, L. A.; Pande, S.

    2009-12-01

    Pattern analysis deals with the automatic detection of patterns in the data and there are a variety of algorithms available for the purpose. These algorithms are commonly called Artificial Intelligence (AI) or data driven algorithms, and have been applied lately to a variety of problems in hydrology and are becoming extremely popular. When confronting such a range of algorithms, the question of which one is the “best” arises. Some algorithms may be preferred because of the lower computational complexity; others take into account prior knowledge of the form and the amount of the data; others are chosen based on a version of the Occam’s razor principle that a simple classifier performs better. Popper has argued, however, that Occam’s razor is without operational value because there is no clear measure or criterion for simplicity. An example of measures that can be used for this purpose are: the so called algorithmic complexity - also known as Kolmogorov complexity or Kolmogorov (algorithmic) entropy; the Bayesian information criterion; or the Vapnik-Chervonenkis dimension. On the other hand, the No Free Lunch Theorem states that there is no best general algorithm, and that specific algorithms are superior only for specific problems. It should be noted also that the appropriate algorithm and the appropriate complexity are constrained by the finiteness of the available data and the uncertainties associated with it. Thus, there is compromise between the complexity of the algorithm, the data properties, and the robustness of the predictions. We discuss the above topics; briefly review the historical development of applications with particular emphasis on statistical learning theory (SLT), also known as machine learning (ML) of which support vector machines and relevant vector machines are the most commonly known algorithms. We present some applications of such algorithms for distributed hydrologic modeling; and introduce an example of how the complexity measure can be applied for appropriate model choice within the context of applications in hydrologic modeling intended for use in studies about water resources and water resources management and their direct relation to extreme conditions or natural hazards.

  18. Effects of retrofit emission controls and work practices on perchloroethylene exposures in small dry-cleaning shops.

    PubMed

    Ewers, Lynda M; Ruder, Avima M; Petersen, Martin R; Earnest, G Scott; Goldenhar, Linda M

    2002-02-01

    The effectiveness of commercially available interventions for reducing workers' perchloroethylene exposures in three small dry-cleaning shops was evaluated. Depending upon machine configuration, the intervention consisted of the addition of either a refrigerated condenser or a closed-loop carbon adsorber to the existing dry-cleaning machine. These relatively inexpensive (less than $5000) engineering controls were designed to reduce perchloroethylene emissions when dry-cleaning machine doors were opened for loading or unloading. Effectiveness of the interventions was judged by comparing pre- and postintervention perchloroethylene exposures using three types of measurements in each shop: (1) full-shift, personal breathing zone, air monitoring, (2) next-morning, end-exhaled worker breath concentrations of perchloroethylene, and (3) differences in the end-exhaled breath perchloroethylene concentrations before and after opening the dry-cleaning machine door. In general, measurements supported the hypothesis that machine operators' exposures to perchloroethylene can be reduced. However, work practices, especially maintenance practices, influenced exposures more than was originally anticipated. Only owners of dry-cleaning machines in good repair, with few leaks, should consider retrofitting them, and only after consultation with their machine's manufacturer. If machines are in poor condition, a new machine or alternative technology should be considered. Shop owners and employees should never circumvent safety features on dry-cleaning machines.

  19. Machine Learning Methods for Analysis of Metabolic Data and Metabolic Pathway Modeling

    PubMed Central

    Cuperlovic-Culf, Miroslava

    2018-01-01

    Machine learning uses experimental data to optimize clustering or classification of samples or features, or to develop, augment or verify models that can be used to predict behavior or properties of systems. It is expected that machine learning will help provide actionable knowledge from a variety of big data including metabolomics data, as well as results of metabolism models. A variety of machine learning methods has been applied in bioinformatics and metabolism analyses including self-organizing maps, support vector machines, the kernel machine, Bayesian networks or fuzzy logic. To a lesser extent, machine learning has also been utilized to take advantage of the increasing availability of genomics and metabolomics data for the optimization of metabolic network models and their analysis. In this context, machine learning has aided the development of metabolic networks, the calculation of parameters for stoichiometric and kinetic models, as well as the analysis of major features in the model for the optimal application of bioreactors. Examples of this very interesting, albeit highly complex, application of machine learning for metabolism modeling will be the primary focus of this review presenting several different types of applications for model optimization, parameter determination or system analysis using models, as well as the utilization of several different types of machine learning technologies. PMID:29324649

  20. Machine Learning Methods for Analysis of Metabolic Data and Metabolic Pathway Modeling.

    PubMed

    Cuperlovic-Culf, Miroslava

    2018-01-11

    Machine learning uses experimental data to optimize clustering or classification of samples or features, or to develop, augment or verify models that can be used to predict behavior or properties of systems. It is expected that machine learning will help provide actionable knowledge from a variety of big data including metabolomics data, as well as results of metabolism models. A variety of machine learning methods has been applied in bioinformatics and metabolism analyses including self-organizing maps, support vector machines, the kernel machine, Bayesian networks or fuzzy logic. To a lesser extent, machine learning has also been utilized to take advantage of the increasing availability of genomics and metabolomics data for the optimization of metabolic network models and their analysis. In this context, machine learning has aided the development of metabolic networks, the calculation of parameters for stoichiometric and kinetic models, as well as the analysis of major features in the model for the optimal application of bioreactors. Examples of this very interesting, albeit highly complex, application of machine learning for metabolism modeling will be the primary focus of this review presenting several different types of applications for model optimization, parameter determination or system analysis using models, as well as the utilization of several different types of machine learning technologies.

  1. Robotic inspection of fiber reinforced composites using phased array UT

    NASA Astrophysics Data System (ADS)

    Stetson, Jeffrey T.; De Odorico, Walter

    2014-02-01

    Ultrasound is the current NDE method of choice to inspect large fiber reinforced airframe structures. Over the last 15 years Cartesian based scanning machines using conventional ultrasound techniques have been employed by all airframe OEMs and their top tier suppliers to perform these inspections. Technical advances in both computing power and commercially available, multi-axis robots now facilitate a new generation of scanning machines. These machines use multiple end effector tools taking full advantage of phased array ultrasound technologies yielding substantial improvements in inspection quality and productivity. This paper outlines the general architecture for these new robotic scanning systems as well as details the variety of ultrasonic techniques available for use with them including advances such as wide area phased array scanning and sound field adaptation for non-flat, non-parallel surfaces.

  2. Critical Technology Assessment of Five Axis Simultaneous Control Machine Tools

    DTIC Science & Technology

    2009-07-01

    assessment, BIS specifically examined: • The application of Export Control Classification Numbers ( ECCN ) 2B001.b.2 and 2B001.c.2 controls and related...availability of certain five axis simultaneous control mills, mill/turns, and machining centers controlled by ECCN 2B001.b.2 (but not grinders controlled by... ECCN 2B001.c.2) exists to China and Taiwan, which both have an indigenous capability to produce five axis simultaneous control machine tools with

  3. Machine translation project alternatives analysis

    NASA Technical Reports Server (NTRS)

    Bajis, Catherine J.; Bedford, Denise A. D.

    1993-01-01

    The Machine Translation Project consists of several components, two of which, the Project Plan and the Requirements Analysis, have already been delivered. The Project Plan details the overall rationale, objectives and time-table for the project as a whole. The Requirements Analysis compares a number of available machine translation systems, their capabilities, possible configurations, and costs. The Alternatives Analysis has resulted in a number of conclusions and recommendations to the NASA STI program concerning the acquisition of specific MT systems and related hardware and software.

  4. Cognitive learning: a machine learning approach for automatic process characterization from design

    NASA Astrophysics Data System (ADS)

    Foucher, J.; Baderot, J.; Martinez, S.; Dervilllé, A.; Bernard, G.

    2018-03-01

    Cutting edge innovation requires accurate and fast process-control to obtain fast learning rate and industry adoption. Current tools available for such task are mainly manual and user dependent. We present in this paper cognitive learning, which is a new machine learning based technique to facilitate and to speed up complex characterization by using the design as input, providing fast training and detection time. We will focus on the machine learning framework that allows object detection, defect traceability and automatic measurement tools.

  5. 41 CFR 101-26.509-1 - Requisitioning tabulating machine cards available from Federal Supply Schedule contracts.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 41 Public Contracts and Property Management 2 2011-07-01 2007-07-01 true Requisitioning tabulating... Contracts and Property Management Federal Property Management Regulations System FEDERAL PROPERTY MANAGEMENT... electrical and mechanical contact tabulating machines, including aperture cards and copy cards. Federal...

  6. 41 CFR 101-26.509-1 - Requisitioning tabulating machine cards available from Federal Supply Schedule contracts.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 41 Public Contracts and Property Management 2 2010-07-01 2010-07-01 true Requisitioning tabulating... Contracts and Property Management Federal Property Management Regulations System FEDERAL PROPERTY MANAGEMENT... electrical and mechanical contact tabulating machines, including aperture cards and copy cards. Federal...

  7. Slot Machine Preferences of Pathological and Recreational Gamblers Are Verbally Constructed

    ERIC Educational Resources Information Center

    Dixon, Mark R.; Bihler, Holly L.; Nastally, Becky L.

    2011-01-01

    The current study attempted to alter preferences for concurrently available slot machines of equal payout through the development of equivalence classes and subsequent transfers of functions. Participants rated stimuli consisting of words thought to be associated with having a gambling problem (e.g., "desperation" and "debt"), words associated…

  8. Using ZWDOS to Communicate in Chinese on PC.

    ERIC Educational Resources Information Center

    Xie, Tianwei

    1995-01-01

    Describes the availability, installation, and use of the ZhonWen Disk Operating System (ZWDOS) to display, print, and transmit Chinese characters on conventional International Business Machines (IBM) personal computers and IBM-compatible machines. Also discussed is the use of ZWDOS to compose electronic mail messages, read newsgroups, and access…

  9. Can machine learning complement traditional medical device surveillance? A case study of dual-chamber implantable cardioverter-defibrillators.

    PubMed

    Ross, Joseph S; Bates, Jonathan; Parzynski, Craig S; Akar, Joseph G; Curtis, Jeptha P; Desai, Nihar R; Freeman, James V; Gamble, Ginger M; Kuntz, Richard; Li, Shu-Xia; Marinac-Dabic, Danica; Masoudi, Frederick A; Normand, Sharon-Lise T; Ranasinghe, Isuru; Shaw, Richard E; Krumholz, Harlan M

    2017-01-01

    Machine learning methods may complement traditional analytic methods for medical device surveillance. Using data from the National Cardiovascular Data Registry for implantable cardioverter-defibrillators (ICDs) linked to Medicare administrative claims for longitudinal follow-up, we applied three statistical approaches to safety-signal detection for commonly used dual-chamber ICDs that used two propensity score (PS) models: one specified by subject-matter experts (PS-SME), and the other one by machine learning-based selection (PS-ML). The first approach used PS-SME and cumulative incidence (time-to-event), the second approach used PS-SME and cumulative risk (Data Extraction and Longitudinal Trend Analysis [DELTA]), and the third approach used PS-ML and cumulative risk (embedded feature selection). Safety-signal surveillance was conducted for eleven dual-chamber ICD models implanted at least 2,000 times over 3 years. Between 2006 and 2010, there were 71,948 Medicare fee-for-service beneficiaries who received dual-chamber ICDs. Cumulative device-specific unadjusted 3-year event rates varied for three surveyed safety signals: death from any cause, 12.8%-20.9%; nonfatal ICD-related adverse events, 19.3%-26.3%; and death from any cause or nonfatal ICD-related adverse event, 27.1%-37.6%. Agreement among safety signals detected/not detected between the time-to-event and DELTA approaches was 90.9% (360 of 396, k =0.068), between the time-to-event and embedded feature-selection approaches was 91.7% (363 of 396, k =-0.028), and between the DELTA and embedded feature selection approaches was 88.1% (349 of 396, k =-0.042). Three statistical approaches, including one machine learning method, identified important safety signals, but without exact agreement. Ensemble methods may be needed to detect all safety signals for further evaluation during medical device surveillance.

  10. Two Body Wear of Newly Introduced Nanocomposite Teeth and Cross Linked Four Layered Acrylic Teeth: a Comparitive In Vitro Study.

    PubMed

    Ilangkumaran, R; Srinivasan, J; Baburajan, K; Balaji, N

    2014-12-01

    Wear of complete denture teeth results in compromise in denture esthetics and functions. To counteract this problem, artificial teeth with increased wear resistance had been introduced in the market such as nanocomposite teeth. The purpose of this study was to compare the amount of wear between nanocomposite teeth and acrylic teeth. Fifteen specimens were chosen from each group namely the nanocomposite teeth (SR_-PHONARES) and the acrylic teeth (ACRY PLUS). Maxillary premolar was only chosen for testing and the samples were customized according to the specifications of the pin on disc machine. Pin on disc machine is a two body tribometer which quantifies the amount of wear under a specific load and time. Test samples were mounted on to the receptacle of the pin on disc machine and tested under a load of 0.3 kg for 1,000 cycles of rotation against a 600 grit emery paper. The amount of wear is displayed from the digital reading obtained from the pin on disc machine. After statistical analysis, it was found that, the amount of wear is more in four layered acrylic teeth. The p value obtained is 0.002 (<0.005) thus implies that the difference in wear between nanocomposite teeth and acrylic teeth is statistically significant. Though the nanocomposite teeth has less amount of wear than the four layered acrylic teeth, the difference is very less and adds only to a little clinical significance but the cost of the nanocomposite is four times that of the acrylic teeth. Further clinical studies must be performed to confirm our results.

  11. Va-Room: Motorcycle Safety.

    ERIC Educational Resources Information Center

    Keller, Rosanne

    One of a series of instructional materials produced by the Literacy Council of Alaska, this booklet provides information about motorcycle safety. Using a simplified vocabulary and shorter sentences, it offers statistics concerning motorcycle accidents; information on how to choose the proper machine; basic information about the operation of the…

  12. Consensus in the Wasserstein Metric Space of Probability Measures

    DTIC Science & Technology

    2015-07-01

    this direction, potential applications/uses for the Wasser - stein barycentre (itself) have been considered previously in a number of fields...one is interested in more general empirical input measures. Applications in machine learning and Bayesian statistics have also made use of the Wasser

  13. Managing a Special Library. Parts I and II.

    ERIC Educational Resources Information Center

    Labovitz, Judy; Swanigan, Meryl

    1985-01-01

    Various concepts from "In Search of Excellence" are described in context of authors' personal styles. Discussions address thinking in terms of options, using statistics, learning value of corporate politics, a bias toward action, productivity through people, the "lean-machine" concept, staying close to client, entrepreneurship…

  14. Microscopes and computers combined for analysis of chromosomes

    NASA Technical Reports Server (NTRS)

    Butler, J. W.; Butler, M. K.; Stroud, A. N.

    1969-01-01

    Scanning machine CHLOE, developed for photographic use, is combined with a digital computer to obtain quantitative and statistically significant data on chromosome shapes, distribution, density, and pairing. CHLOE permits data acquisition about a chromosome complement to be obtained two times faster than by manual pairing.

  15. Assessing Creative Problem-Solving with Automated Text Grading

    ERIC Educational Resources Information Center

    Wang, Hao-Chuan; Chang, Chun-Yen; Li, Tsai-Yen

    2008-01-01

    The work aims to improve the assessment of creative problem-solving in science education by employing language technologies and computational-statistical machine learning methods to grade students' natural language responses automatically. To evaluate constructs like creative problem-solving with validity, open-ended questions that elicit…

  16. Associations between state-level soda taxes and adolescent body mass index.

    PubMed

    Powell, Lisa M; Chriqui, Jamie; Chaloupka, Frank J

    2009-09-01

    Soft drink consumption has been linked with higher energy intake, obesity, and poorer health. Fiscal pricing policies such as soda taxes may lower soda consumption and, in turn, reduce weight among U.S. adolescents. This study used multivariate linear regression analyses to examine the associations between state-level grocery store and vending machine soda taxes and adolescent body mass index (BMI). We used repeated cross-sections of individual-level data on adolescents drawn from the Monitoring the Future surveys combined with state-level tax data and local area contextual measures for the years 1997 through 2006. The results showed no statistically significant associations between state-level soda taxes and adolescent BMI. Only a weak economic and statistically significant effect was found between vending machine soda tax rates and BMI among teens at risk for overweight. Current state-level tax rates are not found to be significantly associated with adolescent weight outcomes. It is likely that taxes would need to be raised substantially to detect significant associations between taxes and adolescent weight.

  17. Learning disordered topological phases by statistical recovery of symmetry

    NASA Astrophysics Data System (ADS)

    Yoshioka, Nobuyuki; Akagi, Yutaka; Katsura, Hosho

    2018-05-01

    We apply the artificial neural network in a supervised manner to map out the quantum phase diagram of disordered topological superconductors in class DIII. Given the disorder that keeps the discrete symmetries of the ensemble as a whole, translational symmetry which is broken in the quasiparticle distribution individually is recovered statistically by taking an ensemble average. By using this, we classify the phases by the artificial neural network that learned the quasiparticle distribution in the clean limit and show that the result is totally consistent with the calculation by the transfer matrix method or noncommutative geometry approach. If all three phases, namely the Z2, trivial, and thermal metal phases, appear in the clean limit, the machine can classify them with high confidence over the entire phase diagram. If only the former two phases are present, we find that the machine remains confused in a certain region, leading us to conclude the detection of the unknown phase which is eventually identified as the thermal metal phase.

  18. Comparative study of coated and uncoated tool inserts with dry machining of EN47 steel using Taguchi L9 optimization technique

    NASA Astrophysics Data System (ADS)

    Vasu, M.; Shivananda, Nayaka H.

    2018-04-01

    EN47 steel samples are machined on a self-centered lathe using Chemical Vapor Deposition of coated TiCN/Al2O3/TiN and uncoated tungsten carbide tool inserts, with nose radius 0.8mm. Results are compared with each other and optimized using statistical tool. Input (cutting) parameters that are considered in this work are feed rate (f), cutting speed (Vc), and depth of cut (ap), the optimization criteria are based on the Taguchi (L9) orthogonal array. ANOVA method is adopted to evaluate the statistical significance and also percentage contribution for each model. Multiple response characteristics namely cutting force (Fz), tool tip temperature (T) and surface roughness (Ra) are evaluated. The results discovered that coated tool insert (TiCN/Al2O3/TiN) exhibits 1.27 and 1.29 times better than the uncoated tool insert for tool tip temperature and surface roughness respectively. A slight increase in cutting force was observed for coated tools.

  19. Data Science in the Research Domain Criteria Era: Relevance of Machine Learning to the Study of Stress Pathology, Recovery, and Resilience

    PubMed Central

    Galatzer-Levy, Isaac R.; Ruggles, Kelly; Chen, Zhe

    2017-01-01

    Diverse environmental and biological systems interact to influence individual differences in response to environmental stress. Understanding the nature of these complex relationships can enhance the development of methods to: (1) identify risk, (2) classify individuals as healthy or ill, (3) understand mechanisms of change, and (4) develop effective treatments. The Research Domain Criteria (RDoC) initiative provides a theoretical framework to understand health and illness as the product of multiple inter-related systems but does not provide a framework to characterize or statistically evaluate such complex relationships. Characterizing and statistically evaluating models that integrate multiple levels (e.g. synapses, genes, environmental factors) as they relate to outcomes that a free from prior diagnostic benchmarks represents a challenge requiring new computational tools that are capable to capture complex relationships and identify clinically relevant populations. In the current review, we will summarize machine learning methods that can achieve these goals. PMID:29527592

  20. Statistical analysis on the signals monitoring multiphase flow patterns in pipeline-riser system

    NASA Astrophysics Data System (ADS)

    Ye, Jing; Guo, Liejin

    2013-07-01

    The signals monitoring petroleum transmission pipeline in offshore oil industry usually contain abundant information about the multiphase flow on flow assurance which includes the avoidance of most undesirable flow pattern. Therefore, extracting reliable features form these signals to analyze is an alternative way to examine the potential risks to oil platform. This paper is focused on characterizing multiphase flow patterns in pipeline-riser system that is often appeared in offshore oil industry and finding an objective criterion to describe the transition of flow patterns. Statistical analysis on pressure signal at the riser top is proposed, instead of normal prediction method based on inlet and outlet flow conditions which could not be easily determined during most situations. Besides, machine learning method (least square supported vector machine) is also performed to classify automatically the different flow patterns. The experiment results from a small-scale loop show that the proposed method is effective for analyzing the multiphase flow pattern.

  1. Machine Learning Based Multi-Physical-Model Blending for Enhancing Renewable Energy Forecast -- Improvement via Situation Dependent Error Correction

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lu, Siyuan; Hwang, Youngdeok; Khabibrakhmanov, Ildar

    With increasing penetration of solar and wind energy to the total energy supply mix, the pressing need for accurate energy forecasting has become well-recognized. Here we report the development of a machine-learning based model blending approach for statistically combining multiple meteorological models for improving the accuracy of solar/wind power forecast. Importantly, we demonstrate that in addition to parameters to be predicted (such as solar irradiance and power), including additional atmospheric state parameters which collectively define weather situations as machine learning input provides further enhanced accuracy for the blended result. Functional analysis of variance shows that the error of individual modelmore » has substantial dependence on the weather situation. The machine-learning approach effectively reduces such situation dependent error thus produces more accurate results compared to conventional multi-model ensemble approaches based on simplistic equally or unequally weighted model averaging. Validation over an extended period of time results show over 30% improvement in solar irradiance/power forecast accuracy compared to forecasts based on the best individual model.« less

  2. Certification of highly complex safety-related systems.

    PubMed

    Reinert, D; Schaefer, M

    1999-01-01

    The BIA has now 15 years of experience with the certification of complex electronic systems for safety-related applications in the machinery sector. Using the example of machining centres this presentation will show the systematic procedure for verifying and validating control systems using Application Specific Integrated Circuits (ASICs) and microcomputers for safety functions. One section will describe the control structure of machining centres with control systems using "integrated safety." A diverse redundant architecture combined with crossmonitoring and forced dynamization is explained. In the main section the steps of the systematic certification procedure are explained showing some results of the certification of drilling machines. Specification reviews, design reviews with test case specification, statistical analysis, and walk-throughs are the analytical measures in the testing process. Systematic tests based on the test case specification, Electro Magnetic Interference (EMI), and environmental testing, and site acceptance tests on the machines are the testing measures for validation. A complex software driven system is always undergoing modification. Most of the changes are not safety-relevant but this has to be proven. A systematic procedure for certifying software modifications is presented in the last section of the paper.

  3. Influence of export control policy on the competitiveness of machine tool producing organizations

    NASA Astrophysics Data System (ADS)

    Ahrstrom, Jeffrey D.

    The possible influence of export control policies on producers of export controlled machine tools is examined in this quantitative study. International market competitiveness theories hold that market controlling policies such as export control regulations may influence an organization's ability to compete (Burris, 2010). Differences in domestic application of export control policy on machine tool exports may impose throttling effects on the competitiveness of participating firms (Freedenberg, 2010). Commodity shipments from Japan, Germany, and the United States to the Russian market will be examined using descriptive statistics; gravity modeling of these specific markets provides a foundation for comparison to actual shipment data; and industry participant responses to a user developed survey will provide additional data for analysis using a Kruskal-Wallis one-way analysis of variance. There is scarce academic research data on the topic of export control effects within the machine tool industry. Research results may be of interest to industry leadership in market participation decisions, advocacy arguments, and strategic planning. Industry advocates and export policy decision makers could find data of interest in supporting positions for or against modifications of export control policies.

  4. Evaluating data distribution and drift vulnerabilities of machine learning algorithms in secure and adversarial environments

    NASA Astrophysics Data System (ADS)

    Nelson, Kevin; Corbin, George; Blowers, Misty

    2014-05-01

    Machine learning is continuing to gain popularity due to its ability to solve problems that are difficult to model using conventional computer programming logic. Much of the current and past work has focused on algorithm development, data processing, and optimization. Lately, a subset of research has emerged which explores issues related to security. This research is gaining traction as systems employing these methods are being applied to both secure and adversarial environments. One of machine learning's biggest benefits, its data-driven versus logic-driven approach, is also a weakness if the data on which the models rely are corrupted. Adversaries could maliciously influence systems which address drift and data distribution changes using re-training and online learning. Our work is focused on exploring the resilience of various machine learning algorithms to these data-driven attacks. In this paper, we present our initial findings using Monte Carlo simulations, and statistical analysis, to explore the maximal achievable shift to a classification model, as well as the required amount of control over the data.

  5. Mechanistic models versus machine learning, a fight worth fighting for the biological community?

    PubMed

    Baker, Ruth E; Peña, Jose-Maria; Jayamohan, Jayaratnam; Jérusalem, Antoine

    2018-05-01

    Ninety per cent of the world's data have been generated in the last 5 years ( Machine learning: the power and promise of computers that learn by example Report no. DES4702. Issued April 2017. Royal Society). A small fraction of these data is collected with the aim of validating specific hypotheses. These studies are led by the development of mechanistic models focused on the causality of input-output relationships. However, the vast majority is aimed at supporting statistical or correlation studies that bypass the need for causality and focus exclusively on prediction. Along these lines, there has been a vast increase in the use of machine learning models, in particular in the biomedical and clinical sciences, to try and keep pace with the rate of data generation. Recent successes now beg the question of whether mechanistic models are still relevant in this area. Said otherwise, why should we try to understand the mechanisms of disease progression when we can use machine learning tools to directly predict disease outcome? © 2018 The Author(s).

  6. Tool geometry and damage mechanisms influencing CNC turning efficiency of Ti6Al4V

    NASA Astrophysics Data System (ADS)

    Suresh, Sangeeth; Hamid, Darulihsan Abdul; Yazid, M. Z. A.; Nasuha, Nurdiyanah; Ain, Siti Nurul

    2017-12-01

    Ti6Al4V or Grade 5 titanium alloy is widely used in the aerospace, medical, automotive and fabrication industries, due to its distinctive combination of mechanical and physical properties. Ti6Al4V has always been perverse during its machining, strangely due to the same mix of properties mentioned earlier. Ti6Al4V machining has resulted in shorter cutting tool life which has led to objectionable surface integrity and rapid failure of the parts machined. However, the proven functional relevance of this material has prompted extensive research in the optimization of machine parameters and cutting tool characteristics. Cutting tool geometry plays a vital role in ensuring dimensional and geometric accuracy in machined parts. In this study, an experimental investigation is actualized to optimize the nose radius and relief angles of the cutting tools and their interaction to different levels of machining parameters. Low elastic modulus and thermal conductivity of Ti6Al4V contribute to the rapid tool damage. The impact of these properties over the tool tips damage is studied. An experimental design approach is utilized in the CNC turning process of Ti6Al4V to statistically analyze and propose optimum levels of input parameters to lengthen the tool life and enhance surface characteristics of the machined parts. A greater tool nose radius with a straight flank, combined with low feed rates have resulted in a desirable surface integrity. The presence of relief angle has proven to aggravate tool damage and also dimensional instability in the CNC turning of Ti6Al4V.

  7. Machine Learning Approach for Classifying Multiple Sclerosis Courses by Combining Clinical Data with Lesion Loads and Magnetic Resonance Metabolic Features.

    PubMed

    Ion-Mărgineanu, Adrian; Kocevar, Gabriel; Stamile, Claudio; Sima, Diana M; Durand-Dubief, Françoise; Van Huffel, Sabine; Sappey-Marinier, Dominique

    2017-01-01

    Purpose: The purpose of this study is classifying multiple sclerosis (MS) patients in the four clinical forms as defined by the McDonald criteria using machine learning algorithms trained on clinical data combined with lesion loads and magnetic resonance metabolic features. Materials and Methods: Eighty-seven MS patients [12 Clinically Isolated Syndrome (CIS), 30 Relapse Remitting (RR), 17 Primary Progressive (PP), and 28 Secondary Progressive (SP)] and 18 healthy controls were included in this study. Longitudinal data available for each MS patient included clinical (e.g., age, disease duration, Expanded Disability Status Scale), conventional magnetic resonance imaging and spectroscopic imaging. We extract N -acetyl-aspartate (NAA), Choline (Cho), and Creatine (Cre) concentrations, and we compute three features for each spectroscopic grid by averaging metabolite ratios (NAA/Cho, NAA/Cre, Cho/Cre) over good quality voxels. We built linear mixed-effects models to test for statistically significant differences between MS forms. We test nine binary classification tasks on clinical data, lesion loads, and metabolic features, using a leave-one-patient-out cross-validation method based on 100 random patient-based bootstrap selections. We compute F1-scores and BAR values after tuning Linear Discriminant Analysis (LDA), Support Vector Machines with gaussian kernel (SVM-rbf), and Random Forests. Results: Statistically significant differences were found between the disease starting points of each MS form using four different response variables: Lesion Load, NAA/Cre, NAA/Cho, and Cho/Cre ratios. Training SVM-rbf on clinical and lesion loads yields F1-scores of 71-72% for CIS vs. RR and CIS vs. RR+SP, respectively. For RR vs. PP we obtained good classification results (maximum F1-score of 85%) after training LDA on clinical and metabolic features, while for RR vs. SP we obtained slightly higher classification results (maximum F1-score of 87%) after training LDA and SVM-rbf on clinical, lesion loads and metabolic features. Conclusions: Our results suggest that metabolic features are better at differentiating between relapsing-remitting and primary progressive forms, while lesion loads are better at differentiating between relapsing-remitting and secondary progressive forms. Therefore, combining clinical data with magnetic resonance lesion loads and metabolic features can improve the discrimination between relapsing-remitting and progressive forms.

  8. Advanced Machine Learning Emulators of Radiative Transfer Models

    NASA Astrophysics Data System (ADS)

    Camps-Valls, G.; Verrelst, J.; Martino, L.; Vicent, J.

    2017-12-01

    Physically-based model inversion methodologies are based on physical laws and established cause-effect relationships. A plethora of remote sensing applications rely on the physical inversion of a Radiative Transfer Model (RTM), which lead to physically meaningful bio-geo-physical parameter estimates. The process is however computationally expensive, needs expert knowledge for both the selection of the RTM, its parametrization and the the look-up table generation, as well as its inversion. Mimicking complex codes with statistical nonlinear machine learning algorithms has become the natural alternative very recently. Emulators are statistical constructs able to approximate the RTM, although at a fraction of the computational cost, providing an estimation of uncertainty, and estimations of the gradient or finite integral forms. We review the field and recent advances of emulation of RTMs with machine learning models. We posit Gaussian processes (GPs) as the proper framework to tackle the problem. Furthermore, we introduce an automatic methodology to construct emulators for costly RTMs. The Automatic Gaussian Process Emulator (AGAPE) methodology combines the interpolation capabilities of GPs with the accurate design of an acquisition function that favours sampling in low density regions and flatness of the interpolation function. We illustrate the good capabilities of our emulators in toy examples, leaf and canopy levels PROSPECT and PROSAIL RTMs, and for the construction of an optimal look-up-table for atmospheric correction based on MODTRAN5.

  9. Using Perturbed Physics Ensembles and Machine Learning to Select Parameters for Reducing Regional Biases in a Global Climate Model

    NASA Astrophysics Data System (ADS)

    Li, S.; Rupp, D. E.; Hawkins, L.; Mote, P.; McNeall, D. J.; Sarah, S.; Wallom, D.; Betts, R. A.

    2017-12-01

    This study investigates the potential to reduce known summer hot/dry biases over Pacific Northwest in the UK Met Office's atmospheric model (HadAM3P) by simultaneously varying multiple model parameters. The bias-reduction process is done through a series of steps: 1) Generation of perturbed physics ensemble (PPE) through the volunteer computing network weather@home; 2) Using machine learning to train "cheap" and fast statistical emulators of climate model, to rule out regions of parameter spaces that lead to model variants that do not satisfy observational constraints, where the observational constraints (e.g., top-of-atmosphere energy flux, magnitude of annual temperature cycle, summer/winter temperature and precipitation) are introduced sequentially; 3) Designing a new PPE by "pre-filtering" using the emulator results. Steps 1) through 3) are repeated until results are considered to be satisfactory (3 times in our case). The process includes a sensitivity analysis to find dominant parameters for various model output metrics, which reduces the number of parameters to be perturbed with each new PPE. Relative to observational uncertainty, we achieve regional improvements without introducing large biases in other parts of the globe. Our results illustrate the potential of using machine learning to train cheap and fast statistical emulators of climate model, in combination with PPEs in systematic model improvement.

  10. Forecasting Solar Flares Using Magnetogram-based Predictors and Machine Learning

    NASA Astrophysics Data System (ADS)

    Florios, Kostas; Kontogiannis, Ioannis; Park, Sung-Hong; Guerra, Jordan A.; Benvenuto, Federico; Bloomfield, D. Shaun; Georgoulis, Manolis K.

    2018-02-01

    We propose a forecasting approach for solar flares based on data from Solar Cycle 24, taken by the Helioseismic and Magnetic Imager (HMI) on board the Solar Dynamics Observatory (SDO) mission. In particular, we use the Space-weather HMI Active Region Patches (SHARP) product that facilitates cut-out magnetograms of solar active regions (AR) in the Sun in near-realtime (NRT), taken over a five-year interval (2012 - 2016). Our approach utilizes a set of thirteen predictors, which are not included in the SHARP metadata, extracted from line-of-sight and vector photospheric magnetograms. We exploit several machine learning (ML) and conventional statistics techniques to predict flares of peak magnitude {>} M1 and {>} C1 within a 24 h forecast window. The ML methods used are multi-layer perceptrons (MLP), support vector machines (SVM), and random forests (RF). We conclude that random forests could be the prediction technique of choice for our sample, with the second-best method being multi-layer perceptrons, subject to an entropy objective function. A Monte Carlo simulation showed that the best-performing method gives accuracy ACC=0.93(0.00), true skill statistic TSS=0.74(0.02), and Heidke skill score HSS=0.49(0.01) for {>} M1 flare prediction with probability threshold 15% and ACC=0.84(0.00), TSS=0.60(0.01), and HSS=0.59(0.01) for {>} C1 flare prediction with probability threshold 35%.

  11. Calibrating random forests for probability estimation.

    PubMed

    Dankowski, Theresa; Ziegler, Andreas

    2016-09-30

    Probabilities can be consistently estimated using random forests. It is, however, unclear how random forests should be updated to make predictions for other centers or at different time points. In this work, we present two approaches for updating random forests for probability estimation. The first method has been proposed by Elkan and may be used for updating any machine learning approach yielding consistent probabilities, so-called probability machines. The second approach is a new strategy specifically developed for random forests. Using the terminal nodes, which represent conditional probabilities, the random forest is first translated to logistic regression models. These are, in turn, used for re-calibration. The two updating strategies were compared in a simulation study and are illustrated with data from the German Stroke Study Collaboration. In most simulation scenarios, both methods led to similar improvements. In the simulation scenario in which the stricter assumptions of Elkan's method were not met, the logistic regression-based re-calibration approach for random forests outperformed Elkan's method. It also performed better on the stroke data than Elkan's method. The strength of Elkan's method is its general applicability to any probability machine. However, if the strict assumptions underlying this approach are not met, the logistic regression-based approach is preferable for updating random forests for probability estimation. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  12. Establishing glucose- and ABA-regulated transcription networks in Arabidopsis by microarray analysis and promoter classification using a Relevance Vector Machine.

    PubMed

    Li, Yunhai; Lee, Kee Khoon; Walsh, Sean; Smith, Caroline; Hadingham, Sophie; Sorefan, Karim; Cawley, Gavin; Bevan, Michael W

    2006-03-01

    Establishing transcriptional regulatory networks by analysis of gene expression data and promoter sequences shows great promise. We developed a novel promoter classification method using a Relevance Vector Machine (RVM) and Bayesian statistical principles to identify discriminatory features in the promoter sequences of genes that can correctly classify transcriptional responses. The method was applied to microarray data obtained from Arabidopsis seedlings treated with glucose or abscisic acid (ABA). Of those genes showing >2.5-fold changes in expression level, approximately 70% were correctly predicted as being up- or down-regulated (under 10-fold cross-validation), based on the presence or absence of a small set of discriminative promoter motifs. Many of these motifs have known regulatory functions in sugar- and ABA-mediated gene expression. One promoter motif that was not known to be involved in glucose-responsive gene expression was identified as the strongest classifier of glucose-up-regulated gene expression. We show it confers glucose-responsive gene expression in conjunction with another promoter motif, thus validating the classification method. We were able to establish a detailed model of glucose and ABA transcriptional regulatory networks and their interactions, which will help us to understand the mechanisms linking metabolism with growth in Arabidopsis. This study shows that machine learning strategies coupled to Bayesian statistical methods hold significant promise for identifying functionally significant promoter sequences.

  13. A data-driven multi-model methodology with deep feature selection for short-term wind forecasting

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Feng, Cong; Cui, Mingjian; Hodge, Bri-Mathias

    With the growing wind penetration into the power system worldwide, improving wind power forecasting accuracy is becoming increasingly important to ensure continued economic and reliable power system operations. In this paper, a data-driven multi-model wind forecasting methodology is developed with a two-layer ensemble machine learning technique. The first layer is composed of multiple machine learning models that generate individual forecasts. A deep feature selection framework is developed to determine the most suitable inputs to the first layer machine learning models. Then, a blending algorithm is applied in the second layer to create an ensemble of the forecasts produced by firstmore » layer models and generate both deterministic and probabilistic forecasts. This two-layer model seeks to utilize the statistically different characteristics of each machine learning algorithm. A number of machine learning algorithms are selected and compared in both layers. This developed multi-model wind forecasting methodology is compared to several benchmarks. The effectiveness of the proposed methodology is evaluated to provide 1-hour-ahead wind speed forecasting at seven locations of the Surface Radiation network. Numerical results show that comparing to the single-algorithm models, the developed multi-model framework with deep feature selection procedure has improved the forecasting accuracy by up to 30%.« less

  14. The SED Machine: a dedicated transient IFU spectrograph

    NASA Astrophysics Data System (ADS)

    Ben-Ami, Sagi; Konidaris, Nick; Quimby, Robert; Davis, Jack T.; Ngeow, Chow Choong; Ritter, Andreas; Rudy, Alexander

    2012-09-01

    The Spectral Energy Distribution (SED) Machine is an Integral Field Unit (IFU) spectrograph designed specifically to classify transients. It is comprised of two subsystems. A lenselet based IFU, with a 26" × 26" Field of View (FoV) and ˜ 0.75" spaxels feeds a constant resolution (R˜100) triple-prism. The dispersed rays are than imaged onto an off-the-shelf CCD detector. The second subsystem, the Rainbow Camera (RC), is a 4-band seeing-limited imager with a 12.5' × 12.5' FoV around the IFU that will allow real time spectrophotometric calibrations with a ˜ 5% accuracy. Data from both subsystems will be processed in real time using a dedicated reduction pipeline. The SED Machine will be mounted on the Palomar 60-inch robotic telescope (P60), covers a wavelength range of 370 - 920nm at high throughput and will classify transients from on-going and future surveys at a high rate. This will provide good statistics for common types of transients, and a better ability to discover and study rare and exotic ones. We present the science cases, optical design, and data reduction strategy of the SED Machine. The SED machine is currently being constructed at the Calofornia Institute of Technology, and will be comissioned on the spring of 2013.

  15. Automatic vetting of planet candidates from ground based surveys: Machine learning with NGTS

    NASA Astrophysics Data System (ADS)

    Armstrong, David J.; Günther, Maximilian N.; McCormac, James; Smith, Alexis M. S.; Bayliss, Daniel; Bouchy, François; Burleigh, Matthew R.; Casewell, Sarah; Eigmüller, Philipp; Gillen, Edward; Goad, Michael R.; Hodgkin, Simon T.; Jenkins, James S.; Louden, Tom; Metrailler, Lionel; Pollacco, Don; Poppenhaeger, Katja; Queloz, Didier; Raynard, Liam; Rauer, Heike; Udry, Stéphane; Walker, Simon R.; Watson, Christopher A.; West, Richard G.; Wheatley, Peter J.

    2018-05-01

    State of the art exoplanet transit surveys are producing ever increasing quantities of data. To make the best use of this resource, in detecting interesting planetary systems or in determining accurate planetary population statistics, requires new automated methods. Here we describe a machine learning algorithm that forms an integral part of the pipeline for the NGTS transit survey, demonstrating the efficacy of machine learning in selecting planetary candidates from multi-night ground based survey data. Our method uses a combination of random forests and self-organising-maps to rank planetary candidates, achieving an AUC score of 97.6% in ranking 12368 injected planets against 27496 false positives in the NGTS data. We build on past examples by using injected transit signals to form a training set, a necessary development for applying similar methods to upcoming surveys. We also make the autovet code used to implement the algorithm publicly accessible. autovet is designed to perform machine learned vetting of planetary candidates, and can utilise a variety of methods. The apparent robustness of machine learning techniques, whether on space-based or the qualitatively different ground-based data, highlights their importance to future surveys such as TESS and PLATO and the need to better understand their advantages and pitfalls in an exoplanetary context.

  16. Study of the atmospheric conditions affecting infrared astronomical measurements at White Mountain, California

    NASA Technical Reports Server (NTRS)

    Field, G. B.

    1974-01-01

    Measurements are described of atmospheric conditions affecting astronomical observations at White Mountain, California. Measurements were made at more than 1400 times spaced over more than 170 days at the Summit Laboratory and a small number of days at the Barcroft Laboratory. The recorded quantities were ten micron sky noise and precipitable water vapor, plus wet and dry bulb temperatures, wind speed and direction, brightness of the sky near the sun, fisheye lens photographs of the sky, description of cloud cover and other observable parameters, color photographs of air pollution astronomical seeing, and occasional determinations of the visible light brightness of the night sky. Measurements of some of these parameters have been made for over twenty years at the Barcroft and Crooked Creek Laboratories, and statistical analyses were made of them. These results and interpretations are given. The bulk of the collected data are statistically analyzed, and disposition of the detailed data is described. Most of the data are available in machine readable form. A detailed discussion of the techniques proposed for operation at White Mountain is given, showing how to cope with the mountain and climatic problems.

  17. Empirical study of alginate impression materials by customized proportioning system

    PubMed Central

    2016-01-01

    PURPOSE Alginate mixers available in the market do not have the automatic proportioning unit. In this study, an automatic proportioning unit for the alginate mixer and controller software were designed and produced for a new automatic proportioning unit. With this device, it was ensured that proportioning operation could arrange weight-based alginate impression materials. MATERIALS AND METHODS The variation of coefficient in the tested groups was compared with the manual proportioning. Compression tension and tear tests were conducted to determine the mechanical properties of alginate impression materials. The experimental data were statistically analyzed using one way ANOVA and Tukey test at the 0.05 level of significance. RESULTS No statistically significant differences in modulus of elastisity (P>0.3), tensional/compresional strength (P>0.3), resilience (P>0.2), strain in failure (P>0.4), and tear energy (P>0.7) of alginate impression materials were seen. However, a decrease in the standard deviation of tested groups was observed when the customized machine was used. To verify the efficiency of the system, powder and powder/water mixing were weighed and significant decrease was observed. CONCLUSION It was possible to obtain more mechanically stable alginate impression materials by using the custom-made proportioning unit. PMID:27826387

  18. Machines that go "ping" may improve balance but may not improve mobility or reduce risk of falls: a systematic review.

    PubMed

    Dennett, Amy M; Taylor, Nicholas F

    2015-01-01

    To determine the effectiveness of computer-based electronic devices that provide feedback in improving mobility and balance and reducing falls. Randomized controlled trials were searched from the earliest available date to August 2013. Standardized mean differences were used to complete meta-analyses, with statistical heterogeneity being described with the I-squared statistic. The GRADE approach was used to summarize the level of evidence for each completed meta-analysis. Risk of bias for individual trials was assessed with the (Physiotherapy Evidence Database) PEDro scale. Thirty trials were included. There was high-quality evidence that computerized devices can improve dynamic balance in people with a neurological condition compared with no therapy. There was low-to-moderate-quality evidence that computerized devices have no significant effect on mobility, falls efficacy and falls risk in community-dwelling older adults, and people with a neurological condition compared with physiotherapy. There is high-quality evidence that computerized devices that provide feedback may be useful in improving balance in people with neurological conditions compared with no therapy, but there is a lack of evidence supporting more meaningful changes in mobility and falls risk.

  19. Mining Twitter Data to Improve Detection of Schizophrenia

    PubMed Central

    McManus, Kimberly; Mallory, Emily K.; Goldfeder, Rachel L.; Haynes, Winston A.; Tatum, Jonathan D.

    2015-01-01

    Individuals who suffer from schizophrenia comprise I percent of the United States population and are four times more likely to die of suicide than the general US population. Identification of at-risk individuals with schizophrenia is challenging when they do not seek treatment. Microblogging platforms allow users to share their thoughts and emotions with the world in short snippets of text. In this work, we leveraged the large corpus of Twitter posts and machine-learning methodologies to detect individuals with schizophrenia. Using features from tweets such as emoticon use, posting time of day, and dictionary terms, we trained, built, and validated several machine learning models. Our support vector machine model achieved the best performance with 92% precision and 71% recall on the held-out test set. Additionally, we built a web application that dynamically displays summary statistics between cohorts. This enables outreach to undiagnosed individuals, improved physician diagnoses, and destigmatization of schizophrenia. PMID:26306253

  20. Friction Laws Derived From the Acoustic Emissions of a Laboratory Fault by Machine Learning

    NASA Astrophysics Data System (ADS)

    Rouet-Leduc, B.; Hulbert, C.; Ren, C. X.; Bolton, D. C.; Marone, C.; Johnson, P. A.

    2017-12-01

    Fault friction controls nearly all aspects of fault rupture, yet it is only possible to measure in the laboratory. Here we describe laboratory experiments where acoustic emissions are recorded from the fault. We find that by applying a machine learning approach known as "extreme gradient boosting trees" to the continuous acoustical signal, the fault friction can be directly inferred, showing that instantaneous characteristics of the acoustic signal are a fingerprint of the frictional state. This machine learning-based inference leads to a simple law that links the acoustic signal to the friction state, and holds for every stress cycle the laboratory fault goes through. The approach does not use any other measured parameter than instantaneous statistics of the acoustic signal. This finding may have importance for inferring frictional characteristics from seismic waves in Earth where fault friction cannot be measured.

Top