Sample records for machine statistical analysis

  1. Information integration and diagnosis analysis of equipment status and production quality for machining process

    NASA Astrophysics Data System (ADS)

    Zan, Tao; Wang, Min; Hu, Jianzhong

    2010-12-01

    Machining status monitoring technique by multi-sensors can acquire and analyze the machining process information to implement abnormity diagnosis and fault warning. Statistical quality control technique is normally used to distinguish abnormal fluctuations from normal fluctuations through statistical method. In this paper by comparing the advantages and disadvantages of the two methods, the necessity and feasibility of integration and fusion is introduced. Then an approach that integrates multi-sensors status monitoring and statistical process control based on artificial intelligent technique, internet technique and database technique is brought forward. Based on virtual instrument technique the author developed the machining quality assurance system - MoniSysOnline, which has been used to monitoring the grinding machining process. By analyzing the quality data and AE signal information of wheel dressing process the reason of machining quality fluctuation has been obtained. The experiment result indicates that the approach is suitable for the status monitoring and analyzing of machining process.

  2. Machine Learning Algorithms Outperform Conventional Regression Models in Predicting Development of Hepatocellular Carcinoma

    PubMed Central

    Singal, Amit G.; Mukherjee, Ashin; Elmunzer, B. Joseph; Higgins, Peter DR; Lok, Anna S.; Zhu, Ji; Marrero, Jorge A; Waljee, Akbar K

    2015-01-01

    Background Predictive models for hepatocellular carcinoma (HCC) have been limited by modest accuracy and lack of validation. Machine learning algorithms offer a novel methodology, which may improve HCC risk prognostication among patients with cirrhosis. Our study's aim was to develop and compare predictive models for HCC development among cirrhotic patients, using conventional regression analysis and machine learning algorithms. Methods We enrolled 442 patients with Child A or B cirrhosis at the University of Michigan between January 2004 and September 2006 (UM cohort) and prospectively followed them until HCC development, liver transplantation, death, or study termination. Regression analysis and machine learning algorithms were used to construct predictive models for HCC development, which were tested on an independent validation cohort from the Hepatitis C Antiviral Long-term Treatment against Cirrhosis (HALT-C) Trial. Both models were also compared to the previously published HALT-C model. Discrimination was assessed using receiver operating characteristic curve analysis and diagnostic accuracy was assessed with net reclassification improvement and integrated discrimination improvement statistics. Results After a median follow-up of 3.5 years, 41 patients developed HCC. The UM regression model had a c-statistic of 0.61 (95%CI 0.56-0.67), whereas the machine learning algorithm had a c-statistic of 0.64 (95%CI 0.60–0.69) in the validation cohort. The machine learning algorithm had significantly better diagnostic accuracy as assessed by net reclassification improvement (p<0.001) and integrated discrimination improvement (p=0.04). The HALT-C model had a c-statistic of 0.60 (95%CI 0.50-0.70) in the validation cohort and was outperformed by the machine learning algorithm (p=0.047). Conclusion Machine learning algorithms improve the accuracy of risk stratifying patients with cirrhosis and can be used to accurately identify patients at high-risk for developing HCC. PMID:24169273

  3. Machine learning for neuroimaging with scikit-learn.

    PubMed

    Abraham, Alexandre; Pedregosa, Fabian; Eickenberg, Michael; Gervais, Philippe; Mueller, Andreas; Kossaifi, Jean; Gramfort, Alexandre; Thirion, Bertrand; Varoquaux, Gaël

    2014-01-01

    Statistical machine learning methods are increasingly used for neuroimaging data analysis. Their main virtue is their ability to model high-dimensional datasets, e.g., multivariate analysis of activation images or resting-state time series. Supervised learning is typically used in decoding or encoding settings to relate brain images to behavioral or clinical observations, while unsupervised learning can uncover hidden structures in sets of images (e.g., resting state functional MRI) or find sub-populations in large cohorts. By considering different functional neuroimaging applications, we illustrate how scikit-learn, a Python machine learning library, can be used to perform some key analysis steps. Scikit-learn contains a very large set of statistical learning algorithms, both supervised and unsupervised, and its application to neuroimaging data provides a versatile tool to study the brain.

  4. Machine learning for neuroimaging with scikit-learn

    PubMed Central

    Abraham, Alexandre; Pedregosa, Fabian; Eickenberg, Michael; Gervais, Philippe; Mueller, Andreas; Kossaifi, Jean; Gramfort, Alexandre; Thirion, Bertrand; Varoquaux, Gaël

    2014-01-01

    Statistical machine learning methods are increasingly used for neuroimaging data analysis. Their main virtue is their ability to model high-dimensional datasets, e.g., multivariate analysis of activation images or resting-state time series. Supervised learning is typically used in decoding or encoding settings to relate brain images to behavioral or clinical observations, while unsupervised learning can uncover hidden structures in sets of images (e.g., resting state functional MRI) or find sub-populations in large cohorts. By considering different functional neuroimaging applications, we illustrate how scikit-learn, a Python machine learning library, can be used to perform some key analysis steps. Scikit-learn contains a very large set of statistical learning algorithms, both supervised and unsupervised, and its application to neuroimaging data provides a versatile tool to study the brain. PMID:24600388

  5. Statistical mechanics of unsupervised feature learning in a restricted Boltzmann machine with binary synapses

    NASA Astrophysics Data System (ADS)

    Huang, Haiping

    2017-05-01

    Revealing hidden features in unlabeled data is called unsupervised feature learning, which plays an important role in pretraining a deep neural network. Here we provide a statistical mechanics analysis of the unsupervised learning in a restricted Boltzmann machine with binary synapses. A message passing equation to infer the hidden feature is derived, and furthermore, variants of this equation are analyzed. A statistical analysis by replica theory describes the thermodynamic properties of the model. Our analysis confirms an entropy crisis preceding the non-convergence of the message passing equation, suggesting a discontinuous phase transition as a key characteristic of the restricted Boltzmann machine. Continuous phase transition is also confirmed depending on the embedded feature strength in the data. The mean-field result under the replica symmetric assumption agrees with that obtained by running message passing algorithms on single instances of finite sizes. Interestingly, in an approximate Hopfield model, the entropy crisis is absent, and a continuous phase transition is observed instead. We also develop an iterative equation to infer the hyper-parameter (temperature) hidden in the data, which in physics corresponds to iteratively imposing Nishimori condition. Our study provides insights towards understanding the thermodynamic properties of the restricted Boltzmann machine learning, and moreover important theoretical basis to build simplified deep networks.

  6. Performance Analysis of Millimeter-Wave Multi-hop Machine-to-Machine Networks Based on Hop Distance Statistics

    PubMed Central

    2018-01-01

    As an intrinsic part of the Internet of Things (IoT) ecosystem, machine-to-machine (M2M) communications are expected to provide ubiquitous connectivity between machines. Millimeter-wave (mmWave) communication is another promising technology for the future communication systems to alleviate the pressure of scarce spectrum resources. For this reason, in this paper, we consider multi-hop M2M communications, where a machine-type communication (MTC) device with the limited transmit power relays to help other devices using mmWave. To be specific, we focus on hop distance statistics and their impacts on system performances in multi-hop wireless networks (MWNs) with directional antenna arrays in mmWave for M2M communications. Different from microwave systems, in mmWave communications, wireless channel suffers from blockage by obstacles that heavily attenuate line-of-sight signals, which may result in limited per-hop progress in MWNs. We consider two routing strategies aiming at different types of applications and derive the probability distributions of their hop distances. Moreover, we provide their baseline statistics assuming the blockage-free scenario to quantify the impact of blockages. Based on the hop distance analysis, we propose a method to estimate the end-to-end performances (e.g., outage probability, hop count, and transmit energy) of the mmWave MWNs, which provides important insights into mmWave MWN design without time-consuming and repetitive end-to-end simulation. PMID:29329248

  7. Performance Analysis of Millimeter-Wave Multi-hop Machine-to-Machine Networks Based on Hop Distance Statistics.

    PubMed

    Jung, Haejoon; Lee, In-Ho

    2018-01-12

    As an intrinsic part of the Internet of Things (IoT) ecosystem, machine-to-machine (M2M) communications are expected to provide ubiquitous connectivity between machines. Millimeter-wave (mmWave) communication is another promising technology for the future communication systems to alleviate the pressure of scarce spectrum resources. For this reason, in this paper, we consider multi-hop M2M communications, where a machine-type communication (MTC) device with the limited transmit power relays to help other devices using mmWave. To be specific, we focus on hop distance statistics and their impacts on system performances in multi-hop wireless networks (MWNs) with directional antenna arrays in mmWave for M2M communications. Different from microwave systems, in mmWave communications, wireless channel suffers from blockage by obstacles that heavily attenuate line-of-sight signals, which may result in limited per-hop progress in MWNs. We consider two routing strategies aiming at different types of applications and derive the probability distributions of their hop distances. Moreover, we provide their baseline statistics assuming the blockage-free scenario to quantify the impact of blockages. Based on the hop distance analysis, we propose a method to estimate the end-to-end performances (e.g., outage probability, hop count, and transmit energy) of the mmWave MWNs, which provides important insights into mmWave MWN design without time-consuming and repetitive end-to-end simulation.

  8. Comparing statistical and machine learning classifiers: alternatives for predictive modeling in human factors research.

    PubMed

    Carnahan, Brian; Meyer, Gérard; Kuntz, Lois-Ann

    2003-01-01

    Multivariate classification models play an increasingly important role in human factors research. In the past, these models have been based primarily on discriminant analysis and logistic regression. Models developed from machine learning research offer the human factors professional a viable alternative to these traditional statistical classification methods. To illustrate this point, two machine learning approaches--genetic programming and decision tree induction--were used to construct classification models designed to predict whether or not a student truck driver would pass his or her commercial driver license (CDL) examination. The models were developed and validated using the curriculum scores and CDL exam performances of 37 student truck drivers who had completed a 320-hr driver training course. Results indicated that the machine learning classification models were superior to discriminant analysis and logistic regression in terms of predictive accuracy. Actual or potential applications of this research include the creation of models that more accurately predict human performance outcomes.

  9. Objective research of auscultation signals in Traditional Chinese Medicine based on wavelet packet energy and support vector machine.

    PubMed

    Yan, Jianjun; Shen, Xiaojing; Wang, Yiqin; Li, Fufeng; Xia, Chunming; Guo, Rui; Chen, Chunfeng; Shen, Qingwei

    2010-01-01

    This study aims at utilising Wavelet Packet Transform (WPT) and Support Vector Machine (SVM) algorithm to make objective analysis and quantitative research for the auscultation in Traditional Chinese Medicine (TCM) diagnosis. First, Wavelet Packet Decomposition (WPD) at level 6 was employed to split more elaborate frequency bands of the auscultation signals. Then statistic analysis was made based on the extracted Wavelet Packet Energy (WPE) features from WPD coefficients. Furthermore, the pattern recognition was used to distinguish mixed subjects' statistical feature values of sample groups through SVM. Finally, the experimental results showed that the classification accuracies were at a high level.

  10. A MOOC on Approaches to Machine Translation

    ERIC Educational Resources Information Center

    Costa-jussà, Mart R.; Formiga, Lluís; Torrillas, Oriol; Petit, Jordi; Fonollosa, José A. R.

    2015-01-01

    This paper describes the design, development, and analysis of a MOOC entitled "Approaches to Machine Translation: Rule-based, statistical and hybrid", and provides lessons learned and conclusions to be taken into account in the future. The course was developed within the Canvas platform, used by recognized European universities. It…

  11. Statistical Machine Learning for Structured and High Dimensional Data

    DTIC Science & Technology

    2014-09-17

    AFRL-OSR-VA-TR-2014-0234 STATISTICAL MACHINE LEARNING FOR STRUCTURED AND HIGH DIMENSIONAL DATA Larry Wasserman CARNEGIE MELLON UNIVERSITY Final...Re . 8-98) v Prescribed by ANSI Std. Z39.18 14-06-2014 Final Dec 2009 - Aug 2014 Statistical Machine Learning for Structured and High Dimensional...area of resource-constrained statistical estimation. machine learning , high-dimensional statistics U U U UU John Lafferty 773-702-3813 > Research under

  12. Statistical Learning Analysis in Neuroscience: Aiming for Transparency

    PubMed Central

    Hanke, Michael; Halchenko, Yaroslav O.; Haxby, James V.; Pollmann, Stefan

    2009-01-01

    Encouraged by a rise of reciprocal interest between the machine learning and neuroscience communities, several recent studies have demonstrated the explanatory power of statistical learning techniques for the analysis of neural data. In order to facilitate a wider adoption of these methods, neuroscientific research needs to ensure a maximum of transparency to allow for comprehensive evaluation of the employed procedures. We argue that such transparency requires “neuroscience-aware” technology for the performance of multivariate pattern analyses of neural data that can be documented in a comprehensive, yet comprehensible way. Recently, we introduced PyMVPA, a specialized Python framework for machine learning based data analysis that addresses this demand. Here, we review its features and applicability to various neural data modalities. PMID:20582270

  13. Interpreting support vector machine models for multivariate group wise analysis in neuroimaging

    PubMed Central

    Gaonkar, Bilwaj; Shinohara, Russell T; Davatzikos, Christos

    2015-01-01

    Machine learning based classification algorithms like support vector machines (SVMs) have shown great promise for turning a high dimensional neuroimaging data into clinically useful decision criteria. However, tracing imaging based patterns that contribute significantly to classifier decisions remains an open problem. This is an issue of critical importance in imaging studies seeking to determine which anatomical or physiological imaging features contribute to the classifier’s decision, thereby allowing users to critically evaluate the findings of such machine learning methods and to understand disease mechanisms. The majority of published work addresses the question of statistical inference for support vector classification using permutation tests based on SVM weight vectors. Such permutation testing ignores the SVM margin, which is critical in SVM theory. In this work we emphasize the use of a statistic that explicitly accounts for the SVM margin and show that the null distributions associated with this statistic are asymptotically normal. Further, our experiments show that this statistic is a lot less conservative as compared to weight based permutation tests and yet specific enough to tease out multivariate patterns in the data. Thus, we can better understand the multivariate patterns that the SVM uses for neuroimaging based classification. PMID:26210913

  14. Noise induced hearing loss of forest workers in Turkey.

    PubMed

    Tunay, M; Melemez, K

    2008-09-01

    In this study, a total number of 114 workers who were in 3 different groups in terms of age and work underwent audiometric analysis. In order to determine whether there was a statistically significant difference between the hearing loss levels of the workers who were included in the study, variance analysis was applied with the help of the data obtained as a result of the evaluation. Correlation and regression analysis were applied in order to determine the relations between hearing loss and their age and their time of work. As a result of the variance analysis, statistically significant differences were found at 500, 2000 and 4000 Hz frequencies. The most specific difference was observed among chainsaw machine operators at 4000 Hz frequency, which was determined by the variance analysis. As a result of the correlation analysis, significant relations were found between time of work and hearing loss in 0.01 confidence level and between age and hearing loss in 0.05 confidence level. Forest workers using chainsaw machines should be informed, they should wear or use protective materials and less noising chainsaw machines should be used if possible and workers should undergo audiometric tests when they start work and once a year.

  15. Anomaly detection for machine learning redshifts applied to SDSS galaxies

    NASA Astrophysics Data System (ADS)

    Hoyle, Ben; Rau, Markus Michael; Paech, Kerstin; Bonnett, Christopher; Seitz, Stella; Weller, Jochen

    2015-10-01

    We present an analysis of anomaly detection for machine learning redshift estimation. Anomaly detection allows the removal of poor training examples, which can adversely influence redshift estimates. Anomalous training examples may be photometric galaxies with incorrect spectroscopic redshifts, or galaxies with one or more poorly measured photometric quantity. We select 2.5 million `clean' SDSS DR12 galaxies with reliable spectroscopic redshifts, and 6730 `anomalous' galaxies with spectroscopic redshift measurements which are flagged as unreliable. We contaminate the clean base galaxy sample with galaxies with unreliable redshifts and attempt to recover the contaminating galaxies using the Elliptical Envelope technique. We then train four machine learning architectures for redshift analysis on both the contaminated sample and on the preprocessed `anomaly-removed' sample and measure redshift statistics on a clean validation sample generated without any preprocessing. We find an improvement on all measured statistics of up to 80 per cent when training on the anomaly removed sample as compared with training on the contaminated sample for each of the machine learning routines explored. We further describe a method to estimate the contamination fraction of a base data sample.

  16. Travelogue--a newcomer encounters statistics and the computer.

    PubMed

    Bruce, Peter

    2011-11-01

    Computer-intensive methods have revolutionized statistics, giving rise to new areas of analysis and expertise in predictive analytics, image processing, pattern recognition, machine learning, genomic analysis, and more. Interest naturally centers on the new capabilities the computer allows the analyst to bring to the table. This article, instead, focuses on the account of how computer-based resampling methods, with their relative simplicity and transparency, enticed one individual, untutored in statistics or mathematics, on a long journey into learning statistics, then teaching it, then starting an education institution.

  17. MSUSTAT.

    ERIC Educational Resources Information Center

    Mauriello, David

    1984-01-01

    Reviews an interactive statistical analysis package (designed to run on 8- and 16-bit machines that utilize CP/M 80 and MS-DOS operating systems), considering its features and uses, documentation, operation, and performance. The package consists of 40 general purpose statistical procedures derived from the classic textbook "Statistical…

  18. Comparison of Machine Learning Methods for the Arterial Hypertension Diagnostics

    PubMed Central

    Belo, David; Gamboa, Hugo

    2017-01-01

    The paper presents results of machine learning approach accuracy applied analysis of cardiac activity. The study evaluates the diagnostics possibilities of the arterial hypertension by means of the short-term heart rate variability signals. Two groups were studied: 30 relatively healthy volunteers and 40 patients suffering from the arterial hypertension of II-III degree. The following machine learning approaches were studied: linear and quadratic discriminant analysis, k-nearest neighbors, support vector machine with radial basis, decision trees, and naive Bayes classifier. Moreover, in the study, different methods of feature extraction are analyzed: statistical, spectral, wavelet, and multifractal. All in all, 53 features were investigated. Investigation results show that discriminant analysis achieves the highest classification accuracy. The suggested approach of noncorrelated feature set search achieved higher results than data set based on the principal components. PMID:28831239

  19. Analysis of Machine Learning Techniques for Heart Failure Readmissions.

    PubMed

    Mortazavi, Bobak J; Downing, Nicholas S; Bucholz, Emily M; Dharmarajan, Kumar; Manhapra, Ajay; Li, Shu-Xia; Negahban, Sahand N; Krumholz, Harlan M

    2016-11-01

    The current ability to predict readmissions in patients with heart failure is modest at best. It is unclear whether machine learning techniques that address higher dimensional, nonlinear relationships among variables would enhance prediction. We sought to compare the effectiveness of several machine learning algorithms for predicting readmissions. Using data from the Telemonitoring to Improve Heart Failure Outcomes trial, we compared the effectiveness of random forests, boosting, random forests combined hierarchically with support vector machines or logistic regression (LR), and Poisson regression against traditional LR to predict 30- and 180-day all-cause readmissions and readmissions because of heart failure. We randomly selected 50% of patients for a derivation set, and a validation set comprised the remaining patients, validated using 100 bootstrapped iterations. We compared C statistics for discrimination and distributions of observed outcomes in risk deciles for predictive range. In 30-day all-cause readmission prediction, the best performing machine learning model, random forests, provided a 17.8% improvement over LR (mean C statistics, 0.628 and 0.533, respectively). For readmissions because of heart failure, boosting improved the C statistic by 24.9% over LR (mean C statistic 0.678 and 0.543, respectively). For 30-day all-cause readmission, the observed readmission rates in the lowest and highest deciles of predicted risk with random forests (7.8% and 26.2%, respectively) showed a much wider separation than LR (14.2% and 16.4%, respectively). Machine learning methods improved the prediction of readmission after hospitalization for heart failure compared with LR and provided the greatest predictive range in observed readmission rates. © 2016 American Heart Association, Inc.

  20. AstroML: Python-powered Machine Learning for Astronomy

    NASA Astrophysics Data System (ADS)

    Vander Plas, Jake; Connolly, A. J.; Ivezic, Z.

    2014-01-01

    As astronomical data sets grow in size and complexity, automated machine learning and data mining methods are becoming an increasingly fundamental component of research in the field. The astroML project (http://astroML.org) provides a common repository for practical examples of the data mining and machine learning tools used and developed by astronomical researchers, written in Python. The astroML module contains a host of general-purpose data analysis and machine learning routines, loaders for openly-available astronomical datasets, and fast implementations of specific computational methods often used in astronomy and astrophysics. The associated website features hundreds of examples of these routines being used for analysis of real astronomical datasets, while the associated textbook provides a curriculum resource for graduate-level courses focusing on practical statistics, machine learning, and data mining approaches within Astronomical research. This poster will highlight several of the more powerful and unique examples of analysis performed with astroML, all of which can be reproduced in their entirety on any computer with the proper packages installed.

  1. Derivative Free Optimization of Complex Systems with the Use of Statistical Machine Learning Models

    DTIC Science & Technology

    2015-09-12

    AFRL-AFOSR-VA-TR-2015-0278 DERIVATIVE FREE OPTIMIZATION OF COMPLEX SYSTEMS WITH THE USE OF STATISTICAL MACHINE LEARNING MODELS Katya Scheinberg...COMPLEX SYSTEMS WITH THE USE OF STATISTICAL MACHINE LEARNING MODELS 5a.  CONTRACT NUMBER 5b.  GRANT NUMBER FA9550-11-1-0239 5c.  PROGRAM ELEMENT...developed, which has been the focus of our research. 15. SUBJECT TERMS optimization, Derivative-Free Optimization, Statistical Machine Learning 16. SECURITY

  2. Machine learning patterns for neuroimaging-genetic studies in the cloud.

    PubMed

    Da Mota, Benoit; Tudoran, Radu; Costan, Alexandru; Varoquaux, Gaël; Brasche, Goetz; Conrod, Patricia; Lemaitre, Herve; Paus, Tomas; Rietschel, Marcella; Frouin, Vincent; Poline, Jean-Baptiste; Antoniu, Gabriel; Thirion, Bertrand

    2014-01-01

    Brain imaging is a natural intermediate phenotype to understand the link between genetic information and behavior or brain pathologies risk factors. Massive efforts have been made in the last few years to acquire high-dimensional neuroimaging and genetic data on large cohorts of subjects. The statistical analysis of such data is carried out with increasingly sophisticated techniques and represents a great computational challenge. Fortunately, increasing computational power in distributed architectures can be harnessed, if new neuroinformatics infrastructures are designed and training to use these new tools is provided. Combining a MapReduce framework (TomusBLOB) with machine learning algorithms (Scikit-learn library), we design a scalable analysis tool that can deal with non-parametric statistics on high-dimensional data. End-users describe the statistical procedure to perform and can then test the model on their own computers before running the very same code in the cloud at a larger scale. We illustrate the potential of our approach on real data with an experiment showing how the functional signal in subcortical brain regions can be significantly fit with genome-wide genotypes. This experiment demonstrates the scalability and the reliability of our framework in the cloud with a 2 weeks deployment on hundreds of virtual machines.

  3. Multivariate analysis of fMRI time series: classification and regression of brain responses using machine learning.

    PubMed

    Formisano, Elia; De Martino, Federico; Valente, Giancarlo

    2008-09-01

    Machine learning and pattern recognition techniques are being increasingly employed in functional magnetic resonance imaging (fMRI) data analysis. By taking into account the full spatial pattern of brain activity measured simultaneously at many locations, these methods allow detecting subtle, non-strictly localized effects that may remain invisible to the conventional analysis with univariate statistical methods. In typical fMRI applications, pattern recognition algorithms "learn" a functional relationship between brain response patterns and a perceptual, cognitive or behavioral state of a subject expressed in terms of a label, which may assume discrete (classification) or continuous (regression) values. This learned functional relationship is then used to predict the unseen labels from a new data set ("brain reading"). In this article, we describe the mathematical foundations of machine learning applications in fMRI. We focus on two methods, support vector machines and relevance vector machines, which are respectively suited for the classification and regression of fMRI patterns. Furthermore, by means of several examples and applications, we illustrate and discuss the methodological challenges of using machine learning algorithms in the context of fMRI data analysis.

  4. Principle of maximum entropy for reliability analysis in the design of machine components

    NASA Astrophysics Data System (ADS)

    Zhang, Yimin

    2018-03-01

    We studied the reliability of machine components with parameters that follow an arbitrary statistical distribution using the principle of maximum entropy (PME). We used PME to select the statistical distribution that best fits the available information. We also established a probability density function (PDF) and a failure probability model for the parameters of mechanical components using the concept of entropy and the PME. We obtained the first four moments of the state function for reliability analysis and design. Furthermore, we attained an estimate of the PDF with the fewest human bias factors using the PME. This function was used to calculate the reliability of the machine components, including a connecting rod, a vehicle half-shaft, a front axle, a rear axle housing, and a leaf spring, which have parameters that typically follow a non-normal distribution. Simulations were conducted for comparison. This study provides a design methodology for the reliability of mechanical components for practical engineering projects.

  5. On the Application of Syntactic Methodologies in Automatic Text Analysis.

    ERIC Educational Resources Information Center

    Salton, Gerard; And Others

    1990-01-01

    Summarizes various linguistic approaches proposed for document analysis in information retrieval environments. Topics discussed include syntactic analysis; use of machine-readable dictionary information; knowledge base construction; the PLNLP English Grammar (PEG) system; phrase normalization; and statistical and syntactic phrase evaluation used…

  6. Review of Statistical Learning Methods in Integrated Omics Studies (An Integrated Information Science).

    PubMed

    Zeng, Irene Sui Lan; Lumley, Thomas

    2018-01-01

    Integrated omics is becoming a new channel for investigating the complex molecular system in modern biological science and sets a foundation for systematic learning for precision medicine. The statistical/machine learning methods that have emerged in the past decade for integrated omics are not only innovative but also multidisciplinary with integrated knowledge in biology, medicine, statistics, machine learning, and artificial intelligence. Here, we review the nontrivial classes of learning methods from the statistical aspects and streamline these learning methods within the statistical learning framework. The intriguing findings from the review are that the methods used are generalizable to other disciplines with complex systematic structure, and the integrated omics is part of an integrated information science which has collated and integrated different types of information for inferences and decision making. We review the statistical learning methods of exploratory and supervised learning from 42 publications. We also discuss the strengths and limitations of the extended principal component analysis, cluster analysis, network analysis, and regression methods. Statistical techniques such as penalization for sparsity induction when there are fewer observations than the number of features and using Bayesian approach when there are prior knowledge to be integrated are also included in the commentary. For the completeness of the review, a table of currently available software and packages from 23 publications for omics are summarized in the appendix.

  7. Advances in Machine Learning and Data Mining for Astronomy

    NASA Astrophysics Data System (ADS)

    Way, Michael J.; Scargle, Jeffrey D.; Ali, Kamal M.; Srivastava, Ashok N.

    2012-03-01

    Advances in Machine Learning and Data Mining for Astronomy documents numerous successful collaborations among computer scientists, statisticians, and astronomers who illustrate the application of state-of-the-art machine learning and data mining techniques in astronomy. Due to the massive amount and complexity of data in most scientific disciplines, the material discussed in this text transcends traditional boundaries between various areas in the sciences and computer science. The book's introductory part provides context to issues in the astronomical sciences that are also important to health, social, and physical sciences, particularly probabilistic and statistical aspects of classification and cluster analysis. The next part describes a number of astrophysics case studies that leverage a range of machine learning and data mining technologies. In the last part, developers of algorithms and practitioners of machine learning and data mining show how these tools and techniques are used in astronomical applications. With contributions from leading astronomers and computer scientists, this book is a practical guide to many of the most important developments in machine learning, data mining, and statistics. It explores how these advances can solve current and future problems in astronomy and looks at how they could lead to the creation of entirely new algorithms within the data mining community.

  8. Machine Learning Methods for Production Cases Analysis

    NASA Astrophysics Data System (ADS)

    Mokrova, Nataliya V.; Mokrov, Alexander M.; Safonova, Alexandra V.; Vishnyakov, Igor V.

    2018-03-01

    Approach to analysis of events occurring during the production process were proposed. Described machine learning system is able to solve classification tasks related to production control and hazard identification at an early stage. Descriptors of the internal production network data were used for training and testing of applied models. k-Nearest Neighbors and Random forest methods were used to illustrate and analyze proposed solution. The quality of the developed classifiers was estimated using standard statistical metrics, such as precision, recall and accuracy.

  9. Study of the Effect of Lubricant Emulsion Percentage and Tool Material on Surface Roughness in Machining of EN-AC 48000 Alloy

    NASA Astrophysics Data System (ADS)

    Soltani, E.; Shahali, H.; Zarepour, H.

    2011-01-01

    In this paper, the effect of machining parameters, namely, lubricant emulsion percentage and tool material on surface roughness has been studied in machining process of EN-AC 48000 aluminum alloy. EN-AC 48000 aluminum alloy is an important alloy in industries. Machining of this alloy is of vital importance due to built-up edge and tool wear. A L9 Taguchi standard orthogonal array has been applied as experimental design to investigate the effect of the factors and their interaction. Nine machining tests have been carried out with three random replications resulting in 27 experiments. Three type of cutting tools including coated carbide (CD1810), uncoated carbide (H10), and polycrystalline diamond (CD10) have been used in this research. Emulsion percentage of lubricant is selected at three levels including 3%, 5% and 10%. Statistical analysis has been employed to study the effect of factors and their interactions using ANOVA method. Moreover, the optimal factors level has been achieved through signal to noise ratio (S/N) analysis. Also, a regression model has been provided to predict the surface roughness. Finally, the results of the confirmation tests have been presented to verify the adequacy of the predictive model. In this research, surface quality was improved by 9% using lubricant and statistical optimization method.

  10. Data-driven advice for applying machine learning to bioinformatics problems

    PubMed Central

    Olson, Randal S.; La Cava, William; Mustahsan, Zairah; Varik, Akshay; Moore, Jason H.

    2017-01-01

    As the bioinformatics field grows, it must keep pace not only with new data but with new algorithms. Here we contribute a thorough analysis of 13 state-of-the-art, commonly used machine learning algorithms on a set of 165 publicly available classification problems in order to provide data-driven algorithm recommendations to current researchers. We present a number of statistical and visual comparisons of algorithm performance and quantify the effect of model selection and algorithm tuning for each algorithm and dataset. The analysis culminates in the recommendation of five algorithms with hyperparameters that maximize classifier performance across the tested problems, as well as general guidelines for applying machine learning to supervised classification problems. PMID:29218881

  11. Detection of Cutting Tool Wear using Statistical Analysis and Regression Model

    NASA Astrophysics Data System (ADS)

    Ghani, Jaharah A.; Rizal, Muhammad; Nuawi, Mohd Zaki; Haron, Che Hassan Che; Ramli, Rizauddin

    2010-10-01

    This study presents a new method for detecting the cutting tool wear based on the measured cutting force signals. A statistical-based method called Integrated Kurtosis-based Algorithm for Z-Filter technique, called I-kaz was used for developing a regression model and 3D graphic presentation of I-kaz 3D coefficient during machining process. The machining tests were carried out using a CNC turning machine Colchester Master Tornado T4 in dry cutting condition. A Kistler 9255B dynamometer was used to measure the cutting force signals, which were transmitted, analyzed, and displayed in the DasyLab software. Various force signals from machining operation were analyzed, and each has its own I-kaz 3D coefficient. This coefficient was examined and its relationship with flank wear lands (VB) was determined. A regression model was developed due to this relationship, and results of the regression model shows that the I-kaz 3D coefficient value decreases as tool wear increases. The result then is used for real time tool wear monitoring.

  12. New machine learning tools for predictive vegetation mapping after climate change: Bagging and Random Forest perform better than Regression Tree Analysis

    Treesearch

    L.R. Iverson; A.M. Prasad; A. Liaw

    2004-01-01

    More and better machine learning tools are becoming available for landscape ecologists to aid in understanding species-environment relationships and to map probable species occurrence now and potentially into the future. To thal end, we evaluated three statistical models: Regression Tree Analybib (RTA), Bagging Trees (BT) and Random Forest (RF) for their utility in...

  13. A Comparative Study of "Google Translate" Translations: An Error Analysis of English-to-Persian and Persian-to-English Translations

    ERIC Educational Resources Information Center

    Ghasemi, Hadis; Hashemian, Mahmood

    2016-01-01

    Both lack of time and the need to translate texts for numerous reasons brought about an increase in studying machine translation with a history spanning over 65 years. During the last decades, Google Translate, as a statistical machine translation (SMT), was in the center of attention for supporting 90 languages. Although there are many studies on…

  14. AstroML: "better, faster, cheaper" towards state-of-the-art data mining and machine learning

    NASA Astrophysics Data System (ADS)

    Ivezic, Zeljko; Connolly, Andrew J.; Vanderplas, Jacob

    2015-01-01

    We present AstroML, a Python module for machine learning and data mining built on numpy, scipy, scikit-learn, matplotlib, and astropy, and distributed under an open license. AstroML contains a growing library of statistical and machine learning routines for analyzing astronomical data in Python, loaders for several open astronomical datasets (such as SDSS and other recent major surveys), and a large suite of examples of analyzing and visualizing astronomical datasets. AstroML is especially suitable for introducing undergraduate students to numerical research projects and for graduate students to rapidly undertake cutting-edge research. The long-term goal of astroML is to provide a community repository for fast Python implementations of common tools and routines used for statistical data analysis in astronomy and astrophysics (see http://www.astroml.org).

  15. Improved analyses using function datasets and statistical modeling

    Treesearch

    John S. Hogland; Nathaniel M. Anderson

    2014-01-01

    Raster modeling is an integral component of spatial analysis. However, conventional raster modeling techniques can require a substantial amount of processing time and storage space and have limited statistical functionality and machine learning algorithms. To address this issue, we developed a new modeling framework using C# and ArcObjects and integrated that framework...

  16. TU-FG-201-05: Varian MPC as a Statistical Process Control Tool

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Carver, A; Rowbottom, C

    Purpose: Quality assurance in radiotherapy requires the measurement of various machine parameters to ensure they remain within permitted values over time. In Truebeam release 2.0 the Machine Performance Check (MPC) was released allowing beam output and machine axis movements to be assessed in a single test. We aim to evaluate the Varian Machine Performance Check (MPC) as a tool for Statistical Process Control (SPC). Methods: Varian’s MPC tool was used on three Truebeam and one EDGE linac for a period of approximately one year. MPC was commissioned against independent systems. After this period the data were reviewed to determine whethermore » or not the MPC was useful as a process control tool. Analyses on individual tests were analysed using Shewhart control plots, using Matlab for analysis. Principal component analysis was used to determine if a multivariate model was of any benefit in analysing the data. Results: Control charts were found to be useful to detect beam output changes, worn T-nuts and jaw calibration issues. Upper and lower control limits were defined at the 95% level. Multivariate SPC was performed using Principal Component Analysis. We found little evidence of clustering beyond that which might be naively expected such as beam uniformity and beam output. Whilst this makes multivariate analysis of little use it suggests that each test is giving independent information. Conclusion: The variety of independent parameters tested in MPC makes it a sensitive tool for routine machine QA. We have determined that using control charts in our QA programme would rapidly detect changes in machine performance. The use of control charts allows large quantities of tests to be performed on all linacs without visual inspection of all results. The use of control limits alerts users when data are inconsistent with previous measurements before they become out of specification. A. Carver has received a speaker’s honorarium from Varian.« less

  17. Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies

    PubMed Central

    Mieth, Bettina; Kloft, Marius; Rodríguez, Juan Antonio; Sonnenburg, Sören; Vobruba, Robin; Morcillo-Suárez, Carlos; Farré, Xavier; Marigorta, Urko M.; Fehr, Ernst; Dickhaus, Thorsten; Blanchard, Gilles; Schunk, Daniel; Navarro, Arcadi; Müller, Klaus-Robert

    2016-01-01

    The standard approach to the analysis of genome-wide association studies (GWAS) is based on testing each position in the genome individually for statistical significance of its association with the phenotype under investigation. To improve the analysis of GWAS, we propose a combination of machine learning and statistical testing that takes correlation structures within the set of SNPs under investigation in a mathematically well-controlled manner into account. The novel two-step algorithm, COMBI, first trains a support vector machine to determine a subset of candidate SNPs and then performs hypothesis tests for these SNPs together with an adequate threshold correction. Applying COMBI to data from a WTCCC study (2007) and measuring performance as replication by independent GWAS published within the 2008–2015 period, we show that our method outperforms ordinary raw p-value thresholding as well as other state-of-the-art methods. COMBI presents higher power and precision than the examined alternatives while yielding fewer false (i.e. non-replicated) and more true (i.e. replicated) discoveries when its results are validated on later GWAS studies. More than 80% of the discoveries made by COMBI upon WTCCC data have been validated by independent studies. Implementations of the COMBI method are available as a part of the GWASpi toolbox 2.0. PMID:27892471

  18. Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies.

    PubMed

    Mieth, Bettina; Kloft, Marius; Rodríguez, Juan Antonio; Sonnenburg, Sören; Vobruba, Robin; Morcillo-Suárez, Carlos; Farré, Xavier; Marigorta, Urko M; Fehr, Ernst; Dickhaus, Thorsten; Blanchard, Gilles; Schunk, Daniel; Navarro, Arcadi; Müller, Klaus-Robert

    2016-11-28

    The standard approach to the analysis of genome-wide association studies (GWAS) is based on testing each position in the genome individually for statistical significance of its association with the phenotype under investigation. To improve the analysis of GWAS, we propose a combination of machine learning and statistical testing that takes correlation structures within the set of SNPs under investigation in a mathematically well-controlled manner into account. The novel two-step algorithm, COMBI, first trains a support vector machine to determine a subset of candidate SNPs and then performs hypothesis tests for these SNPs together with an adequate threshold correction. Applying COMBI to data from a WTCCC study (2007) and measuring performance as replication by independent GWAS published within the 2008-2015 period, we show that our method outperforms ordinary raw p-value thresholding as well as other state-of-the-art methods. COMBI presents higher power and precision than the examined alternatives while yielding fewer false (i.e. non-replicated) and more true (i.e. replicated) discoveries when its results are validated on later GWAS studies. More than 80% of the discoveries made by COMBI upon WTCCC data have been validated by independent studies. Implementations of the COMBI method are available as a part of the GWASpi toolbox 2.0.

  19. Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies

    NASA Astrophysics Data System (ADS)

    Mieth, Bettina; Kloft, Marius; Rodríguez, Juan Antonio; Sonnenburg, Sören; Vobruba, Robin; Morcillo-Suárez, Carlos; Farré, Xavier; Marigorta, Urko M.; Fehr, Ernst; Dickhaus, Thorsten; Blanchard, Gilles; Schunk, Daniel; Navarro, Arcadi; Müller, Klaus-Robert

    2016-11-01

    The standard approach to the analysis of genome-wide association studies (GWAS) is based on testing each position in the genome individually for statistical significance of its association with the phenotype under investigation. To improve the analysis of GWAS, we propose a combination of machine learning and statistical testing that takes correlation structures within the set of SNPs under investigation in a mathematically well-controlled manner into account. The novel two-step algorithm, COMBI, first trains a support vector machine to determine a subset of candidate SNPs and then performs hypothesis tests for these SNPs together with an adequate threshold correction. Applying COMBI to data from a WTCCC study (2007) and measuring performance as replication by independent GWAS published within the 2008-2015 period, we show that our method outperforms ordinary raw p-value thresholding as well as other state-of-the-art methods. COMBI presents higher power and precision than the examined alternatives while yielding fewer false (i.e. non-replicated) and more true (i.e. replicated) discoveries when its results are validated on later GWAS studies. More than 80% of the discoveries made by COMBI upon WTCCC data have been validated by independent studies. Implementations of the COMBI method are available as a part of the GWASpi toolbox 2.0.

  20. Statistical downscaling of GCM simulations to streamflow using relevance vector machine

    NASA Astrophysics Data System (ADS)

    Ghosh, Subimal; Mujumdar, P. P.

    2008-01-01

    General circulation models (GCMs), the climate models often used in assessing the impact of climate change, operate on a coarse scale and thus the simulation results obtained from GCMs are not particularly useful in a comparatively smaller river basin scale hydrology. The article presents a methodology of statistical downscaling based on sparse Bayesian learning and Relevance Vector Machine (RVM) to model streamflow at river basin scale for monsoon period (June, July, August, September) using GCM simulated climatic variables. NCEP/NCAR reanalysis data have been used for training the model to establish a statistical relationship between streamflow and climatic variables. The relationship thus obtained is used to project the future streamflow from GCM simulations. The statistical methodology involves principal component analysis, fuzzy clustering and RVM. Different kernel functions are used for comparison purpose. The model is applied to Mahanadi river basin in India. The results obtained using RVM are compared with those of state-of-the-art Support Vector Machine (SVM) to present the advantages of RVMs over SVMs. A decreasing trend is observed for monsoon streamflow of Mahanadi due to high surface warming in future, with the CCSR/NIES GCM and B2 scenario.

  1. Statistical Analysis of NAS Parallel Benchmarks and LINPACK Results

    NASA Technical Reports Server (NTRS)

    Meuer, Hans-Werner; Simon, Horst D.; Strohmeier, Erich; Lasinski, T. A. (Technical Monitor)

    1994-01-01

    In the last three years extensive performance data have been reported for parallel machines both based on the NAS Parallel Benchmarks, and on LINPACK. In this study we have used the reported benchmark results and performed a number of statistical experiments using factor, cluster, and regression analyses. In addition to the performance results of LINPACK and the eight NAS parallel benchmarks, we have also included peak performance of the machine, and the LINPACK n and n(sub 1/2) values. Some of the results and observations can be summarized as follows: 1) All benchmarks are strongly correlated with peak performance. 2) LINPACK and EP have each a unique signature. 3) The remaining NPB can grouped into three groups as follows: (CG and IS), (LU and SP), and (MG, FT, and BT). Hence three (or four with EP) benchmarks are sufficient to characterize the overall NPB performance. Our poster presentation will follow a standard poster format, and will present the data of our statistical analysis in detail.

  2. Application of statistical machine translation to public health information: a feasibility study.

    PubMed

    Kirchhoff, Katrin; Turner, Anne M; Axelrod, Amittai; Saavedra, Francisco

    2011-01-01

    Accurate, understandable public health information is important for ensuring the health of the nation. The large portion of the US population with Limited English Proficiency is best served by translations of public-health information into other languages. However, a large number of health departments and primary care clinics face significant barriers to fulfilling federal mandates to provide multilingual materials to Limited English Proficiency individuals. This article presents a pilot study on the feasibility of using freely available statistical machine translation technology to translate health promotion materials. The authors gathered health-promotion materials in English from local and national public-health websites. Spanish versions were created by translating the documents using a freely available machine-translation website. Translations were rated for adequacy and fluency, analyzed for errors, manually corrected by a human posteditor, and compared with exclusively manual translations. Machine translation plus postediting took 15-53 min per document, compared to the reported days or even weeks for the standard translation process. A blind comparison of machine-assisted and human translations of six documents revealed overall equivalency between machine-translated and manually translated materials. The analysis of translation errors indicated that the most important errors were word-sense errors. The results indicate that machine translation plus postediting may be an effective method of producing multilingual health materials with equivalent quality but lower cost compared to manual translations.

  3. Application of statistical machine translation to public health information: a feasibility study

    PubMed Central

    Turner, Anne M; Axelrod, Amittai; Saavedra, Francisco

    2011-01-01

    Objective Accurate, understandable public health information is important for ensuring the health of the nation. The large portion of the US population with Limited English Proficiency is best served by translations of public-health information into other languages. However, a large number of health departments and primary care clinics face significant barriers to fulfilling federal mandates to provide multilingual materials to Limited English Proficiency individuals. This article presents a pilot study on the feasibility of using freely available statistical machine translation technology to translate health promotion materials. Design The authors gathered health-promotion materials in English from local and national public-health websites. Spanish versions were created by translating the documents using a freely available machine-translation website. Translations were rated for adequacy and fluency, analyzed for errors, manually corrected by a human posteditor, and compared with exclusively manual translations. Results Machine translation plus postediting took 15–53 min per document, compared to the reported days or even weeks for the standard translation process. A blind comparison of machine-assisted and human translations of six documents revealed overall equivalency between machine-translated and manually translated materials. The analysis of translation errors indicated that the most important errors were word-sense errors. Conclusion The results indicate that machine translation plus postediting may be an effective method of producing multilingual health materials with equivalent quality but lower cost compared to manual translations. PMID:21498805

  4. Risk estimation using probability machines.

    PubMed

    Dasgupta, Abhijit; Szymczak, Silke; Moore, Jason H; Bailey-Wilson, Joan E; Malley, James D

    2014-03-01

    Logistic regression has been the de facto, and often the only, model used in the description and analysis of relationships between a binary outcome and observed features. It is widely used to obtain the conditional probabilities of the outcome given predictors, as well as predictor effect size estimates using conditional odds ratios. We show how statistical learning machines for binary outcomes, provably consistent for the nonparametric regression problem, can be used to provide both consistent conditional probability estimation and conditional effect size estimates. Effect size estimates from learning machines leverage our understanding of counterfactual arguments central to the interpretation of such estimates. We show that, if the data generating model is logistic, we can recover accurate probability predictions and effect size estimates with nearly the same efficiency as a correct logistic model, both for main effects and interactions. We also propose a method using learning machines to scan for possible interaction effects quickly and efficiently. Simulations using random forest probability machines are presented. The models we propose make no assumptions about the data structure, and capture the patterns in the data by just specifying the predictors involved and not any particular model structure. So they do not run the same risks of model mis-specification and the resultant estimation biases as a logistic model. This methodology, which we call a "risk machine", will share properties from the statistical machine that it is derived from.

  5. NASA's online machine aided indexing system

    NASA Technical Reports Server (NTRS)

    Silvester, June P.; Genuardi, Michael T.; Klingbiel, Paul H.

    1993-01-01

    This report describes the NASA Lexical Dictionary, a machine aided indexing system used online at the National Aeronautics and Space Administration's Center for Aerospace Information (CASI). This system is comprised of a text processor that is based on the computational, non-syntactic analysis of input text, and an extensive 'knowledge base' that serves to recognize and translate text-extracted concepts. The structure and function of the various NLD system components are described in detail. Methods used for the development of the knowledge base are discussed. Particular attention is given to a statistically-based text analysis program that provides the knowledge base developer with a list of concept-specific phrases extracted from large textual corpora. Production and quality benefits resulting from the integration of machine aided indexing at CASI are discussed along with a number of secondary applications of NLD-derived systems including on-line spell checking and machine aided lexicography.

  6. Pre-use anesthesia machine check; certified anesthesia technician based quality improvement audit.

    PubMed

    Al Suhaibani, Mazen; Al Malki, Assaf; Al Dosary, Saad; Al Barmawi, Hanan; Pogoku, Mahdhav

    2014-01-01

    Quality assurance of providing a work ready machine in multiple theatre operating rooms in a tertiary modern medical center in Riyadh. The aim of the following study is to keep high quality environment for workers and patients in surgical operating rooms. Technicians based audit by using key performance indicators to assure inspection, passing test of machine worthiness for use daily and in between cases and in case of unexpected failure to provide quick replacement by ready to use another anesthetic machine. The anesthetic machines in all operating rooms are daily and continuously inspected and passed as ready by technicians and verified by anesthesiologist consultant or assistant consultant. The daily records of each machines were collected then inspected for data analysis by quality improvement committee department for descriptive analysis and report the degree of staff compliance to daily inspection as "met" items. Replaced machine during use and overall compliance. Distractive statistic using Microsoft Excel 2003 tables and graphs of sums and percentages of item studied in this audit. Audit obtained highest compliance percentage and low rate of replacement of machine which indicate unexpected machine state of use and quick machine switch. The authors are able to conclude that following regular inspection and running self-check recommended by the manufacturers can contribute to abort any possibility of hazard of anesthesia machine failure during operation. Furthermore in case of unexpected reason to replace the anesthesia machine in quick maneuver contributes to high assured operative utilization of man machine inter-phase in modern surgical operating rooms.

  7. GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies

    PubMed Central

    Zhang, Bing; Schmoyer, Denise; Kirov, Stefan; Snoddy, Jay

    2004-01-01

    Background Microarray and other high-throughput technologies are producing large sets of interesting genes that are difficult to analyze directly. Bioinformatics tools are needed to interpret the functional information in the gene sets. Results We have created a web-based tool for data analysis and data visualization for sets of genes called GOTree Machine (GOTM). This tool was originally intended to analyze sets of co-regulated genes identified from microarray analysis but is adaptable for use with other gene sets from other high-throughput analyses. GOTree Machine generates a GOTree, a tree-like structure to navigate the Gene Ontology Directed Acyclic Graph for input gene sets. This system provides user friendly data navigation and visualization. Statistical analysis helps users to identify the most important Gene Ontology categories for the input gene sets and suggests biological areas that warrant further study. GOTree Machine is available online at . Conclusion GOTree Machine has a broad application in functional genomic, proteomic and other high-throughput methods that generate large sets of interesting genes; its primary purpose is to help users sort for interesting patterns in gene sets. PMID:14975175

  8. Comparing machine learning and logistic regression methods for predicting hypertension using a combination of gene expression and next-generation sequencing data.

    PubMed

    Held, Elizabeth; Cape, Joshua; Tintle, Nathan

    2016-01-01

    Machine learning methods continue to show promise in the analysis of data from genetic association studies because of the high number of variables relative to the number of observations. However, few best practices exist for the application of these methods. We extend a recently proposed supervised machine learning approach for predicting disease risk by genotypes to be able to incorporate gene expression data and rare variants. We then apply 2 different versions of the approach (radial and linear support vector machines) to simulated data from Genetic Analysis Workshop 19 and compare performance to logistic regression. Method performance was not radically different across the 3 methods, although the linear support vector machine tended to show small gains in predictive ability relative to a radial support vector machine and logistic regression. Importantly, as the number of genes in the models was increased, even when those genes contained causal rare variants, model predictive ability showed a statistically significant decrease in performance for both the radial support vector machine and logistic regression. The linear support vector machine showed more robust performance to the inclusion of additional genes. Further work is needed to evaluate machine learning approaches on larger samples and to evaluate the relative improvement in model prediction from the incorporation of gene expression data.

  9. Statistical analysis and machine learning algorithms for optical biopsy

    NASA Astrophysics Data System (ADS)

    Wu, Binlin; Liu, Cheng-hui; Boydston-White, Susie; Beckman, Hugh; Sriramoju, Vidyasagar; Sordillo, Laura; Zhang, Chunyuan; Zhang, Lin; Shi, Lingyan; Smith, Jason; Bailin, Jacob; Alfano, Robert R.

    2018-02-01

    Analyzing spectral or imaging data collected with various optical biopsy methods is often times difficult due to the complexity of the biological basis. Robust methods that can utilize the spectral or imaging data and detect the characteristic spectral or spatial signatures for different types of tissue is challenging but highly desired. In this study, we used various machine learning algorithms to analyze a spectral dataset acquired from human skin normal and cancerous tissue samples using resonance Raman spectroscopy with 532nm excitation. The algorithms including principal component analysis, nonnegative matrix factorization, and autoencoder artificial neural network are used to reduce dimension of the dataset and detect features. A support vector machine with a linear kernel is used to classify the normal tissue and cancerous tissue samples. The efficacies of the methods are compared.

  10. Risk estimation using probability machines

    PubMed Central

    2014-01-01

    Background Logistic regression has been the de facto, and often the only, model used in the description and analysis of relationships between a binary outcome and observed features. It is widely used to obtain the conditional probabilities of the outcome given predictors, as well as predictor effect size estimates using conditional odds ratios. Results We show how statistical learning machines for binary outcomes, provably consistent for the nonparametric regression problem, can be used to provide both consistent conditional probability estimation and conditional effect size estimates. Effect size estimates from learning machines leverage our understanding of counterfactual arguments central to the interpretation of such estimates. We show that, if the data generating model is logistic, we can recover accurate probability predictions and effect size estimates with nearly the same efficiency as a correct logistic model, both for main effects and interactions. We also propose a method using learning machines to scan for possible interaction effects quickly and efficiently. Simulations using random forest probability machines are presented. Conclusions The models we propose make no assumptions about the data structure, and capture the patterns in the data by just specifying the predictors involved and not any particular model structure. So they do not run the same risks of model mis-specification and the resultant estimation biases as a logistic model. This methodology, which we call a “risk machine”, will share properties from the statistical machine that it is derived from. PMID:24581306

  11. Machine learning to analyze images of shocked materials for precise and accurate measurements

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dresselhaus-Cooper, Leora; Howard, Marylesa; Hock, Margaret C.

    A supervised machine learning algorithm, called locally adaptive discriminant analysis (LADA), has been developed to locate boundaries between identifiable image features that have varying intensities. LADA is an adaptation of image segmentation, which includes techniques that find the positions of image features (classes) using statistical intensity distributions for each class in the image. In order to place a pixel in the proper class, LADA considers the intensity at that pixel and the distribution of intensities in local (nearby) pixels. This paper presents the use of LADA to provide, with statistical uncertainties, the positions and shapes of features within ultrafast imagesmore » of shock waves. We demonstrate the ability to locate image features including crystals, density changes associated with shock waves, and material jetting caused by shock waves. This algorithm can analyze images that exhibit a wide range of physical phenomena because it does not rely on comparison to a model. LADA enables analysis of images from shock physics with statistical rigor independent of underlying models or simulations.« less

  12. Investigation of machinability characteristics on EN47 steel for cutting force and tool wear using optimization technique

    NASA Astrophysics Data System (ADS)

    M, Vasu; Shivananda Nayaka, H.

    2018-06-01

    In this experimental work dry turning process carried out on EN47 spring steel with coated tungsten carbide tool insert with 0.8 mm nose radius are optimized by using statistical technique. Experiments were conducted at three different cutting speeds (625, 796 and 1250 rpm) with three different feed rates (0.046, 0.062 and 0.093 mm/rev) and depth of cuts (0.2, 0.3 and 0.4 mm). Experiments are conducted based on full factorial design (FFD) 33 three factors and three levels. Analysis of variance is used to identify significant factor for each output response. The result reveals that feed rate is the most significant factor influencing on cutting force followed by depth of cut and cutting speed having less significance. Optimum machining condition for cutting force obtained from the statistical technique. Tool wear measurements are performed with optimum condition of Vc = 796 rpm, ap = 0.2 mm, f = 0.046 mm/rev. The minimum tool wear observed as 0.086 mm with 5 min machining. Analysis of tool wear was done by confocal microscope it was observed that tool wear increases with increasing cutting time.

  13. A computational visual saliency model based on statistics and machine learning.

    PubMed

    Lin, Ru-Je; Lin, Wei-Song

    2014-08-01

    Identifying the type of stimuli that attracts human visual attention has been an appealing topic for scientists for many years. In particular, marking the salient regions in images is useful for both psychologists and many computer vision applications. In this paper, we propose a computational approach for producing saliency maps using statistics and machine learning methods. Based on four assumptions, three properties (Feature-Prior, Position-Prior, and Feature-Distribution) can be derived and combined by a simple intersection operation to obtain a saliency map. These properties are implemented by a similarity computation, support vector regression (SVR) technique, statistical analysis of training samples, and information theory using low-level features. This technique is able to learn the preferences of human visual behavior while simultaneously considering feature uniqueness. Experimental results show that our approach performs better in predicting human visual attention regions than 12 other models in two test databases. © 2014 ARVO.

  14. Discomfort analysis in computerized numeric control machine operations.

    PubMed

    Muthukumar, Krishnamoorthy; Sankaranarayanasamy, Krishnasamy; Ganguli, Anindya Kumar

    2012-06-01

    The introduction of computerized numeric control (CNC) technology in manufacturing industries has revolutionized the production process, but there are some health and safety problems associated with these machines. The present study aimed to investigate the extent of postural discomfort in CNC machine operators, and the relationship of this discomfort to the display and control panel height, with a view to validate the anthropometric recommendation for the location of the display and control panel in CNC machines. The postural discomforts associated with CNC machines were studied in 122 male operators using Corlett and Bishop's body part discomfort mapping, subject information, and discomfort level at various time intervals from starting to end of a shift. This information was collected using a questionnaire. Statistical analysis was carried out using ANOVA. Neck discomfort due to the positioning of the machine displays, and shoulder and arm discomfort due to the positioning of controls were identified as common health issues in the operators of these machines. The study revealed that 45.9% of machine operators reported discomfort in the lower back, 41.8% in the neck, 22.1% in the upper-back, 53.3% in the shoulder and arm, and 21.3% of the operators reported discomfort in the leg. Discomfort increased with the progress of the day and was highest at the end of a shift; subject age had no effect on patient tendency to experience discomfort levels.

  15. Discomfort Analysis in Computerized Numeric Control Machine Operations

    PubMed Central

    Sankaranarayanasamy, Krishnasamy; Ganguli, Anindya Kumar

    2012-01-01

    Objectives The introduction of computerized numeric control (CNC) technology in manufacturing industries has revolutionized the production process, but there are some health and safety problems associated with these machines. The present study aimed to investigate the extent of postural discomfort in CNC machine operators, and the relationship of this discomfort to the display and control panel height, with a view to validate the anthropometric recommendation for the location of the display and control panel in CNC machines. Methods The postural discomforts associated with CNC machines were studied in 122 male operators using Corlett and Bishop's body part discomfort mapping, subject information, and discomfort level at various time intervals from starting to end of a shift. This information was collected using a questionnaire. Statistical analysis was carried out using ANOVA. Results Neck discomfort due to the positioning of the machine displays, and shoulder and arm discomfort due to the positioning of controls were identified as common health issues in the operators of these machines. The study revealed that 45.9% of machine operators reported discomfort in the lower back, 41.8% in the neck, 22.1% in the upper-back, 53.3% in the shoulder and arm, and 21.3% of the operators reported discomfort in the leg. Conclusion Discomfort increased with the progress of the day and was highest at the end of a shift; subject age had no effect on patient tendency to experience discomfort levels. PMID:22993720

  16. Dependency graph for code analysis on emerging architectures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shashkov, Mikhail Jurievich; Lipnikov, Konstantin

    Direct acyclic dependency (DAG) graph is becoming the standard for modern multi-physics codes.The ideal DAG is the true block-scheme of a multi-physics code. Therefore, it is the convenient object for insitu analysis of the cost of computations and algorithmic bottlenecks related to statistical frequent data motion and dymanical machine state.

  17. Scaling up to address data science challenges

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wendelberger, Joanne R.

    Statistics and Data Science provide a variety of perspectives and technical approaches for exploring and understanding Big Data. Partnerships between scientists from different fields such as statistics, machine learning, computer science, and applied mathematics can lead to innovative approaches for addressing problems involving increasingly large amounts of data in a rigorous and effective manner that takes advantage of advances in computing. Here, this article will explore various challenges in Data Science and will highlight statistical approaches that can facilitate analysis of large-scale data including sampling and data reduction methods, techniques for effective analysis and visualization of large-scale simulations, and algorithmsmore » and procedures for efficient processing.« less

  18. Scaling up to address data science challenges

    DOE PAGES

    Wendelberger, Joanne R.

    2017-04-27

    Statistics and Data Science provide a variety of perspectives and technical approaches for exploring and understanding Big Data. Partnerships between scientists from different fields such as statistics, machine learning, computer science, and applied mathematics can lead to innovative approaches for addressing problems involving increasingly large amounts of data in a rigorous and effective manner that takes advantage of advances in computing. Here, this article will explore various challenges in Data Science and will highlight statistical approaches that can facilitate analysis of large-scale data including sampling and data reduction methods, techniques for effective analysis and visualization of large-scale simulations, and algorithmsmore » and procedures for efficient processing.« less

  19. Parameterizing Phrase Based Statistical Machine Translation Models: An Analytic Study

    ERIC Educational Resources Information Center

    Cer, Daniel

    2011-01-01

    The goal of this dissertation is to determine the best way to train a statistical machine translation system. I first develop a state-of-the-art machine translation system called Phrasal and then use it to examine a wide variety of potential learning algorithms and optimization criteria and arrive at two very surprising results. First, despite the…

  20. Machine learning classifier using abnormal brain network topological metrics in major depressive disorder.

    PubMed

    Guo, Hao; Cao, Xiaohua; Liu, Zhifen; Li, Haifang; Chen, Junjie; Zhang, Kerang

    2012-12-05

    Resting state functional brain networks have been widely studied in brain disease research. However, it is currently unclear whether abnormal resting state functional brain network metrics can be used with machine learning for the classification of brain diseases. Resting state functional brain networks were constructed for 28 healthy controls and 38 major depressive disorder patients by thresholding partial correlation matrices of 90 regions. Three nodal metrics were calculated using graph theory-based approaches. Nonparametric permutation tests were then used for group comparisons of topological metrics, which were used as classified features in six different algorithms. We used statistical significance as the threshold for selecting features and measured the accuracies of six classifiers with different number of features. A sensitivity analysis method was used to evaluate the importance of different features. The result indicated that some of the regions exhibited significantly abnormal nodal centralities, including the limbic system, basal ganglia, medial temporal, and prefrontal regions. Support vector machine with radial basis kernel function algorithm and neural network algorithm exhibited the highest average accuracy (79.27 and 78.22%, respectively) with 28 features (P<0.05). Correlation analysis between feature importance and the statistical significance of metrics was investigated, and the results revealed a strong positive correlation between them. Overall, the current study demonstrated that major depressive disorder is associated with abnormal functional brain network topological metrics and statistically significant nodal metrics can be successfully used for feature selection in classification algorithms.

  1. External validation of ADO, DOSE, COTE and CODEX at predicting death in primary care patients with COPD using standard and machine learning approaches.

    PubMed

    Morales, Daniel R; Flynn, Rob; Zhang, Jianguo; Trucco, Emmanuel; Quint, Jennifer K; Zutis, Kris

    2018-05-01

    Several models for predicting the risk of death in people with chronic obstructive pulmonary disease (COPD) exist but have not undergone large scale validation in primary care. The objective of this study was to externally validate these models using statistical and machine learning approaches. We used a primary care COPD cohort identified using data from the UK Clinical Practice Research Datalink. Age-standardised mortality rates were calculated for the population by gender and discrimination of ADO (age, dyspnoea, airflow obstruction), COTE (COPD-specific comorbidity test), DOSE (dyspnoea, airflow obstruction, smoking, exacerbations) and CODEX (comorbidity, dyspnoea, airflow obstruction, exacerbations) at predicting death over 1-3 years measured using logistic regression and a support vector machine learning (SVM) method of analysis. The age-standardised mortality rate was 32.8 (95%CI 32.5-33.1) and 25.2 (95%CI 25.4-25.7) per 1000 person years for men and women respectively. Complete data were available for 54879 patients to predict 1-year mortality. ADO performed the best (c-statistic of 0.730) compared with DOSE (c-statistic 0.645), COTE (c-statistic 0.655) and CODEX (c-statistic 0.649) at predicting 1-year mortality. Discrimination of ADO and DOSE improved at predicting 1-year mortality when combined with COTE comorbidities (c-statistic 0.780 ADO + COTE; c-statistic 0.727 DOSE + COTE). Discrimination did not change significantly over 1-3 years. Comparable results were observed using SVM. In primary care, ADO appears superior at predicting death in COPD. Performance of ADO and DOSE improved when combined with COTE comorbidities suggesting better models may be generated with additional data facilitated using novel approaches. Copyright © 2018. Published by Elsevier Ltd.

  2. A Machine Learning Approach to Automated Gait Analysis for the Noldus Catwalk System.

    PubMed

    Frohlich, Holger; Claes, Kasper; De Wolf, Catherine; Van Damme, Xavier; Michel, Anne

    2018-05-01

    Gait analysis of animal disease models can provide valuable insights into in vivo compound effects and thus help in preclinical drug development. The purpose of this paper is to establish a computational gait analysis approach for the Noldus Catwalk system, in which footprints are automatically captured and stored. We present a - to our knowledge - first machine learning based approach for the Catwalk system, which comprises a step decomposition, definition and extraction of meaningful features, multivariate step sequence alignment, feature selection, and training of different classifiers (gradient boosting machine, random forest, and elastic net). Using animal-wise leave-one-out cross validation we demonstrate that with our method we can reliable separate movement patterns of a putative Parkinson's disease animal model and several control groups. Furthermore, we show that we can predict the time point after and the type of different brain lesions and can even forecast the brain region, where the intervention was applied. We provide an in-depth analysis of the features involved into our classifiers via statistical techniques for model interpretation. A machine learning method for automated analysis of data from the Noldus Catwalk system was established. Our works shows the ability of machine learning to discriminate pharmacologically relevant animal groups based on their walking behavior in a multivariate manner. Further interesting aspects of the approach include the ability to learn from past experiments, improve with more data arriving and to make predictions for single animals in future studies.

  3. Robust crop and weed segmentation under uncontrolled outdoor illumination

    USDA-ARS?s Scientific Manuscript database

    A new machine vision for weed detection was developed from RGB color model images. Processes included in the algorithm for the detection were excessive green conversion, threshold value computation by statistical analysis, adaptive image segmentation by adjusting the threshold value, median filter, ...

  4. Crux: Rapid Open Source Protein Tandem Mass Spectrometry Analysis

    PubMed Central

    2015-01-01

    Efficiently and accurately analyzing big protein tandem mass spectrometry data sets requires robust software that incorporates state-of-the-art computational, machine learning, and statistical methods. The Crux mass spectrometry analysis software toolkit (http://cruxtoolkit.sourceforge.net) is an open source project that aims to provide users with a cross-platform suite of analysis tools for interpreting protein mass spectrometry data. PMID:25182276

  5. Evaluation of machinability and flexural strength of a novel dental machinable glass-ceramic.

    PubMed

    Qin, Feng; Zheng, Shucan; Luo, Zufeng; Li, Yong; Guo, Ling; Zhao, Yunfeng; Fu, Qiang

    2009-10-01

    To evaluate the machinability and flexural strength of a novel dental machinable glass-ceramic (named PMC), and to compare the machinability property with that of Vita Mark II and human enamel. The raw batch materials were selected and mixed. Four groups of novel glass-ceramics were formed at different nucleation temperatures, and were assigned to Group 1, Group 2, Group 3 and Group 4. The machinability of the four groups of novel glass-ceramics, Vita Mark II ceramic and freshly extracted human premolars were compared by means of drilling depth measurement. A three-point bending test was used to measure the flexural strength of the novel glass-ceramics. The crystalline phases of the group with the best machinability were identified by X-ray diffraction. In terms of the drilling depth, Group 2 of the novel glass-ceramics proves to have the largest drilling depth. There was no statistical difference among Group 1, Group 4 and the natural teeth. The drilling depth of Vita MK II was statistically less than that of Group 1, Group 4 and the natural teeth. Group 3 had the least drilling depth. In respect of the flexural strength, Group 2 exhibited the maximum flexural strength; Group 1 was statistically weaker than Group 2; there was no statistical difference between Group 3 and Group 4, and they were the weakest materials. XRD of Group 2 ceramic showed that a new type of dental machinable glass-ceramic containing calcium-mica had been developed by the present study and was named PMC. PMC is promising for application as a dental machinable ceramic due to its good machinability and relatively high strength.

  6. Financial Statistics. Higher Education General Information Survey (HEGIS) [machine-readable data file].

    ERIC Educational Resources Information Center

    Center for Education Statistics (ED/OERI), Washington, DC.

    The Financial Statistics machine-readable data file (MRDF) is a subfile of the larger Higher Education General Information Survey (HEGIS). It contains basic financial statistics for over 3,000 institutions of higher education in the United States and its territories. The data are arranged sequentially by institution, with institutional…

  7. Deep Learning Accurately Predicts Estrogen Receptor Status in Breast Cancer Metabolomics Data.

    PubMed

    Alakwaa, Fadhl M; Chaudhary, Kumardeep; Garmire, Lana X

    2018-01-05

    Metabolomics holds the promise as a new technology to diagnose highly heterogeneous diseases. Conventionally, metabolomics data analysis for diagnosis is done using various statistical and machine learning based classification methods. However, it remains unknown if deep neural network, a class of increasingly popular machine learning methods, is suitable to classify metabolomics data. Here we use a cohort of 271 breast cancer tissues, 204 positive estrogen receptor (ER+), and 67 negative estrogen receptor (ER-) to test the accuracies of feed-forward networks, a deep learning (DL) framework, as well as six widely used machine learning models, namely random forest (RF), support vector machines (SVM), recursive partitioning and regression trees (RPART), linear discriminant analysis (LDA), prediction analysis for microarrays (PAM), and generalized boosted models (GBM). DL framework has the highest area under the curve (AUC) of 0.93 in classifying ER+/ER- patients, compared to the other six machine learning algorithms. Furthermore, the biological interpretation of the first hidden layer reveals eight commonly enriched significant metabolomics pathways (adjusted P-value <0.05) that cannot be discovered by other machine learning methods. Among them, protein digestion and absorption and ATP-binding cassette (ABC) transporters pathways are also confirmed in integrated analysis between metabolomics and gene expression data in these samples. In summary, deep learning method shows advantages for metabolomics based breast cancer ER status classification, with both the highest prediction accuracy (AUC = 0.93) and better revelation of disease biology. We encourage the adoption of feed-forward networks based deep learning method in the metabolomics research community for classification.

  8. Effects of cutting parameters and machining environments on surface roughness in hard turning using design of experiment

    NASA Astrophysics Data System (ADS)

    Mia, Mozammel; Bashir, Mahmood Al; Dhar, Nikhil Ranjan

    2016-07-01

    Hard turning is gradually replacing the time consuming conventional turning process, which is typically followed by grinding, by producing surface quality compatible to grinding. The hard turned surface roughness depends on the cutting parameters, machining environments and tool insert configurations. In this article the variation of the surface roughness of the produced surfaces with the changes in tool insert configuration, use of coolant and different cutting parameters (cutting speed, feed rate) has been investigated. This investigation was performed in machining AISI 1060 steel, hardened to 56 HRC by heat treatment, using coated carbide inserts under two different machining environments. The depth of cut, fluid pressure and material hardness were kept constant. The Design of Experiment (DOE) was performed to determine the number and combination sets of different cutting parameters. A full factorial analysis has been performed to examine the effect of main factors as well as interaction effect of factors on surface roughness. A statistical analysis of variance (ANOVA) was employed to determine the combined effect of cutting parameters, environment and tool configuration. The result of this analysis reveals that environment has the most significant impact on surface roughness followed by feed rate and tool configuration respectively.

  9. Experimental Investigation and Optimization of Response Variables in WEDM of Inconel - 718

    NASA Astrophysics Data System (ADS)

    Karidkar, S. S.; Dabade, U. A.

    2016-02-01

    Effective utilisation of Wire Electrical Discharge Machining (WEDM) technology is challenge for modern manufacturing industries. Day by day new materials with high strengths and capabilities are being developed to fulfil the customers need. Inconel - 718 is similar kind of material which is extensively used in aerospace applications, such as gas turbine, rocket motors, and spacecraft as well as in nuclear reactors and pumps etc. This paper deals with the experimental investigation of optimal machining parameters in WEDM for Surface Roughness, Kerf Width and Dimensional Deviation using DoE such as Taguchi methodology, L9 orthogonal array. By keeping peak current constant at 70 A, the effect of other process parameters on above response variables were analysed. Obtained experimental results were statistically analysed using Minitab-16 software. Analysis of Variance (ANOVA) shows pulse on time as the most influential parameter followed by wire tension whereas spark gap set voltage is observed to be non-influencing parameter. Multi-objective optimization technique, Grey Relational Analysis (GRA), shows optimal machining parameters such as pulse on time 108 Machine unit, spark gap set voltage 50 V and wire tension 12 gm for optimal response variables considered for the experimental analysis.

  10. Microscopes and computers combined for analysis of chromosomes

    NASA Technical Reports Server (NTRS)

    Butler, J. W.; Butler, M. K.; Stroud, A. N.

    1969-01-01

    Scanning machine CHLOE, developed for photographic use, is combined with a digital computer to obtain quantitative and statistically significant data on chromosome shapes, distribution, density, and pairing. CHLOE permits data acquisition about a chromosome complement to be obtained two times faster than by manual pairing.

  11. Statistical analysis on the signals monitoring multiphase flow patterns in pipeline-riser system

    NASA Astrophysics Data System (ADS)

    Ye, Jing; Guo, Liejin

    2013-07-01

    The signals monitoring petroleum transmission pipeline in offshore oil industry usually contain abundant information about the multiphase flow on flow assurance which includes the avoidance of most undesirable flow pattern. Therefore, extracting reliable features form these signals to analyze is an alternative way to examine the potential risks to oil platform. This paper is focused on characterizing multiphase flow patterns in pipeline-riser system that is often appeared in offshore oil industry and finding an objective criterion to describe the transition of flow patterns. Statistical analysis on pressure signal at the riser top is proposed, instead of normal prediction method based on inlet and outlet flow conditions which could not be easily determined during most situations. Besides, machine learning method (least square supported vector machine) is also performed to classify automatically the different flow patterns. The experiment results from a small-scale loop show that the proposed method is effective for analyzing the multiphase flow pattern.

  12. Effectiveness of Direct Safety Regulations on Manufacturers and Users of Industrial Machines: Its Implications on Industrial Safety Policies in Republic of Korea.

    PubMed

    Choi, Gi Heung

    2017-03-01

    Despite considerable efforts made in recent years, the industrial accident rate and the fatality rate in the Republic of Korea are much higher than those in most developed countries in Europe and North America. Industrial safety policies and safety regulations are also known to be ineffective and inefficient in some cases. This study focuses on the quantitative evaluation of the effectiveness of direct safety regulations such as safety certification, self-declaration of conformity, and safety inspection of industrial machines in the Republic of Korea. Implications on safety policies to restructure the industrial safety system associated with industrial machines are also explored. Analysis of causes in industrial accidents associated with industrial machines confirms that technical causes need to be resolved to reduce both the frequency and the severity of such industrial accidents. Statistical analysis also confirms that the indirect effects of safety device regulation on users are limited for a variety of reasons. Safety device regulation needs to be shifted to complement safety certification and self-declaration of conformity for more balanced direct regulations on manufacturers and users. An example of cost-benefit analysis on conveyor justifies such a transition. Industrial safety policies and regulations associated with industrial machines must be directed towards eliminating the sources of danger at the stage of danger creation, thereby securing the safe industrial machines. Safety inspection further secures the safety of workers at the stage of danger use. The overall balance between such safety regulations is achieved by proper distribution of industrial machines subject to such regulations and the intensity of each regulation. Rearrangement of industrial machines subject to safety certification and self-declaration of conformity to include more movable industrial machines and other industrial machines with a high level of danger is also suggested.

  13. Detection of Dendritic Spines Using Wavelet Packet Entropy and Fuzzy Support Vector Machine.

    PubMed

    Wang, Shuihua; Li, Yang; Shao, Ying; Cattani, Carlo; Zhang, Yudong; Du, Sidan

    2017-01-01

    The morphology of dendritic spines is highly correlated with the neuron function. Therefore, it is of positive influence for the research of the dendritic spines. However, it is tried to manually label the spine types for statistical analysis. In this work, we proposed an approach based on the combination of wavelet contour analysis for the backbone detection, wavelet packet entropy, and fuzzy support vector machine for the spine classification. The experiments show that this approach is promising. The average detection accuracy of "MushRoom" achieves 97.3%, "Stubby" achieves 94.6%, and "Thin" achieves 97.2%. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  14. MLViS: A Web Tool for Machine Learning-Based Virtual Screening in Early-Phase of Drug Discovery and Development

    PubMed Central

    Korkmaz, Selcuk; Zararsiz, Gokmen; Goksuluk, Dincer

    2015-01-01

    Virtual screening is an important step in early-phase of drug discovery process. Since there are thousands of compounds, this step should be both fast and effective in order to distinguish drug-like and nondrug-like molecules. Statistical machine learning methods are widely used in drug discovery studies for classification purpose. Here, we aim to develop a new tool, which can classify molecules as drug-like and nondrug-like based on various machine learning methods, including discriminant, tree-based, kernel-based, ensemble and other algorithms. To construct this tool, first, performances of twenty-three different machine learning algorithms are compared by ten different measures, then, ten best performing algorithms have been selected based on principal component and hierarchical cluster analysis results. Besides classification, this application has also ability to create heat map and dendrogram for visual inspection of the molecules through hierarchical cluster analysis. Moreover, users can connect the PubChem database to download molecular information and to create two-dimensional structures of compounds. This application is freely available through www.biosoft.hacettepe.edu.tr/MLViS/. PMID:25928885

  15. PredPsych: A toolbox for predictive machine learning-based approach in experimental psychology research.

    PubMed

    Koul, Atesh; Becchio, Cristina; Cavallo, Andrea

    2017-12-12

    Recent years have seen an increased interest in machine learning-based predictive methods for analyzing quantitative behavioral data in experimental psychology. While these methods can achieve relatively greater sensitivity compared to conventional univariate techniques, they still lack an established and accessible implementation. The aim of current work was to build an open-source R toolbox - "PredPsych" - that could make these methods readily available to all psychologists. PredPsych is a user-friendly, R toolbox based on machine-learning predictive algorithms. In this paper, we present the framework of PredPsych via the analysis of a recently published multiple-subject motion capture dataset. In addition, we discuss examples of possible research questions that can be addressed with the machine-learning algorithms implemented in PredPsych and cannot be easily addressed with univariate statistical analysis. We anticipate that PredPsych will be of use to researchers with limited programming experience not only in the field of psychology, but also in that of clinical neuroscience, enabling computational assessment of putative bio-behavioral markers for both prognosis and diagnosis.

  16. Influence of export control policy on the competitiveness of machine tool producing organizations

    NASA Astrophysics Data System (ADS)

    Ahrstrom, Jeffrey D.

    The possible influence of export control policies on producers of export controlled machine tools is examined in this quantitative study. International market competitiveness theories hold that market controlling policies such as export control regulations may influence an organization's ability to compete (Burris, 2010). Differences in domestic application of export control policy on machine tool exports may impose throttling effects on the competitiveness of participating firms (Freedenberg, 2010). Commodity shipments from Japan, Germany, and the United States to the Russian market will be examined using descriptive statistics; gravity modeling of these specific markets provides a foundation for comparison to actual shipment data; and industry participant responses to a user developed survey will provide additional data for analysis using a Kruskal-Wallis one-way analysis of variance. There is scarce academic research data on the topic of export control effects within the machine tool industry. Research results may be of interest to industry leadership in market participation decisions, advocacy arguments, and strategic planning. Industry advocates and export policy decision makers could find data of interest in supporting positions for or against modifications of export control policies.

  17. Technical Report: Reference photon dosimetry data for Varian accelerators based on IROC-Houston site visit data.

    PubMed

    Kerns, James R; Followill, David S; Lowenstein, Jessica; Molineu, Andrea; Alvarez, Paola; Taylor, Paige A; Stingo, Francesco C; Kry, Stephen F

    2016-05-01

    Accurate data regarding linear accelerator (Linac) radiation characteristics are important for treatment planning system modeling as well as regular quality assurance of the machine. The Imaging and Radiation Oncology Core-Houston (IROC-H) has measured the dosimetric characteristics of numerous machines through their on-site dosimetry review protocols. Photon data are presented and can be used as a secondary check of acquired values, as a means to verify commissioning a new machine, or in preparation for an IROC-H site visit. Photon data from IROC-H on-site reviews from 2000 to 2014 were compiled and analyzed. Specifically, data from approximately 500 Varian machines were analyzed. Each dataset consisted of point measurements of several dosimetric parameters at various locations in a water phantom to assess the percentage depth dose, jaw output factors, multileaf collimator small field output factors, off-axis factors, and wedge factors. The data were analyzed by energy and parameter, with similarly performing machine models being assimilated into classes. Common statistical metrics are presented for each machine class. Measurement data were compared against other reference data where applicable. Distributions of the parameter data were shown to be robust and derive from a student's t distribution. Based on statistical and clinical criteria, all machine models were able to be classified into two or three classes for each energy, except for 6 MV for which there were eight classes. Quantitative analysis of the measurements for 6, 10, 15, and 18 MV photon beams is presented for each parameter; supplementary material has also been made available which contains further statistical information. IROC-H has collected numerous data on Varian Linacs and the results of photon measurements from the past 15 years are presented. The data can be used as a comparison check of a physicist's acquired values. Acquired values that are well outside the expected distribution should be verified by the physicist to identify whether the measurements are valid. Comparison of values to this reference data provides a redundant check to help prevent gross dosimetric treatment errors.

  18. Statistical quality control through overall vibration analysis

    NASA Astrophysics Data System (ADS)

    Carnero, M. a. Carmen; González-Palma, Rafael; Almorza, David; Mayorga, Pedro; López-Escobar, Carlos

    2010-05-01

    The present study introduces the concept of statistical quality control in automotive wheel bearings manufacturing processes. Defects on products under analysis can have a direct influence on passengers' safety and comfort. At present, the use of vibration analysis on machine tools for quality control purposes is not very extensive in manufacturing facilities. Noise and vibration are common quality problems in bearings. These failure modes likely occur under certain operating conditions and do not require high vibration amplitudes but relate to certain vibration frequencies. The vibration frequencies are affected by the type of surface problems (chattering) of ball races that are generated through grinding processes. The purpose of this paper is to identify grinding process variables that affect the quality of bearings by using statistical principles in the field of machine tools. In addition, an evaluation of the quality results of the finished parts under different combinations of process variables is assessed. This paper intends to establish the foundations to predict the quality of the products through the analysis of self-induced vibrations during the contact between the grinding wheel and the parts. To achieve this goal, the overall self-induced vibration readings under different combinations of process variables are analysed using statistical tools. The analysis of data and design of experiments follows a classical approach, considering all potential interactions between variables. The analysis of data is conducted through analysis of variance (ANOVA) for data sets that meet normality and homoscedasticity criteria. This paper utilizes different statistical tools to support the conclusions such as chi squared, Shapiro-Wilks, symmetry, Kurtosis, Cochran, Hartlett, and Hartley and Krushal-Wallis. The analysis presented is the starting point to extend the use of predictive techniques (vibration analysis) for quality control. This paper demonstrates the existence of predictive variables (high-frequency vibration displacements) that are sensible to the processes setup and the quality of the products obtained. Based on the result of this overall vibration analysis, a second paper will analyse self-induced vibration spectrums in order to define limit vibration bands, controllable every cycle or connected to permanent vibration-monitoring systems able to adjust sensible process variables identified by ANOVA, once the vibration readings exceed established quality limits.

  19. When Machines Think: Radiology's Next Frontier.

    PubMed

    Dreyer, Keith J; Geis, J Raymond

    2017-12-01

    Artificial intelligence (AI), machine learning, and deep learning are terms now seen frequently, all of which refer to computer algorithms that change as they are exposed to more data. Many of these algorithms are surprisingly good at recognizing objects in images. The combination of large amounts of machine-consumable digital data, increased and cheaper computing power, and increasingly sophisticated statistical models combine to enable machines to find patterns in data in ways that are not only cost-effective but also potentially beyond humans' abilities. Building an AI algorithm can be surprisingly easy. Understanding the associated data structures and statistics, on the other hand, is often difficult and obscure. Converting the algorithm into a sophisticated product that works consistently in broad, general clinical use is complex and incompletely understood. To show how these AI products reduce costs and improve outcomes will require clinical translation and industrial-grade integration into routine workflow. Radiology has the chance to leverage AI to become a center of intelligently aggregated, quantitative, diagnostic information. Centaur radiologists, formed as a synergy of human plus computer, will provide interpretations using data extracted from images by humans and image-analysis computer algorithms, as well as the electronic health record, genomics, and other disparate sources. These interpretations will form the foundation of precision health care, or care customized to an individual patient. © RSNA, 2017.

  20. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Keller, J; Hardin, M; Giaddui, T

    Purpose: To test whether unified vendor specified beam conformance for matched machines implies volumetric modulated arc radiotherapy(VMAT) delivery consistency. Methods: Twenty-two identical patient QA plans, eleven 6MV and eleven 15MV, were delivered to the Delta{sup 4}(Scandidos, Uppsala, Sweden) on two Varian TrueBEAM matched machines. Sixteen patient QA plans, nine 6 MV and seven 10 MV, were delivered to Delta{sup 4} on two Elekta Agility matched machines. The percent dose deviation(%DDev), distance-to-agreement(DTA), and the gamma analysis(γ) were collected for all plans and the differences in measurements were tabulated between matched machines. A paired t-test analysis of the data with an alphamore » of 0.05 determines statistical significance. Power(P) was calculated to detect a difference of 5%; all data except Elekta %DDev sets were strong with above a 0.85 power. Results: The average differences for Varian machines (%DDev, DTA, and γ) are 6.4%, 1.6% and 2.7% for 6MV, respectively, and 8.0%, 0.6%, and 2.5% for 15MV. The average differences for matched Elekta machines (%DDev, DTA, and γ) are 10.2%, 0.6% and 0.9% for 6 MV, respectively, and 7.0%, 1.9%, and 2.8% for 10MV.A paired t-test shows for Varian the %DDev difference is significant for 6MV and 15MV(p-value6MV=0.019, P6MV=0.96; p-value15MV=0.0003, P15MV=0.86). Differences in DTA are insignificant for both 6MV and 15MV(p-value6MV=0.063, P6MV=1; p-value15MV=0.907, P15MV=1). Varian differences in gamma are significant for both energies(p-value6MV=0.025, P6MV=0.99; p-value15MV=0.013, P15MV=1). A paired t-test shows for Elekta the difference in %DDev is significant for 6MV but not 10MV(p-value6MV=0.00065, P6MV=0.68; p-value10MV=0.262, P10MV=0.39). Differences in DTA are statistically insignificant(p-value6MV=0.803, P6MV = 1; p-value10MV=0.269, P10MV=1). Elekta differences in gamma are significant for 10MV only(p-value6MV=0.094, P6MV=1; p-value10MV=0.011, P10MV=1). Conclusion: These results show vendor specified beam conformance across machines does not ensure equivalent patient specific QA pass rates. Gamma differences are statistically significant in three of the four comparisons for two pairs of vendor matched machines.« less

  1. Leveraging Code Comments to Improve Software Reliability

    ERIC Educational Resources Information Center

    Tan, Lin

    2009-01-01

    Commenting source code has long been a common practice in software development. This thesis, consisting of three pieces of work, made novel use of the code comments written in natural language to improve software reliability. Our solution combines Natural Language Processing (NLP), Machine Learning, Statistics, and Program Analysis techniques to…

  2. 3D Self-Localisation From Angle of Arrival Measurements

    DTIC Science & Technology

    2009-04-01

    systems can provide precise position information. However, there are situations where GPS is not adequate such as indoor, underwater, extraterrestrial or...Transactions on Pattern Analysis and Machine Intelligence , Vol. 22, No. 6, June 2000, pp 610-622. 7. Torrieri, D.J., "Statistical Theory of Passive Location

  3. Machine vision system for measuring conifer seedling morphology

    NASA Astrophysics Data System (ADS)

    Rigney, Michael P.; Kranzler, Glenn A.

    1995-01-01

    A PC-based machine vision system providing rapid measurement of bare-root tree seedling morphological features has been designed. The system uses backlighting and a 2048-pixel line- scan camera to acquire images with transverse resolutions as high as 0.05 mm for precise measurement of stem diameter. Individual seedlings are manually loaded on a conveyor belt and inspected by the vision system in less than 0.25 seconds. Designed for quality control and morphological data acquisition by nursery personnel, the system provides a user-friendly, menu-driven graphical interface. The system automatically locates the seedling root collar and measures stem diameter, shoot height, sturdiness ratio, root mass length, projected shoot and root area, shoot-root area ratio, and percent fine roots. Sample statistics are computed for each measured feature. Measurements for each seedling may be stored for later analysis. Feature measurements may be compared with multi-class quality criteria to determine sample quality or to perform multi-class sorting. Statistical summary and classification reports may be printed to facilitate the communication of quality concerns with grading personnel. Tests were conducted at a commercial forest nursery to evaluate measurement precision. Four quality control personnel measured root collar diameter, stem height, and root mass length on each of 200 conifer seedlings. The same seedlings were inspected four times by the machine vision system. Machine stem diameter measurement precision was four times greater than that of manual measurements. Machine and manual measurements had comparable precision for shoot height and root mass length.

  4. A Survey of Statistical Machine Translation

    DTIC Science & Technology

    2007-04-01

    methods are notoriously sen- sitive to domain differences, however, so the move to informal text is likely to present many interesting challenges ...Och, Christoph Tillman, and Hermann Ney. Improved alignment models for statistical machine translation. In Proc. of EMNLP- VLC , pages 20–28, Jun 1999

  5. Reversibility in Quantum Models of Stochastic Processes

    NASA Astrophysics Data System (ADS)

    Gier, David; Crutchfield, James; Mahoney, John; James, Ryan

    Natural phenomena such as time series of neural firing, orientation of layers in crystal stacking and successive measurements in spin-systems are inherently probabilistic. The provably minimal classical models of such stochastic processes are ɛ-machines, which consist of internal states, transition probabilities between states and output values. The topological properties of the ɛ-machine for a given process characterize the structure, memory and patterns of that process. However ɛ-machines are often not ideal because their statistical complexity (Cμ) is demonstrably greater than the excess entropy (E) of the processes they represent. Quantum models (q-machines) of the same processes can do better in that their statistical complexity (Cq) obeys the relation Cμ >= Cq >= E. q-machines can be constructed to consider longer lengths of strings, resulting in greater compression. With code-words of sufficiently long length, the statistical complexity becomes time-symmetric - a feature apparently novel to this quantum representation. This result has ramifications for compression of classical information in quantum computing and quantum communication technology.

  6. Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests

    PubMed Central

    2011-01-01

    Background Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests) were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression) in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Results Press' Q test showed that all classifiers performed better than chance alone (p < 0.05). Support Vector Machines showed the larger overall classification accuracy (Median (Me) = 0.76) an area under the ROC (Me = 0.90). However this method showed high specificity (Me = 1.0) but low sensitivity (Me = 0.3). Random Forest ranked second in overall accuracy (Me = 0.73) with high area under the ROC (Me = 0.73) specificity (Me = 0.73) and sensitivity (Me = 0.64). Linear Discriminant Analysis also showed acceptable overall accuracy (Me = 0.66), with acceptable area under the ROC (Me = 0.72) specificity (Me = 0.66) and sensitivity (Me = 0.64). The remaining classifiers showed overall classification accuracy above a median value of 0.63, but for most sensitivity was around or even lower than a median value of 0.5. Conclusions When taking into account sensitivity, specificity and overall classification accuracy Random Forests and Linear Discriminant analysis rank first among all the classifiers tested in prediction of dementia using several neuropsychological tests. These methods may be used to improve accuracy, sensitivity and specificity of Dementia predictions from neuropsychological testing. PMID:21849043

  7. Food Safety by Using Machine Learning for Automatic Classification of Seeds of the South-American Incanut Plant

    NASA Astrophysics Data System (ADS)

    Lemanzyk, Thomas; Anding, Katharina; Linss, Gerhard; Rodriguez Hernández, Jorge; Theska, René

    2015-02-01

    The following paper deals with the classification of seeds and seed components of the South-American Incanut plant and the modification of a machine to handle this task. Initially the state of the art is being illustrated. The research was executed in Germany and with a relevant part in Peru and Ecuador. Theoretical considerations for the solution of an automatically analysis of the Incanut seeds were specified. The optimization of the analyzing software and the separation unit of the mechanical hardware are carried out with recognition results. In a final step the practical application of the analysis of the Incanut seeds is held on a trial basis and rated on the bases of statistic values.

  8. Urban land use monitoring from computer-implemented processing of airborne multispectral data

    NASA Technical Reports Server (NTRS)

    Todd, W. J.; Mausel, P. W.; Baumgardner, M. F.

    1976-01-01

    Machine processing techniques were applied to multispectral data obtained from airborne scanners at an elevation of 600 meters over central Indianapolis in August, 1972. Computer analysis of these spectral data indicate that roads (two types), roof tops (three types), dense grass (two types), sparse grass (two types), trees, bare soil, and water (two types) can be accurately identified. Using computers, it is possible to determine land uses from analysis of type, size, shape, and spatial associations of earth surface images identified from multispectral data. Land use data developed through machine processing techniques can be programmed to monitor land use changes, simulate land use conditions, and provide impact statistics that are required to analyze stresses placed on spatial systems.

  9. Application of modified profile analysis to function testing of the motion/no-motion issue in an aircraft ground-handling simulation. [statistical analysis procedure for man machine systems flight simulation

    NASA Technical Reports Server (NTRS)

    Parrish, R. V.; Mckissick, B. T.; Steinmetz, G. G.

    1979-01-01

    A recent modification of the methodology of profile analysis, which allows the testing for differences between two functions as a whole with a single test, rather than point by point with multiple tests is discussed. The modification is applied to the examination of the issue of motion/no motion conditions as shown by the lateral deviation curve as a function of engine cut speed of a piloted 737-100 simulator. The results of this application are presented along with those of more conventional statistical test procedures on the same simulator data.

  10. Statistical learning algorithms for identifying contrasting tillage practices with landsat thematic mapper data

    USDA-ARS?s Scientific Manuscript database

    Tillage management practices have direct impact on water holding capacity, evaporation, carbon sequestration, and water quality. This study examines the feasibility of two statistical learning algorithms, such as Least Square Support Vector Machine (LSSVM) and Relevance Vector Machine (RVM), for cla...

  11. Optimisation of GaN LEDs and the reduction of efficiency droop using active machine learning

    DOE PAGES

    Rouet-Leduc, Bertrand; Barros, Kipton Marcos; Lookman, Turab; ...

    2016-04-26

    A fundamental challenge in the design of LEDs is to maximise electro-luminescence efficiency at high current densities. We simulate GaN-based LED structures that delay the onset of efficiency droop by spreading carrier concentrations evenly across the active region. Statistical analysis and machine learning effectively guide the selection of the next LED structure to be examined based upon its expected efficiency as well as model uncertainty. This active learning strategy rapidly constructs a model that predicts Poisson-Schrödinger simulations of devices, and that simultaneously produces structures with higher simulated efficiencies.

  12. Machine learning of frustrated classical spin models. I. Principal component analysis

    NASA Astrophysics Data System (ADS)

    Wang, Ce; Zhai, Hui

    2017-10-01

    This work aims at determining whether artificial intelligence can recognize a phase transition without prior human knowledge. If this were successful, it could be applied to, for instance, analyzing data from the quantum simulation of unsolved physical models. Toward this goal, we first need to apply the machine learning algorithm to well-understood models and see whether the outputs are consistent with our prior knowledge, which serves as the benchmark for this approach. In this work, we feed the computer data generated by the classical Monte Carlo simulation for the X Y model in frustrated triangular and union jack lattices, which has two order parameters and exhibits two phase transitions. We show that the outputs of the principal component analysis agree very well with our understanding of different orders in different phases, and the temperature dependences of the major components detect the nature and the locations of the phase transitions. Our work offers promise for using machine learning techniques to study sophisticated statistical models, and our results can be further improved by using principal component analysis with kernel tricks and the neural network method.

  13. Establishing glucose- and ABA-regulated transcription networks in Arabidopsis by microarray analysis and promoter classification using a Relevance Vector Machine.

    PubMed

    Li, Yunhai; Lee, Kee Khoon; Walsh, Sean; Smith, Caroline; Hadingham, Sophie; Sorefan, Karim; Cawley, Gavin; Bevan, Michael W

    2006-03-01

    Establishing transcriptional regulatory networks by analysis of gene expression data and promoter sequences shows great promise. We developed a novel promoter classification method using a Relevance Vector Machine (RVM) and Bayesian statistical principles to identify discriminatory features in the promoter sequences of genes that can correctly classify transcriptional responses. The method was applied to microarray data obtained from Arabidopsis seedlings treated with glucose or abscisic acid (ABA). Of those genes showing >2.5-fold changes in expression level, approximately 70% were correctly predicted as being up- or down-regulated (under 10-fold cross-validation), based on the presence or absence of a small set of discriminative promoter motifs. Many of these motifs have known regulatory functions in sugar- and ABA-mediated gene expression. One promoter motif that was not known to be involved in glucose-responsive gene expression was identified as the strongest classifier of glucose-up-regulated gene expression. We show it confers glucose-responsive gene expression in conjunction with another promoter motif, thus validating the classification method. We were able to establish a detailed model of glucose and ABA transcriptional regulatory networks and their interactions, which will help us to understand the mechanisms linking metabolism with growth in Arabidopsis. This study shows that machine learning strategies coupled to Bayesian statistical methods hold significant promise for identifying functionally significant promoter sequences.

  14. Comprehensive machine learning analysis of Hydra behavior reveals a stable basal behavioral repertoire

    PubMed Central

    Taralova, Ekaterina; Dupre, Christophe; Yuste, Rafael

    2018-01-01

    Animal behavior has been studied for centuries, but few efficient methods are available to automatically identify and classify it. Quantitative behavioral studies have been hindered by the subjective and imprecise nature of human observation, and the slow speed of annotating behavioral data. Here, we developed an automatic behavior analysis pipeline for the cnidarian Hydra vulgaris using machine learning. We imaged freely behaving Hydra, extracted motion and shape features from the videos, and constructed a dictionary of visual features to classify pre-defined behaviors. We also identified unannotated behaviors with unsupervised methods. Using this analysis pipeline, we quantified 6 basic behaviors and found surprisingly similar behavior statistics across animals within the same species, regardless of experimental conditions. Our analysis indicates that the fundamental behavioral repertoire of Hydra is stable. This robustness could reflect a homeostatic neural control of "housekeeping" behaviors which could have been already present in the earliest nervous systems. PMID:29589829

  15. BLS Machine-Readable Data and Tabulating Routines.

    ERIC Educational Resources Information Center

    DiFillipo, Tony

    This report describes the machine-readable data and tabulating routines that the Bureau of Labor Statistics (BLS) is prepared to distribute. An introduction discusses the LABSTAT (Labor Statistics) database and the BLS policy on release of unpublished data. Descriptions summarizing data stored in 25 files follow this format: overview, data…

  16. Investigating output and energy variations and their relationship to delivery QA results using Statistical Process Control for helical tomotherapy.

    PubMed

    Binny, Diana; Mezzenga, Emilio; Lancaster, Craig M; Trapp, Jamie V; Kairn, Tanya; Crowe, Scott B

    2017-06-01

    The aims of this study were to investigate machine beam parameters using the TomoTherapy quality assurance (TQA) tool, establish a correlation to patient delivery quality assurance results and to evaluate the relationship between energy variations detected using different TQA modules. TQA daily measurement results from two treatment machines for periods of up to 4years were acquired. Analyses of beam quality, helical and static output variations were made. Variations from planned dose were also analysed using Statistical Process Control (SPC) technique and their relationship to output trends were studied. Energy variations appeared to be one of the contributing factors to delivery output dose seen in the analysis. Ion chamber measurements were reliable indicators of energy and output variations and were linear with patient dose verifications. Crown Copyright © 2017. Published by Elsevier Ltd. All rights reserved.

  17. Waveform classification and statistical analysis of seismic precursors to the July 2008 Vulcanian Eruption of Soufrière Hills Volcano, Montserrat

    NASA Astrophysics Data System (ADS)

    Rodgers, Mel; Smith, Patrick; Pyle, David; Mather, Tamsin

    2016-04-01

    Understanding the transition between quiescence and eruption at dome-forming volcanoes, such as Soufrière Hills Volcano (SHV), Montserrat, is important for monitoring volcanic activity during long-lived eruptions. Statistical analysis of seismic events (e.g. spectral analysis and identification of multiplets via cross-correlation) can be useful for characterising seismicity patterns and can be a powerful tool for analysing temporal changes in behaviour. Waveform classification is crucial for volcano monitoring, but consistent classification, both during real-time analysis and for retrospective analysis of previous volcanic activity, remains a challenge. Automated classification allows consistent re-classification of events. We present a machine learning (random forest) approach to rapidly classify waveforms that requires minimal training data. We analyse the seismic precursors to the July 2008 Vulcanian explosion at SHV and show systematic changes in frequency content and multiplet behaviour that had not previously been recognised. These precursory patterns of seismicity may be interpreted as changes in pressure conditions within the conduit during magma ascent and could be linked to magma flow rates. Frequency analysis of the different waveform classes supports the growing consensus that LP and Hybrid events should be considered end members of a continuum of low-frequency source processes. By using both supervised and unsupervised machine-learning methods we investigate the nature of waveform classification and assess current classification schemes.

  18. On the Stability of Jump-Linear Systems Driven by Finite-State Machines with Markovian Inputs

    NASA Technical Reports Server (NTRS)

    Patilkulkarni, Sudarshan; Herencia-Zapana, Heber; Gray, W. Steven; Gonzalez, Oscar R.

    2004-01-01

    This paper presents two mean-square stability tests for a jump-linear system driven by a finite-state machine with a first-order Markovian input process. The first test is based on conventional Markov jump-linear theory and avoids the use of any higher-order statistics. The second test is developed directly using the higher-order statistics of the machine s output process. The two approaches are illustrated with a simple model for a recoverable computer control system.

  19. Modeling Stochastic Kinetics of Molecular Machines at Multiple Levels: From Molecules to Modules

    PubMed Central

    Chowdhury, Debashish

    2013-01-01

    A molecular machine is either a single macromolecule or a macromolecular complex. In spite of the striking superficial similarities between these natural nanomachines and their man-made macroscopic counterparts, there are crucial differences. Molecular machines in a living cell operate stochastically in an isothermal environment far from thermodynamic equilibrium. In this mini-review we present a catalog of the molecular machines and an inventory of the essential toolbox for theoretically modeling these machines. The tool kits include 1), nonequilibrium statistical-physics techniques for modeling machines and machine-driven processes; and 2), statistical-inference methods for reverse engineering a functional machine from the empirical data. The cell is often likened to a microfactory in which the machineries are organized in modular fashion; each module consists of strongly coupled multiple machines, but different modules interact weakly with each other. This microfactory has its own automated supply chain and delivery system. Buoyed by the success achieved in modeling individual molecular machines, we advocate integration of these models in the near future to develop models of functional modules. A system-level description of the cell from the perspective of molecular machinery (the mechanome) is likely to emerge from further integrations that we envisage here. PMID:23746505

  20. The influence of maintenance quality of hemodialysis machines on hemodialysis efficiency.

    PubMed

    Azar, Ahmad Taher

    2009-01-01

    Several studies suggest that there is a correlation between dose of dialysis and machine maintenance. However, in spite of the current practice, there are conflicting reports regarding the relationship between dose of dialysis or patient outcome, and machine maintenance. In order to evaluate the impact of hemodialysis machine maintenance on dialysis adequacy Kt/V and session performance, data were processed on 134 patients on 3-times-per-week dialysis regimens by dividing the patients into four groups and also dividing the hemodialysis machines into four groups according to their year of installation. The equilibrated dialysis dose eq Kt/V, urea reduction ratio (URR) and the overall equipment effectiveness (OEE) were calculated in each group to show the effect hemodialysis machine efficiency on the overall session performance. The average working time per machine per month was 270 hours. The cumulative number of hours according to the year of installation was: 26,122 hours for machines installed in 1998; 21,596 hours for machines installed in 1999, 8362 hours for those installed in 2003 and 2486 hours for those installed in 2005. The mean time between failures (MTBF) was 1.8, 2.1, 4.2 and 6 months between failures for machines installed in 1999, 1998, 2003 and 2005, respectively. Statistical analysis demonstrated that the dialysis dose eq Kt/V and URR were increased as the overall equipment effectiveness (OEE) increases with regular maintenance procedures. Maintenance has become one of the most expedient approaches to guarantee high machine dependability. The efficiency of dialysis machine is relevant in assuring a proper dialysis adequacy.

  1. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Angers, Crystal Plume; Bottema, Ryan; Buckley, Les

    Purpose: Treatment unit uptime statistics are typically used to monitor radiation equipment performance. The Ottawa Hospital Cancer Centre has introduced the use of Quality Control (QC) test success as a quality indicator for equipment performance and overall health of the equipment QC program. Methods: Implemented in 2012, QATrack+ is used to record and monitor over 1100 routine machine QC tests each month for 20 treatment and imaging units ( http://qatrackplus.com/ ). Using an SQL (structured query language) script, automated queries of the QATrack+ database are used to generate program metrics such as the number of QC tests executed and themore » percentage of tests passing, at tolerance or at action. These metrics are compared against machine uptime statistics already reported within the program. Results: Program metrics for 2015 show good correlation between pass rate of QC tests and uptime for a given machine. For the nine conventional linacs, the QC test success rate was consistently greater than 97%. The corresponding uptimes for these units are better than 98%. Machines that consistently show higher failure or tolerance rates in the QC tests have lower uptimes. This points to either poor machine performance requiring corrective action or to problems with the QC program. Conclusions: QATrack+ significantly improves the organization of QC data but can also aid in overall equipment management. Complimenting machine uptime statistics with QC test metrics provides a more complete picture of overall machine performance and can be used to identify areas of improvement in the machine service and QC programs.« less

  2. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yale, S H

    A survey was conducted of x-ray facilities in 2000 dental offices under actual operating conditions. Each of 10 dental schools in the United States collected data on 200 local dental offices to implement geographic analysis of the status of radiation hygiene in the offices. The data provided records of roentgen (r) output of each machine, relative r dose to patient, and dose to operator. In addition, specific information relating to both operator and machine was coiiected and evaluated. Some dentists were found to be operating under unsafe conditions, but the average dentist covered in the survey was statistically safe. Onmore » the basis of the survey, it was concluded that the probiem of radiation hazards in dentistry will be resolved when all dental x-ray machines are properly filtered and collimated and high-speed dental x-ray film is used. (P.C.H.)« less

  3. Detection of Buried Targets via Active Selection of Labeled Data: Application to Sensing Subsurface UXO

    DTIC Science & Technology

    2007-06-01

    images,” IEEE Trans. Pattern Analysis Machine Intelligence, vol. 13, no. 2, pp. 99–113, 1991. [15] C. Bouman and M. Shapiro, “A multiscale random...including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing...this project was on developing new statistical algorithms for analysis of electromagnetic induction (EMI) and magnetometer data measured at actual

  4. High School and Beyond. 1980 Sophomore Cohort. First Follow-Up (1982). [machine-readable data file].

    ERIC Educational Resources Information Center

    National Center for Education Statistics (ED), Washington, DC.

    The High School and Beyond 1980 Sophomore Cohort First Follow-Up (1982) data file is presented. The First Follow-Up Sophomore Cohort data tape consists of four related data files: (1) the student data file (including data availability flags, weights, questionnaire data, and composite variables); (2) Statistical Analysis System (SAS) control cards…

  5. Secure and Efficient Regression Analysis Using a Hybrid Cryptographic Framework: Development and Evaluation

    PubMed Central

    Jiang, Xiaoqian; Aziz, Md Momin Al; Wang, Shuang; Mohammed, Noman

    2018-01-01

    Background Machine learning is an effective data-driven tool that is being widely used to extract valuable patterns and insights from data. Specifically, predictive machine learning models are very important in health care for clinical data analysis. The machine learning algorithms that generate predictive models often require pooling data from different sources to discover statistical patterns or correlations among different attributes of the input data. The primary challenge is to fulfill one major objective: preserving the privacy of individuals while discovering knowledge from data. Objective Our objective was to develop a hybrid cryptographic framework for performing regression analysis over distributed data in a secure and efficient way. Methods Existing secure computation schemes are not suitable for processing the large-scale data that are used in cutting-edge machine learning applications. We designed, developed, and evaluated a hybrid cryptographic framework, which can securely perform regression analysis, a fundamental machine learning algorithm using somewhat homomorphic encryption and a newly introduced secure hardware component of Intel Software Guard Extensions (Intel SGX) to ensure both privacy and efficiency at the same time. Results Experimental results demonstrate that our proposed method provides a better trade-off in terms of security and efficiency than solely secure hardware-based methods. Besides, there is no approximation error. Computed model parameters are exactly similar to plaintext results. Conclusions To the best of our knowledge, this kind of secure computation model using a hybrid cryptographic framework, which leverages both somewhat homomorphic encryption and Intel SGX, is not proposed or evaluated to this date. Our proposed framework ensures data security and computational efficiency at the same time. PMID:29506966

  6. Secure and Efficient Regression Analysis Using a Hybrid Cryptographic Framework: Development and Evaluation.

    PubMed

    Sadat, Md Nazmus; Jiang, Xiaoqian; Aziz, Md Momin Al; Wang, Shuang; Mohammed, Noman

    2018-03-05

    Machine learning is an effective data-driven tool that is being widely used to extract valuable patterns and insights from data. Specifically, predictive machine learning models are very important in health care for clinical data analysis. The machine learning algorithms that generate predictive models often require pooling data from different sources to discover statistical patterns or correlations among different attributes of the input data. The primary challenge is to fulfill one major objective: preserving the privacy of individuals while discovering knowledge from data. Our objective was to develop a hybrid cryptographic framework for performing regression analysis over distributed data in a secure and efficient way. Existing secure computation schemes are not suitable for processing the large-scale data that are used in cutting-edge machine learning applications. We designed, developed, and evaluated a hybrid cryptographic framework, which can securely perform regression analysis, a fundamental machine learning algorithm using somewhat homomorphic encryption and a newly introduced secure hardware component of Intel Software Guard Extensions (Intel SGX) to ensure both privacy and efficiency at the same time. Experimental results demonstrate that our proposed method provides a better trade-off in terms of security and efficiency than solely secure hardware-based methods. Besides, there is no approximation error. Computed model parameters are exactly similar to plaintext results. To the best of our knowledge, this kind of secure computation model using a hybrid cryptographic framework, which leverages both somewhat homomorphic encryption and Intel SGX, is not proposed or evaluated to this date. Our proposed framework ensures data security and computational efficiency at the same time. ©Md Nazmus Sadat, Xiaoqian Jiang, Md Momin Al Aziz, Shuang Wang, Noman Mohammed. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 05.03.2018.

  7. Improved biliary detection and diagnosis through intelligent machine analysis.

    PubMed

    Logeswaran, Rajasvaran

    2012-09-01

    This paper reports on work undertaken to improve automated detection of bile ducts in magnetic resonance cholangiopancreatography (MRCP) images, with the objective of conducting preliminary classification of the images for diagnosis. The proposed I-BDeDIMA (Improved Biliary Detection and Diagnosis through Intelligent Machine Analysis) scheme is a multi-stage framework consisting of successive phases of image normalization, denoising, structure identification, object labeling, feature selection and disease classification. A combination of multiresolution wavelet, dynamic intensity thresholding, segment-based region growing, region elimination, statistical analysis and neural networks, is used in this framework to achieve good structure detection and preliminary diagnosis. Tests conducted on over 200 clinical images with known diagnosis have shown promising results of over 90% accuracy. The scheme outperforms related work in the literature, making it a viable framework for computer-aided diagnosis of biliary diseases. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.

  8. Modeling stochastic kinetics of molecular machines at multiple levels: from molecules to modules.

    PubMed

    Chowdhury, Debashish

    2013-06-04

    A molecular machine is either a single macromolecule or a macromolecular complex. In spite of the striking superficial similarities between these natural nanomachines and their man-made macroscopic counterparts, there are crucial differences. Molecular machines in a living cell operate stochastically in an isothermal environment far from thermodynamic equilibrium. In this mini-review we present a catalog of the molecular machines and an inventory of the essential toolbox for theoretically modeling these machines. The tool kits include 1), nonequilibrium statistical-physics techniques for modeling machines and machine-driven processes; and 2), statistical-inference methods for reverse engineering a functional machine from the empirical data. The cell is often likened to a microfactory in which the machineries are organized in modular fashion; each module consists of strongly coupled multiple machines, but different modules interact weakly with each other. This microfactory has its own automated supply chain and delivery system. Buoyed by the success achieved in modeling individual molecular machines, we advocate integration of these models in the near future to develop models of functional modules. A system-level description of the cell from the perspective of molecular machinery (the mechanome) is likely to emerge from further integrations that we envisage here. Copyright © 2013 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  9. Machine Learning Approaches for Clinical Psychology and Psychiatry.

    PubMed

    Dwyer, Dominic B; Falkai, Peter; Koutsouleris, Nikolaos

    2018-05-07

    Machine learning approaches for clinical psychology and psychiatry explicitly focus on learning statistical functions from multidimensional data sets to make generalizable predictions about individuals. The goal of this review is to provide an accessible understanding of why this approach is important for future practice given its potential to augment decisions associated with the diagnosis, prognosis, and treatment of people suffering from mental illness using clinical and biological data. To this end, the limitations of current statistical paradigms in mental health research are critiqued, and an introduction is provided to critical machine learning methods used in clinical studies. A selective literature review is then presented aiming to reinforce the usefulness of machine learning methods and provide evidence of their potential. In the context of promising initial results, the current limitations of machine learning approaches are addressed, and considerations for future clinical translation are outlined.

  10. Towards application of rule learning to the meta-analysis of clinical data: an example of the metabolic syndrome.

    PubMed

    Wojtusiak, Janusz; Michalski, Ryszard S; Simanivanh, Thipkesone; Baranova, Ancha V

    2009-12-01

    Systematic reviews and meta-analysis of published clinical datasets are important part of medical research. By combining results of multiple studies, meta-analysis is able to increase confidence in its conclusions, validate particular study results, and sometimes lead to new findings. Extensive theory has been built on how to aggregate results from multiple studies and arrive to the statistically valid conclusions. Surprisingly, very little has been done to adopt advanced machine learning methods to support meta-analysis. In this paper we describe a novel machine learning methodology that is capable of inducing accurate and easy to understand attributional rules from aggregated data. Thus, the methodology can be used to support traditional meta-analysis in systematic reviews. Most machine learning applications give primary attention to predictive accuracy of the learned knowledge, and lesser attention to its understandability. Here we employed attributional rules, the special form of rules that are relatively easy to interpret for medical experts who are not necessarily trained in statistics and meta-analysis. The methodology has been implemented and initially tested on a set of publicly available clinical data describing patients with metabolic syndrome (MS). The objective of this application was to determine rules describing combinations of clinical parameters used for metabolic syndrome diagnosis, and to develop rules for predicting whether particular patients are likely to develop secondary complications of MS. The aggregated clinical data was retrieved from 20 separate hospital cohorts that included 12 groups of patients with present liver disease symptoms and 8 control groups of healthy subjects. The total of 152 attributes were used, most of which were measured, however, in different studies. Twenty most common attributes were selected for the rule learning process. By applying the developed rule learning methodology we arrived at several different possible rulesets that can be used to predict three considered complications of MS, namely nonalcoholic fatty liver disease (NAFLD), simple steatosis (SS), and nonalcoholic steatohepatitis (NASH).

  11. The Statistical Basis of Chemical Equilibria.

    ERIC Educational Resources Information Center

    Hauptmann, Siegfried; Menger, Eva

    1978-01-01

    Describes a machine which demonstrates the statistical bases of chemical equilibrium, and in doing so conveys insight into the connections among statistical mechanics, quantum mechanics, Maxwell Boltzmann statistics, statistical thermodynamics, and transition state theory. (GA)

  12. Adding Statistical Machine Translation Adaptation to Computer-Assisted Translation

    DTIC Science & Technology

    2013-09-01

    are automatically searched and used to suggest possible translations; (2) spell-checkers; (3) glossaries; (4) dictionaries ; (5) alignment and...matching against TMs to propose translations; spell-checking, glossary, and dictionary look-up; support for multiple file formats; regular expressions...on Telecommunications. Tehran, 2012, 822–826. Bertoldi, N.; Federico, M. Domain Adaptation for Statistical Machine Translation with Monolingual

  13. A framework for medical image retrieval using machine learning and statistical similarity matching techniques with relevance feedback.

    PubMed

    Rahman, Md Mahmudur; Bhattacharya, Prabir; Desai, Bipin C

    2007-01-01

    A content-based image retrieval (CBIR) framework for diverse collection of medical images of different imaging modalities, anatomic regions with different orientations and biological systems is proposed. Organization of images in such a database (DB) is well defined with predefined semantic categories; hence, it can be useful for category-specific searching. The proposed framework consists of machine learning methods for image prefiltering, similarity matching using statistical distance measures, and a relevance feedback (RF) scheme. To narrow down the semantic gap and increase the retrieval efficiency, we investigate both supervised and unsupervised learning techniques to associate low-level global image features (e.g., color, texture, and edge) in the projected PCA-based eigenspace with their high-level semantic and visual categories. Specially, we explore the use of a probabilistic multiclass support vector machine (SVM) and fuzzy c-mean (FCM) clustering for categorization and prefiltering of images to reduce the search space. A category-specific statistical similarity matching is proposed in a finer level on the prefiltered images. To incorporate a better perception subjectivity, an RF mechanism is also added to update the query parameters dynamically and adjust the proposed matching functions. Experiments are based on a ground-truth DB consisting of 5000 diverse medical images of 20 predefined categories. Analysis of results based on cross-validation (CV) accuracy and precision-recall for image categorization and retrieval is reported. It demonstrates the improvement, effectiveness, and efficiency achieved by the proposed framework.

  14. The effects of multiple repairs on Inconel 718 weld mechanical properties

    NASA Technical Reports Server (NTRS)

    Russell, C. K.; Nunes, A. C., Jr.; Moore, D.

    1991-01-01

    Inconel 718 weldments were repaired 3, 6, 9, and 13 times using the gas tungsten arc welding process. The welded panels were machined into mechanical test specimens, postweld heat treated, and nondestructively tested. Tensile properties and high cycle fatigue life were evaluated and the results compared to unrepaired weld properties. Mechanical property data were analyzed using the statistical methods of difference in means for tensile properties and difference in log means and Weibull analysis for high cycle fatigue properties. Statistical analysis performed on the data did not show a significant decrease in tensile or high cycle fatigue properties due to the repeated repairs. Some degradation was observed in all properties, however, it was minimal.

  15. The Southampton-York Natural Scenes (SYNS) dataset: Statistics of surface attitude

    PubMed Central

    Adams, Wendy J.; Elder, James H.; Graf, Erich W.; Leyland, Julian; Lugtigheid, Arthur J.; Muryy, Alexander

    2016-01-01

    Recovering 3D scenes from 2D images is an under-constrained task; optimal estimation depends upon knowledge of the underlying scene statistics. Here we introduce the Southampton-York Natural Scenes dataset (SYNS: https://syns.soton.ac.uk), which provides comprehensive scene statistics useful for understanding biological vision and for improving machine vision systems. In order to capture the diversity of environments that humans encounter, scenes were surveyed at random locations within 25 indoor and outdoor categories. Each survey includes (i) spherical LiDAR range data (ii) high-dynamic range spherical imagery and (iii) a panorama of stereo image pairs. We envisage many uses for the dataset and present one example: an analysis of surface attitude statistics, conditioned on scene category and viewing elevation. Surface normals were estimated using a novel adaptive scale selection algorithm. Across categories, surface attitude below the horizon is dominated by the ground plane (0° tilt). Near the horizon, probability density is elevated at 90°/270° tilt due to vertical surfaces (trees, walls). Above the horizon, probability density is elevated near 0° slant due to overhead structure such as ceilings and leaf canopies. These structural regularities represent potentially useful prior assumptions for human and machine observers, and may predict human biases in perceived surface attitude. PMID:27782103

  16. Statistical Optimality in Multipartite Ranking and Ordinal Regression.

    PubMed

    Uematsu, Kazuki; Lee, Yoonkyung

    2015-05-01

    Statistical optimality in multipartite ranking is investigated as an extension of bipartite ranking. We consider the optimality of ranking algorithms through minimization of the theoretical risk which combines pairwise ranking errors of ordinal categories with differential ranking costs. The extension shows that for a certain class of convex loss functions including exponential loss, the optimal ranking function can be represented as a ratio of weighted conditional probability of upper categories to lower categories, where the weights are given by the misranking costs. This result also bridges traditional ranking methods such as proportional odds model in statistics with various ranking algorithms in machine learning. Further, the analysis of multipartite ranking with different costs provides a new perspective on non-smooth list-wise ranking measures such as the discounted cumulative gain and preference learning. We illustrate our findings with simulation study and real data analysis.

  17. IRB Process Improvements: A Machine Learning Analysis.

    PubMed

    Shoenbill, Kimberly; Song, Yiqiang; Cobb, Nichelle L; Drezner, Marc K; Mendonca, Eneida A

    2017-06-01

    Clinical research involving humans is critically important, but it is a lengthy and expensive process. Most studies require institutional review board (IRB) approval. Our objective is to identify predictors of delays or accelerations in the IRB review process and apply this knowledge to inform process change in an effort to improve IRB efficiency, transparency, consistency and communication. We analyzed timelines of protocol submissions to determine protocol or IRB characteristics associated with different processing times. Our evaluation included single variable analysis to identify significant predictors of IRB processing time and machine learning methods to predict processing times through the IRB review system. Based on initial identified predictors, changes to IRB workflow and staffing procedures were instituted and we repeated our analysis. Our analysis identified several predictors of delays in the IRB review process including type of IRB review to be conducted, whether a protocol falls under Veteran's Administration purview and specific staff in charge of a protocol's review. We have identified several predictors of delays in IRB protocol review processing times using statistical and machine learning methods. Application of this knowledge to process improvement efforts in two IRBs has led to increased efficiency in protocol review. The workflow and system enhancements that are being made support our four-part goal of improving IRB efficiency, consistency, transparency, and communication.

  18. Moving beyond regression techniques in cardiovascular risk prediction: applying machine learning to address analytic challenges

    PubMed Central

    Goldstein, Benjamin A.; Navar, Ann Marie; Carter, Rickey E.

    2017-01-01

    Abstract Risk prediction plays an important role in clinical cardiology research. Traditionally, most risk models have been based on regression models. While useful and robust, these statistical methods are limited to using a small number of predictors which operate in the same way on everyone, and uniformly throughout their range. The purpose of this review is to illustrate the use of machine-learning methods for development of risk prediction models. Typically presented as black box approaches, most machine-learning methods are aimed at solving particular challenges that arise in data analysis that are not well addressed by typical regression approaches. To illustrate these challenges, as well as how different methods can address them, we consider trying to predicting mortality after diagnosis of acute myocardial infarction. We use data derived from our institution's electronic health record and abstract data on 13 regularly measured laboratory markers. We walk through different challenges that arise in modelling these data and then introduce different machine-learning approaches. Finally, we discuss general issues in the application of machine-learning methods including tuning parameters, loss functions, variable importance, and missing data. Overall, this review serves as an introduction for those working on risk modelling to approach the diffuse field of machine learning. PMID:27436868

  19. Performance analysis of cutting graphite-epoxy composite using a 90,000psi abrasive waterjet

    NASA Astrophysics Data System (ADS)

    Choppali, Aiswarya

    Graphite-epoxy composites are being widely used in many aerospace and structural applications because of their properties: which include lighter weight, higher strength to weight ratio and a greater flexibility in design. However, the inherent anisotropy of these composites makes it difficult to machine them using conventional methods. To overcome the major issues that develop with conventional machining such as fiber pull out, delamination, heat generation and high tooling costs, an effort is herein made to study abrasive waterjet machining of composites. An abrasive waterjet is used to cut 1" thick graphite epoxy composites based on baseline data obtained from the cutting of ¼" thick material. The objective of this project is to study the surface roughness of the cut surface with a focus on demonstrating the benefits of using higher pressures for cutting composites. The effects of major cutting parameters: jet pressure, traverse speed, abrasive feed rate and cutting head size are studied at different levels. Statistical analysis of the experimental data provides an understanding of the effect of the process parameters on surface roughness. Additionally, the effect of these parameters on the taper angle of the cut is studied. The data is analyzed to obtain a set of process parameters that optimize the cutting of 1" thick graphite-epoxy composite. The statistical analysis is used to validate the experimental data. Costs involved in the cutting process are investigated in term of abrasive consumed to better understand and illustrate the practical benefits of using higher pressures. It is demonstrated that, as pressure increased, ultra-high pressure waterjets produced a better surface quality at a faster traverse rate with lower costs.

  20. Semi-supervised vibration-based classification and condition monitoring of compressors

    NASA Astrophysics Data System (ADS)

    Potočnik, Primož; Govekar, Edvard

    2017-09-01

    Semi-supervised vibration-based classification and condition monitoring of the reciprocating compressors installed in refrigeration appliances is proposed in this paper. The method addresses the problem of industrial condition monitoring where prior class definitions are often not available or difficult to obtain from local experts. The proposed method combines feature extraction, principal component analysis, and statistical analysis for the extraction of initial class representatives, and compares the capability of various classification methods, including discriminant analysis (DA), neural networks (NN), support vector machines (SVM), and extreme learning machines (ELM). The use of the method is demonstrated on a case study which was based on industrially acquired vibration measurements of reciprocating compressors during the production of refrigeration appliances. The paper presents a comparative qualitative analysis of the applied classifiers, confirming the good performance of several nonlinear classifiers. If the model parameters are properly selected, then very good classification performance can be obtained from NN trained by Bayesian regularization, SVM and ELM classifiers. The method can be effectively applied for the industrial condition monitoring of compressors.

  1. CMM Data Analysis Tool

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Due to the increase in the use of Coordinate Measuring Machines (CMMs) to measure fine details and complex geometries in manufacturing, many programs have been made to compile and analyze the data. These programs typically require extensive setup to determine the expected results in order to not only track the pass/fail of a dimension, but also to use statistical process control (SPC). These extra steps and setup times have been addressed through the CMM Data Analysis Tool, which only requires the output of the CMM to provide both pass/fail analysis on all parts run to the same inspection program asmore » well as provide graphs which help visualize where the part measures within the allowed tolerances. This provides feedback not only to the customer for approval of a part during development, but also to machining process engineers to identify when any dimension is drifting towards an out of tolerance condition during production. This program can handle hundreds of parts with complex dimensions and will provide an analysis within minutes.« less

  2. Comprehensive machine learning analysis of Hydra behavior reveals a stable basal behavioral repertoire.

    PubMed

    Han, Shuting; Taralova, Ekaterina; Dupre, Christophe; Yuste, Rafael

    2018-03-28

    Animal behavior has been studied for centuries, but few efficient methods are available to automatically identify and classify it. Quantitative behavioral studies have been hindered by the subjective and imprecise nature of human observation, and the slow speed of annotating behavioral data. Here, we developed an automatic behavior analysis pipeline for the cnidarian Hydra vulgaris using machine learning. We imaged freely behaving Hydra , extracted motion and shape features from the videos, and constructed a dictionary of visual features to classify pre-defined behaviors. We also identified unannotated behaviors with unsupervised methods. Using this analysis pipeline, we quantified 6 basic behaviors and found surprisingly similar behavior statistics across animals within the same species, regardless of experimental conditions. Our analysis indicates that the fundamental behavioral repertoire of Hydra is stable. This robustness could reflect a homeostatic neural control of "housekeeping" behaviors which could have been already present in the earliest nervous systems. © 2018, Han et al.

  3. High School and Beyond. 1980 Senior Cohort. First Follow-Up (1982). [machine-readable data file].

    ERIC Educational Resources Information Center

    National Center for Education Statistics (ED), Washington, DC.

    The High School and Beyond 1980 Senior Cohort First Follow-Up (1982) Data File is presented. The First Follow-Up Senior Cohort data tape consists of four related data files: (1) the student data file (including data availability flags, weights, questionnaire data, and composite variables); (2) Statistical Analysis System (SAS) control cards for…

  4. Fault detection, isolation, and diagnosis of self-validating multifunctional sensors.

    PubMed

    Yang, Jing-Li; Chen, Yin-Sheng; Zhang, Li-Li; Sun, Zhen

    2016-06-01

    A novel fault detection, isolation, and diagnosis (FDID) strategy for self-validating multifunctional sensors is presented in this paper. The sparse non-negative matrix factorization-based method can effectively detect faults by using the squared prediction error (SPE) statistic, and the variables contribution plots based on SPE statistic can help to locate and isolate the faulty sensitive units. The complete ensemble empirical mode decomposition is employed to decompose the fault signals to a series of intrinsic mode functions (IMFs) and a residual. The sample entropy (SampEn)-weighted energy values of each IMFs and the residual are estimated to represent the characteristics of the fault signals. Multi-class support vector machine is introduced to identify the fault mode with the purpose of diagnosing status of the faulty sensitive units. The performance of the proposed strategy is compared with other fault detection strategies such as principal component analysis, independent component analysis, and fault diagnosis strategies such as empirical mode decomposition coupled with support vector machine. The proposed strategy is fully evaluated in a real self-validating multifunctional sensors experimental system, and the experimental results demonstrate that the proposed strategy provides an excellent solution to the FDID research topic of self-validating multifunctional sensors.

  5. Results of PBX 9501 and PBX 9502 Round-Robin Quasi-Static Tension Tests from JOWOG-9/39 Focused Exchange.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Thompson, D. G.

    2002-01-01

    A round-robin study was conducted with the participation of three laboratory facilities: Los Alamos National Laboratory (LANL), BWXT Pantex Plant (PX), and Lawrence Livermore National Laboratory (LLNL). The study involved the machining and quasi-static tension testing of two plastic-bonded high explosive (PBX) composites, PBX 9501 and PBX 9502. Nine tensile specimens for each type of PBX were to be machined at each of the three facilities; 3 of these specimens were to be sent to each of the participating materials testing facilities for tensile testing. The resultant data was analyzed to look for trends associated with specimen machining location and/ormore » trends associated with materials testing location. The analysis provides interesting insights into the variability and statistical nature of mechanical properties testing on PBX composites. Caution is warranted when results are compared/exchanged between testing facilities.« less

  6. Machine Learning for Treatment Assignment: Improving Individualized Risk Attribution

    PubMed Central

    Weiss, Jeremy; Kuusisto, Finn; Boyd, Kendrick; Liu, Jie; Page, David

    2015-01-01

    Clinical studies model the average treatment effect (ATE), but apply this population-level effect to future individuals. Due to recent developments of machine learning algorithms with useful statistical guarantees, we argue instead for modeling the individualized treatment effect (ITE), which has better applicability to new patients. We compare ATE-estimation using randomized and observational analysis methods against ITE-estimation using machine learning, and describe how the ITE theoretically generalizes to new population distributions, whereas the ATE may not. On a synthetic data set of statin use and myocardial infarction (MI), we show that a learned ITE model improves true ITE estimation and outperforms the ATE. We additionally argue that ITE models should be learned with a consistent, nonparametric algorithm from unweighted examples and show experiments in favor of our argument using our synthetic data model and a real data set of D-penicillamine use for primary biliary cirrhosis. PMID:26958271

  7. Machine Learning for Treatment Assignment: Improving Individualized Risk Attribution.

    PubMed

    Weiss, Jeremy; Kuusisto, Finn; Boyd, Kendrick; Liu, Jie; Page, David

    2015-01-01

    Clinical studies model the average treatment effect (ATE), but apply this population-level effect to future individuals. Due to recent developments of machine learning algorithms with useful statistical guarantees, we argue instead for modeling the individualized treatment effect (ITE), which has better applicability to new patients. We compare ATE-estimation using randomized and observational analysis methods against ITE-estimation using machine learning, and describe how the ITE theoretically generalizes to new population distributions, whereas the ATE may not. On a synthetic data set of statin use and myocardial infarction (MI), we show that a learned ITE model improves true ITE estimation and outperforms the ATE. We additionally argue that ITE models should be learned with a consistent, nonparametric algorithm from unweighted examples and show experiments in favor of our argument using our synthetic data model and a real data set of D-penicillamine use for primary biliary cirrhosis.

  8. Perspectives on Machine Learning for Classification of Schizotypy Using fMRI Data.

    PubMed

    Madsen, Kristoffer H; Krohne, Laerke G; Cai, Xin-Lu; Wang, Yi; Chan, Raymond C K

    2018-03-15

    Functional magnetic resonance imaging is capable of estimating functional activation and connectivity in the human brain, and lately there has been increased interest in the use of these functional modalities combined with machine learning for identification of psychiatric traits. While these methods bear great potential for early diagnosis and better understanding of disease processes, there are wide ranges of processing choices and pitfalls that may severely hamper interpretation and generalization performance unless carefully considered. In this perspective article, we aim to motivate the use of machine learning schizotypy research. To this end, we describe common data processing steps while commenting on best practices and procedures. First, we introduce the important role of schizotypy to motivate the importance of reliable classification, and summarize existing machine learning literature on schizotypy. Then, we describe procedures for extraction of features based on fMRI data, including statistical parametric mapping, parcellation, complex network analysis, and decomposition methods, as well as classification with a special focus on support vector classification and deep learning. We provide more detailed descriptions and software as supplementary material. Finally, we present current challenges in machine learning for classification of schizotypy and comment on future trends and perspectives.

  9. Evaluating the Security of Machine Learning Algorithms

    DTIC Science & Technology

    2008-05-20

    Two far-reaching trends in computing have grown in significance in recent years. First, statistical machine learning has entered the mainstream as a...computing applications. The growing intersection of these trends compels us to investigate how well machine learning performs under adversarial conditions... machine learning has a structure that we can use to build secure learning systems. This thesis makes three high-level contributions. First, we develop a

  10. Spectroscopic Diagnosis of Arsenic Contamination in Agricultural Soils

    PubMed Central

    Shi, Tiezhu; Liu, Huizeng; Chen, Yiyun; Fei, Teng; Wang, Junjie; Wu, Guofeng

    2017-01-01

    This study investigated the abilities of pre-processing, feature selection and machine-learning methods for the spectroscopic diagnosis of soil arsenic contamination. The spectral data were pre-processed by using Savitzky-Golay smoothing, first and second derivatives, multiplicative scatter correction, standard normal variate, and mean centering. Principle component analysis (PCA) and the RELIEF algorithm were used to extract spectral features. Machine-learning methods, including random forests (RF), artificial neural network (ANN), radial basis function- and linear function- based support vector machine (RBF- and LF-SVM) were employed for establishing diagnosis models. The model accuracies were evaluated and compared by using overall accuracies (OAs). The statistical significance of the difference between models was evaluated by using McNemar’s test (Z value). The results showed that the OAs varied with the different combinations of pre-processing, feature selection, and classification methods. Feature selection methods could improve the modeling efficiencies and diagnosis accuracies, and RELIEF often outperformed PCA. The optimal models established by RF (OA = 86%), ANN (OA = 89%), RBF- (OA = 89%) and LF-SVM (OA = 87%) had no statistical difference in diagnosis accuracies (Z < 1.96, p < 0.05). These results indicated that it was feasible to diagnose soil arsenic contamination using reflectance spectroscopy. The appropriate combination of multivariate methods was important to improve diagnosis accuracies. PMID:28471412

  11. ANN based Performance Evaluation of BDI for Condition Monitoring of Induction Motor Bearings

    NASA Astrophysics Data System (ADS)

    Patel, Raj Kumar; Giri, V. K.

    2017-06-01

    One of the critical parts in rotating machines is bearings and most of the failure arises from the defective bearings. Bearing failure leads to failure of a machine and the unpredicted productivity loss in the performance. Therefore, bearing fault detection and prognosis is an integral part of the preventive maintenance procedures. In this paper vibration signal for four conditions of a deep groove ball bearing; normal (N), inner race defect (IRD), ball defect (BD) and outer race defect (ORD) were acquired from a customized bearing test rig, under four different conditions and three different fault sizes. Two approaches have been opted for statistical feature extraction from the vibration signal. In the first approach, raw signal is used for statistical feature extraction and in the second approach statistical features extracted are based on bearing damage index (BDI). The proposed BDI technique uses wavelet packet node energy coefficients analysis method. Both the features are used as inputs to an ANN classifier to evaluate its performance. A comparison of ANN performance is made based on raw vibration data and data chosen by using BDI. The ANN performance has been found to be fairly higher when BDI based signals were used as inputs to the classifier.

  12. Optimisation of a machine learning algorithm in human locomotion using principal component and discriminant function analyses.

    PubMed

    Bisele, Maria; Bencsik, Martin; Lewis, Martin G C; Barnett, Cleveland T

    2017-01-01

    Assessment methods in human locomotion often involve the description of normalised graphical profiles and/or the extraction of discrete variables. Whilst useful, these approaches may not represent the full complexity of gait data. Multivariate statistical methods, such as Principal Component Analysis (PCA) and Discriminant Function Analysis (DFA), have been adopted since they have the potential to overcome these data handling issues. The aim of the current study was to develop and optimise a specific machine learning algorithm for processing human locomotion data. Twenty participants ran at a self-selected speed across a 15m runway in barefoot and shod conditions. Ground reaction forces (BW) and kinematics were measured at 1000 Hz and 100 Hz, respectively from which joint angles (°), joint moments (N.m.kg-1) and joint powers (W.kg-1) for the hip, knee and ankle joints were calculated in all three anatomical planes. Using PCA and DFA, power spectra of the kinematic and kinetic variables were used as a training database for the development of a machine learning algorithm. All possible combinations of 10 out of 20 participants were explored to find the iteration of individuals that would optimise the machine learning algorithm. The results showed that the algorithm was able to successfully predict whether a participant ran shod or barefoot in 93.5% of cases. To the authors' knowledge, this is the first study to optimise the development of a machine learning algorithm.

  13. Optimisation of a machine learning algorithm in human locomotion using principal component and discriminant function analyses

    PubMed Central

    Bisele, Maria; Bencsik, Martin; Lewis, Martin G. C.

    2017-01-01

    Assessment methods in human locomotion often involve the description of normalised graphical profiles and/or the extraction of discrete variables. Whilst useful, these approaches may not represent the full complexity of gait data. Multivariate statistical methods, such as Principal Component Analysis (PCA) and Discriminant Function Analysis (DFA), have been adopted since they have the potential to overcome these data handling issues. The aim of the current study was to develop and optimise a specific machine learning algorithm for processing human locomotion data. Twenty participants ran at a self-selected speed across a 15m runway in barefoot and shod conditions. Ground reaction forces (BW) and kinematics were measured at 1000 Hz and 100 Hz, respectively from which joint angles (°), joint moments (N.m.kg-1) and joint powers (W.kg-1) for the hip, knee and ankle joints were calculated in all three anatomical planes. Using PCA and DFA, power spectra of the kinematic and kinetic variables were used as a training database for the development of a machine learning algorithm. All possible combinations of 10 out of 20 participants were explored to find the iteration of individuals that would optimise the machine learning algorithm. The results showed that the algorithm was able to successfully predict whether a participant ran shod or barefoot in 93.5% of cases. To the authors’ knowledge, this is the first study to optimise the development of a machine learning algorithm. PMID:28886059

  14. Pattern Activity Clustering and Evaluation (PACE)

    NASA Astrophysics Data System (ADS)

    Blasch, Erik; Banas, Christopher; Paul, Michael; Bussjager, Becky; Seetharaman, Guna

    2012-06-01

    With the vast amount of network information available on activities of people (i.e. motions, transportation routes, and site visits) there is a need to explore the salient properties of data that detect and discriminate the behavior of individuals. Recent machine learning approaches include methods of data mining, statistical analysis, clustering, and estimation that support activity-based intelligence. We seek to explore contemporary methods in activity analysis using machine learning techniques that discover and characterize behaviors that enable grouping, anomaly detection, and adversarial intent prediction. To evaluate these methods, we describe the mathematics and potential information theory metrics to characterize behavior. A scenario is presented to demonstrate the concept and metrics that could be useful for layered sensing behavior pattern learning and analysis. We leverage work on group tracking, learning and clustering approaches; as well as utilize information theoretical metrics for classification, behavioral and event pattern recognition, and activity and entity analysis. The performance evaluation of activity analysis supports high-level information fusion of user alerts, data queries and sensor management for data extraction, relations discovery, and situation analysis of existing data.

  15. Fault Diagnosis for Rotating Machinery Using Vibration Measurement Deep Statistical Feature Learning.

    PubMed

    Li, Chuan; Sánchez, René-Vinicio; Zurita, Grover; Cerrada, Mariela; Cabrera, Diego

    2016-06-17

    Fault diagnosis is important for the maintenance of rotating machinery. The detection of faults and fault patterns is a challenging part of machinery fault diagnosis. To tackle this problem, a model for deep statistical feature learning from vibration measurements of rotating machinery is presented in this paper. Vibration sensor signals collected from rotating mechanical systems are represented in the time, frequency, and time-frequency domains, each of which is then used to produce a statistical feature set. For learning statistical features, real-value Gaussian-Bernoulli restricted Boltzmann machines (GRBMs) are stacked to develop a Gaussian-Bernoulli deep Boltzmann machine (GDBM). The suggested approach is applied as a deep statistical feature learning tool for both gearbox and bearing systems. The fault classification performances in experiments using this approach are 95.17% for the gearbox, and 91.75% for the bearing system. The proposed approach is compared to such standard methods as a support vector machine, GRBM and a combination model. In experiments, the best fault classification rate was detected using the proposed model. The results show that deep learning with statistical feature extraction has an essential improvement potential for diagnosing rotating machinery faults.

  16. Fault Diagnosis for Rotating Machinery Using Vibration Measurement Deep Statistical Feature Learning

    PubMed Central

    Li, Chuan; Sánchez, René-Vinicio; Zurita, Grover; Cerrada, Mariela; Cabrera, Diego

    2016-01-01

    Fault diagnosis is important for the maintenance of rotating machinery. The detection of faults and fault patterns is a challenging part of machinery fault diagnosis. To tackle this problem, a model for deep statistical feature learning from vibration measurements of rotating machinery is presented in this paper. Vibration sensor signals collected from rotating mechanical systems are represented in the time, frequency, and time-frequency domains, each of which is then used to produce a statistical feature set. For learning statistical features, real-value Gaussian-Bernoulli restricted Boltzmann machines (GRBMs) are stacked to develop a Gaussian-Bernoulli deep Boltzmann machine (GDBM). The suggested approach is applied as a deep statistical feature learning tool for both gearbox and bearing systems. The fault classification performances in experiments using this approach are 95.17% for the gearbox, and 91.75% for the bearing system. The proposed approach is compared to such standard methods as a support vector machine, GRBM and a combination model. In experiments, the best fault classification rate was detected using the proposed model. The results show that deep learning with statistical feature extraction has an essential improvement potential for diagnosing rotating machinery faults. PMID:27322273

  17. Machine Learning Based Multi-Physical-Model Blending for Enhancing Renewable Energy Forecast -- Improvement via Situation Dependent Error Correction

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lu, Siyuan; Hwang, Youngdeok; Khabibrakhmanov, Ildar

    With increasing penetration of solar and wind energy to the total energy supply mix, the pressing need for accurate energy forecasting has become well-recognized. Here we report the development of a machine-learning based model blending approach for statistically combining multiple meteorological models for improving the accuracy of solar/wind power forecast. Importantly, we demonstrate that in addition to parameters to be predicted (such as solar irradiance and power), including additional atmospheric state parameters which collectively define weather situations as machine learning input provides further enhanced accuracy for the blended result. Functional analysis of variance shows that the error of individual modelmore » has substantial dependence on the weather situation. The machine-learning approach effectively reduces such situation dependent error thus produces more accurate results compared to conventional multi-model ensemble approaches based on simplistic equally or unequally weighted model averaging. Validation over an extended period of time results show over 30% improvement in solar irradiance/power forecast accuracy compared to forecasts based on the best individual model.« less

  18. Certification of highly complex safety-related systems.

    PubMed

    Reinert, D; Schaefer, M

    1999-01-01

    The BIA has now 15 years of experience with the certification of complex electronic systems for safety-related applications in the machinery sector. Using the example of machining centres this presentation will show the systematic procedure for verifying and validating control systems using Application Specific Integrated Circuits (ASICs) and microcomputers for safety functions. One section will describe the control structure of machining centres with control systems using "integrated safety." A diverse redundant architecture combined with crossmonitoring and forced dynamization is explained. In the main section the steps of the systematic certification procedure are explained showing some results of the certification of drilling machines. Specification reviews, design reviews with test case specification, statistical analysis, and walk-throughs are the analytical measures in the testing process. Systematic tests based on the test case specification, Electro Magnetic Interference (EMI), and environmental testing, and site acceptance tests on the machines are the testing measures for validation. A complex software driven system is always undergoing modification. Most of the changes are not safety-relevant but this has to be proven. A systematic procedure for certifying software modifications is presented in the last section of the paper.

  19. Evaluating data distribution and drift vulnerabilities of machine learning algorithms in secure and adversarial environments

    NASA Astrophysics Data System (ADS)

    Nelson, Kevin; Corbin, George; Blowers, Misty

    2014-05-01

    Machine learning is continuing to gain popularity due to its ability to solve problems that are difficult to model using conventional computer programming logic. Much of the current and past work has focused on algorithm development, data processing, and optimization. Lately, a subset of research has emerged which explores issues related to security. This research is gaining traction as systems employing these methods are being applied to both secure and adversarial environments. One of machine learning's biggest benefits, its data-driven versus logic-driven approach, is also a weakness if the data on which the models rely are corrupted. Adversaries could maliciously influence systems which address drift and data distribution changes using re-training and online learning. Our work is focused on exploring the resilience of various machine learning algorithms to these data-driven attacks. In this paper, we present our initial findings using Monte Carlo simulations, and statistical analysis, to explore the maximal achievable shift to a classification model, as well as the required amount of control over the data.

  20. Selected aspects of microelectronics technology and applications: Numerically controlled machine tools. Technology trends series no. 2

    NASA Astrophysics Data System (ADS)

    Sigurdson, J.; Tagerud, J.

    1986-05-01

    A UNIDO publication about machine tools with automatic control discusses the following: (1) numerical control (NC) machine tool perspectives, definition of NC, flexible manufacturing systems, robots and their industrial application, research and development, and sensors; (2) experience in developing a capability in NC machine tools; (3) policy issues; (4) procedures for retrieval of relevant documentation from data bases. Diagrams, statistics, bibliography are included.

  1. Quantitative sensory testing response patterns to capsaicin- and ultraviolet-B–induced local skin hypersensitization in healthy subjects: a machine-learned analysis

    PubMed Central

    Lötsch, Jörn; Geisslinger, Gerd; Heinemann, Sarah; Lerch, Florian; Oertel, Bruno G.; Ultsch, Alfred

    2018-01-01

    Abstract The comprehensive assessment of pain-related human phenotypes requires combinations of nociceptive measures that produce complex high-dimensional data, posing challenges to bioinformatic analysis. In this study, we assessed established experimental models of heat hyperalgesia of the skin, consisting of local ultraviolet-B (UV-B) irradiation or capsaicin application, in 82 healthy subjects using a variety of noxious stimuli. We extended the original heat stimulation by applying cold and mechanical stimuli and assessing the hypersensitization effects with a clinically established quantitative sensory testing (QST) battery (German Research Network on Neuropathic Pain). This study provided a 246 × 10-sized data matrix (82 subjects assessed at baseline, following UV-B application, and following capsaicin application) with respect to 10 QST parameters, which we analyzed using machine-learning techniques. We observed statistically significant effects of the hypersensitization treatments in 9 different QST parameters. Supervised machine-learned analysis implemented as random forests followed by ABC analysis pointed to heat pain thresholds as the most relevantly affected QST parameter. However, decision tree analysis indicated that UV-B additionally modulated sensitivity to cold. Unsupervised machine-learning techniques, implemented as emergent self-organizing maps, hinted at subgroups responding to topical application of capsaicin. The distinction among subgroups was based on sensitivity to pressure pain, which could be attributed to sex differences, with women being more sensitive than men. Thus, while UV-B and capsaicin share a major component of heat pain sensitization, they differ in their effects on QST parameter patterns in healthy subjects, suggesting a lack of redundancy between these models. PMID:28700537

  2. Statistical machine translation for biomedical text: are we there yet?

    PubMed

    Wu, Cuijun; Xia, Fei; Deleger, Louise; Solti, Imre

    2011-01-01

    In our paper we addressed the research question: "Has machine translation achieved sufficiently high quality to translate PubMed titles for patients?". We analyzed statistical machine translation output for six foreign language - English translation pairs (bi-directionally). We built a high performing in-house system and evaluated its output for each translation pair on large scale both with automated BLEU scores and human judgment. In addition to the in-house system, we also evaluated Google Translate's performance specifically within the biomedical domain. We report high performance for German, French and Spanish -- English bi-directional translation pairs for both Google Translate and our system.

  3. Time-Frequency Learning Machines for Nonstationarity Detection Using Surrogates

    NASA Astrophysics Data System (ADS)

    Borgnat, Pierre; Flandrin, Patrick; Richard, Cédric; Ferrari, André; Amoud, Hassan; Honeine, Paul

    2012-03-01

    Time-frequency representations provide a powerful tool for nonstationary signal analysis and classification, supporting a wide range of applications [12]. As opposed to conventional Fourier analysis, these techniques reveal the evolution in time of the spectral content of signals. In Ref. [7,38], time-frequency analysis is used to test stationarity of any signal. The proposed method consists of a comparison between global and local time-frequency features. The originality is to make use of a family of stationary surrogate signals for defining the null hypothesis of stationarity and, based upon this information, to derive statistical tests. An open question remains, however, about how to choose relevant time-frequency features. Over the last decade, a number of new pattern recognition methods based on reproducing kernels have been introduced. These learning machines have gained popularity due to their conceptual simplicity and their outstanding performance [30]. Initiated by Vapnik’s support vector machines (SVM) [35], they offer now a wide class of supervised and unsupervised learning algorithms. In Ref. [17-19], the authors have shown how the most effective and innovative learning machines can be tuned to operate in the time-frequency domain. This chapter follows this line of research by taking advantage of learning machines to test and quantify stationarity. Based on one-class SVM, our approach uses the entire time-frequency representation and does not require arbitrary feature extraction. Applied to a set of surrogates, it provides the domain boundary that includes most of these stationarized signals. This allows us to test the stationarity of the signal under investigation. This chapter is organized as follows. In Section 22.2, we introduce the surrogate data method to generate stationarized signals, namely, the null hypothesis of stationarity. The concept of time-frequency learning machines is presented in Section 22.3, and applied to one-class SVM in order to derive a stationarity test in Section 22.4. The relevance of the latter is illustrated by simulation results in Section 22.5.

  4. A Pulsed Thermographic Imaging System for Detection and Identification of Cotton Foreign Matter

    PubMed Central

    Kuzy, Jesse; Li, Changying

    2017-01-01

    Detection of foreign matter in cleaned cotton is instrumental to accurately grading cotton quality, which in turn impacts the marketability of the cotton. Current grading systems return estimates of the amount of foreign matter present, but provide no information about the identity of the contaminants. This paper explores the use of pulsed thermographic analysis to detect and identify cotton foreign matter. The design and implementation of a pulsed thermographic analysis system is described. A sample set of 240 foreign matter and cotton lint samples were collected. Hand-crafted waveform features and frequency-domain features were extracted and analyzed for statistical significance. Classification was performed on these features using linear discriminant analysis and support vector machines. Using waveform features and support vector machine classifiers, detection of cotton foreign matter was performed with 99.17% accuracy. Using frequency-domain features and linear discriminant analysis, identification was performed with 90.00% accuracy. These results demonstrate that pulsed thermographic imaging analysis produces data which is of significant utility for the detection and identification of cotton foreign matter. PMID:28273848

  5. Effect of Thermal and Chemical Treatment on the Microstructural, Mechanical and Machining Performance of W319 Al-Si-Cu Cast Alloy Engine Blocks and Directionally Solidified Machinability Test Blocks

    NASA Astrophysics Data System (ADS)

    Szablewski, Daniel

    The research presented in this work is focused on making a link between casting microstructural, mechanical and machining properties for 319 Al-Si sand cast components. In order to achieve this, a unique Machinability Test Block (MTB) is designed to simulate the Nemak V6 Al-Si engine block solidification behavior. This MTB is then utilized to cast structures with in-situ nano-alumina particle master alloy additions that are Mg based, as well as independent in-situ Mg additions, and Sr additions to the MTB. The Universal Metallurgical Simulator and Analyzer (UMSA) Technology Platform is utilized for characterization of each cast structure at different Secondary Dendrite Arm Spacing (SDAS) levels. The rapid quench method and Jominy testing is used to assess the capability of the nano-alumina master alloy to modify the microstructure at different SDAS levels. Mechanical property assessment of the MTB is done at different SDAS levels on cast structures with master alloy additions described above. Weibull and Quality Index statistical analysis tools are then utilized to assess the mechanical properties. The MTB is also used to study single pass high speed face milling and bi-metallic cutting operations where the Al-Si hypoeutectic structure is combined with hypereutectoid Al-Si liners and cast iron cylinder liners. These studies are utilized to aid the implementation of Al-Si liners into the Nemak V6 engine block and bi-metallic cutting of the head decks. Machining behavior is also quantified for the investigated microstructures, and the Silicon Modification Level (SiML) is utilized for microstructural analysis as it relates to the machining behavior.

  6. Constructing and validating readability models: the method of integrating multilevel linguistic features with machine learning.

    PubMed

    Sung, Yao-Ting; Chen, Ju-Ling; Cha, Ji-Her; Tseng, Hou-Chiang; Chang, Tao-Hsing; Chang, Kuo-En

    2015-06-01

    Multilevel linguistic features have been proposed for discourse analysis, but there have been few applications of multilevel linguistic features to readability models and also few validations of such models. Most traditional readability formulae are based on generalized linear models (GLMs; e.g., discriminant analysis and multiple regression), but these models have to comply with certain statistical assumptions about data properties and include all of the data in formulae construction without pruning the outliers in advance. The use of such readability formulae tends to produce a low text classification accuracy, while using a support vector machine (SVM) in machine learning can enhance the classification outcome. The present study constructed readability models by integrating multilevel linguistic features with SVM, which is more appropriate for text classification. Taking the Chinese language as an example, this study developed 31 linguistic features as the predicting variables at the word, semantic, syntax, and cohesion levels, with grade levels of texts as the criterion variable. The study compared four types of readability models by integrating unilevel and multilevel linguistic features with GLMs and an SVM. The results indicate that adopting a multilevel approach in readability analysis provides a better representation of the complexities of both texts and the reading comprehension process.

  7. Spectral feature extraction of EEG signals and pattern recognition during mental tasks of 2-D cursor movements for BCI using SVM and ANN.

    PubMed

    Bascil, M Serdar; Tesneli, Ahmet Y; Temurtas, Feyzullah

    2016-09-01

    Brain computer interface (BCI) is a new communication way between man and machine. It identifies mental task patterns stored in electroencephalogram (EEG). So, it extracts brain electrical activities recorded by EEG and transforms them machine control commands. The main goal of BCI is to make available assistive environmental devices for paralyzed people such as computers and makes their life easier. This study deals with feature extraction and mental task pattern recognition on 2-D cursor control from EEG as offline analysis approach. The hemispherical power density changes are computed and compared on alpha-beta frequency bands with only mental imagination of cursor movements. First of all, power spectral density (PSD) features of EEG signals are extracted and high dimensional data reduced by principle component analysis (PCA) and independent component analysis (ICA) which are statistical algorithms. In the last stage, all features are classified with two types of support vector machine (SVM) which are linear and least squares (LS-SVM) and three different artificial neural network (ANN) structures which are learning vector quantization (LVQ), multilayer neural network (MLNN) and probabilistic neural network (PNN) and mental task patterns are successfully identified via k-fold cross validation technique.

  8. Machine learning of swimming data via wisdom of crowd and regression analysis.

    PubMed

    Xie, Jiang; Xu, Junfu; Nie, Celine; Nie, Qing

    2017-04-01

    Every performance, in an officially sanctioned meet, by a registered USA swimmer is recorded into an online database with times dating back to 1980. For the first time, statistical analysis and machine learning methods are systematically applied to 4,022,631 swim records. In this study, we investigate performance features for all strokes as a function of age and gender. The variances in performance of males and females for different ages and strokes were studied, and the correlations of performances for different ages were estimated using the Pearson correlation. Regression analysis show the performance trends for both males and females at different ages and suggest critical ages for peak training. Moreover, we assess twelve popular machine learning methods to predict or classify swimmer performance. Each method exhibited different strengths or weaknesses in different cases, indicating no one method could predict well for all strokes. To address this problem, we propose a new method by combining multiple inference methods to derive Wisdom of Crowd Classifier (WoCC). Our simulation experiments demonstrate that the WoCC is a consistent method with better overall prediction accuracy. Our study reveals several new age-dependent trends in swimming and provides an accurate method for classifying and predicting swimming times.

  9. Diagnosis by Volatile Organic Compounds in Exhaled Breath from Lung Cancer Patients Using Support Vector Machine Algorithm

    PubMed Central

    Sakumura, Yuichi; Koyama, Yutaro; Tokutake, Hiroaki; Hida, Toyoaki; Sato, Kazuo; Itoh, Toshio; Akamatsu, Takafumi; Shin, Woosuck

    2017-01-01

    Monitoring exhaled breath is a very attractive, noninvasive screening technique for early diagnosis of diseases, especially lung cancer. However, the technique provides insufficient accuracy because the exhaled air has many crucial volatile organic compounds (VOCs) at very low concentrations (ppb level). We analyzed the breath exhaled by lung cancer patients and healthy subjects (controls) using gas chromatography/mass spectrometry (GC/MS), and performed a subsequent statistical analysis to diagnose lung cancer based on the combination of multiple lung cancer-related VOCs. We detected 68 VOCs as marker species using GC/MS analysis. We reduced the number of VOCs and used support vector machine (SVM) algorithm to classify the samples. We observed that a combination of five VOCs (CHN, methanol, CH3CN, isoprene, 1-propanol) is sufficient for 89.0% screening accuracy, and hence, it can be used for the design and development of a desktop GC-sensor analysis system for lung cancer. PMID:28165388

  10. Diagnosis by Volatile Organic Compounds in Exhaled Breath from Lung Cancer Patients Using Support Vector Machine Algorithm.

    PubMed

    Sakumura, Yuichi; Koyama, Yutaro; Tokutake, Hiroaki; Hida, Toyoaki; Sato, Kazuo; Itoh, Toshio; Akamatsu, Takafumi; Shin, Woosuck

    2017-02-04

    Monitoring exhaled breath is a very attractive, noninvasive screening technique for early diagnosis of diseases, especially lung cancer. However, the technique provides insufficient accuracy because the exhaled air has many crucial volatile organic compounds (VOCs) at very low concentrations (ppb level). We analyzed the breath exhaled by lung cancer patients and healthy subjects (controls) using gas chromatography/mass spectrometry (GC/MS), and performed a subsequent statistical analysis to diagnose lung cancer based on the combination of multiple lung cancer-related VOCs. We detected 68 VOCs as marker species using GC/MS analysis. We reduced the number of VOCs and used support vector machine (SVM) algorithm to classify the samples. We observed that a combination of five VOCs (CHN, methanol, CH₃CN, isoprene, 1-propanol) is sufficient for 89.0% screening accuracy, and hence, it can be used for the design and development of a desktop GC-sensor analysis system for lung cancer.

  11. Moving beyond regression techniques in cardiovascular risk prediction: applying machine learning to address analytic challenges.

    PubMed

    Goldstein, Benjamin A; Navar, Ann Marie; Carter, Rickey E

    2017-06-14

    Risk prediction plays an important role in clinical cardiology research. Traditionally, most risk models have been based on regression models. While useful and robust, these statistical methods are limited to using a small number of predictors which operate in the same way on everyone, and uniformly throughout their range. The purpose of this review is to illustrate the use of machine-learning methods for development of risk prediction models. Typically presented as black box approaches, most machine-learning methods are aimed at solving particular challenges that arise in data analysis that are not well addressed by typical regression approaches. To illustrate these challenges, as well as how different methods can address them, we consider trying to predicting mortality after diagnosis of acute myocardial infarction. We use data derived from our institution's electronic health record and abstract data on 13 regularly measured laboratory markers. We walk through different challenges that arise in modelling these data and then introduce different machine-learning approaches. Finally, we discuss general issues in the application of machine-learning methods including tuning parameters, loss functions, variable importance, and missing data. Overall, this review serves as an introduction for those working on risk modelling to approach the diffuse field of machine learning. © The Author 2016. Published by Oxford University Press on behalf of the European Society of Cardiology.

  12. Clinical use and misuse of automated semen analysis.

    PubMed

    Sherins, R J

    1991-01-01

    During the past six years, there has been an explosion of technology which allows automated machine-vision for sperm analysis. CASA clearly provides an opportunity for objective, systematic assessment of sperm motion. But there are many caveats in using this type of equipment. CASA requires a disciplined and standardized approach to semen collection, specimen preparation, machine settings, calibration and avoidance of sampling bias. Potential sources of error can be minimized. Unfortunately, the rapid commercialization of this technology preceded detailed statistical analysis of such data to allow equally rapid comparisons of data between different CASA machines and among different laboratories. Thus, it is now imperative that we standardize use of this technology and obtain more detailed biological insights into sperm motion parameters in semen and after capacitation before we empirically employ CASA for studies of fertility prediction. In the basic science arena, CASA technology will likely evolve to provide new algorithms for accurate sperm motion analysis and give us an opportunity to address the biophysics of sperm movement. In the clinical arena, CASA instruments provide the opportunity to share and compare sperm motion data among laboratories by virtue of its objectivity, assuming standardized conditions of utilization. Identification of men with specific sperm motion disorders is certain, but the biological relevance of motility dysfunction to actual fertilization remains uncertain and surely the subject for further study.

  13. Machine Learning–Based Differential Network Analysis: A Study of Stress-Responsive Transcriptomes in Arabidopsis[W

    PubMed Central

    Ma, Chuang; Xin, Mingming; Feldmann, Kenneth A.; Wang, Xiangfeng

    2014-01-01

    Machine learning (ML) is an intelligent data mining technique that builds a prediction model based on the learning of prior knowledge to recognize patterns in large-scale data sets. We present an ML-based methodology for transcriptome analysis via comparison of gene coexpression networks, implemented as an R package called machine learning–based differential network analysis (mlDNA) and apply this method to reanalyze a set of abiotic stress expression data in Arabidopsis thaliana. The mlDNA first used a ML-based filtering process to remove nonexpressed, constitutively expressed, or non-stress-responsive “noninformative” genes prior to network construction, through learning the patterns of 32 expression characteristics of known stress-related genes. The retained “informative” genes were subsequently analyzed by ML-based network comparison to predict candidate stress-related genes showing expression and network differences between control and stress networks, based on 33 network topological characteristics. Comparative evaluation of the network-centric and gene-centric analytic methods showed that mlDNA substantially outperformed traditional statistical testing–based differential expression analysis at identifying stress-related genes, with markedly improved prediction accuracy. To experimentally validate the mlDNA predictions, we selected 89 candidates out of the 1784 predicted salt stress–related genes with available SALK T-DNA mutagenesis lines for phenotypic screening and identified two previously unreported genes, mutants of which showed salt-sensitive phenotypes. PMID:24520154

  14. Statistics and Machine Learning based Outlier Detection Techniques for Exoplanets

    NASA Astrophysics Data System (ADS)

    Goel, Amit; Montgomery, Michele

    2015-08-01

    Architectures of planetary systems are observable snapshots in time that can indicate formation and dynamic evolution of planets. The observable key parameters that we consider are planetary mass and orbital period. If planet masses are significantly less than their host star masses, then Keplerian Motion is defined as P^2 = a^3 where P is the orbital period in units of years and a is the orbital period in units of Astronomical Units (AU). Keplerian motion works on small scales such as the size of the Solar System but not on large scales such as the size of the Milky Way Galaxy. In this work, for confirmed exoplanets of known stellar mass, planetary mass, orbital period, and stellar age, we analyze Keplerian motion of systems based on stellar age to seek if Keplerian motion has an age dependency and to identify outliers. For detecting outliers, we apply several techniques based on statistical and machine learning methods such as probabilistic, linear, and proximity based models. In probabilistic and statistical models of outliers, the parameters of a closed form probability distributions are learned in order to detect the outliers. Linear models use regression analysis based techniques for detecting outliers. Proximity based models use distance based algorithms such as k-nearest neighbour, clustering algorithms such as k-means, or density based algorithms such as kernel density estimation. In this work, we will use unsupervised learning algorithms with only the proximity based models. In addition, we explore the relative strengths and weaknesses of the various techniques by validating the outliers. The validation criteria for the outliers is if the ratio of planetary mass to stellar mass is less than 0.001. In this work, we present our statistical analysis of the outliers thus detected.

  15. ROOT: A C++ framework for petabyte data storage, statistical analysis and visualization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Antcheva, I.; /CERN; Ballintijn, M.

    2009-01-01

    ROOT is an object-oriented C++ framework conceived in the high-energy physics (HEP) community, designed for storing and analyzing petabytes of data in an efficient way. Any instance of a C++ class can be stored into a ROOT file in a machine-independent compressed binary format. In ROOT the TTree object container is optimized for statistical data analysis over very large data sets by using vertical data storage techniques. These containers can span a large number of files on local disks, the web or a number of different shared file systems. In order to analyze this data, the user can chose outmore » of a wide set of mathematical and statistical functions, including linear algebra classes, numerical algorithms such as integration and minimization, and various methods for performing regression analysis (fitting). In particular, the RooFit package allows the user to perform complex data modeling and fitting while the RooStats library provides abstractions and implementations for advanced statistical tools. Multivariate classification methods based on machine learning techniques are available via the TMVA package. A central piece in these analysis tools are the histogram classes which provide binning of one- and multi-dimensional data. Results can be saved in high-quality graphical formats like Postscript and PDF or in bitmap formats like JPG or GIF. The result can also be stored into ROOT macros that allow a full recreation and rework of the graphics. Users typically create their analysis macros step by step, making use of the interactive C++ interpreter CINT, while running over small data samples. Once the development is finished, they can run these macros at full compiled speed over large data sets, using on-the-fly compilation, or by creating a stand-alone batch program. Finally, if processing farms are available, the user can reduce the execution time of intrinsically parallel tasks - e.g. data mining in HEP - by using PROOF, which will take care of optimally distributing the work over the available resources in a transparent way.« less

  16. Machine Learning-Augmented Propensity Score-Adjusted Multilevel Mixed Effects Panel Analysis of Hands-On Cooking and Nutrition Education versus Traditional Curriculum for Medical Students as Preventive Cardiology: Multisite Cohort Study of 3,248 Trainees over 5 Years

    PubMed Central

    Dart, Lyn; Vanbeber, Anne; Smith-Barbaro, Peggy; Costilla, Vanessa; Samuel, Charlotte; Terregino, Carol A.; Abali, Emine Ercikan; Dollinger, Beth; Baumgartner, Nicole; Kramer, Nicholas; Seelochan, Alex; Taher, Sabira; Deutchman, Mark; Evans, Meredith; Ellis, Robert B.; Oyola, Sonia; Maker-Clark, Geeta; Budnick, Isadore; Tran, David; DeValle, Nicole; Shepard, Rachel; Chow, Erika; Petrin, Christine; Razavi, Alexander; McGowan, Casey; Grant, Austin; Bird, Mackenzie; Carry, Connor; McGowan, Glynis; McCullough, Colleen; Berman, Casey M.; Dotson, Kerri; Sarris, Leah; Harlan, Timothy S.; Co-investigators, on behalf of the CHOP

    2018-01-01

    Background Cardiovascular disease (CVD) annually claims more lives and costs more dollars than any other disease globally amid widening health disparities, despite the known significant reductions in this burden by low cost dietary changes. The world's first medical school-based teaching kitchen therefore launched CHOP-Medical Students as the largest known multisite cohort study of hands-on cooking and nutrition education versus traditional curriculum for medical students. Methods This analysis provides a novel integration of artificial intelligence-based machine learning (ML) with causal inference statistics. 43 ML automated algorithms were tested, with the top performer compared to triply robust propensity score-adjusted multilevel mixed effects regression panel analysis of longitudinal data. Inverse-variance weighted fixed effects meta-analysis pooled the individual estimates for competencies. Results 3,248 unique medical trainees met study criteria from 20 medical schools nationally from August 1, 2012, to June 26, 2017, generating 4,026 completed validated surveys. ML analysis produced similar results to the causal inference statistics based on root mean squared error and accuracy. Hands-on cooking and nutrition education compared to traditional medical school curriculum significantly improved student competencies (OR 2.14, 95% CI 2.00–2.28, p < 0.001) and MedDiet adherence (OR 1.40, 95% CI 1.07–1.84, p = 0.015), while reducing trainees' soft drink consumption (OR 0.56, 95% CI 0.37–0.85, p = 0.007). Overall improved competencies were demonstrated from the initial study site through the scale-up of the intervention to 10 sites nationally (p < 0.001). Discussion This study provides the first machine learning-augmented causal inference analysis of a multisite cohort showing hands-on cooking and nutrition education for medical trainees improves their competencies counseling patients on nutrition, while improving students' own diets. This study suggests that the public health and medical sectors can unite population health management and precision medicine for a sustainable model of next-generation health systems providing effective, equitable, accessible care beginning with reversing the CVD epidemic. PMID:29850526

  17. Machine Learning-Augmented Propensity Score-Adjusted Multilevel Mixed Effects Panel Analysis of Hands-On Cooking and Nutrition Education versus Traditional Curriculum for Medical Students as Preventive Cardiology: Multisite Cohort Study of 3,248 Trainees over 5 Years.

    PubMed

    Monlezun, Dominique J; Dart, Lyn; Vanbeber, Anne; Smith-Barbaro, Peggy; Costilla, Vanessa; Samuel, Charlotte; Terregino, Carol A; Abali, Emine Ercikan; Dollinger, Beth; Baumgartner, Nicole; Kramer, Nicholas; Seelochan, Alex; Taher, Sabira; Deutchman, Mark; Evans, Meredith; Ellis, Robert B; Oyola, Sonia; Maker-Clark, Geeta; Dreibelbis, Tomi; Budnick, Isadore; Tran, David; DeValle, Nicole; Shepard, Rachel; Chow, Erika; Petrin, Christine; Razavi, Alexander; McGowan, Casey; Grant, Austin; Bird, Mackenzie; Carry, Connor; McGowan, Glynis; McCullough, Colleen; Berman, Casey M; Dotson, Kerri; Niu, Tianhua; Sarris, Leah; Harlan, Timothy S; Co-Investigators, On Behalf Of The Chop

    2018-01-01

    Cardiovascular disease (CVD) annually claims more lives and costs more dollars than any other disease globally amid widening health disparities, despite the known significant reductions in this burden by low cost dietary changes. The world's first medical school-based teaching kitchen therefore launched CHOP-Medical Students as the largest known multisite cohort study of hands-on cooking and nutrition education versus traditional curriculum for medical students. This analysis provides a novel integration of artificial intelligence-based machine learning (ML) with causal inference statistics. 43 ML automated algorithms were tested, with the top performer compared to triply robust propensity score-adjusted multilevel mixed effects regression panel analysis of longitudinal data. Inverse-variance weighted fixed effects meta-analysis pooled the individual estimates for competencies. 3,248 unique medical trainees met study criteria from 20 medical schools nationally from August 1, 2012, to June 26, 2017, generating 4,026 completed validated surveys. ML analysis produced similar results to the causal inference statistics based on root mean squared error and accuracy. Hands-on cooking and nutrition education compared to traditional medical school curriculum significantly improved student competencies (OR 2.14, 95% CI 2.00-2.28, p < 0.001) and MedDiet adherence (OR 1.40, 95% CI 1.07-1.84, p = 0.015), while reducing trainees' soft drink consumption (OR 0.56, 95% CI 0.37-0.85, p = 0.007). Overall improved competencies were demonstrated from the initial study site through the scale-up of the intervention to 10 sites nationally ( p < 0.001). This study provides the first machine learning-augmented causal inference analysis of a multisite cohort showing hands-on cooking and nutrition education for medical trainees improves their competencies counseling patients on nutrition, while improving students' own diets. This study suggests that the public health and medical sectors can unite population health management and precision medicine for a sustainable model of next-generation health systems providing effective, equitable, accessible care beginning with reversing the CVD epidemic.

  18. Analysis of miRNA expression profile based on SVM algorithm

    NASA Astrophysics Data System (ADS)

    Ting-ting, Dai; Chang-ji, Shan; Yan-shou, Dong; Yi-duo, Bian

    2018-05-01

    Based on mirna expression spectrum data set, a new data mining algorithm - tSVM - KNN (t statistic with support vector machine - k nearest neighbor) is proposed. the idea of the algorithm is: firstly, the feature selection of the data set is carried out by the unified measurement method; Secondly, SVM - KNN algorithm, which combines support vector machine (SVM) and k - nearest neighbor (k - nearest neighbor) is used as classifier. Simulation results show that SVM - KNN algorithm has better classification ability than SVM and KNN alone. Tsvm - KNN algorithm only needs 5 mirnas to obtain 96.08 % classification accuracy in terms of the number of mirna " tags" and recognition accuracy. compared with similar algorithms, tsvm - KNN algorithm has obvious advantages.

  19. Issues on machine learning for prediction of classes among molecular sequences of plants and animals

    NASA Astrophysics Data System (ADS)

    Stehlik, Milan; Pant, Bhasker; Pant, Kumud; Pardasani, K. R.

    2012-09-01

    Nowadays major laboratories of the world are turning towards in-silico experimentation due to their ease, reproducibility and accuracy. The ethical issues concerning wet lab experimentations are also minimal in in-silico experimentations. But before we turn fully towards dry lab simulations it is necessary to understand the discrepancies and bottle necks involved with dry lab experimentations. It is necessary before reporting any result using dry lab simulations to perform in-depth statistical analysis of the data. Keeping same in mind here we are presenting a collaborative effort to correlate findings and results of various machine learning algorithms and checking underlying regressions and mutual dependencies so as to develop an optimal classifier and predictors.

  20. Enhancing Research in Networking & System Security, and Forensics, in Puerto Rico

    DTIC Science & Technology

    2015-03-03

    Researcher and her research revolves around using Cognitive Systems, which are machines that can think, listen and see in order to help the disabled ...Subsequence. The implementation is been conducted using R- Language because of its statistical and analysis abilities. Because it works using a command line...Technology. 14-AUG-13, . : , Eduardo Melendez. FROM RANDOM EMBEDDING TECHNIQUES TO ENTROPY USING IMAGEPOINT ADJACENT SHADE VALUES, 12th Annual

  1. Preliminary Evaluation of an Aviation Safety Thesaurus' Utility for Enhancing Automated Processing of Incident Reports

    NASA Technical Reports Server (NTRS)

    Barrientos, Francesca; Castle, Joseph; McIntosh, Dawn; Srivastava, Ashok

    2007-01-01

    This document presents a preliminary evaluation the utility of the FAA Safety Analytics Thesaurus (SAT) utility in enhancing automated document processing applications under development at NASA Ames Research Center (ARC). Current development efforts at ARC are described, including overviews of the statistical machine learning techniques that have been investigated. An analysis of opportunities for applying thesaurus knowledge to improving algorithm performance is then presented.

  2. In-Depth Characterization and Validation of Human Urine Metabolomes Reveal Novel Metabolic Signatures of Lower Urinary Tract Symptoms

    NASA Astrophysics Data System (ADS)

    Hao, Ling; Greer, Tyler; Page, David; Shi, Yatao; Vezina, Chad M.; Macoska, Jill A.; Marker, Paul C.; Bjorling, Dale E.; Bushman, Wade; Ricke, William A.; Li, Lingjun

    2016-08-01

    Lower urinary tract symptoms (LUTS) are a range of irritative or obstructive symptoms that commonly afflict aging population. The diagnosis is mostly based on patient-reported symptoms, and current medication often fails to completely eliminate these symptoms. There is a pressing need for objective non-invasive approaches to measure symptoms and understand disease mechanisms. We developed an in-depth workflow combining urine metabolomics analysis and machine learning bioinformatics to characterize metabolic alterations and support objective diagnosis of LUTS. Machine learning feature selection and statistical tests were combined to identify candidate biomarkers, which were statistically validated with leave-one-patient-out cross-validation and absolutely quantified by selected reaction monitoring assay. Receiver operating characteristic analysis showed highly-accurate prediction power of candidate biomarkers to stratify patients into disease or non-diseased categories. The key metabolites and pathways may be possibly correlated with smooth muscle tone changes, increased collagen content, and inflammation, which have been identified as potential contributors to urinary dysfunction in humans and rodents. Periurethral tissue staining revealed a significant increase in collagen content and tissue stiffness in men with LUTS. Together, our study provides the first characterization and validation of LUTS urinary metabolites and pathways to support the future development of a urine-based diagnostic test for LUTS.

  3. Geospatial and machine learning techniques for wicked social science problems: analysis of crash severity on a regional highway corridor

    NASA Astrophysics Data System (ADS)

    Effati, Meysam; Thill, Jean-Claude; Shabani, Shahin

    2015-04-01

    The contention of this paper is that many social science research problems are too "wicked" to be suitably studied using conventional statistical and regression-based methods of data analysis. This paper argues that an integrated geospatial approach based on methods of machine learning is well suited to this purpose. Recognizing the intrinsic wickedness of traffic safety issues, such approach is used to unravel the complexity of traffic crash severity on highway corridors as an example of such problems. The support vector machine (SVM) and coactive neuro-fuzzy inference system (CANFIS) algorithms are tested as inferential engines to predict crash severity and uncover spatial and non-spatial factors that systematically relate to crash severity, while a sensitivity analysis is conducted to determine the relative influence of crash severity factors. Different specifications of the two methods are implemented, trained, and evaluated against crash events recorded over a 4-year period on a regional highway corridor in Northern Iran. Overall, the SVM model outperforms CANFIS by a notable margin. The combined use of spatial analysis and artificial intelligence is effective at identifying leading factors of crash severity, while explicitly accounting for spatial dependence and spatial heterogeneity effects. Thanks to the demonstrated effectiveness of a sensitivity analysis, this approach produces comprehensive results that are consistent with existing traffic safety theories and supports the prioritization of effective safety measures that are geographically targeted and behaviorally sound on regional highway corridors.

  4. A Cyber-Attack Detection Model Based on Multivariate Analyses

    NASA Astrophysics Data System (ADS)

    Sakai, Yuto; Rinsaka, Koichiro; Dohi, Tadashi

    In the present paper, we propose a novel cyber-attack detection model based on two multivariate-analysis methods to the audit data observed on a host machine. The statistical techniques used here are the well-known Hayashi's quantification method IV and cluster analysis method. We quantify the observed qualitative audit event sequence via the quantification method IV, and collect similar audit event sequence in the same groups based on the cluster analysis. It is shown in simulation experiments that our model can improve the cyber-attack detection accuracy in some realistic cases where both normal and attack activities are intermingled.

  5. Using Perturbed Physics Ensembles and Machine Learning to Select Parameters for Reducing Regional Biases in a Global Climate Model

    NASA Astrophysics Data System (ADS)

    Li, S.; Rupp, D. E.; Hawkins, L.; Mote, P.; McNeall, D. J.; Sarah, S.; Wallom, D.; Betts, R. A.

    2017-12-01

    This study investigates the potential to reduce known summer hot/dry biases over Pacific Northwest in the UK Met Office's atmospheric model (HadAM3P) by simultaneously varying multiple model parameters. The bias-reduction process is done through a series of steps: 1) Generation of perturbed physics ensemble (PPE) through the volunteer computing network weather@home; 2) Using machine learning to train "cheap" and fast statistical emulators of climate model, to rule out regions of parameter spaces that lead to model variants that do not satisfy observational constraints, where the observational constraints (e.g., top-of-atmosphere energy flux, magnitude of annual temperature cycle, summer/winter temperature and precipitation) are introduced sequentially; 3) Designing a new PPE by "pre-filtering" using the emulator results. Steps 1) through 3) are repeated until results are considered to be satisfactory (3 times in our case). The process includes a sensitivity analysis to find dominant parameters for various model output metrics, which reduces the number of parameters to be perturbed with each new PPE. Relative to observational uncertainty, we achieve regional improvements without introducing large biases in other parts of the globe. Our results illustrate the potential of using machine learning to train cheap and fast statistical emulators of climate model, in combination with PPEs in systematic model improvement.

  6. Machine Learning in the Presence of an Adversary: Attacking and Defending the SpamBayes Spam Filter

    DTIC Science & Technology

    2008-05-20

    Machine learning techniques are often used for decision making in security critical applications such as intrusion detection and spam filtering...filter. The defenses shown in this thesis are able to work against the attacks developed against SpamBayes and are sufficiently generic to be easily extended into other statistical machine learning algorithms.

  7. Testing meta tagger

    DTIC Science & Technology

    2017-12-21

    rank , and computer vision. Machine learning is closely related to (and often overlaps with) computational statistics, which also focuses on...Machine learning is a field of computer science that gives computers the ability to learn without being explicitly programmed.[1] Arthur Samuel...an American pioneer in the field of computer gaming and artificial intelligence, coined the term "Machine Learning " in 1959 while at IBM[2]. Evolved

  8. Machine learning Z2 quantum spin liquids with quasiparticle statistics

    NASA Astrophysics Data System (ADS)

    Zhang, Yi; Melko, Roger G.; Kim, Eun-Ah

    2017-12-01

    After decades of progress and effort, obtaining a phase diagram for a strongly correlated topological system still remains a challenge. Although in principle one could turn to Wilson loops and long-range entanglement, evaluating these nonlocal observables at many points in phase space can be prohibitively costly. With growing excitement over topological quantum computation comes the need for an efficient approach for obtaining topological phase diagrams. Here we turn to machine learning using quantum loop topography (QLT), a notion we have recently introduced. Specifically, we propose a construction of QLT that is sensitive to quasiparticle statistics. We then use mutual statistics between the spinons and visons to detect a Z2 quantum spin liquid in a multiparameter phase space. We successfully obtain the quantum phase boundary between the topological and trivial phases using a simple feed-forward neural network. Furthermore, we demonstrate advantages of our approach for the evaluation of phase diagrams relating to speed and storage. Such statistics-based machine learning of topological phases opens new efficient routes to studying topological phase diagrams in strongly correlated systems.

  9. Impact of machining on the flexural fatigue strength of glass and polycrystalline CAD/CAM ceramics.

    PubMed

    Fraga, Sara; Amaral, Marina; Bottino, Marco Antônio; Valandro, Luiz Felipe; Kleverlaan, Cornelis Johannes; May, Liliana Gressler

    2017-11-01

    To assess the effect of machining on the flexural fatigue strength and on the surface roughness of different computer-aided design, computer-aided manufacturing (CAD/CAM) ceramics by comparing machined and polished after machining specimens. Disc-shaped specimens of yttria-stabilized polycrystalline tetragonal zirconia (Y-TZP), leucite-, and lithium disilicate-based glass ceramics were prepared by CAD/CAM machining, and divided into two groups: machining (M) and machining followed by polishing (MP). The surface roughness was measured and the flexural fatigue strength was evaluated by the step-test method (n=20). The initial load and the load increment for each ceramic material were based on a monotonic test (n=5). A maximum of 10,000 cycles was applied in each load step, at 1.4Hz. Weibull probability statistics was used for the analysis of the flexural fatigue strength, and Mann-Whitney test (α=5%) to compare roughness between the M and MP conditions. Machining resulted in lower values of characteristic flexural fatigue strength than machining followed by polishing. The greatest reduction in flexural fatigue strength from MP to M was observed for Y-TZP (40%; M=536.48MPa; MP=894.50MPa), followed by lithium disilicate (33%; M=187.71MPa; MP=278.93MPa) and leucite (29%; M=72.61MPa; MP=102.55MPa). Significantly higher values of roughness (Ra) were observed for M compared to MP (leucite: M=1.59μm and MP=0.08μm; lithium disilicate: M=1.84μm and MP=0.13μm; Y-TZP: M=1.79μm and MP=0.18μm). Machining negatively affected the flexural fatigue strength of CAD/CAM ceramics, indicating that machining of partially or fully sintered ceramics is deleterious to fatigue strength. Copyright © 2017 The Academy of Dental Materials. Published by Elsevier Ltd. All rights reserved.

  10. Statistical complex fatigue data for SAE 4340 steel and its use in design by reliability

    NASA Technical Reports Server (NTRS)

    Kececioglu, D.; Smith, J. L.

    1970-01-01

    A brief description of the complex fatigue machines used in the test program is presented. The data generated from these machines are given and discussed. Two methods of obtaining strength distributions from the data are also discussed. Then follows a discussion of the construction of statistical fatigue diagrams and their use in designing by reliability. Finally, some of the problems encountered in the test equipment and a corrective modification are presented.

  11. Surface Integrity of Inconel 718 by Ball Burnishing

    NASA Astrophysics Data System (ADS)

    Sequera, A.; Fu, C. H.; Guo, Y. B.; Wei, X. T.

    2014-09-01

    Inconel 718 has wide applications in manufacturing mechanical components such as turbine blades, turbocharger rotors, and nuclear reactors. Since these components are subject to harsh environments such as high temperature, pressure, and corrosion, it is critical to improve the functionality to prevent catastrophic failure due to fatigue or corrosion. Ball burnishing as a low plastic deformation process is a promising technique to enhance surface integrity for increasing component fatigue and corrosion resistance in service. This study focuses on the experimental study on surface integrity of burnished Inconel 718. The effects of burnishing ball size and pressure on surface integrity factors such as surface topography, roughness, and hardness are investigated. The burnished surfaces are smoother than the as-machined ones. Surface hardness after burnishing is higher than the as-machined surfaces, but become stable over a certain burnishing pressure. There exists an optimal process space of ball sized and burnishing pressure for surface finish. In addition, surface hardness after burnishing is higher than the as-machined surfaces, which is confirmed by statistical analysis.

  12. On the effect of subliminal priming on subjective perception of images: a machine learning approach.

    PubMed

    Kumar, Parmod; Mahmood, Faisal; Mohan, Dhanya Menoth; Wong, Ken; Agrawal, Abhishek; Elgendi, Mohamed; Shukla, Rohit; Dauwels, Justin; Chan, Alice H D

    2014-01-01

    The research presented in this article investigates the influence of subliminal prime words on peoples' judgment about images, through electroencephalograms (EEGs). In this cross domain priming paradigm, the participants are asked to rate how much they like the stimulus images, on a 7-point Likert scale, after being subliminally exposed to masked lexical prime words, with EEG recorded simultaneously. Statistical analysis tools are used to analyze the effect of priming on behavior, and machine learning techniques to infer the primes from EEGs. The experiment reveals strong effects of subliminal priming on the participants' explicit rating of images. The subjective judgment affected by the priming makes visible change in event-related potentials (ERPs); results show larger ERP amplitude for the negative primes compared with positive and neutral primes. In addition, Support Vector Machine (SVM) based classifiers are proposed to infer the prime types from the average ERPs, which yields a classification rate of 70%.

  13. The effect of the use of a TNF-alpha inhibitor in hypothermic machine perfusion on kidney function after transplantation.

    PubMed

    Diuwe, Piotr; Domagala, Piotr; Durlik, Magdalena; Trzebicki, Janusz; Chmura, Andrzej; Kwiatkowski, Artur

    2017-08-01

    One of the most important problems in transplantation medicine is the ischemia/reperfusion injury of the organs to be transplanted. The aim of the present study was to assess the effect of tumor necrosis factor-alpha (TNF-alpha) inhibitor etanercept on the machine perfusion hypothermia of renal allograft kidney function and organ perfusion. No statistically significant differences were found in the impact of the applied intervention on kidney machine perfusion during which the average flow and vascular resistance were evaluated. There were no statistically significant differences in the occurrence of delayed graft function (DGF). Fewer events in patients who received a kidney from the etanercept treated Group A compared to the patients who received a kidney from the control Group B were observed when comparing the functional DGF and occurrence of acute rejection episodes, however, there was no statistically significant difference. In summary, no effect of treatment with etanercept an inhibitor of TNF-alpha in a hypothermic machine perfusion on renal allograft renal survival and its perfusion were detected in this study. However, treatment of the isolated organ may be important for the future of transplantation medicine. Copyright © 2017 Elsevier Inc. All rights reserved.

  14. Improving Non-Destructive Concrete Strength Tests Using Support Vector Machines

    PubMed Central

    Shih, Yi-Fan; Wang, Yu-Ren; Lin, Kuo-Liang; Chen, Chin-Wen

    2015-01-01

    Non-destructive testing (NDT) methods are important alternatives when destructive tests are not feasible to examine the in situ concrete properties without damaging the structure. The rebound hammer test and the ultrasonic pulse velocity test are two popular NDT methods to examine the properties of concrete. The rebound of the hammer depends on the hardness of the test specimen and ultrasonic pulse travelling speed is related to density, uniformity, and homogeneity of the specimen. Both of these two methods have been adopted to estimate the concrete compressive strength. Statistical analysis has been implemented to establish the relationship between hammer rebound values/ultrasonic pulse velocities and concrete compressive strength. However, the estimated results can be unreliable. As a result, this research proposes an Artificial Intelligence model using support vector machines (SVMs) for the estimation. Data from 95 cylinder concrete samples are collected to develop and validate the model. The results show that combined NDT methods (also known as SonReb method) yield better estimations than single NDT methods. The results also show that the SVMs model is more accurate than the statistical regression model. PMID:28793627

  15. Optimization of hole generation in Ti/CFRP stacks

    NASA Astrophysics Data System (ADS)

    Ivanov, Y. N.; Pashkov, A. E.; Chashhin, N. S.

    2018-03-01

    The article aims to describe methods for improving the surface quality and hole accuracy in Ti/CFRP stacks by optimizing cutting methods and drill geometry. The research is based on the fundamentals of machine building, theory of probability, mathematical statistics, and experiment planning and manufacturing process optimization theories. Statistical processing of experiment data was carried out by means of Statistica 6 and Microsoft Excel 2010. Surface geometry in Ti stacks was analyzed using a Taylor Hobson Form Talysurf i200 Series Profilometer, and in CFRP stacks - using a Bruker ContourGT-Kl Optical Microscope. Hole shapes and sizes were analyzed using a Carl Zeiss CONTURA G2 Measuring machine, temperatures in cutting zones were recorded with a FLIR SC7000 Series Infrared Camera. Models of multivariate analysis of variance were developed. They show effects of drilling modes on surface quality and accuracy of holes in Ti/CFRP stacks. The task of multicriteria drilling process optimization was solved. Optimal cutting technologies which improve performance were developed. Methods for assessing thermal tool and material expansion effects on the accuracy of holes in Ti/CFRP/Ti stacks were developed.

  16. Bias-Free Chemically Diverse Test Sets from Machine Learning.

    PubMed

    Swann, Ellen T; Fernandez, Michael; Coote, Michelle L; Barnard, Amanda S

    2017-08-14

    Current benchmarking methods in quantum chemistry rely on databases that are built using a chemist's intuition. It is not fully understood how diverse or representative these databases truly are. Multivariate statistical techniques like archetypal analysis and K-means clustering have previously been used to summarize large sets of nanoparticles however molecules are more diverse and not as easily characterized by descriptors. In this work, we compare three sets of descriptors based on the one-, two-, and three-dimensional structure of a molecule. Using data from the NIST Computational Chemistry Comparison and Benchmark Database and machine learning techniques, we demonstrate the functional relationship between these structural descriptors and the electronic energy of molecules. Archetypes and prototypes found with topological or Coulomb matrix descriptors can be used to identify smaller, statistically significant test sets that better capture the diversity of chemical space. We apply this same method to find a diverse subset of organic molecules to demonstrate how the methods can easily be reapplied to individual research projects. Finally, we use our bias-free test sets to assess the performance of density functional theory and quantum Monte Carlo methods.

  17. Information-Theoretic Performance Analysis of Sensor Networks via Markov Modeling of Time Series Data.

    PubMed

    Li, Yue; Jha, Devesh K; Ray, Asok; Wettergren, Thomas A; Yue Li; Jha, Devesh K; Ray, Asok; Wettergren, Thomas A; Wettergren, Thomas A; Li, Yue; Ray, Asok; Jha, Devesh K

    2018-06-01

    This paper presents information-theoretic performance analysis of passive sensor networks for detection of moving targets. The proposed method falls largely under the category of data-level information fusion in sensor networks. To this end, a measure of information contribution for sensors is formulated in a symbolic dynamics framework. The network information state is approximately represented as the largest principal component of the time series collected across the network. To quantify each sensor's contribution for generation of the information content, Markov machine models as well as x-Markov (pronounced as cross-Markov) machine models, conditioned on the network information state, are constructed; the difference between the conditional entropies of these machines is then treated as an approximate measure of information contribution by the respective sensors. The x-Markov models represent the conditional temporal statistics given the network information state. The proposed method has been validated on experimental data collected from a local area network of passive sensors for target detection, where the statistical characteristics of environmental disturbances are similar to those of the target signal in the sense of time scale and texture. A distinctive feature of the proposed algorithm is that the network decisions are independent of the behavior and identity of the individual sensors, which is desirable from computational perspectives. Results are presented to demonstrate the proposed method's efficacy to correctly identify the presence of a target with very low false-alarm rates. The performance of the underlying algorithm is compared with that of a recent data-driven, feature-level information fusion algorithm. It is shown that the proposed algorithm outperforms the other algorithm.

  18. Statistical Capability Study of a Helical Grinding Machine Producing Screw Rotors

    NASA Astrophysics Data System (ADS)

    Holmes, C. S.; Headley, M.; Hart, P. W.

    2017-08-01

    Screw compressors depend for their efficiency and reliability on the accuracy of the rotors, and therefore on the machinery used in their production. The machinery has evolved over more than half a century in response to customer demands for production accuracy, efficiency, and flexibility, and is now at a high level on all three criteria. Production equipment and processes must be capable of maintaining accuracy over a production run, and this must be assessed statistically under strictly controlled conditions. This paper gives numerical data from such a study of an innovative machine tool and shows that it is possible to meet the demanding statistical capability requirements.

  19. Operational planning using Climatological Observations for Maritime Prediction and Analysis Support Service (COMPASS)

    NASA Astrophysics Data System (ADS)

    O'Connor, Alison; Kirtman, Benjamin; Harrison, Scott; Gorman, Joe

    2016-05-01

    The US Navy faces several limitations when planning operations in regard to forecasting environmental conditions. Currently, mission analysis and planning tools rely heavily on short-term (less than a week) forecasts or long-term statistical climate products. However, newly available data in the form of weather forecast ensembles provides dynamical and statistical extended-range predictions that can produce more accurate predictions if ensemble members can be combined correctly. Charles River Analytics is designing the Climatological Observations for Maritime Prediction and Analysis Support Service (COMPASS), which performs data fusion over extended-range multi-model ensembles, such as the North American Multi-Model Ensemble (NMME), to produce a unified forecast for several weeks to several seasons in the future. We evaluated thirty years of forecasts using machine learning to select predictions for an all-encompassing and superior forecast that can be used to inform the Navy's decision planning process.

  20. Fusing Data Mining, Machine Learning and Traditional Statistics to Detect Biomarkers Associated with Depression

    PubMed Central

    Dipnall, Joanna F.

    2016-01-01

    Background Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study. Methods The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009–2010). Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators. Results After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30), serum glucose (OR 1.01; 95% CI 1.00, 1.01) and total bilirubin (OR 0.12; 95% CI 0.05, 0.28). Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016), and current smokers (p<0.001). Conclusion The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and complex survey sampling methodology and was demonstrated to be a useful tool for detecting three biomarkers associated with depression for future hypothesis generation: red cell distribution width, serum glucose and total bilirubin. PMID:26848571

  1. Fusing Data Mining, Machine Learning and Traditional Statistics to Detect Biomarkers Associated with Depression.

    PubMed

    Dipnall, Joanna F; Pasco, Julie A; Berk, Michael; Williams, Lana J; Dodd, Seetal; Jacka, Felice N; Meyer, Denny

    2016-01-01

    Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study. The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009-2010). Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators. After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30), serum glucose (OR 1.01; 95% CI 1.00, 1.01) and total bilirubin (OR 0.12; 95% CI 0.05, 0.28). Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016), and current smokers (p<0.001). The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and complex survey sampling methodology and was demonstrated to be a useful tool for detecting three biomarkers associated with depression for future hypothesis generation: red cell distribution width, serum glucose and total bilirubin.

  2. Tablet Velocity Measurement and Prediction in the Pharmaceutical Film Coating Process.

    PubMed

    Suzuki, Yasuhiro; Yokohama, Chihiro; Minami, Hidemi; Terada, Katsuhide

    2016-01-01

    The purpose of this study was to measure the tablet velocity in pan coating machines during the film coating process in order to understand the impact of the batch size (laboratory to commercial scale), coating machine type (DRIACOATER, HICOATER® and AQUA COATER®) and manufacturing conditions on tablet velocity. We used a high speed camera and particle image velocimetry to measure the tablet velocity in the coating pans. It was observed that increasing batch sizes resulted in increased tablet velocities under the same rotation number because of the differences in circumferential rotation speeds. We also observed the tendency that increase in the filling ratio of tablets resulted in an increased tablet velocity for all coating machines. Statistical analysis was used to make a tablet velocity predictive equation by employing the filling ratio and rotation speed as the parameters from these measured values. The correlation coefficients of predicted value and experimental value were more than 0.959 in each machine. Using the predictive equation to determine tablet velocities, the manufacturing conditions of previous products were reviewed, and it was found that the tablet velocities of commercial scales, in which tablet chipping and breakage problems had occurred, were higher than those of pilot scales or laboratory scales.

  3. Using statistical and machine learning to help institutions detect suspicious access to electronic health records.

    PubMed

    Boxwala, Aziz A; Kim, Jihoon; Grillo, Janice M; Ohno-Machado, Lucila

    2011-01-01

    To determine whether statistical and machine-learning methods, when applied to electronic health record (EHR) access data, could help identify suspicious (ie, potentially inappropriate) access to EHRs. From EHR access logs and other organizational data collected over a 2-month period, the authors extracted 26 features likely to be useful in detecting suspicious accesses. Selected events were marked as either suspicious or appropriate by privacy officers, and served as the gold standard set for model evaluation. The authors trained logistic regression (LR) and support vector machine (SVM) models on 10-fold cross-validation sets of 1291 labeled events. The authors evaluated the sensitivity of final models on an external set of 58 events that were identified as truly inappropriate and investigated independently from this study using standard operating procedures. The area under the receiver operating characteristic curve of the models on the whole data set of 1291 events was 0.91 for LR, and 0.95 for SVM. The sensitivity of the baseline model on this set was 0.8. When the final models were evaluated on the set of 58 investigated events, all of which were determined as truly inappropriate, the sensitivity was 0 for the baseline method, 0.76 for LR, and 0.79 for SVM. The LR and SVM models may not generalize because of interinstitutional differences in organizational structures, applications, and workflows. Nevertheless, our approach for constructing the models using statistical and machine-learning techniques can be generalized. An important limitation is the relatively small sample used for the training set due to the effort required for its construction. The results suggest that statistical and machine-learning methods can play an important role in helping privacy officers detect suspicious accesses to EHRs.

  4. Using statistical and machine learning to help institutions detect suspicious access to electronic health records

    PubMed Central

    Kim, Jihoon; Grillo, Janice M; Ohno-Machado, Lucila

    2011-01-01

    Objective To determine whether statistical and machine-learning methods, when applied to electronic health record (EHR) access data, could help identify suspicious (ie, potentially inappropriate) access to EHRs. Methods From EHR access logs and other organizational data collected over a 2-month period, the authors extracted 26 features likely to be useful in detecting suspicious accesses. Selected events were marked as either suspicious or appropriate by privacy officers, and served as the gold standard set for model evaluation. The authors trained logistic regression (LR) and support vector machine (SVM) models on 10-fold cross-validation sets of 1291 labeled events. The authors evaluated the sensitivity of final models on an external set of 58 events that were identified as truly inappropriate and investigated independently from this study using standard operating procedures. Results The area under the receiver operating characteristic curve of the models on the whole data set of 1291 events was 0.91 for LR, and 0.95 for SVM. The sensitivity of the baseline model on this set was 0.8. When the final models were evaluated on the set of 58 investigated events, all of which were determined as truly inappropriate, the sensitivity was 0 for the baseline method, 0.76 for LR, and 0.79 for SVM. Limitations The LR and SVM models may not generalize because of interinstitutional differences in organizational structures, applications, and workflows. Nevertheless, our approach for constructing the models using statistical and machine-learning techniques can be generalized. An important limitation is the relatively small sample used for the training set due to the effort required for its construction. Conclusion The results suggest that statistical and machine-learning methods can play an important role in helping privacy officers detect suspicious accesses to EHRs. PMID:21672912

  5. Can machine learning complement traditional medical device surveillance? A case study of dual-chamber implantable cardioverter-defibrillators.

    PubMed

    Ross, Joseph S; Bates, Jonathan; Parzynski, Craig S; Akar, Joseph G; Curtis, Jeptha P; Desai, Nihar R; Freeman, James V; Gamble, Ginger M; Kuntz, Richard; Li, Shu-Xia; Marinac-Dabic, Danica; Masoudi, Frederick A; Normand, Sharon-Lise T; Ranasinghe, Isuru; Shaw, Richard E; Krumholz, Harlan M

    2017-01-01

    Machine learning methods may complement traditional analytic methods for medical device surveillance. Using data from the National Cardiovascular Data Registry for implantable cardioverter-defibrillators (ICDs) linked to Medicare administrative claims for longitudinal follow-up, we applied three statistical approaches to safety-signal detection for commonly used dual-chamber ICDs that used two propensity score (PS) models: one specified by subject-matter experts (PS-SME), and the other one by machine learning-based selection (PS-ML). The first approach used PS-SME and cumulative incidence (time-to-event), the second approach used PS-SME and cumulative risk (Data Extraction and Longitudinal Trend Analysis [DELTA]), and the third approach used PS-ML and cumulative risk (embedded feature selection). Safety-signal surveillance was conducted for eleven dual-chamber ICD models implanted at least 2,000 times over 3 years. Between 2006 and 2010, there were 71,948 Medicare fee-for-service beneficiaries who received dual-chamber ICDs. Cumulative device-specific unadjusted 3-year event rates varied for three surveyed safety signals: death from any cause, 12.8%-20.9%; nonfatal ICD-related adverse events, 19.3%-26.3%; and death from any cause or nonfatal ICD-related adverse event, 27.1%-37.6%. Agreement among safety signals detected/not detected between the time-to-event and DELTA approaches was 90.9% (360 of 396, k =0.068), between the time-to-event and embedded feature-selection approaches was 91.7% (363 of 396, k =-0.028), and between the DELTA and embedded feature selection approaches was 88.1% (349 of 396, k =-0.042). Three statistical approaches, including one machine learning method, identified important safety signals, but without exact agreement. Ensemble methods may be needed to detect all safety signals for further evaluation during medical device surveillance.

  6. Two Body Wear of Newly Introduced Nanocomposite Teeth and Cross Linked Four Layered Acrylic Teeth: a Comparitive In Vitro Study.

    PubMed

    Ilangkumaran, R; Srinivasan, J; Baburajan, K; Balaji, N

    2014-12-01

    Wear of complete denture teeth results in compromise in denture esthetics and functions. To counteract this problem, artificial teeth with increased wear resistance had been introduced in the market such as nanocomposite teeth. The purpose of this study was to compare the amount of wear between nanocomposite teeth and acrylic teeth. Fifteen specimens were chosen from each group namely the nanocomposite teeth (SR_-PHONARES) and the acrylic teeth (ACRY PLUS). Maxillary premolar was only chosen for testing and the samples were customized according to the specifications of the pin on disc machine. Pin on disc machine is a two body tribometer which quantifies the amount of wear under a specific load and time. Test samples were mounted on to the receptacle of the pin on disc machine and tested under a load of 0.3 kg for 1,000 cycles of rotation against a 600 grit emery paper. The amount of wear is displayed from the digital reading obtained from the pin on disc machine. After statistical analysis, it was found that, the amount of wear is more in four layered acrylic teeth. The p value obtained is 0.002 (<0.005) thus implies that the difference in wear between nanocomposite teeth and acrylic teeth is statistically significant. Though the nanocomposite teeth has less amount of wear than the four layered acrylic teeth, the difference is very less and adds only to a little clinical significance but the cost of the nanocomposite is four times that of the acrylic teeth. Further clinical studies must be performed to confirm our results.

  7. Machine learning to predict the occurrence of bisphosphonate-related osteonecrosis of the jaw associated with dental extraction: A preliminary report.

    PubMed

    Kim, Dong Wook; Kim, Hwiyoung; Nam, Woong; Kim, Hyung Jun; Cha, In-Ho

    2018-04-23

    The aim of this study was to build and validate five types of machine learning models that can predict the occurrence of BRONJ associated with dental extraction in patients taking bisphosphonates for the management of osteoporosis. A retrospective review of the medical records was conducted to obtain cases and controls for the study. Total 125 patients consisting of 41 cases and 84 controls were selected for the study. Five machine learning prediction algorithms including multivariable logistic regression model, decision tree, support vector machine, artificial neural network, and random forest were implemented. The outputs of these models were compared with each other and also with conventional methods, such as serum CTX level. Area under the receiver operating characteristic (ROC) curve (AUC) was used to compare the results. The performance of machine learning models was significantly superior to conventional statistical methods and single predictors. The random forest model yielded the best performance (AUC = 0.973), followed by artificial neural network (AUC = 0.915), support vector machine (AUC = 0.882), logistic regression (AUC = 0.844), decision tree (AUC = 0.821), drug holiday alone (AUC = 0.810), and CTX level alone (AUC = 0.630). Machine learning methods showed superior performance in predicting BRONJ associated with dental extraction compared to conventional statistical methods using drug holiday and serum CTX level. Machine learning can thus be applied in a wide range of clinical studies. Copyright © 2017. Published by Elsevier Inc.

  8. Enhancing predictive accuracy and reproducibility in clinical evaluation research: Commentary on the special section of the Journal of Evaluation in Clinical Practice.

    PubMed

    Bryant, Fred B

    2016-12-01

    This paper introduces a special section of the current issue of the Journal of Evaluation in Clinical Practice that includes a set of 6 empirical articles showcasing a versatile, new machine-learning statistical method, known as optimal data (or discriminant) analysis (ODA), specifically designed to produce statistical models that maximize predictive accuracy. As this set of papers clearly illustrates, ODA offers numerous important advantages over traditional statistical methods-advantages that enhance the validity and reproducibility of statistical conclusions in empirical research. This issue of the journal also includes a review of a recently published book that provides a comprehensive introduction to the logic, theory, and application of ODA in empirical research. It is argued that researchers have much to gain by using ODA to analyze their data. © 2016 John Wiley & Sons, Ltd.

  9. Machine Learning Methods for Attack Detection in the Smart Grid.

    PubMed

    Ozay, Mete; Esnaola, Inaki; Yarman Vural, Fatos Tunay; Kulkarni, Sanjeev R; Poor, H Vincent

    2016-08-01

    Attack detection problems in the smart grid are posed as statistical learning problems for different attack scenarios in which the measurements are observed in batch or online settings. In this approach, machine learning algorithms are used to classify measurements as being either secure or attacked. An attack detection framework is provided to exploit any available prior knowledge about the system and surmount constraints arising from the sparse structure of the problem in the proposed approach. Well-known batch and online learning algorithms (supervised and semisupervised) are employed with decision- and feature-level fusion to model the attack detection problem. The relationships between statistical and geometric properties of attack vectors employed in the attack scenarios and learning algorithms are analyzed to detect unobservable attacks using statistical learning methods. The proposed algorithms are examined on various IEEE test systems. Experimental analyses show that machine learning algorithms can detect attacks with performances higher than attack detection algorithms that employ state vector estimation methods in the proposed attack detection framework.

  10. A PDF-based classification of gait cadence patterns in patients with amyotrophic lateral sclerosis.

    PubMed

    Wu, Yunfeng; Ng, Sin Chun

    2010-01-01

    Amyotrophic lateral sclerosis (ALS) is a type of neurological disease due to the degeneration of motor neurons. During the course of such a progressive disease, it would be difficult for ALS patients to regulate normal locomotion, so that the gait stability becomes perturbed. This paper presents a pilot statistical study on the gait cadence (or stride interval) in ALS, based on the statistical analysis method. The probability density functions (PDFs) of stride interval were first estimated with the nonparametric Parzen-window method. We computed the mean of the left-foot stride interval and the modified Kullback-Leibler divergence (MKLD) from the PDFs estimated. The analysis results suggested that both of these two statistical parameters were significantly altered in ALS, and the least-squares support vector machine (LS-SVM) may effectively distinguish the stride patterns between the ALS patients and healthy controls, with an accurate rate of 82.8% and an area of 0.87 under the receiver operating characteristic curve.

  11. Automated Cognitive Health Assessment From Smart Home-Based Behavior Data.

    PubMed

    Dawadi, Prafulla Nath; Cook, Diane Joyce; Schmitter-Edgecombe, Maureen

    2016-07-01

    Smart home technologies offer potential benefits for assisting clinicians by automating health monitoring and well-being assessment. In this paper, we examine the actual benefits of smart home-based analysis by monitoring daily behavior in the home and predicting clinical scores of the residents. To accomplish this goal, we propose a clinical assessment using activity behavior (CAAB) approach to model a smart home resident's daily behavior and predict the corresponding clinical scores. CAAB uses statistical features that describe characteristics of a resident's daily activity performance to train machine learning algorithms that predict the clinical scores. We evaluate the performance of CAAB utilizing smart home sensor data collected from 18 smart homes over two years. We obtain a statistically significant correlation ( r=0.72) between CAAB-predicted and clinician-provided cognitive scores and a statistically significant correlation ( r=0.45) between CAAB-predicted and clinician-provided mobility scores. These prediction results suggest that it is feasible to predict clinical scores using smart home sensor data and learning-based data analysis.

  12. The Statistical Package for the Social Sciences (SPSS) as an adjunct to pharmacokinetic analysis.

    PubMed

    Mather, L E; Austin, K L

    1983-01-01

    Computer techniques for numerical analysis are well known to pharmacokineticists. Powerful techniques for data file management have been developed by social scientists but have, in general, been ignored by pharmacokineticists because of their apparent lack of ability to interface with pharmacokinetic programs. Extensive use has been made of the Statistical Package for the Social Sciences (SPSS) for its data handling capabilities, but at the same time, techniques have been developed within SPSS to interface with pharmacokinetic programs of the users' choice and to carry out a variety of user-defined pharmacokinetic tasks within SPSS commands, apart from the expected variety of statistical tasks. Because it is based on a ubiquitous package, this methodology has all of the benefits of excellent documentation, interchangeability between different types and sizes of machines and true portability of techniques and data files. An example is given of the total management of a pharmacokinetic study previously reported in the literature by the authors.

  13. Optimization of classification and regression analysis of four monoclonal antibodies from Raman spectra using collaborative machine learning approach.

    PubMed

    Le, Laetitia Minh Maï; Kégl, Balázs; Gramfort, Alexandre; Marini, Camille; Nguyen, David; Cherti, Mehdi; Tfaili, Sana; Tfayli, Ali; Baillet-Guffroy, Arlette; Prognon, Patrice; Chaminade, Pierre; Caudron, Eric

    2018-07-01

    The use of monoclonal antibodies (mAbs) constitutes one of the most important strategies to treat patients suffering from cancers such as hematological malignancies and solid tumors. These antibodies are prescribed by the physician and prepared by hospital pharmacists. An analytical control enables the quality of the preparations to be ensured. The aim of this study was to explore the development of a rapid analytical method for quality control. The method used four mAbs (Infliximab, Bevacizumab, Rituximab and Ramucirumab) at various concentrations and was based on recording Raman data and coupling them to a traditional chemometric and machine learning approach for data analysis. Compared to conventional linear approach, prediction errors are reduced with a data-driven approach using statistical machine learning methods. In the latter, preprocessing and predictive models are jointly optimized. An additional original aspect of the work involved on submitting the problem to a collaborative data challenge platform called Rapid Analytics and Model Prototyping (RAMP). This allowed using solutions from about 300 data scientists in collaborative work. Using machine learning, the prediction of the four mAbs samples was considerably improved. The best predictive model showed a combined error of 2.4% versus 14.6% using linear approach. The concentration and classification errors were 5.8% and 0.7%, only three spectra were misclassified over the 429 spectra of the test set. This large improvement obtained with machine learning techniques was uniform for all molecules but maximal for Bevacizumab with an 88.3% reduction on combined errors (2.1% versus 17.9%). Copyright © 2018 Elsevier B.V. All rights reserved.

  14. A comparative study on performance of CBN inserts when turning steel under dry and wet conditions

    NASA Astrophysics Data System (ADS)

    Abdullah Bagaber, Salem; Razlan Yusoff, Ahmad

    2017-10-01

    Cutting fluids is the most unsustainable components of machining processes, it is negatively impacting on the environmental and additional energy required. Due to its high strength and corrosion resistance, the machinability of stainless steel has attracted considerable interest. This study aims to evaluate performance of cubic boron nitride (CBN) inserts for the machining parameters includes the power consumption and surface roughness. Due to the high single cutting-edge cost of CBN, the performance of significant is importance for hard finish turning. The present work also deals with a comparative study on power consumption and surface roughness under dry and flood conditions. Turning process of the stainless steel 316 was performed. A response surface methodology based box-behnken design (BBD) was utilized for statistical analysis. The optimum process parameters are determined as the overall performance index. The comparison study has been done between dry and wet stainless-steel cut in terms of minimum value of energy and surface roughness. The result shows the stainless still can be machined under dry condition with 18.57% improvement of power consumption and acceptable quality compare to the wet cutting. The CBN tools under dry cutting stainless steel can be used to reduce the environment impacts in terms of no cutting fluid use and less energy required which is effected in machining productivity and profit.

  15. Implementing Machine Learning in Radiology Practice and Research.

    PubMed

    Kohli, Marc; Prevedello, Luciano M; Filice, Ross W; Geis, J Raymond

    2017-04-01

    The purposes of this article are to describe concepts that radiologists should understand to evaluate machine learning projects, including common algorithms, supervised as opposed to unsupervised techniques, statistical pitfalls, and data considerations for training and evaluation, and to briefly describe ethical dilemmas and legal risk. Machine learning includes a broad class of computer programs that improve with experience. The complexity of creating, training, and monitoring machine learning indicates that the success of the algorithms will require radiologist involvement for years to come, leading to engagement rather than replacement.

  16. Quantitative sensory testing response patterns to capsaicin- and ultraviolet-B-induced local skin hypersensitization in healthy subjects: a machine-learned analysis.

    PubMed

    Lötsch, Jörn; Geisslinger, Gerd; Heinemann, Sarah; Lerch, Florian; Oertel, Bruno G; Ultsch, Alfred

    2017-08-16

    The comprehensive assessment of pain-related human phenotypes requires combinations of nociceptive measures that produce complex high-dimensional data, posing challenges to bioinformatic analysis. In this study, we assessed established experimental models of heat hyperalgesia of the skin, consisting of local ultraviolet-B (UV-B) irradiation or capsaicin application, in 82 healthy subjects using a variety of noxious stimuli. We extended the original heat stimulation by applying cold and mechanical stimuli and assessing the hypersensitization effects with a clinically established quantitative sensory testing (QST) battery (German Research Network on Neuropathic Pain). This study provided a 246 × 10-sized data matrix (82 subjects assessed at baseline, following UV-B application, and following capsaicin application) with respect to 10 QST parameters, which we analyzed using machine-learning techniques. We observed statistically significant effects of the hypersensitization treatments in 9 different QST parameters. Supervised machine-learned analysis implemented as random forests followed by ABC analysis pointed to heat pain thresholds as the most relevantly affected QST parameter. However, decision tree analysis indicated that UV-B additionally modulated sensitivity to cold. Unsupervised machine-learning techniques, implemented as emergent self-organizing maps, hinted at subgroups responding to topical application of capsaicin. The distinction among subgroups was based on sensitivity to pressure pain, which could be attributed to sex differences, with women being more sensitive than men. Thus, while UV-B and capsaicin share a major component of heat pain sensitization, they differ in their effects on QST parameter patterns in healthy subjects, suggesting a lack of redundancy between these models.This is an open-access article distributed under the terms of the Creative Commons Attribution-Non Commercial-No Derivatives License 4.0 (CCBY-NC-ND), where it is permissible to download and share the work provided it is properly cited. The work cannot be changed in any way or used commercially without permission from the journal.

  17. Noninvasive prostate cancer screening based on serum surface-enhanced Raman spectroscopy and support vector machine

    NASA Astrophysics Data System (ADS)

    Li, Shaoxin; Zhang, Yanjiao; Xu, Junfa; Li, Linfang; Zeng, Qiuyao; Lin, Lin; Guo, Zhouyi; Liu, Zhiming; Xiong, Honglian; Liu, Songhao

    2014-09-01

    This study aims to present a noninvasive prostate cancer screening methods using serum surface-enhanced Raman scattering (SERS) and support vector machine (SVM) techniques through peripheral blood sample. SERS measurements are performed using serum samples from 93 prostate cancer patients and 68 healthy volunteers by silver nanoparticles. Three types of kernel functions including linear, polynomial, and Gaussian radial basis function (RBF) are employed to build SVM diagnostic models for classifying measured SERS spectra. For comparably evaluating the performance of SVM classification models, the standard multivariate statistic analysis method of principal component analysis (PCA) is also applied to classify the same datasets. The study results show that for the RBF kernel SVM diagnostic model, the diagnostic accuracy of 98.1% is acquired, which is superior to the results of 91.3% obtained from PCA methods. The receiver operating characteristic curve of diagnostic models further confirm above research results. This study demonstrates that label-free serum SERS analysis technique combined with SVM diagnostic algorithm has great potential for noninvasive prostate cancer screening.

  18. Heart Rate Variability Dynamics for the Prognosis of Cardiovascular Risk

    PubMed Central

    Ramirez-Villegas, Juan F.; Lam-Espinosa, Eric; Ramirez-Moreno, David F.; Calvo-Echeverry, Paulo C.; Agredo-Rodriguez, Wilfredo

    2011-01-01

    Statistical, spectral, multi-resolution and non-linear methods were applied to heart rate variability (HRV) series linked with classification schemes for the prognosis of cardiovascular risk. A total of 90 HRV records were analyzed: 45 from healthy subjects and 45 from cardiovascular risk patients. A total of 52 features from all the analysis methods were evaluated using standard two-sample Kolmogorov-Smirnov test (KS-test). The results of the statistical procedure provided input to multi-layer perceptron (MLP) neural networks, radial basis function (RBF) neural networks and support vector machines (SVM) for data classification. These schemes showed high performances with both training and test sets and many combinations of features (with a maximum accuracy of 96.67%). Additionally, there was a strong consideration for breathing frequency as a relevant feature in the HRV analysis. PMID:21386966

  19. Solution of a tridiagonal system of equations on the finite element machine

    NASA Technical Reports Server (NTRS)

    Bostic, S. W.

    1984-01-01

    Two parallel algorithms for the solution of tridiagonal systems of equations were implemented on the Finite Element Machine. The Accelerated Parallel Gauss method, an iterative method, and the Buneman algorithm, a direct method, are discussed and execution statistics are presented.

  20. MicroCT Analysis of Micro-Nano Titanium Implant Surface on the Osseointegration.

    PubMed

    Ban, Jaesam; Kang, Seongsoo; Kim, Jihyun; Lee, Kwangmin; Hyunpil, Lim; Vang, Mongsook; Yang, Hongso; Oh, Gyejeong; Kim, Hyunseung; Hwang, Gabwoon; Jung, Yongho; Lee, Kyungku; Park, Sangwon; Yunl, Kwidug

    2015-01-01

    This study was to investigate the effects of micro-nano titanium implant surface on the osseointegration. A total of 36 screw-shaped implants were used. The implant surfaces were classified into 3 groups (n = 12): machined surface (M group), nanosurface which is nanotube formation on the machined surface (MA group) and nano-micro surface which is nanotube formation on the RBM surface (RA group). Anodic oxidation was performed at a 20 V for 10 min with 1 M H3PO4 and 1.5 wt% HF solutions. The implants were installed on the humerus on 6 beagles. After 4 and 12 weeks, the morphometric analysis with micro CT (skyscan 1172, SKYSCAN, Antwerpen, Belgium) was done. The data were statistically analyzed with two-way ANOVA. Bone mineral density and bone volume were significantly increased depending on time. RA group showed the highest bone mineral density and bone volume at 4 weeks and 12 weeks significantly. It indicated that nano-micro titanium implant surface showed faster and more mature osseointegration.

  1. Protecting Externally Supplied Software in Small Computers.

    DTIC Science & Technology

    1980-09-01

    smiall computer sysiemrS incor-poratinig these SecCurity fetReNS reCquireCs careful analysis of a number of options in making tradeoffs amnong perfbrmance...alarming statistic is not representative of the market as a whole or that it is not indicative of the fate of sales of such software in the future. In ... market as well. Although the size of this market ( in numbers of machines) may not approach that of personal computers, small business computers may

  2. Geometric, Statistical, and Topological Modeling of Intrinsic Data Manifolds: Application to 3D Shapes

    DTIC Science & Technology

    2009-01-01

    representation to a simple curve in 3D by using the Whitney embedding theorem. In a very ludic way, we propose to combine phases one and two to...elimination principle which takes advantage of the designed parametrization. To further refine discrimination among objects, we introduce a post...packing numbers and design of principal curves. IEEE transactions on Pattern Analysis and Machine Intel- ligence, 22(3):281-297, 2000. [68] M. H. Yang, Face

  3. Experimental validation of a distribution theory based analysis of the effect of manufacturing tolerances on permanent magnet synchronous machines

    NASA Astrophysics Data System (ADS)

    Boscaino, V.; Cipriani, G.; Di Dio, V.; Corpora, M.; Curto, D.; Franzitta, V.; Trapanese, M.

    2017-05-01

    An experimental study on the effect of permanent magnet tolerances on the performances of a Tubular Linear Ferrite Motor is presented in this paper. The performances that have been investigated are: cogging force, end effect cogging force and generated thrust. It is demonstrated that: 1) the statistical variability of the magnets introduces harmonics in the spectrum of the cogging force; 2) the value of the end effect cogging force is directly linked to the values of then remanence field of the external magnets placed on the slider; 3) the generated thrust and its statistical distribution depend on the remanence field of the magnets placed on the translator.

  4. Evaluation of different time domain peak models using extreme learning machine-based peak detection for EEG signal.

    PubMed

    Adam, Asrul; Ibrahim, Zuwairie; Mokhtar, Norrima; Shapiai, Mohd Ibrahim; Cumming, Paul; Mubin, Marizan

    2016-01-01

    Various peak models have been introduced to detect and analyze peaks in the time domain analysis of electroencephalogram (EEG) signals. In general, peak model in the time domain analysis consists of a set of signal parameters, such as amplitude, width, and slope. Models including those proposed by Dumpala, Acir, Liu, and Dingle are routinely used to detect peaks in EEG signals acquired in clinical studies of epilepsy or eye blink. The optimal peak model is the most reliable peak detection performance in a particular application. A fair measure of performance of different models requires a common and unbiased platform. In this study, we evaluate the performance of the four different peak models using the extreme learning machine (ELM)-based peak detection algorithm. We found that the Dingle model gave the best performance, with 72 % accuracy in the analysis of real EEG data. Statistical analysis conferred that the Dingle model afforded significantly better mean testing accuracy than did the Acir and Liu models, which were in the range 37-52 %. Meanwhile, the Dingle model has no significant difference compared to Dumpala model.

  5. The effects of time delay in man-machine control systems: Implications for design of flight simulator Visual-Display-Delay compensation

    NASA Technical Reports Server (NTRS)

    Crane, D. F.

    1984-01-01

    When human operators are performing precision tracking tasks, their dynamic response can often be modeled by quasilinear describing functions. That fact permits analysis of the effects of delay in certain man machine control systems using linear control system analysis techniques. The analysis indicates that a reduction in system stability is the immediate effect of additional control system delay, and that system characteristics moderate or exaggerate the importance of the delay. A selection of data (simulator and flight test) consistent with the analysis is reviewed. Flight simulator visual-display delay compensation, designed to restore pilot aircraft system stability, was evaluated in several studies which are reviewed here. The studies range from single-axis, tracking-task experiments (with sufficient subjects and trials to establish the statistical significance of the results) to a brief evaluation of compensation of a computer generated imagery (CGI) visual display system in a full six degree of freedom simulation. The compensation was effective, improvements in pilot performance and workload or aircraft handling qualities rating (HQR) were observed. Results from recent aircraft handling qualities research literature, which support the compensation design approach, are also reviewed.

  6. The dynamic analysis of drum roll lathe for machining of rollers

    NASA Astrophysics Data System (ADS)

    Qiao, Zheng; Wu, Dongxu; Wang, Bo; Li, Guo; Wang, Huiming; Ding, Fei

    2014-08-01

    An ultra-precision machine tool for machining of the roller has been designed and assembled, and due to the obvious impact which dynamic characteristic of machine tool has on the quality of microstructures on the roller surface, the dynamic characteristic of the existing machine tool is analyzed in this paper, so is the influence of circumstance that a large scale and slender roller is fixed in the machine on dynamic characteristic of the machine tool. At first, finite element model of the machine tool is built and simplified, and based on that, the paper carries on with the finite element mode analysis and gets the natural frequency and shaking type of four steps of the machine tool. According to the above model analysis results, the weak stiffness systems of machine tool can be further improved and the reasonable bandwidth of control system of the machine tool can be designed. In the end, considering the shock which is caused by Z axis as a result of fast positioning frequently to feeding system and cutting tool, transient analysis is conducted by means of ANSYS analysis in this paper. Based on the results of transient analysis, the vibration regularity of key components of machine tool and its impact on cutting process are explored respectively.

  7. Prediction and analysis of beta-turns in proteins by support vector machine.

    PubMed

    Pham, Tho Hoan; Satou, Kenji; Ho, Tu Bao

    2003-01-01

    Tight turn has long been recognized as one of the three important features of proteins after the alpha-helix and beta-sheet. Tight turns play an important role in globular proteins from both the structural and functional points of view. More than 90% tight turns are beta-turns. Analysis and prediction of beta-turns in particular and tight turns in general are very useful for the design of new molecules such as drugs, pesticides, and antigens. In this paper, we introduce a support vector machine (SVM) approach to prediction and analysis of beta-turns. We have investigated two aspects of applying SVM to the prediction and analysis of beta-turns. First, we developed a new SVM method, called BTSVM, which predicts beta-turns of a protein from its sequence. The prediction results on the dataset of 426 non-homologous protein chains by sevenfold cross-validation technique showed that our method is superior to the other previous methods. Second, we analyzed how amino acid positions support (or prevent) the formation of beta-turns based on the "multivariable" classification model of a linear SVM. This model is more general than the other ones of previous statistical methods. Our analysis results are more comprehensive and easier to use than previously published analysis results.

  8. Trends of Occupational Fatalities Involving Machines, United States, 1992–2010

    PubMed Central

    Marsh, Suzanne M.; Fosbroke, David E.

    2016-01-01

    Background This paper describes trends of occupational machine-related fatalities from 1992–2010. We examine temporal patterns by worker demographics, machine types (e.g., stationary, mobile), and industries. Methods We analyzed fatalities from Census of Fatal Occupational Injuries data provided by the Bureau of Labor Statistics to the National Institute for Occupational Safety and Health. We used injury source to identify machine-related incidents and Poisson regression to assess trends over the 19-year period. Results There was an average annual decrease of 2.8% in overall machine-related fatality rates from 1992 through 2010. Mobile machine-related fatality rates decreased an average of 2.6% annually and stationary machine-related rates decreased an average of 3.5% annually. Groups that continued to be at high risk included older workers; self-employed; and workers in agriculture/forestry/fishing, construction, and mining. Conclusion Addressing dangers posed by tractors, excavators, and other mobile machines needs to continue. High-risk worker groups should receive targeted information on machine safety. PMID:26358658

  9. STATISTICAL EVALUATION OF CONFOCAL MICROSCOPY IMAGES

    EPA Science Inventory

    Abstract

    In this study the CV is defined as the Mean/SD of the population of beads or pixels. Flow cytometry uses the CV of beads to determine if the machine is aligned correctly and performing properly. This CV concept to determine machine performance has been adapted to...

  10. Machine learning techniques applied to the determination of road suitability for the transportation of dangerous substances.

    PubMed

    Matías, J M; Taboada, J; Ordóñez, C; Nieto, P G

    2007-08-17

    This article describes a methodology to model the degree of remedial action required to make short stretches of a roadway suitable for dangerous goods transport (DGT), particularly pollutant substances, using different variables associated with the characteristics of each segment. Thirty-one factors determining the impact of an accident on a particular stretch of road were identified and subdivided into two major groups: accident probability factors and accident severity factors. Given the number of factors determining the state of a particular road segment, the only viable statistical methods for implementing the model were machine learning techniques, such as multilayer perceptron networks (MLPs), classification trees (CARTs) and support vector machines (SVMs). The results produced by these techniques on a test sample were more favourable than those produced by traditional discriminant analysis, irrespective of whether dimensionality reduction techniques were applied. The best results were obtained using SVMs specifically adapted to ordinal data. This technique takes advantage of the ordinal information contained in the data without penalising the computational load. Furthermore, the technique permits the estimation of the utility function that is latent in expert knowledge.

  11. New machine-learning algorithms for prediction of Parkinson's disease

    NASA Astrophysics Data System (ADS)

    Mandal, Indrajit; Sairam, N.

    2014-03-01

    This article presents an enhanced prediction accuracy of diagnosis of Parkinson's disease (PD) to prevent the delay and misdiagnosis of patients using the proposed robust inference system. New machine-learning methods are proposed and performance comparisons are based on specificity, sensitivity, accuracy and other measurable parameters. The robust methods of treating Parkinson's disease (PD) includes sparse multinomial logistic regression, rotation forest ensemble with support vector machines and principal components analysis, artificial neural networks, boosting methods. A new ensemble method comprising of the Bayesian network optimised by Tabu search algorithm as classifier and Haar wavelets as projection filter is used for relevant feature selection and ranking. The highest accuracy obtained by linear logistic regression and sparse multinomial logistic regression is 100% and sensitivity, specificity of 0.983 and 0.996, respectively. All the experiments are conducted over 95% and 99% confidence levels and establish the results with corrected t-tests. This work shows a high degree of advancement in software reliability and quality of the computer-aided diagnosis system and experimentally shows best results with supportive statistical inference.

  12. Broiler weight estimation based on machine vision and artificial neural network.

    PubMed

    Amraei, S; Abdanan Mehdizadeh, S; Salari, S

    2017-04-01

    1. Machine vision and artificial neural network (ANN) procedures were used to estimate live body weight of broiler chickens in 30 1-d-old broiler chickens reared for 42 d. 2. Imaging was performed two times daily. To localise chickens within the pen, an ellipse fitting algorithm was used and the chickens' head and tail removed using the Chan-Vese method. 3. The correlations between the body weight and 6 physical extracted features indicated that there were strong correlations between body weight and the 5 features including area, perimeter, convex area, major and minor axis length. 5. According to statistical analysis there was no significant difference between morning and afternoon data over 42 d. 6. In an attempt to improve the accuracy of live weight approximation different ANN techniques, including Bayesian regulation, Levenberg-Marquardt, Scaled conjugate gradient and gradient descent were used. Bayesian regulation with R 2 value of 0.98 was the best network for prediction of broiler weight. 7. The accuracy of the machine vision technique was examined and most errors were less than 50 g.

  13. In vitro wear of new indirect resin composites.

    PubMed

    Jain, V; Platt, J A; Moore, B K; Borges, G A

    2009-01-01

    This in vitro study evaluated the toothbrush abrasion wear, three-body Alabama wear and two-body pin-on-disc wear of four commercial indirect resin composites. Enamel shades of Radica (R), Sculpture Plus (S), Belleglass-NG (B) and Gradia Indirect (G) were used. For measuring wear due to toothbrush abrasion, six specimens of each group were fabricated, then brushed in a toothbrush abrasion machine for 20,000 cycles. Material loss was determined by weighing and conversion to volume loss. Three-body wear was measured on six samples for each group using an Alabama-type wear testing machine for 400,000 cycles. Wear depth was measured with a contact profilometer. For two-body wear, five disc specimens were prepared and tested in a two-body wear-testing machine against hydroxypatite sliders for 25,000 cycles. Data were analyzed with one-way analysis of variance (ANOVA) and Tukey test (alpha=0.05). Wear was the highest in Sculpture Plus by all three methods tested and the lowest wear was observed in Belleglass-NG. No statistical difference in wear was noted from Radica.

  14. The Cognitive Visualization System with the Dynamic Projection of Multidimensional Data

    NASA Astrophysics Data System (ADS)

    Gorohov, V.; Vitkovskiy, V.

    2008-08-01

    The phenomenon of cognitive machine drawing consists in the generation on the screen the special graphic representations, which create in the brain of human operator entertainment means. These means seem man by aesthetically attractive and, thus, they stimulate its descriptive imagination, closely related to the intuitive mechanisms of thinking. The essence of cognitive effect lies in the fact that man receives the moving projection as pseudo-three-dimensional object characterizing multidimensional means in the multidimensional space. After the thorough qualitative study of the visual aspects of multidimensional means with the aid of the enumerated algorithms appears the possibility, using algorithms of standard machine drawing to paint the interesting user separate objects or the groups of objects. Then it is possible to again return to the dynamic behavior of the rotation of means for the purpose of checking the intuitive ideas of user about the clusters and the connections in multidimensional data. Is possible the development of the methods of cognitive machine drawing in combination with other information technologies, first of all with the packets of digital processing of images and multidimensional statistical analysis.

  15. Improvement of MRR and surface roughness during electrical discharge machining (EDM) using aluminum oxide powder mixed dielectric fluid

    NASA Astrophysics Data System (ADS)

    Khan, A. A.; Mohiuddin, A. K. M.; Latif, M. A. A.

    2018-01-01

    This paper discusses the effect of aluminium oxide (Al203) addition to dielectric fluid during electrical discharge machining (EDM). Aluminium oxide was added to the dielectric used in the EDM process to improve its performance when machining the stainless steel AISI 304, while copper was used as the electrode. Effect of the concentration of Al203 (0.3 mg/L) in dielectric fluid was compared with EDM without any addition of Al203. Surface quality of stainless steel and the material removal rate were investigated. Design of the experiment (DOE) was used for the experimental plan. Statistical analysis was done using ANOVA and then appropriate model was designated. The experimental results show that with dispersing of aluminium oxide in dielectric fluid surface roughness was improved while the material removal rate (MRR) was increased to some extent. These indicate the improvement of EDM performance using aluminium oxide in dielectric fluid. It was also found that with increase in pulse on time both MRR and surface roughness increase sharply.

  16. Signal detection using support vector machines in the presence of ultrasonic speckle

    NASA Astrophysics Data System (ADS)

    Kotropoulos, Constantine L.; Pitas, Ioannis

    2002-04-01

    Support Vector Machines are a general algorithm based on guaranteed risk bounds of statistical learning theory. They have found numerous applications, such as in classification of brain PET images, optical character recognition, object detection, face verification, text categorization and so on. In this paper we propose the use of support vector machines to segment lesions in ultrasound images and we assess thoroughly their lesion detection ability. We demonstrate that trained support vector machines with a Radial Basis Function kernel segment satisfactorily (unseen) ultrasound B-mode images as well as clinical ultrasonic images.

  17. Coupling Matched Molecular Pairs with Machine Learning for Virtual Compound Optimization.

    PubMed

    Turk, Samo; Merget, Benjamin; Rippmann, Friedrich; Fulle, Simone

    2017-12-26

    Matched molecular pair (MMP) analyses are widely used in compound optimization projects to gain insights into structure-activity relationships (SAR). The analysis is traditionally done via statistical methods but can also be employed together with machine learning (ML) approaches to extrapolate to novel compounds. The here introduced MMP/ML method combines a fragment-based MMP implementation with different machine learning methods to obtain automated SAR decomposition and prediction. To test the prediction capabilities and model transferability, two different compound optimization scenarios were designed: (1) "new fragments" which occurs when exploring new fragments for a defined compound series and (2) "new static core and transformations" which resembles for instance the identification of a new compound series. Very good results were achieved by all employed machine learning methods especially for the new fragments case, but overall deep neural network models performed best, allowing reliable predictions also for the new static core and transformations scenario, where comprehensive SAR knowledge of the compound series is missing. Furthermore, we show that models trained on all available data have a higher generalizability compared to models trained on focused series and can extend beyond chemical space covered in the training data. Thus, coupling MMP with deep neural networks provides a promising approach to make high quality predictions on various data sets and in different compound optimization scenarios.

  18. Detection of Periodic Leg Movements by Machine Learning Methods Using Polysomnographic Parameters Other Than Leg Electromyography

    PubMed Central

    Umut, İlhan; Çentik, Güven

    2016-01-01

    The number of channels used for polysomnographic recording frequently causes difficulties for patients because of the many cables connected. Also, it increases the risk of having troubles during recording process and increases the storage volume. In this study, it is intended to detect periodic leg movement (PLM) in sleep with the use of the channels except leg electromyography (EMG) by analysing polysomnography (PSG) data with digital signal processing (DSP) and machine learning methods. PSG records of 153 patients of different ages and genders with PLM disorder diagnosis were examined retrospectively. A novel software was developed for the analysis of PSG records. The software utilizes the machine learning algorithms, statistical methods, and DSP methods. In order to classify PLM, popular machine learning methods (multilayer perceptron, K-nearest neighbour, and random forests) and logistic regression were used. Comparison of classified results showed that while K-nearest neighbour classification algorithm had higher average classification rate (91.87%) and lower average classification error value (RMSE = 0.2850), multilayer perceptron algorithm had the lowest average classification rate (83.29%) and the highest average classification error value (RMSE = 0.3705). Results showed that PLM can be classified with high accuracy (91.87%) without leg EMG record being present. PMID:27213008

  19. Detection of Periodic Leg Movements by Machine Learning Methods Using Polysomnographic Parameters Other Than Leg Electromyography.

    PubMed

    Umut, İlhan; Çentik, Güven

    2016-01-01

    The number of channels used for polysomnographic recording frequently causes difficulties for patients because of the many cables connected. Also, it increases the risk of having troubles during recording process and increases the storage volume. In this study, it is intended to detect periodic leg movement (PLM) in sleep with the use of the channels except leg electromyography (EMG) by analysing polysomnography (PSG) data with digital signal processing (DSP) and machine learning methods. PSG records of 153 patients of different ages and genders with PLM disorder diagnosis were examined retrospectively. A novel software was developed for the analysis of PSG records. The software utilizes the machine learning algorithms, statistical methods, and DSP methods. In order to classify PLM, popular machine learning methods (multilayer perceptron, K-nearest neighbour, and random forests) and logistic regression were used. Comparison of classified results showed that while K-nearest neighbour classification algorithm had higher average classification rate (91.87%) and lower average classification error value (RMSE = 0.2850), multilayer perceptron algorithm had the lowest average classification rate (83.29%) and the highest average classification error value (RMSE = 0.3705). Results showed that PLM can be classified with high accuracy (91.87%) without leg EMG record being present.

  20. Machine Learning in Medicine

    PubMed Central

    Deo, Rahul C.

    2015-01-01

    Spurred by advances in processing power, memory, storage, and an unprecedented wealth of data, computers are being asked to tackle increasingly complex learning tasks, often with astonishing success. Computers have now mastered a popular variant of poker, learned the laws of physics from experimental data, and become experts in video games – tasks which would have been deemed impossible not too long ago. In parallel, the number of companies centered on applying complex data analysis to varying industries has exploded, and it is thus unsurprising that some analytic companies are turning attention to problems in healthcare. The purpose of this review is to explore what problems in medicine might benefit from such learning approaches and use examples from the literature to introduce basic concepts in machine learning. It is important to note that seemingly large enough medical data sets and adequate learning algorithms have been available for many decades – and yet, although there are thousands of papers applying machine learning algorithms to medical data, very few have contributed meaningfully to clinical care. This lack of impact stands in stark contrast to the enormous relevance of machine learning to many other industries. Thus part of my effort will be to identify what obstacles there may be to changing the practice of medicine through statistical learning approaches, and discuss how these might be overcome. PMID:26572668

  1. Rainfall Prediction of Indian Peninsula: Comparison of Time Series Based Approach and Predictor Based Approach using Machine Learning Techniques

    NASA Astrophysics Data System (ADS)

    Dash, Y.; Mishra, S. K.; Panigrahi, B. K.

    2017-12-01

    Prediction of northeast/post monsoon rainfall which occur during October, November and December (OND) over Indian peninsula is a challenging task due to the dynamic nature of uncertain chaotic climate. It is imperative to elucidate this issue by examining performance of different machine leaning (ML) approaches. The prime objective of this research is to compare between a) statistical prediction using historical rainfall observations and global atmosphere-ocean predictors like Sea Surface Temperature (SST) and Sea Level Pressure (SLP) and b) empirical prediction based on a time series analysis of past rainfall data without using any other predictors. Initially, ML techniques have been applied on SST and SLP data (1948-2014) obtained from NCEP/NCAR reanalysis monthly mean provided by the NOAA ESRL PSD. Later, this study investigated the applicability of ML methods using OND rainfall time series for 1948-2014 and forecasted up to 2018. The predicted values of aforementioned methods were verified using observed time series data collected from Indian Institute of Tropical Meteorology and the result revealed good performance of ML algorithms with minimal error scores. Thus, it is found that both statistical and empirical methods are useful for long range climatic projections.

  2. Using machine learning to model dose-response relationships.

    PubMed

    Linden, Ariel; Yarnold, Paul R; Nallamothu, Brahmajee K

    2016-12-01

    Establishing the relationship between various doses of an exposure and a response variable is integral to many studies in health care. Linear parametric models, widely used for estimating dose-response relationships, have several limitations. This paper employs the optimal discriminant analysis (ODA) machine-learning algorithm to determine the degree to which exposure dose can be distinguished based on the distribution of the response variable. By framing the dose-response relationship as a classification problem, machine learning can provide the same functionality as conventional models, but can additionally make individual-level predictions, which may be helpful in practical applications like establishing responsiveness to prescribed drug regimens. Using data from a study measuring the responses of blood flow in the forearm to the intra-arterial administration of isoproterenol (separately for 9 black and 13 white men, and pooled), we compare the results estimated from a generalized estimating equations (GEE) model with those estimated using ODA. Generalized estimating equations and ODA both identified many statistically significant dose-response relationships, separately by race and for pooled data. Post hoc comparisons between doses indicated ODA (based on exact P values) was consistently more conservative than GEE (based on estimated P values). Compared with ODA, GEE produced twice as many instances of paradoxical confounding (findings from analysis of pooled data that are inconsistent with findings from analyses stratified by race). Given its unique advantages and greater analytic flexibility, maximum-accuracy machine-learning methods like ODA should be considered as the primary analytic approach in dose-response applications. © 2016 John Wiley & Sons, Ltd.

  3. Fatigue criterion to system design, life and reliability

    NASA Technical Reports Server (NTRS)

    Zaretsky, E. V.

    1985-01-01

    A generalized methodology to structural life prediction, design, and reliability based upon a fatigue criterion is advanced. The life prediction methodology is based in part on work of W. Weibull and G. Lundberg and A. Palmgren. The approach incorporates the computed life of elemental stress volumes of a complex machine element to predict system life. The results of coupon fatigue testing can be incorporated into the analysis allowing for life prediction and component or structural renewal rates with reasonable statistical certainty.

  4. Secondary electrospray ionization-mass spectrometry and a novel statistical bioinformatic approach identifies a cancer-related profile in exhaled breath of breast cancer patients: a pilot study.

    PubMed

    Martinez-Lozano Sinues, Pablo; Landoni, Elena; Miceli, Rosalba; Dibari, Vincenza F; Dugo, Matteo; Agresti, Roberto; Tagliabue, Elda; Cristoni, Simone; Orlandi, Rosaria

    2015-09-21

    Breath analysis represents a new frontier in medical diagnosis and a powerful tool for cancer biomarker discovery due to the recent development of analytical platforms for the detection and identification of human exhaled volatile compounds. Statistical and bioinformatic tools may represent an effective complement to the technical and instrumental enhancements needed to fully exploit clinical applications of breath analysis. Our exploratory study in a cohort of 14 breast cancer patients and 11 healthy volunteers used secondary electrospray ionization-mass spectrometry (SESI-MS) to detect a cancer-related volatile profile. SESI-MS full-scan spectra were acquired in a range of 40-350 mass-to-charge ratio (m/z), converted to matrix data and analyzed using a procedure integrating data pre-processing for quality control, and a two-step class prediction based on machine-learning techniques, including a robust feature selection, and a classifier development with internal validation. MS spectra from exhaled breath showed an individual-specific breath profile and high reciprocal homogeneity among samples, with strong agreement among technical replicates, suggesting a robust responsiveness of SESI-MS. Supervised analysis of breath data identified a support vector machine (SVM) model including 8 features corresponding to m/z 106, 126, 147, 78, 148, 52, 128, 315 and able to discriminate exhaled breath from breast cancer patients from that of healthy individuals, with sensitivity and specificity above 0.9.Our data highlight the significance of SESI-MS as an analytical technique for clinical studies of breath analysis and provide evidence that our noninvasive strategy detects volatile signatures that may support existing technologies to diagnose breast cancer.

  5. Multivariate statistical analysis software technologies for astrophysical research involving large data bases

    NASA Technical Reports Server (NTRS)

    Djorgovski, S. George

    1994-01-01

    We developed a package to process and analyze the data from the digital version of the Second Palomar Sky Survey. This system, called SKICAT, incorporates the latest in machine learning and expert systems software technology, in order to classify the detected objects objectively and uniformly, and facilitate handling of the enormous data sets from digital sky surveys and other sources. The system provides a powerful, integrated environment for the manipulation and scientific investigation of catalogs from virtually any source. It serves three principal functions: image catalog construction, catalog management, and catalog analysis. Through use of the GID3* Decision Tree artificial induction software, SKICAT automates the process of classifying objects within CCD and digitized plate images. To exploit these catalogs, the system also provides tools to merge them into a large, complete database which may be easily queried and modified when new data or better methods of calibrating or classifying become available. The most innovative feature of SKICAT is the facility it provides to experiment with and apply the latest in machine learning technology to the tasks of catalog construction and analysis. SKICAT provides a unique environment for implementing these tools for any number of future scientific purposes. Initial scientific verification and performance tests have been made using galaxy counts and measurements of galaxy clustering from small subsets of the survey data, and a search for very high redshift quasars. All of the tests were successful, and produced new and interesting scientific results. Attachments to this report give detailed accounts of the technical aspects for multivariate statistical analysis of small and moderate-size data sets, called STATPROG. The package was tested extensively on a number of real scientific applications, and has produced real, published results.

  6. GAFFE: a gaze-attentive fixation finding engine.

    PubMed

    Rajashekar, U; van der Linde, I; Bovik, A C; Cormack, L K

    2008-04-01

    The ability to automatically detect visually interesting regions in images has many practical applications, especially in the design of active machine vision and automatic visual surveillance systems. Analysis of the statistics of image features at observers' gaze can provide insights into the mechanisms of fixation selection in humans. Using a foveated analysis framework, we studied the statistics of four low-level local image features: luminance, contrast, and bandpass outputs of both luminance and contrast, and discovered that image patches around human fixations had, on average, higher values of each of these features than image patches selected at random. Contrast-bandpass showed the greatest difference between human and random fixations, followed by luminance-bandpass, RMS contrast, and luminance. Using these measurements, we present a new algorithm that selects image regions as likely candidates for fixation. These regions are shown to correlate well with fixations recorded from human observers.

  7. Geographically Sourcing Cocaine's Origin - Delineation of the Nineteen Major Coca Growing Regions in South America.

    PubMed

    Mallette, Jennifer R; Casale, John F; Jordan, James; Morello, David R; Beyer, Paul M

    2016-03-23

    Previously, geo-sourcing to five major coca growing regions within South America was accomplished. However, the expansion of coca cultivation throughout South America made sub-regional origin determinations increasingly difficult. The former methodology was recently enhanced with additional stable isotope analyses ((2)H and (18)O) to fully characterize cocaine due to the varying environmental conditions in which the coca was grown. An improved data analysis method was implemented with the combination of machine learning and multivariate statistical analysis methods to provide further partitioning between growing regions. Here, we show how the combination of trace cocaine alkaloids, stable isotopes, and multivariate statistical analyses can be used to classify illicit cocaine as originating from one of 19 growing regions within South America. The data obtained through this approach can be used to describe current coca cultivation and production trends, highlight trafficking routes, as well as identify new coca growing regions.

  8. Geographically Sourcing Cocaine’s Origin - Delineation of the Nineteen Major Coca Growing Regions in South America

    NASA Astrophysics Data System (ADS)

    Mallette, Jennifer R.; Casale, John F.; Jordan, James; Morello, David R.; Beyer, Paul M.

    2016-03-01

    Previously, geo-sourcing to five major coca growing regions within South America was accomplished. However, the expansion of coca cultivation throughout South America made sub-regional origin determinations increasingly difficult. The former methodology was recently enhanced with additional stable isotope analyses (2H and 18O) to fully characterize cocaine due to the varying environmental conditions in which the coca was grown. An improved data analysis method was implemented with the combination of machine learning and multivariate statistical analysis methods to provide further partitioning between growing regions. Here, we show how the combination of trace cocaine alkaloids, stable isotopes, and multivariate statistical analyses can be used to classify illicit cocaine as originating from one of 19 growing regions within South America. The data obtained through this approach can be used to describe current coca cultivation and production trends, highlight trafficking routes, as well as identify new coca growing regions.

  9. Pathogenesis-based treatments in primary Sjogren's syndrome using artificial intelligence and advanced machine learning techniques: a systematic literature review.

    PubMed

    Foulquier, Nathan; Redou, Pascal; Le Gal, Christophe; Rouvière, Bénédicte; Pers, Jacques-Olivier; Saraux, Alain

    2018-05-17

    Big data analysis has become a common way to extract information from complex and large datasets among most scientific domains. This approach is now used to study large cohorts of patients in medicine. This work is a review of publications that have used artificial intelligence and advanced machine learning techniques to study physio pathogenesis-based treatments in pSS. A systematic literature review retrieved all articles reporting on the use of advanced statistical analysis applied to the study of systemic autoimmune diseases (SADs) over the last decade. An automatic bibliography screening method has been developed to perform this task. The program called BIBOT was designed to fetch and analyze articles from the pubmed database using a list of keywords and Natural Language Processing approaches. The evolution of trends in statistical approaches, sizes of cohorts and number of publications over this period were also computed in the process. In all, 44077 abstracts were screened and 1017 publications were analyzed. The mean number of selected articles was 101.0 (S.D. 19.16) by year, but increased significantly over the time (from 74 articles in 2008 to 138 in 2017). Among them only 12 focused on pSS but none of them emphasized on the aspect of pathogenesis-based treatments. To conclude, medicine progressively enters the era of big data analysis and artificial intelligence, but these approaches are not yet used to describe pSS-specific pathogenesis-based treatment. Nevertheless, large multicentre studies are investigating this aspect with advanced algorithmic tools on large cohorts of SADs patients.

  10. On Statistical Analysis of Neuroimages with Imperfect Registration

    PubMed Central

    Kim, Won Hwa; Ravi, Sathya N.; Johnson, Sterling C.; Okonkwo, Ozioma C.; Singh, Vikas

    2016-01-01

    A variety of studies in neuroscience/neuroimaging seek to perform statistical inference on the acquired brain image scans for diagnosis as well as understanding the pathological manifestation of diseases. To do so, an important first step is to register (or co-register) all of the image data into a common coordinate system. This permits meaningful comparison of the intensities at each voxel across groups (e.g., diseased versus healthy) to evaluate the effects of the disease and/or use machine learning algorithms in a subsequent step. But errors in the underlying registration make this problematic, they either decrease the statistical power or make the follow-up inference tasks less effective/accurate. In this paper, we derive a novel algorithm which offers immunity to local errors in the underlying deformation field obtained from registration procedures. By deriving a deformation invariant representation of the image, the downstream analysis can be made more robust as if one had access to a (hypothetical) far superior registration procedure. Our algorithm is based on recent work on scattering transform. Using this as a starting point, we show how results from harmonic analysis (especially, non-Euclidean wavelets) yields strategies for designing deformation and additive noise invariant representations of large 3-D brain image volumes. We present a set of results on synthetic and real brain images where we achieve robust statistical analysis even in the presence of substantial deformation errors; here, standard analysis procedures significantly under-perform and fail to identify the true signal. PMID:27042168

  11. Biomechanical analysis of tension band fixation for olecranon fracture treatment.

    PubMed

    Kozin, S H; Berglund, L J; Cooney, W P; Morrey, B F; An, K N

    1996-01-01

    This study assessed the strength of various tension band fixation methods with wire and cable applied to simulated olecranon fractures to compare stability and potential failure or complications between the two. Transverse olecranon fractures were simulated by osteotomy. The fracture was anatomically reduced, and various tension band fixation techniques were applied with monofilament wire or multifilament cable. With a material testing machine load displacement curves were obtained and statistical relevance determined by analysis of variance. Two loading modes were tested: loading on the posterior surface of olecranon to simulate triceps pull and loading on the anterior olecranon tip to recreate a potential compressive loading on the fragment during the resistive flexion. All fixation methods were more resistant to posterior loading than to an anterior load. Individual comparative analysis for various loading conditions concluded that tension band fixation is more resilient to tensile forces exerted by the triceps than compressive forces on the anterior olecranon tip. Neither wire passage anterior to the K-wires nor the multifilament cable provided statistically significant increased stability.

  12. Comparison of wear behaviour and mechanical properties of as-cast Al6082 and Al6082-T6 using statistical analysis

    NASA Astrophysics Data System (ADS)

    Rani Rana, Sandhya; Pattnaik, A. B.; Patnaik, S. C.

    2018-03-01

    In the present work the wear behavior and mechanical properties of as cast A16082 and A16086-T6 were compared and analyzed using statistical analysis. The as cast Al6082 alloy was solutionized at 550°C, quenched and artificially aged at 170°C for 8hrs. Metallographic examination and XRD analysis revealed the presence of intermetallic compounds Al6Mn.Hardness of heat treated Al6082 was found to be more than as cast sample. Wear tests were carried out using Pin on Disc wear testing machine according to Taguchi L9 orthogonal array. Experiments were conducted under normal load 10-30N, sliding speed 1-3m/s, sliding distance 400,800,1200m respectively. Sliding speed was found to be the dominant factor for wear in both as cast and aged Al 6082 alloy. Sliding distance increases the wear rate up to 800m and then after it decreases.

  13. The same analysis approach: Practical protection against the pitfalls of novel neuroimaging analysis methods.

    PubMed

    Görgen, Kai; Hebart, Martin N; Allefeld, Carsten; Haynes, John-Dylan

    2017-12-27

    Standard neuroimaging data analysis based on traditional principles of experimental design, modelling, and statistical inference is increasingly complemented by novel analysis methods, driven e.g. by machine learning methods. While these novel approaches provide new insights into neuroimaging data, they often have unexpected properties, generating a growing literature on possible pitfalls. We propose to meet this challenge by adopting a habit of systematic testing of experimental design, analysis procedures, and statistical inference. Specifically, we suggest to apply the analysis method used for experimental data also to aspects of the experimental design, simulated confounds, simulated null data, and control data. We stress the importance of keeping the analysis method the same in main and test analyses, because only this way possible confounds and unexpected properties can be reliably detected and avoided. We describe and discuss this Same Analysis Approach in detail, and demonstrate it in two worked examples using multivariate decoding. With these examples, we reveal two sources of error: A mismatch between counterbalancing (crossover designs) and cross-validation which leads to systematic below-chance accuracies, and linear decoding of a nonlinear effect, a difference in variance. Copyright © 2017 Elsevier Inc. All rights reserved.

  14. Machine learning methods can replace 3D profile method in classification of amyloidogenic hexapeptides.

    PubMed

    Stanislawski, Jerzy; Kotulska, Malgorzata; Unold, Olgierd

    2013-01-17

    Amyloids are proteins capable of forming fibrils. Many of them underlie serious diseases, like Alzheimer disease. The number of amyloid-associated diseases is constantly increasing. Recent studies indicate that amyloidogenic properties can be associated with short segments of aminoacids, which transform the structure when exposed. A few hundreds of such peptides have been experimentally found. Experimental testing of all possible aminoacid combinations is currently not feasible. Instead, they can be predicted by computational methods. 3D profile is a physicochemical-based method that has generated the most numerous dataset - ZipperDB. However, it is computationally very demanding. Here, we show that dataset generation can be accelerated. Two methods to increase the classification efficiency of amyloidogenic candidates are presented and tested: simplified 3D profile generation and machine learning methods. We generated a new dataset of hexapeptides, using more economical 3D profile algorithm, which showed very good classification overlap with ZipperDB (93.5%). The new part of our dataset contains 1779 segments, with 204 classified as amyloidogenic. The dataset of 6-residue sequences with their binary classification, based on the energy of the segment, was applied for training machine learning methods. A separate set of sequences from ZipperDB was used as a test set. The most effective methods were Alternating Decision Tree and Multilayer Perceptron. Both methods obtained area under ROC curve of 0.96, accuracy 91%, true positive rate ca. 78%, and true negative rate 95%. A few other machine learning methods also achieved a good performance. The computational time was reduced from 18-20 CPU-hours (full 3D profile) to 0.5 CPU-hours (simplified 3D profile) to seconds (machine learning). We showed that the simplified profile generation method does not introduce an error with regard to the original method, while increasing the computational efficiency. Our new dataset proved representative enough to use simple statistical methods for testing the amylogenicity based only on six letter sequences. Statistical machine learning methods such as Alternating Decision Tree and Multilayer Perceptron can replace the energy based classifier, with advantage of very significantly reduced computational time and simplicity to perform the analysis. Additionally, a decision tree provides a set of very easily interpretable rules.

  15. Characterization of Machine Variability and Progressive Heat Treatment in Selective Laser Melting of Inconel 718

    NASA Technical Reports Server (NTRS)

    Prater, T.; Tilson, W.; Jones, Z.

    2015-01-01

    The absence of an economy of scale in spaceflight hardware makes additive manufacturing an immensely attractive option for propulsion components. As additive manufacturing techniques are increasingly adopted by government and industry to produce propulsion hardware in human-rated systems, significant development efforts are needed to establish these methods as reliable alternatives to conventional subtractive manufacturing. One of the critical challenges facing powder bed fusion techniques in this application is variability between machines used to perform builds. Even with implementation of robust process controls, it is possible for two machines operating at identical parameters with equivalent base materials to produce specimens with slightly different material properties. The machine variability study presented here evaluates 60 specimens of identical geometry built using the same parameters. 30 samples were produced on machine 1 (M1) and the other 30 samples were built on machine 2 (M2). Each of the 30-sample sets were further subdivided into three subsets (with 10 specimens in each subset) to assess the effect of progressive heat treatment on machine variability. The three categories for post-processing were: stress relief, stress relief followed by hot isostatic press (HIP), and stress relief followed by HIP followed by heat treatment per AMS 5664. Each specimen (a round, smooth tensile) was mechanically tested per ASTM E8. Two formal statistical techniques, hypothesis testing for equivalency of means and one-way analysis of variance (ANOVA), were applied to characterize the impact of machine variability and heat treatment on six material properties: tensile stress, yield stress, modulus of elasticity, fracture elongation, and reduction of area. This work represents the type of development effort that is critical as NASA, academia, and the industrial base work collaboratively to establish a path to certification for additively manufactured parts. For future flight programs, NASA and its commercial partners will procure parts from vendors who will use a diverse range of machines to produce parts and, as such, it is essential that the AM community develop a sound understanding of the degree to which machine variability impacts material properties.

  16. Advanced Telecommunications Technologies in Rural Communities: Factors Affecting Use.

    ERIC Educational Resources Information Center

    Leistritz, F. Larry; Allen, John C.; Johnson, Bruce B.; Olsen, Duane; Sell, Randy

    1997-01-01

    A survey of 2,000 rural residents in 6 states (36% response) found that 56% used answering machines, 48% fax machines, 46% personal computers, 27% cell phones, and 25% modems. Higher use was associated with higher income and education. Distance from the nearest metropolitan statistical area increased use. A large majority believed…

  17. OFFICE MACHINES USED IN BUSINESS TODAY.

    ERIC Educational Resources Information Center

    COOK, FRED S.; MALICHE, ELEANOR

    INTERVIEWS OF 239 BUSINESSES OF THE BAY CITY STANDARD METROPOLITAN STATISTICAL AREA OF MICHIGAN PROVIDED INFORMATION ON (1) THE TYPE AND NUMBER OF MACHINES USED IN BUSINESS, (2) THE TRAINING DEMANDED BY EMPLOYERS FOR PERSONNEL USING THIS OFFICE EQUIPMENT, (3) THE EXTENT OF ON-THE-JOB TRAINING GIVEN BY EMPLOYERS, (4) THE IMPLICATIONS FOR VOCATIONAL…

  18. Specification of a new de-stoner machine: evaluation of machining effects on olive paste's rheology and olive oil yield and quality.

    PubMed

    Romaniello, Roberto; Leone, Alessandro; Tamborrino, Antonia

    2017-01-01

    An industrial prototype of a partial de-stoner machine was specified, built and implemented in an industrial olive oil extraction plant. The partial de-stoner machine was compared to the traditional mechanical crusher to assess its quantitative and qualitative performance. The extraction efficiency of the olive oil extraction plant, olive oil quality, sensory evaluation and rheological aspects were investigated. The results indicate that by using the partial de-stoner machine the extraction plant did not show statistical differences with respect to the traditional mechanical crushing. Moreover, the partial de-stoner machine allowed recovery of 60% of olive pits and the oils obtained were characterised by more marked green fruitiness, flavour and aroma than the oils produced using the traditional processing systems. The partial de-stoner machine removes the limitations of the traditional total de-stoner machine, opening new frontiers for the recovery of pits to be used as biomass. Moreover, the partial de-stoner machine permitted a significant reduction in the viscosity of the olive paste. © 2016 Society of Chemical Industry. © 2016 Society of Chemical Industry.

  19. Knowledge-based machine indexing from natural language text: Knowledge base design, development, and maintenance

    NASA Technical Reports Server (NTRS)

    Genuardi, Michael T.

    1993-01-01

    One strategy for machine-aided indexing (MAI) is to provide a concept-level analysis of the textual elements of documents or document abstracts. In such systems, natural-language phrases are analyzed in order to identify and classify concepts related to a particular subject domain. The overall performance of these MAI systems is largely dependent on the quality and comprehensiveness of their knowledge bases. These knowledge bases function to (1) define the relations between a controlled indexing vocabulary and natural language expressions; (2) provide a simple mechanism for disambiguation and the determination of relevancy; and (3) allow the extension of concept-hierarchical structure to all elements of the knowledge file. After a brief description of the NASA Machine-Aided Indexing system, concerns related to the development and maintenance of MAI knowledge bases are discussed. Particular emphasis is given to statistically-based text analysis tools designed to aid the knowledge base developer. One such tool, the Knowledge Base Building (KBB) program, presents the domain expert with a well-filtered list of synonyms and conceptually-related phrases for each thesaurus concept. Another tool, the Knowledge Base Maintenance (KBM) program, functions to identify areas of the knowledge base affected by changes in the conceptual domain (for example, the addition of a new thesaurus term). An alternate use of the KBM as an aid in thesaurus construction is also discussed.

  20. As above, so below? Towards understanding inverse models in BCI

    NASA Astrophysics Data System (ADS)

    Lindgren, Jussi T.

    2018-02-01

    Objective. In brain-computer interfaces (BCI), measurements of the user’s brain activity are classified into commands for the computer. With EEG-based BCIs, the origins of the classified phenomena are often considered to be spatially localized in the cortical volume and mixed in the EEG. We investigate if more accurate BCIs can be obtained by reconstructing the source activities in the volume. Approach. We contrast the physiology-driven source reconstruction with data-driven representations obtained by statistical machine learning. We explain these approaches in a common linear dictionary framework and review the different ways to obtain the dictionary parameters. We consider the effect of source reconstruction on some major difficulties in BCI classification, namely information loss, feature selection and nonstationarity of the EEG. Main results. Our analysis suggests that the approaches differ mainly in their parameter estimation. Physiological source reconstruction may thus be expected to improve BCI accuracy if machine learning is not used or where it produces less optimal parameters. We argue that the considered difficulties of surface EEG classification can remain in the reconstructed volume and that data-driven techniques are still necessary. Finally, we provide some suggestions for comparing approaches. Significance. The present work illustrates the relationships between source reconstruction and machine learning-based approaches for EEG data representation. The provided analysis and discussion should help in understanding, applying, comparing and improving such techniques in the future.

  1. Detecting epileptic seizure with different feature extracting strategies using robust machine learning classification techniques by applying advance parameter optimization approach.

    PubMed

    Hussain, Lal

    2018-06-01

    Epilepsy is a neurological disorder produced due to abnormal excitability of neurons in the brain. The research reveals that brain activity is monitored through electroencephalogram (EEG) of patients suffered from seizure to detect the epileptic seizure. The performance of EEG detection based epilepsy require feature extracting strategies. In this research, we have extracted varying features extracting strategies based on time and frequency domain characteristics, nonlinear, wavelet based entropy and few statistical features. A deeper study was undertaken using novel machine learning classifiers by considering multiple factors. The support vector machine kernels are evaluated based on multiclass kernel and box constraint level. Likewise, for K-nearest neighbors (KNN), we computed the different distance metrics, Neighbor weights and Neighbors. Similarly, the decision trees we tuned the paramours based on maximum splits and split criteria and ensemble classifiers are evaluated based on different ensemble methods and learning rate. For training/testing tenfold Cross validation was employed and performance was evaluated in form of TPR, NPR, PPV, accuracy and AUC. In this research, a deeper analysis approach was performed using diverse features extracting strategies using robust machine learning classifiers with more advanced optimal options. Support Vector Machine linear kernel and KNN with City block distance metric give the overall highest accuracy of 99.5% which was higher than using the default parameters for these classifiers. Moreover, highest separation (AUC = 0.9991, 0.9990) were obtained at different kernel scales using SVM. Additionally, the K-nearest neighbors with inverse squared distance weight give higher performance at different Neighbors. Moreover, to distinguish the postictal heart rate oscillations from epileptic ictal subjects, and highest performance of 100% was obtained using different machine learning classifiers.

  2. Controlling corrosion rate of Magnesium alloy using powder mixed electrical discharge machining

    NASA Astrophysics Data System (ADS)

    Razak, M. A.; Rani, A. M. A.; Saad, N. M.; Littlefair, G.; Aliyu, A. A.

    2018-04-01

    Biomedical implant can be divided into permanent and temporary employment. The duration of a temporary implant applied to children and adult is different due to different bone healing rate among the children and adult. Magnesium and its alloys are compatible for the biodegradable implanting application. Nevertheless, it is difficult to control the degradation rate of magnesium alloy to suit the application on both the children and adult. Powder mixed electrical discharge machining (PM-EDM) method, a modified EDM process, has high capability to improve the EDM process efficiency and machined surface quality. The objective of this paper is to establish a formula to control the degradation rate of magnesium alloy using the PM-EDM method. The different corrosion rate of machined surface is hypothesized to be obtained by having different combinations of PM-EDM operation inputs. PM-EDM experiments are conducted using an opened-loop PM-EDM system and the in-vitro corrosion tests are carried out on the machined surface of each specimen. There are four operation inputs investigated in this study which are zinc powder concentration, peak current, pulse on-time and pulse off-time. The results indicate that zinc powder concentration is significantly affecting the response with 2 g/l of zinc powder concentration obtaining the lowest corrosion rate. The high localized temperature at the cutting zone in spark erosion process causes some of the zinc particles get deposited on the machined surface, hence improving the surface characteristics. The suspended zinc particles in the dielectric fluid have also improve the sparking efficiency and the uniformity of sparks distribution. From the statistical analysis, a formula was developed to control the corrosion rate of magnesium alloy within the range from 0.000183 mm/year to 0.001528 mm/year.

  3. Analysis of motion during the breast clamping phase of mammography

    PubMed Central

    McEntee, Mark F; Mercer, Claire; Kelly, Judith; Millington, Sara; Hogg, Peter

    2016-01-01

    Objective: To measure paddle motion during the clamping phase of a breast phantom for a range of machine/paddle combinations. Methods: A deformable breast phantom was used to simulate a female breast. 12 mammography machines from three manufacturers with 22 flexible and 20 fixed paddles were evaluated. Vertical motion at the paddle was measured using two calibrated linear potentiometers. For each paddle, the motion in millimetres was recorded every 0.5 s for 40 s, while the phantom was compressed with 80 N. Independent t-tests were used to determine differences in paddle motion between flexible and fixed, small and large, GE Senographe Essential (General Electric Medical Systems, Milwaukee, WI) and Hologic Selenia Dimensions paddles (Hologic, Bedford, MA). Paddle tilt in the medial–lateral plane for each machine/paddle combination was calculated. Results: All machine/paddle combinations demonstrate highest levels of motion during the first 10 s of the clamping phase. The least motion is 0.17 ± 0.05 mm/10 s (n = 20) and the most motion is 0.51 ± 0.15 mm/10 s (n = 80). There is a statistical difference in paddle motion between fixed and flexible (p < 0.001), GE Senographe Essential and Hologic Selenia Dimensions paddles (p < 0.001). Paddle tilt in the medial–lateral plane is independent of time and varied from 0.04 ° to 0.69 °. Conclusion: All machine/paddle combinations exhibited motion and tilting, and the extent varied with machine and paddle sizes and types. Advances in knowledge: This research suggests that image blurring will likely be clinically insignificant 4 s or more after the clamping phase commences. PMID:26739577

  4. Can machine learning complement traditional medical device surveillance? A case study of dual-chamber implantable cardioverter–defibrillators

    PubMed Central

    Ross, Joseph S; Bates, Jonathan; Parzynski, Craig S; Akar, Joseph G; Curtis, Jeptha P; Desai, Nihar R; Freeman, James V; Gamble, Ginger M; Kuntz, Richard; Li, Shu-Xia; Marinac-Dabic, Danica; Masoudi, Frederick A; Normand, Sharon-Lise T; Ranasinghe, Isuru; Shaw, Richard E; Krumholz, Harlan M

    2017-01-01

    Background Machine learning methods may complement traditional analytic methods for medical device surveillance. Methods and results Using data from the National Cardiovascular Data Registry for implantable cardioverter–defibrillators (ICDs) linked to Medicare administrative claims for longitudinal follow-up, we applied three statistical approaches to safety-signal detection for commonly used dual-chamber ICDs that used two propensity score (PS) models: one specified by subject-matter experts (PS-SME), and the other one by machine learning-based selection (PS-ML). The first approach used PS-SME and cumulative incidence (time-to-event), the second approach used PS-SME and cumulative risk (Data Extraction and Longitudinal Trend Analysis [DELTA]), and the third approach used PS-ML and cumulative risk (embedded feature selection). Safety-signal surveillance was conducted for eleven dual-chamber ICD models implanted at least 2,000 times over 3 years. Between 2006 and 2010, there were 71,948 Medicare fee-for-service beneficiaries who received dual-chamber ICDs. Cumulative device-specific unadjusted 3-year event rates varied for three surveyed safety signals: death from any cause, 12.8%–20.9%; nonfatal ICD-related adverse events, 19.3%–26.3%; and death from any cause or nonfatal ICD-related adverse event, 27.1%–37.6%. Agreement among safety signals detected/not detected between the time-to-event and DELTA approaches was 90.9% (360 of 396, k=0.068), between the time-to-event and embedded feature-selection approaches was 91.7% (363 of 396, k=−0.028), and between the DELTA and embedded feature selection approaches was 88.1% (349 of 396, k=−0.042). Conclusion Three statistical approaches, including one machine learning method, identified important safety signals, but without exact agreement. Ensemble methods may be needed to detect all safety signals for further evaluation during medical device surveillance. PMID:28860874

  5. Spectral methods in machine learning and new strategies for very large datasets

    PubMed Central

    Belabbas, Mohamed-Ali; Wolfe, Patrick J.

    2009-01-01

    Spectral methods are of fundamental importance in statistics and machine learning, because they underlie algorithms from classical principal components analysis to more recent approaches that exploit manifold structure. In most cases, the core technical problem can be reduced to computing a low-rank approximation to a positive-definite kernel. For the growing number of applications dealing with very large or high-dimensional datasets, however, the optimal approximation afforded by an exact spectral decomposition is too costly, because its complexity scales as the cube of either the number of training examples or their dimensionality. Motivated by such applications, we present here 2 new algorithms for the approximation of positive-semidefinite kernels, together with error bounds that improve on results in the literature. We approach this problem by seeking to determine, in an efficient manner, the most informative subset of our data relative to the kernel approximation task at hand. This leads to two new strategies based on the Nyström method that are directly applicable to massive datasets. The first of these—based on sampling—leads to a randomized algorithm whereupon the kernel induces a probability distribution on its set of partitions, whereas the latter approach—based on sorting—provides for the selection of a partition in a deterministic way. We detail their numerical implementation and provide simulation results for a variety of representative problems in statistical data analysis, each of which demonstrates the improved performance of our approach relative to existing methods. PMID:19129490

  6. Comparison Analysis of Recognition Algorithms of Forest-Cover Objects on Hyperspectral Air-Borne and Space-Borne Images

    NASA Astrophysics Data System (ADS)

    Kozoderov, V. V.; Kondranin, T. V.; Dmitriev, E. V.

    2017-12-01

    The basic model for the recognition of natural and anthropogenic objects using their spectral and textural features is described in the problem of hyperspectral air-borne and space-borne imagery processing. The model is based on improvements of the Bayesian classifier that is a computational procedure of statistical decision making in machine-learning methods of pattern recognition. The principal component method is implemented to decompose the hyperspectral measurements on the basis of empirical orthogonal functions. Application examples are shown of various modifications of the Bayesian classifier and Support Vector Machine method. Examples are provided of comparing these classifiers and a metrical classifier that operates on finding the minimal Euclidean distance between different points and sets in the multidimensional feature space. A comparison is also carried out with the " K-weighted neighbors" method that is close to the nonparametric Bayesian classifier.

  7. Machine learning, medical diagnosis, and biomedical engineering research - commentary.

    PubMed

    Foster, Kenneth R; Koprowski, Robert; Skufca, Joseph D

    2014-07-05

    A large number of papers are appearing in the biomedical engineering literature that describe the use of machine learning techniques to develop classifiers for detection or diagnosis of disease. However, the usefulness of this approach in developing clinically validated diagnostic techniques so far has been limited and the methods are prone to overfitting and other problems which may not be immediately apparent to the investigators. This commentary is intended to help sensitize investigators as well as readers and reviewers of papers to some potential pitfalls in the development of classifiers, and suggests steps that researchers can take to help avoid these problems. Building classifiers should be viewed not simply as an add-on statistical analysis, but as part and parcel of the experimental process. Validation of classifiers for diagnostic applications should be considered as part of a much larger process of establishing the clinical validity of the diagnostic technique.

  8. Sensor fusion III: 3-D perception and recognition; Proceedings of the Meeting, Boston, MA, Nov. 5-8, 1990

    NASA Technical Reports Server (NTRS)

    Schenker, Paul S. (Editor)

    1991-01-01

    The volume on data fusion from multiple sources discusses fusing multiple views, temporal analysis and 3D motion interpretation, sensor fusion and eye-to-hand coordination, and integration in human shape perception. Attention is given to surface reconstruction, statistical methods in sensor fusion, fusing sensor data with environmental knowledge, computational models for sensor fusion, and evaluation and selection of sensor fusion techniques. Topics addressed include the structure of a scene from two and three projections, optical flow techniques for moving target detection, tactical sensor-based exploration in a robotic environment, and the fusion of human and machine skills for remote robotic operations. Also discussed are K-nearest-neighbor concepts for sensor fusion, surface reconstruction with discontinuities, a sensor-knowledge-command fusion paradigm for man-machine systems, coordinating sensing and local navigation, and terrain map matching using multisensing techniques for applications to autonomous vehicle navigation.

  9. Workshop on Algorithms for Time-Series Analysis

    NASA Astrophysics Data System (ADS)

    Protopapas, Pavlos

    2012-04-01

    abstract-type="normal">SummaryThis Workshop covered the four major subjects listed below in two 90-minute sessions. Each talk or tutorial allowed questions, and concluded with a discussion. Classification: Automatic classification using machine-learning methods is becoming a standard in surveys that generate large datasets. Ashish Mahabal (Caltech) reviewed various methods, and presented examples of several applications. Time-Series Modelling: Suzanne Aigrain (Oxford University) discussed autoregressive models and multivariate approaches such as Gaussian Processes. Meta-classification/mixture of expert models: Karim Pichara (Pontificia Universidad Católica, Chile) described the substantial promise which machine-learning classification methods are now showing in automatic classification, and discussed how the various methods can be combined together. Event Detection: Pavlos Protopapas (Harvard) addressed methods of fast identification of events with low signal-to-noise ratios, enlarging on the characterization and statistical issues of low signal-to-noise ratios and rare events.

  10. Texture classification of lung computed tomography images

    NASA Astrophysics Data System (ADS)

    Pheng, Hang See; Shamsuddin, Siti M.

    2013-03-01

    Current development of algorithms in computer-aided diagnosis (CAD) scheme is growing rapidly to assist the radiologist in medical image interpretation. Texture analysis of computed tomography (CT) scans is one of important preliminary stage in the computerized detection system and classification for lung cancer. Among different types of images features analysis, Haralick texture with variety of statistical measures has been used widely in image texture description. The extraction of texture feature values is essential to be used by a CAD especially in classification of the normal and abnormal tissue on the cross sectional CT images. This paper aims to compare experimental results using texture extraction and different machine leaning methods in the classification normal and abnormal tissues through lung CT images. The machine learning methods involve in this assessment are Artificial Immune Recognition System (AIRS), Naive Bayes, Decision Tree (J48) and Backpropagation Neural Network. AIRS is found to provide high accuracy (99.2%) and sensitivity (98.0%) in the assessment. For experiments and testing purpose, publicly available datasets in the Reference Image Database to Evaluate Therapy Response (RIDER) are used as study cases.

  11. Intellicount: High-Throughput Quantification of Fluorescent Synaptic Protein Puncta by Machine Learning

    PubMed Central

    Fantuzzo, J. A.; Mirabella, V. R.; Zahn, J. D.

    2017-01-01

    Abstract Synapse formation analyses can be performed by imaging and quantifying fluorescent signals of synaptic markers. Traditionally, these analyses are done using simple or multiple thresholding and segmentation approaches or by labor-intensive manual analysis by a human observer. Here, we describe Intellicount, a high-throughput, fully-automated synapse quantification program which applies a novel machine learning (ML)-based image processing algorithm to systematically improve region of interest (ROI) identification over simple thresholding techniques. Through processing large datasets from both human and mouse neurons, we demonstrate that this approach allows image processing to proceed independently of carefully set thresholds, thus reducing the need for human intervention. As a result, this method can efficiently and accurately process large image datasets with minimal interaction by the experimenter, making it less prone to bias and less liable to human error. Furthermore, Intellicount is integrated into an intuitive graphical user interface (GUI) that provides a set of valuable features, including automated and multifunctional figure generation, routine statistical analyses, and the ability to run full datasets through nested folders, greatly expediting the data analysis process. PMID:29218324

  12. Machine learning approaches to analysing textual injury surveillance data: a systematic review.

    PubMed

    Vallmuur, Kirsten

    2015-06-01

    To synthesise recent research on the use of machine learning approaches to mining textual injury surveillance data. Systematic review. The electronic databases which were searched included PubMed, Cinahl, Medline, Google Scholar, and Proquest. The bibliography of all relevant articles was examined and associated articles were identified using a snowballing technique. For inclusion, articles were required to meet the following criteria: (a) used a health-related database, (b) focused on injury-related cases, AND used machine learning approaches to analyse textual data. The papers identified through the search were screened resulting in 16 papers selected for review. Articles were reviewed to describe the databases and methodology used, the strength and limitations of different techniques, and quality assurance approaches used. Due to heterogeneity between studies meta-analysis was not performed. Occupational injuries were the focus of half of the machine learning studies and the most common methods described were Bayesian probability or Bayesian network based methods to either predict injury categories or extract common injury scenarios. Models were evaluated through either comparison with gold standard data or content expert evaluation or statistical measures of quality. Machine learning was found to provide high precision and accuracy when predicting a small number of categories, was valuable for visualisation of injury patterns and prediction of future outcomes. However, difficulties related to generalizability, source data quality, complexity of models and integration of content and technical knowledge were discussed. The use of narrative text for injury surveillance has grown in popularity, complexity and quality over recent years. With advances in data mining techniques, increased capacity for analysis of large databases, and involvement of computer scientists in the injury prevention field, along with more comprehensive use and description of quality assurance methods in text mining approaches, it is likely that we will see a continued growth and advancement in knowledge of text mining in the injury field. Copyright © 2015 Elsevier Ltd. All rights reserved.

  13. Machine Learning in Medicine.

    PubMed

    Deo, Rahul C

    2015-11-17

    Spurred by advances in processing power, memory, storage, and an unprecedented wealth of data, computers are being asked to tackle increasingly complex learning tasks, often with astonishing success. Computers have now mastered a popular variant of poker, learned the laws of physics from experimental data, and become experts in video games - tasks that would have been deemed impossible not too long ago. In parallel, the number of companies centered on applying complex data analysis to varying industries has exploded, and it is thus unsurprising that some analytic companies are turning attention to problems in health care. The purpose of this review is to explore what problems in medicine might benefit from such learning approaches and use examples from the literature to introduce basic concepts in machine learning. It is important to note that seemingly large enough medical data sets and adequate learning algorithms have been available for many decades, and yet, although there are thousands of papers applying machine learning algorithms to medical data, very few have contributed meaningfully to clinical care. This lack of impact stands in stark contrast to the enormous relevance of machine learning to many other industries. Thus, part of my effort will be to identify what obstacles there may be to changing the practice of medicine through statistical learning approaches, and discuss how these might be overcome. © 2015 American Heart Association, Inc.

  14. Application of machine learning and expert systems to Statistical Process Control (SPC) chart interpretation

    NASA Technical Reports Server (NTRS)

    Shewhart, Mark

    1991-01-01

    Statistical Process Control (SPC) charts are one of several tools used in quality control. Other tools include flow charts, histograms, cause and effect diagrams, check sheets, Pareto diagrams, graphs, and scatter diagrams. A control chart is simply a graph which indicates process variation over time. The purpose of drawing a control chart is to detect any changes in the process signalled by abnormal points or patterns on the graph. The Artificial Intelligence Support Center (AISC) of the Acquisition Logistics Division has developed a hybrid machine learning expert system prototype which automates the process of constructing and interpreting control charts.

  15. Assessing Continuous Operator Workload With a Hybrid Scaffolded Neuroergonomic Modeling Approach.

    PubMed

    Borghetti, Brett J; Giametta, Joseph J; Rusnock, Christina F

    2017-02-01

    We aimed to predict operator workload from neurological data using statistical learning methods to fit neurological-to-state-assessment models. Adaptive systems require real-time mental workload assessment to perform dynamic task allocations or operator augmentation as workload issues arise. Neuroergonomic measures have great potential for informing adaptive systems, and we combine these measures with models of task demand as well as information about critical events and performance to clarify the inherent ambiguity of interpretation. We use machine learning algorithms on electroencephalogram (EEG) input to infer operator workload based upon Improved Performance Research Integration Tool workload model estimates. Cross-participant models predict workload of other participants, statistically distinguishing between 62% of the workload changes. Machine learning models trained from Monte Carlo resampled workload profiles can be used in place of deterministic workload profiles for cross-participant modeling without incurring a significant decrease in machine learning model performance, suggesting that stochastic models can be used when limited training data are available. We employed a novel temporary scaffold of simulation-generated workload profile truth data during the model-fitting process. A continuous workload profile serves as the target to train our statistical machine learning models. Once trained, the workload profile scaffolding is removed and the trained model is used directly on neurophysiological data in future operator state assessments. These modeling techniques demonstrate how to use neuroergonomic methods to develop operator state assessments, which can be employed in adaptive systems.

  16. Extracting laboratory test information from biomedical text

    PubMed Central

    Kang, Yanna Shen; Kayaalp, Mehmet

    2013-01-01

    Background: No previous study reported the efficacy of current natural language processing (NLP) methods for extracting laboratory test information from narrative documents. This study investigates the pathology informatics question of how accurately such information can be extracted from text with the current tools and techniques, especially machine learning and symbolic NLP methods. The study data came from a text corpus maintained by the U.S. Food and Drug Administration, containing a rich set of information on laboratory tests and test devices. Methods: The authors developed a symbolic information extraction (SIE) system to extract device and test specific information about four types of laboratory test entities: Specimens, analytes, units of measures and detection limits. They compared the performance of SIE and three prominent machine learning based NLP systems, LingPipe, GATE and BANNER, each implementing a distinct supervised machine learning method, hidden Markov models, support vector machines and conditional random fields, respectively. Results: Machine learning systems recognized laboratory test entities with moderately high recall, but low precision rates. Their recall rates were relatively higher when the number of distinct entity values (e.g., the spectrum of specimens) was very limited or when lexical morphology of the entity was distinctive (as in units of measures), yet SIE outperformed them with statistically significant margins on extracting specimen, analyte and detection limit information in both precision and F-measure. Its high recall performance was statistically significant on analyte information extraction. Conclusions: Despite its shortcomings against machine learning methods, a well-tailored symbolic system may better discern relevancy among a pile of information of the same type and may outperform a machine learning system by tapping into lexically non-local contextual information such as the document structure. PMID:24083058

  17. Analysis in Motion Initiative – Human Machine Intelligence

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Blaha, Leslie

    As computers and machines become more pervasive in our everyday lives, we are looking for ways for humans and machines to work more intelligently together. How can we help machines understand their users so the team can do smarter things together? The Analysis in Motion Initiative is advancing the science of human machine intelligence — creating human-machine teams that work better together to make correct, useful, and timely interpretations of data.

  18. 3D Visualization of Machine Learning Algorithms with Astronomical Data

    NASA Astrophysics Data System (ADS)

    Kent, Brian R.

    2016-01-01

    We present innovative machine learning (ML) methods using unsupervised clustering with minimum spanning trees (MSTs) to study 3D astronomical catalogs. Utilizing Python code to build trees based on galaxy catalogs, we can render the results with the visualization suite Blender to produce interactive 360 degree panoramic videos. The catalogs and their ML results can be explored in a 3D space using mobile devices, tablets or desktop browsers. We compare the statistics of the MST results to a number of machine learning methods relating to optimization and efficiency.

  19. Machine Learning Prediction of the Energy Gap of Graphene Nanoflakes Using Topological Autocorrelation Vectors.

    PubMed

    Fernandez, Michael; Abreu, Jose I; Shi, Hongqing; Barnard, Amanda S

    2016-11-14

    The possibility of band gap engineering in graphene opens countless new opportunities for application in nanoelectronics. In this work, the energy gaps of 622 computationally optimized graphene nanoflakes were mapped to topological autocorrelation vectors using machine learning techniques. Machine learning modeling revealed that the most relevant correlations appear at topological distances in the range of 1 to 42 with prediction accuracy higher than 80%. The data-driven model can statistically discriminate between graphene nanoflakes with different energy gaps on the basis of their molecular topology.

  20. What subject matter questions motivate the use of machine learning approaches compared to statistical models for probability prediction?

    PubMed

    Binder, Harald

    2014-07-01

    This is a discussion of the following papers: "Probability estimation with machine learning methods for dichotomous and multicategory outcome: Theory" by Jochen Kruppa, Yufeng Liu, Gérard Biau, Michael Kohler, Inke R. König, James D. Malley, and Andreas Ziegler; and "Probability estimation with machine learning methods for dichotomous and multicategory outcome: Applications" by Jochen Kruppa, Yufeng Liu, Hans-Christian Diener, Theresa Holste, Christian Weimar, Inke R. König, and Andreas Ziegler. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  1. Laser-treated stainless steel mini-screw implants: 3D surface roughness, bone-implant contact, and fracture resistance analysis

    PubMed Central

    Kang, He-Kyong; Chu, Tien-Min; Dechow, Paul; Stewart, Kelton; Kyung, Hee-Moon

    2016-01-01

    Summary Background/Objectives: This study investigated the biomechanical properties and bone-implant intersurface response of machined and laser surface-treated stainless steel (SS) mini-screw implants (MSIs). Material and Methods: Forty-eight 1.3mm in diameter and 6mm long SS MSIs were divided into two groups. The control (machined surface) group received no surface treatment; the laser-treated group received Nd-YAG laser surface treatment. Half in each group was used for examining surface roughness (Sa and Sq), surface texture, and facture resistance. The remaining MSIs were placed in the maxilla of six skeletally mature male beagle dogs in a randomized split-mouth design. A pair with the same surface treatment was placed on the same side and immediately loaded with 200g nickel–titanium coil springs for 8 weeks. After killing, the bone-implant contact (BIC) for each MSI was calculated using micro computed tomography. Analysis of variance model and two-sample t test were used for statistical analysis with a significance level of P <0.05. Results: The mean values of Sa and Sq were significantly higher in the laser-treated group compared with the machined group (P <0.05). There were no significant differences in fracture resistance and BIC between the two groups. Limitation: animal study Conclusions/Implications: Laser treatment increased surface roughness without compromising fracture resistance. Despite increasing surface roughness, laser treatment did not improve BIC. Overall, it appears that medical grade SS has the potential to be substituted for titanium alloy MSIs. PMID:25908868

  2. Advantages of Synthetic Noise and Machine Learning for Analyzing Radioecological Data Sets.

    PubMed

    Shuryak, Igor

    2017-01-01

    The ecological effects of accidental or malicious radioactive contamination are insufficiently understood because of the hazards and difficulties associated with conducting studies in radioactively-polluted areas. Data sets from severely contaminated locations can therefore be small. Moreover, many potentially important factors, such as soil concentrations of toxic chemicals, pH, and temperature, can be correlated with radiation levels and with each other. In such situations, commonly-used statistical techniques like generalized linear models (GLMs) may not be able to provide useful information about how radiation and/or these other variables affect the outcome (e.g. abundance of the studied organisms). Ensemble machine learning methods such as random forests offer powerful alternatives. We propose that analysis of small radioecological data sets by GLMs and/or machine learning can be made more informative by using the following techniques: (1) adding synthetic noise variables to provide benchmarks for distinguishing the performances of valuable predictors from irrelevant ones; (2) adding noise directly to the predictors and/or to the outcome to test the robustness of analysis results against random data fluctuations; (3) adding artificial effects to selected predictors to test the sensitivity of the analysis methods in detecting predictor effects; (4) running a selected machine learning method multiple times (with different random-number seeds) to test the robustness of the detected "signal"; (5) using several machine learning methods to test the "signal's" sensitivity to differences in analysis techniques. Here, we applied these approaches to simulated data, and to two published examples of small radioecological data sets: (I) counts of fungal taxa in samples of soil contaminated by the Chernobyl nuclear power plan accident (Ukraine), and (II) bacterial abundance in soil samples under a ruptured nuclear waste storage tank (USA). We show that the proposed techniques were advantageous compared with the methodology used in the original publications where the data sets were presented. Specifically, our approach identified a negative effect of radioactive contamination in data set I, and suggested that in data set II stable chromium could have been a stronger limiting factor for bacterial abundance than the radionuclides 137Cs and 99Tc. This new information, which was extracted from these data sets using the proposed techniques, can potentially enhance the design of radioactive waste bioremediation.

  3. Advantages of Synthetic Noise and Machine Learning for Analyzing Radioecological Data Sets

    PubMed Central

    Shuryak, Igor

    2017-01-01

    The ecological effects of accidental or malicious radioactive contamination are insufficiently understood because of the hazards and difficulties associated with conducting studies in radioactively-polluted areas. Data sets from severely contaminated locations can therefore be small. Moreover, many potentially important factors, such as soil concentrations of toxic chemicals, pH, and temperature, can be correlated with radiation levels and with each other. In such situations, commonly-used statistical techniques like generalized linear models (GLMs) may not be able to provide useful information about how radiation and/or these other variables affect the outcome (e.g. abundance of the studied organisms). Ensemble machine learning methods such as random forests offer powerful alternatives. We propose that analysis of small radioecological data sets by GLMs and/or machine learning can be made more informative by using the following techniques: (1) adding synthetic noise variables to provide benchmarks for distinguishing the performances of valuable predictors from irrelevant ones; (2) adding noise directly to the predictors and/or to the outcome to test the robustness of analysis results against random data fluctuations; (3) adding artificial effects to selected predictors to test the sensitivity of the analysis methods in detecting predictor effects; (4) running a selected machine learning method multiple times (with different random-number seeds) to test the robustness of the detected “signal”; (5) using several machine learning methods to test the “signal’s” sensitivity to differences in analysis techniques. Here, we applied these approaches to simulated data, and to two published examples of small radioecological data sets: (I) counts of fungal taxa in samples of soil contaminated by the Chernobyl nuclear power plan accident (Ukraine), and (II) bacterial abundance in soil samples under a ruptured nuclear waste storage tank (USA). We show that the proposed techniques were advantageous compared with the methodology used in the original publications where the data sets were presented. Specifically, our approach identified a negative effect of radioactive contamination in data set I, and suggested that in data set II stable chromium could have been a stronger limiting factor for bacterial abundance than the radionuclides 137Cs and 99Tc. This new information, which was extracted from these data sets using the proposed techniques, can potentially enhance the design of radioactive waste bioremediation. PMID:28068401

  4. Reliability analysis of component of affination centrifugal 1 machine by using reliability engineering

    NASA Astrophysics Data System (ADS)

    Sembiring, N.; Ginting, E.; Darnello, T.

    2017-12-01

    Problems that appear in a company that produces refined sugar, the production floor has not reached the level of critical machine availability because it often suffered damage (breakdown). This results in a sudden loss of production time and production opportunities. This problem can be solved by Reliability Engineering method where the statistical approach to historical damage data is performed to see the pattern of the distribution. The method can provide a value of reliability, rate of damage, and availability level, of an machine during the maintenance time interval schedule. The result of distribution test to time inter-damage data (MTTF) flexible hose component is lognormal distribution while component of teflon cone lifthing is weibull distribution. While from distribution test to mean time of improvement (MTTR) flexible hose component is exponential distribution while component of teflon cone lifthing is weibull distribution. The actual results of the flexible hose component on the replacement schedule per 720 hours obtained reliability of 0.2451 and availability 0.9960. While on the critical components of teflon cone lifthing actual on the replacement schedule per 1944 hours obtained reliability of 0.4083 and availability 0.9927.

  5. Phenotyping: Using Machine Learning for Improved Pairwise Genotype Classification Based on Root Traits

    PubMed Central

    Zhao, Jiangsan; Bodner, Gernot; Rewald, Boris

    2016-01-01

    Phenotyping local crop cultivars is becoming more and more important, as they are an important genetic source for breeding – especially in regard to inherent root system architectures. Machine learning algorithms are promising tools to assist in the analysis of complex data sets; novel approaches are need to apply them on root phenotyping data of mature plants. A greenhouse experiment was conducted in large, sand-filled columns to differentiate 16 European Pisum sativum cultivars based on 36 manually derived root traits. Through combining random forest and support vector machine models, machine learning algorithms were successfully used for unbiased identification of most distinguishing root traits and subsequent pairwise cultivar differentiation. Up to 86% of pea cultivar pairs could be distinguished based on top five important root traits (Timp5) – Timp5 differed widely between cultivar pairs. Selecting top important root traits (Timp) provided a significant improved classification compared to using all available traits or randomly selected trait sets. The most frequent Timp of mature pea cultivars was total surface area of lateral roots originating from tap root segments at 0–5 cm depth. The high classification rate implies that culturing did not lead to a major loss of variability in root system architecture in the studied pea cultivars. Our results illustrate the potential of machine learning approaches for unbiased (root) trait selection and cultivar classification based on rather small, complex phenotypic data sets derived from pot experiments. Powerful statistical approaches are essential to make use of the increasing amount of (root) phenotyping information, integrating the complex trait sets describing crop cultivars. PMID:27999587

  6. Statistical and Machine Learning forecasting methods: Concerns and ways forward

    PubMed Central

    Makridakis, Spyros; Assimakopoulos, Vassilios

    2018-01-01

    Machine Learning (ML) methods have been proposed in the academic literature as alternatives to statistical ones for time series forecasting. Yet, scant evidence is available about their relative performance in terms of accuracy and computational requirements. The purpose of this paper is to evaluate such performance across multiple forecasting horizons using a large subset of 1045 monthly time series used in the M3 Competition. After comparing the post-sample accuracy of popular ML methods with that of eight traditional statistical ones, we found that the former are dominated across both accuracy measures used and for all forecasting horizons examined. Moreover, we observed that their computational requirements are considerably greater than those of statistical methods. The paper discusses the results, explains why the accuracy of ML models is below that of statistical ones and proposes some possible ways forward. The empirical results found in our research stress the need for objective and unbiased ways to test the performance of forecasting methods that can be achieved through sizable and open competitions allowing meaningful comparisons and definite conclusions. PMID:29584784

  7. Recent advances in environmental data mining

    NASA Astrophysics Data System (ADS)

    Leuenberger, Michael; Kanevski, Mikhail

    2016-04-01

    Due to the large amount and complexity of data available nowadays in geo- and environmental sciences, we face the need to develop and incorporate more robust and efficient methods for their analysis, modelling and visualization. An important part of these developments deals with an elaboration and application of a contemporary and coherent methodology following the process from data collection to the justification and communication of the results. Recent fundamental progress in machine learning (ML) can considerably contribute to the development of the emerging field - environmental data science. The present research highlights and investigates the different issues that can occur when dealing with environmental data mining using cutting-edge machine learning algorithms. In particular, the main attention is paid to the description of the self-consistent methodology and two efficient algorithms - Random Forest (RF, Breiman, 2001) and Extreme Learning Machines (ELM, Huang et al., 2006), which recently gained a great popularity. Despite the fact that they are based on two different concepts, i.e. decision trees vs artificial neural networks, they both propose promising results for complex, high dimensional and non-linear data modelling. In addition, the study discusses several important issues of data driven modelling, including feature selection and uncertainties. The approach considered is accompanied by simulated and real data case studies from renewable resources assessment and natural hazards tasks. In conclusion, the current challenges and future developments in statistical environmental data learning are discussed. References - Breiman, L., 2001. Random Forests. Machine Learning 45 (1), 5-32. - Huang, G.-B., Zhu, Q.-Y., Siew, C.-K., 2006. Extreme learning machine: theory and applications. Neurocomputing 70 (1-3), 489-501. - Kanevski, M., Pozdnoukhov, A., Timonin, V., 2009. Machine Learning for Spatial Environmental Data. EPFL Press; Lausanne, Switzerland, p.392. - Leuenberger, M., Kanevski, M., 2015. Extreme Learning Machines for spatial environmental data. Computers and Geosciences 85, 64-73.

  8. Evaluation of a New Spraying Machine for Barrier Treatment and Penetration of Bifenthrin on Vegetation Against Mosquitoes

    DTIC Science & Technology

    2015-03-01

    one at the University of Florida Veterinary Entomology Laboratory (UF- VEL). Leaf samples for both laboratories were collected together. All samples...Mulla’s formula (Mulla et al. 1971): % reduction 5 100 2 (C1/T1 3 T2/C2) 3 100. The C1 variable was the mean number of mosquitoes from the control site...statistical analysis was performed using JMP 11.1 software (SAS Insti- tute Inc., Cary, NC). Treatment mortality was corrected with Abbott’s formula

  9. Do two machine-learning based prognostic signatures for breast cancer capture the same biological processes?

    PubMed

    Drier, Yotam; Domany, Eytan

    2011-03-14

    The fact that there is very little if any overlap between the genes of different prognostic signatures for early-discovery breast cancer is well documented. The reasons for this apparent discrepancy have been explained by the limits of simple machine-learning identification and ranking techniques, and the biological relevance and meaning of the prognostic gene lists was questioned. Subsequently, proponents of the prognostic gene lists claimed that different lists do capture similar underlying biological processes and pathways. The present study places under scrutiny the validity of this claim, for two important gene lists that are at the focus of current large-scale validation efforts. We performed careful enrichment analysis, controlling the effects of multiple testing in a manner which takes into account the nested dependent structure of gene ontologies. In contradiction to several previous publications, we find that the only biological process or pathway for which statistically significant concordance can be claimed is cell proliferation, a process whose relevance and prognostic value was well known long before gene expression profiling. We found that the claims reported by others, of wider concordance between the biological processes captured by the two prognostic signatures studied, were found either to be lacking statistical rigor or were in fact based on addressing some other question.

  10. Integrating Statistical Machine Learning in a Semantic Sensor Web for Proactive Monitoring and Control.

    PubMed

    Adeleke, Jude Adekunle; Moodley, Deshendran; Rens, Gavin; Adewumi, Aderemi Oluyinka

    2017-04-09

    Proactive monitoring and control of our natural and built environments is important in various application scenarios. Semantic Sensor Web technologies have been well researched and used for environmental monitoring applications to expose sensor data for analysis in order to provide responsive actions in situations of interest. While these applications provide quick response to situations, to minimize their unwanted effects, research efforts are still necessary to provide techniques that can anticipate the future to support proactive control, such that unwanted situations can be averted altogether. This study integrates a statistical machine learning based predictive model in a Semantic Sensor Web using stream reasoning. The approach is evaluated in an indoor air quality monitoring case study. A sliding window approach that employs the Multilayer Perceptron model to predict short term PM 2 . 5 pollution situations is integrated into the proactive monitoring and control framework. Results show that the proposed approach can effectively predict short term PM 2 . 5 pollution situations: precision of up to 0.86 and sensitivity of up to 0.85 is achieved over half hour prediction horizons, making it possible for the system to warn occupants or even to autonomously avert the predicted pollution situations within the context of Semantic Sensor Web.

  11. Integrating Statistical Machine Learning in a Semantic Sensor Web for Proactive Monitoring and Control

    PubMed Central

    Adeleke, Jude Adekunle; Moodley, Deshendran; Rens, Gavin; Adewumi, Aderemi Oluyinka

    2017-01-01

    Proactive monitoring and control of our natural and built environments is important in various application scenarios. Semantic Sensor Web technologies have been well researched and used for environmental monitoring applications to expose sensor data for analysis in order to provide responsive actions in situations of interest. While these applications provide quick response to situations, to minimize their unwanted effects, research efforts are still necessary to provide techniques that can anticipate the future to support proactive control, such that unwanted situations can be averted altogether. This study integrates a statistical machine learning based predictive model in a Semantic Sensor Web using stream reasoning. The approach is evaluated in an indoor air quality monitoring case study. A sliding window approach that employs the Multilayer Perceptron model to predict short term PM2.5 pollution situations is integrated into the proactive monitoring and control framework. Results show that the proposed approach can effectively predict short term PM2.5 pollution situations: precision of up to 0.86 and sensitivity of up to 0.85 is achieved over half hour prediction horizons, making it possible for the system to warn occupants or even to autonomously avert the predicted pollution situations within the context of Semantic Sensor Web. PMID:28397776

  12. A Hierarchical Multivariate Bayesian Approach to Ensemble Model output Statistics in Atmospheric Prediction

    DTIC Science & Technology

    2017-09-01

    efficacy of statistical post-processing methods downstream of these dynamical model components with a hierarchical multivariate Bayesian approach to...Bayesian hierarchical modeling, Markov chain Monte Carlo methods , Metropolis algorithm, machine learning, atmospheric prediction 15. NUMBER OF PAGES...scale processes. However, this dissertation explores the efficacy of statistical post-processing methods downstream of these dynamical model components

  13. Implementation of novel statistical procedures and other advanced approaches to improve analysis of CASA data.

    PubMed

    Ramón, M; Martínez-Pastor, F

    2018-04-23

    Computer-aided sperm analysis (CASA) produces a wealth of data that is frequently ignored. The use of multiparametric statistical methods can help explore these datasets, unveiling the subpopulation structure of sperm samples. In this review we analyse the significance of the internal heterogeneity of sperm samples and its relevance. We also provide a brief description of the statistical tools used for extracting sperm subpopulations from the datasets, namely unsupervised clustering (with non-hierarchical, hierarchical and two-step methods) and the most advanced supervised methods, based on machine learning. The former method has allowed exploration of subpopulation patterns in many species, whereas the latter offering further possibilities, especially considering functional studies and the practical use of subpopulation analysis. We also consider novel approaches, such as the use of geometric morphometrics or imaging flow cytometry. Finally, although the data provided by CASA systems provides valuable information on sperm samples by applying clustering analyses, there are several caveats. Protocols for capturing and analysing motility or morphometry should be standardised and adapted to each experiment, and the algorithms should be open in order to allow comparison of results between laboratories. Moreover, we must be aware of new technology that could change the paradigm for studying sperm motility and morphology.

  14. Analysis and design of asymmetrical reluctance machine

    NASA Astrophysics Data System (ADS)

    Harianto, Cahya A.

    Over the past few decades the induction machine has been chosen for many applications due to its structural simplicity and low manufacturing cost. However, modest torque density and control challenges have motivated researchers to find alternative machines. The permanent magnet synchronous machine has been viewed as one of the alternatives because it features higher torque density for a given loss than the induction machine. However, the assembly and permanent magnet material cost, along with safety under fault conditions, have been concerns for this class of machine. An alternative machine type, namely the asymmetrical reluctance machine, is proposed in this work. Since the proposed machine is of the reluctance machine type, it possesses desirable feature, such as near absence of rotor losses, low assembly cost, low no-load rotational losses, modest torque ripple, and rather benign fault conditions. Through theoretical analysis performed herein, it is shown that this machine has a higher torque density for a given loss than typical reluctance machines, although not as high as the permanent magnet machines. Thus, the asymmetrical reluctance machine is a viable and advantageous machine alternative where the use of permanent magnet machines are undesirable.

  15. On-line Machine Learning and Event Detection in Petascale Data Streams

    NASA Astrophysics Data System (ADS)

    Thompson, David R.; Wagstaff, K. L.

    2012-01-01

    Traditional statistical data mining involves off-line analysis in which all data are available and equally accessible. However, petascale datasets have challenged this premise since it is often impossible to store, let alone analyze, the relevant observations. This has led the machine learning community to investigate adaptive processing chains where data mining is a continuous process. Here pattern recognition permits triage and followup decisions at multiple stages of a processing pipeline. Such techniques can also benefit new astronomical instruments such as the Large Synoptic Survey Telescope (LSST) and Square Kilometre Array (SKA) that will generate petascale data volumes. We summarize some machine learning perspectives on real time data mining, with representative cases of astronomical applications and event detection in high volume datastreams. The first is a "supervised classification" approach currently used for transient event detection at the Very Long Baseline Array (VLBA). It injects known signals of interest - faint single-pulse anomalies - and tunes system parameters to recover these events. This permits meaningful event detection for diverse instrument configurations and observing conditions whose noise cannot be well-characterized in advance. Second, "semi-supervised novelty detection" finds novel events based on statistical deviations from previous patterns. It detects outlier signals of interest while considering known examples of false alarm interference. Applied to data from the Parkes pulsar survey, the approach identifies anomalous "peryton" phenomena that do not match previous event models. Finally, we consider online light curve classification that can trigger adaptive followup measurements of candidate events. Classifier performance analyses suggest optimal survey strategies, and permit principled followup decisions from incomplete data. These examples trace a broad range of algorithm possibilities available for online astronomical data mining. This talk describes research performed at the Jet Propulsion Laboratory, California Institute of Technology. Copyright 2012, All Rights Reserved. U.S. Government support acknowledged.

  16. Comparison of four statistical and machine learning methods for crash severity prediction.

    PubMed

    Iranitalab, Amirfarrokh; Khattak, Aemal

    2017-11-01

    Crash severity prediction models enable different agencies to predict the severity of a reported crash with unknown severity or the severity of crashes that may be expected to occur sometime in the future. This paper had three main objectives: comparison of the performance of four statistical and machine learning methods including Multinomial Logit (MNL), Nearest Neighbor Classification (NNC), Support Vector Machines (SVM) and Random Forests (RF), in predicting traffic crash severity; developing a crash costs-based approach for comparison of crash severity prediction methods; and investigating the effects of data clustering methods comprising K-means Clustering (KC) and Latent Class Clustering (LCC), on the performance of crash severity prediction models. The 2012-2015 reported crash data from Nebraska, United States was obtained and two-vehicle crashes were extracted as the analysis data. The dataset was split into training/estimation (2012-2014) and validation (2015) subsets. The four prediction methods were trained/estimated using the training/estimation dataset and the correct prediction rates for each crash severity level, overall correct prediction rate and a proposed crash costs-based accuracy measure were obtained for the validation dataset. The correct prediction rates and the proposed approach showed NNC had the best prediction performance in overall and in more severe crashes. RF and SVM had the next two sufficient performances and MNL was the weakest method. Data clustering did not affect the prediction results of SVM, but KC improved the prediction performance of MNL, NNC and RF, while LCC caused improvement in MNL and RF but weakened the performance of NNC. Overall correct prediction rate had almost the exact opposite results compared to the proposed approach, showing that neglecting the crash costs can lead to misjudgment in choosing the right prediction method. Copyright © 2017 Elsevier Ltd. All rights reserved.

  17. Detecting Abnormal Word Utterances in Children With Autism Spectrum Disorders: Machine-Learning-Based Voice Analysis Versus Speech Therapists.

    PubMed

    Nakai, Yasushi; Takiguchi, Tetsuya; Matsui, Gakuyo; Yamaoka, Noriko; Takada, Satoshi

    2017-10-01

    Abnormal prosody is often evident in the voice intonations of individuals with autism spectrum disorders. We compared a machine-learning-based voice analysis with human hearing judgments made by 10 speech therapists for classifying children with autism spectrum disorders ( n = 30) and typical development ( n = 51). Using stimuli limited to single-word utterances, machine-learning-based voice analysis was superior to speech therapist judgments. There was a significantly higher true-positive than false-negative rate for machine-learning-based voice analysis but not for speech therapists. Results are discussed in terms of some artificiality of clinician judgments based on single-word utterances, and the objectivity machine-learning-based voice analysis adds to judging abnormal prosody.

  18. Machine Learning Methods for Analysis of Metabolic Data and Metabolic Pathway Modeling

    PubMed Central

    Cuperlovic-Culf, Miroslava

    2018-01-01

    Machine learning uses experimental data to optimize clustering or classification of samples or features, or to develop, augment or verify models that can be used to predict behavior or properties of systems. It is expected that machine learning will help provide actionable knowledge from a variety of big data including metabolomics data, as well as results of metabolism models. A variety of machine learning methods has been applied in bioinformatics and metabolism analyses including self-organizing maps, support vector machines, the kernel machine, Bayesian networks or fuzzy logic. To a lesser extent, machine learning has also been utilized to take advantage of the increasing availability of genomics and metabolomics data for the optimization of metabolic network models and their analysis. In this context, machine learning has aided the development of metabolic networks, the calculation of parameters for stoichiometric and kinetic models, as well as the analysis of major features in the model for the optimal application of bioreactors. Examples of this very interesting, albeit highly complex, application of machine learning for metabolism modeling will be the primary focus of this review presenting several different types of applications for model optimization, parameter determination or system analysis using models, as well as the utilization of several different types of machine learning technologies. PMID:29324649

  19. Machine Learning Methods for Analysis of Metabolic Data and Metabolic Pathway Modeling.

    PubMed

    Cuperlovic-Culf, Miroslava

    2018-01-11

    Machine learning uses experimental data to optimize clustering or classification of samples or features, or to develop, augment or verify models that can be used to predict behavior or properties of systems. It is expected that machine learning will help provide actionable knowledge from a variety of big data including metabolomics data, as well as results of metabolism models. A variety of machine learning methods has been applied in bioinformatics and metabolism analyses including self-organizing maps, support vector machines, the kernel machine, Bayesian networks or fuzzy logic. To a lesser extent, machine learning has also been utilized to take advantage of the increasing availability of genomics and metabolomics data for the optimization of metabolic network models and their analysis. In this context, machine learning has aided the development of metabolic networks, the calculation of parameters for stoichiometric and kinetic models, as well as the analysis of major features in the model for the optimal application of bioreactors. Examples of this very interesting, albeit highly complex, application of machine learning for metabolism modeling will be the primary focus of this review presenting several different types of applications for model optimization, parameter determination or system analysis using models, as well as the utilization of several different types of machine learning technologies.

  20. A glossary for big data in population and public health: discussion and commentary on terminology and research methods.

    PubMed

    Fuller, Daniel; Buote, Richard; Stanley, Kevin

    2017-11-01

    The volume and velocity of data are growing rapidly and big data analytics are being applied to these data in many fields. Population and public health researchers may be unfamiliar with the terminology and statistical methods used in big data. This creates a barrier to the application of big data analytics. The purpose of this glossary is to define terms used in big data and big data analytics and to contextualise these terms. We define the five Vs of big data and provide definitions and distinctions for data mining, machine learning and deep learning, among other terms. We provide key distinctions between big data and statistical analysis methods applied to big data. We contextualise the glossary by providing examples where big data analysis methods have been applied to population and public health research problems and provide brief guidance on how to learn big data analysis methods. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  1. A graphical user interface for RAId, a knowledge integrated proteomics analysis suite with accurate statistics.

    PubMed

    Joyce, Brendan; Lee, Danny; Rubio, Alex; Ogurtsov, Aleksey; Alves, Gelio; Yu, Yi-Kuo

    2018-03-15

    RAId is a software package that has been actively developed for the past 10 years for computationally and visually analyzing MS/MS data. Founded on rigorous statistical methods, RAId's core program computes accurate E-values for peptides and proteins identified during database searches. Making this robust tool readily accessible for the proteomics community by developing a graphical user interface (GUI) is our main goal here. We have constructed a graphical user interface to facilitate the use of RAId on users' local machines. Written in Java, RAId_GUI not only makes easy executions of RAId but also provides tools for data/spectra visualization, MS-product analysis, molecular isotopic distribution analysis, and graphing the retrieval versus the proportion of false discoveries. The results viewer displays and allows the users to download the analyses results. Both the knowledge-integrated organismal databases and the code package (containing source code, the graphical user interface, and a user manual) are available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads/raid.html .

  2. Continuous EEG signal analysis for asynchronous BCI application.

    PubMed

    Hsu, Wei-Yen

    2011-08-01

    In this study, we propose a two-stage recognition system for continuous analysis of electroencephalogram (EEG) signals. An independent component analysis (ICA) and correlation coefficient are used to automatically eliminate the electrooculography (EOG) artifacts. Based on the continuous wavelet transform (CWT) and Student's two-sample t-statistics, active segment selection then detects the location of active segment in the time-frequency domain. Next, multiresolution fractal feature vectors (MFFVs) are extracted with the proposed modified fractal dimension from wavelet data. Finally, the support vector machine (SVM) is adopted for the robust classification of MFFVs. The EEG signals are continuously analyzed in 1-s segments, and every 0.5 second moves forward to simulate asynchronous BCI works in the two-stage recognition architecture. The segment is first recognized as lifted or not in the first stage, and then is classified as left or right finger lifting at stage two if the segment is recognized as lifting in the first stage. Several statistical analyses are used to evaluate the performance of the proposed system. The results indicate that it is a promising system in the applications of asynchronous BCI work.

  3. A Developmental Approach to Machine Learning?

    PubMed Central

    Smith, Linda B.; Slone, Lauren K.

    2017-01-01

    Visual learning depends on both the algorithms and the training material. This essay considers the natural statistics of infant- and toddler-egocentric vision. These natural training sets for human visual object recognition are very different from the training data fed into machine vision systems. Rather than equal experiences with all kinds of things, toddlers experience extremely skewed distributions with many repeated occurrences of a very few things. And though highly variable when considered as a whole, individual views of things are experienced in a specific order – with slow, smooth visual changes moment-to-moment, and developmentally ordered transitions in scene content. We propose that the skewed, ordered, biased visual experiences of infants and toddlers are the training data that allow human learners to develop a way to recognize everything, both the pervasively present entities and the rarely encountered ones. The joint consideration of real-world statistics for learning by researchers of human and machine learning seems likely to bring advances in both disciplines. PMID:29259573

  4. Feature recognition and detection for ancient architecture based on machine vision

    NASA Astrophysics Data System (ADS)

    Zou, Zheng; Wang, Niannian; Zhao, Peng; Zhao, Xuefeng

    2018-03-01

    Ancient architecture has a very high historical and artistic value. The ancient buildings have a wide variety of textures and decorative paintings, which contain a lot of historical meaning. Therefore, the research and statistics work of these different compositional and decorative features play an important role in the subsequent research. However, until recently, the statistics of those components are mainly by artificial method, which consumes a lot of labor and time, inefficiently. At present, as the strong support of big data and GPU accelerated training, machine vision with deep learning as the core has been rapidly developed and widely used in many fields. This paper proposes an idea to recognize and detect the textures, decorations and other features of ancient building based on machine vision. First, classify a large number of surface textures images of ancient building components manually as a set of samples. Then, using the convolution neural network to train the samples in order to get a classification detector. Finally verify its precision.

  5. Geographically Sourcing Cocaine’s Origin – Delineation of the Nineteen Major Coca Growing Regions in South America

    PubMed Central

    Mallette, Jennifer R.; Casale, John F.; Jordan, James; Morello, David R.; Beyer, Paul M.

    2016-01-01

    Previously, geo-sourcing to five major coca growing regions within South America was accomplished. However, the expansion of coca cultivation throughout South America made sub-regional origin determinations increasingly difficult. The former methodology was recently enhanced with additional stable isotope analyses (2H and 18O) to fully characterize cocaine due to the varying environmental conditions in which the coca was grown. An improved data analysis method was implemented with the combination of machine learning and multivariate statistical analysis methods to provide further partitioning between growing regions. Here, we show how the combination of trace cocaine alkaloids, stable isotopes, and multivariate statistical analyses can be used to classify illicit cocaine as originating from one of 19 growing regions within South America. The data obtained through this approach can be used to describe current coca cultivation and production trends, highlight trafficking routes, as well as identify new coca growing regions. PMID:27006288

  6. Machine learning: Trends, perspectives, and prospects.

    PubMed

    Jordan, M I; Mitchell, T M

    2015-07-17

    Machine learning addresses the question of how to build computers that improve automatically through experience. It is one of today's most rapidly growing technical fields, lying at the intersection of computer science and statistics, and at the core of artificial intelligence and data science. Recent progress in machine learning has been driven both by the development of new learning algorithms and theory and by the ongoing explosion in the availability of online data and low-cost computation. The adoption of data-intensive machine-learning methods can be found throughout science, technology and commerce, leading to more evidence-based decision-making across many walks of life, including health care, manufacturing, education, financial modeling, policing, and marketing. Copyright © 2015, American Association for the Advancement of Science.

  7. A New Mathematical Framework for Design Under Uncertainty

    DTIC Science & Technology

    2016-05-05

    blending multiple information sources via auto-regressive stochastic modeling. A computationally efficient machine learning framework is developed based on...sion and machine learning approaches; see Fig. 1. This will lead to a comprehensive description of system performance with less uncertainty than in the...Bayesian optimization of super-cavitating hy- drofoils The goal of this study is to demonstrate the capabilities of statistical learning and

  8. Machine Learning Predictions of a Multiresolution Climate Model Ensemble

    NASA Astrophysics Data System (ADS)

    Anderson, Gemma J.; Lucas, Donald D.

    2018-05-01

    Statistical models of high-resolution climate models are useful for many purposes, including sensitivity and uncertainty analyses, but building them can be computationally prohibitive. We generated a unique multiresolution perturbed parameter ensemble of a global climate model. We use a novel application of a machine learning technique known as random forests to train a statistical model on the ensemble to make high-resolution model predictions of two important quantities: global mean top-of-atmosphere energy flux and precipitation. The random forests leverage cheaper low-resolution simulations, greatly reducing the number of high-resolution simulations required to train the statistical model. We demonstrate that high-resolution predictions of these quantities can be obtained by training on an ensemble that includes only a small number of high-resolution simulations. We also find that global annually averaged precipitation is more sensitive to resolution changes than to any of the model parameters considered.

  9. Machine rates for selected forest harvesting machines

    Treesearch

    R.W. Brinker; J. Kinard; Robert Rummer; B. Lanford

    2002-01-01

    Very little new literature has been published on the subject of machine rates and machine cost analysis since 1989 when the Alabama Agricultural Experiment Station Circular 296, Machine Rates for Selected Forest Harvesting Machines, was originally published. Many machines discussed in the original publication have undergone substantial changes in various aspects, not...

  10. Nowcasting Cloud Fields for U.S. Air Force Special Operations

    DTIC Science & Technology

    2017-03-01

    application of Bayes’ Rule offers many advantages over Kernel Density Estimation (KDE) and other commonly used statistical post-processing methods...reflectance and probability of cloud. A statistical post-processing technique is applied using Bayesian estimation to train the system from a set of past...nowcasting, low cloud forecasting, cloud reflectance, ISR, Bayesian estimation, statistical post-processing, machine learning 15. NUMBER OF PAGES

  11. Statistical sex determination from craniometrics: Comparison of linear discriminant analysis, logistic regression, and support vector machines.

    PubMed

    Santos, Frédéric; Guyomarc'h, Pierre; Bruzek, Jaroslav

    2014-12-01

    Accuracy of identification tools in forensic anthropology primarily rely upon the variations inherent in the data upon which they are built. Sex determination methods based on craniometrics are widely used and known to be specific to several factors (e.g. sample distribution, population, age, secular trends, measurement technique, etc.). The goal of this study is to discuss the potential variations linked to the statistical treatment of the data. Traditional craniometrics of four samples extracted from documented osteological collections (from Portugal, France, the U.S.A., and Thailand) were used to test three different classification methods: linear discriminant analysis (LDA), logistic regression (LR), and support vector machines (SVM). The Portuguese sample was set as a training model on which the other samples were applied in order to assess the validity and reliability of the different models. The tests were performed using different parameters: some included the selection of the best predictors; some included a strict decision threshold (sex assessed only if the related posterior probability was high, including the notion of indeterminate result); and some used an unbalanced sex-ratio. Results indicated that LR tends to perform slightly better than the other techniques and offers a better selection of predictors. Also, the use of a decision threshold (i.e. p>0.95) is essential to ensure an acceptable reliability of sex determination methods based on craniometrics. Although the Portuguese, French, and American samples share a similar sexual dimorphism, application of Western models on the Thai sample (that displayed a lower degree of dimorphism) was unsuccessful. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  12. The Efficacy of Machine Learning Programs for Navy Manpower Analysis

    DTIC Science & Technology

    1993-03-01

    This thesis investigated the efficacy of two machine learning programs for Navy manpower analysis. Two machine learning programs, AIM and IXL, were...to generate models from the two commercial machine learning programs. Using a held out sub-set of the data the capabilities of the three models were...partial effects. The author recommended further investigation of AIM’s capabilities, and testing in an operational environment.... Machine learning , AIM, IXL.

  13. Development of 300 mesh Soy Bean Crusher for Tofu Material Processing

    NASA Astrophysics Data System (ADS)

    Lee, E. S.; Pratama, P. S.; Supeno, D.; Jeong, S. W.; Byun, J. Y.; Woo, J. H.; Park, C. S.; Choi, W. S.

    2018-03-01

    A machine such as bean crusher machine is subjected to different loads and vibration. Due to this vibration there will be certain deformations which affect the performance of the machine in adverse manner. This paper proposed a vibration analysis of bean crusher machine using ANSYS. The effect of vibration on the structure was studied in order to ensure the safety using finite element analysis. This research supports the machine designer to create a better product with lower cost and faster development time. To do this, firstly, using Inventor, a CAD model is prepared. Secondly, the analysis is to be carried out using ANSYS 15. The modal analysis and random vibration analysis of the structure was conducted. The analysis shows that the proposed design was successfully shows the minimum deformation when the vibration was applied in normal condition.

  14. Analysis of physiological signals for recognition of boredom, pain, and surprise emotions.

    PubMed

    Jang, Eun-Hye; Park, Byoung-Jun; Park, Mi-Sook; Kim, Sang-Hyeob; Sohn, Jin-Hun

    2015-06-18

    The aim of the study was to examine the differences of boredom, pain, and surprise. In addition to that, it was conducted to propose approaches for emotion recognition based on physiological signals. Three emotions, boredom, pain, and surprise, are induced through the presentation of emotional stimuli and electrocardiography (ECG), electrodermal activity (EDA), skin temperature (SKT), and photoplethysmography (PPG) as physiological signals are measured to collect a dataset from 217 participants when experiencing the emotions. Twenty-seven physiological features are extracted from the signals to classify the three emotions. The discriminant function analysis (DFA) as a statistical method, and five machine learning algorithms (linear discriminant analysis (LDA), classification and regression trees (CART), self-organizing map (SOM), Naïve Bayes algorithm, and support vector machine (SVM)) are used for classifying the emotions. The result shows that the difference of physiological responses among emotions is significant in heart rate (HR), skin conductance level (SCL), skin conductance response (SCR), mean skin temperature (meanSKT), blood volume pulse (BVP), and pulse transit time (PTT), and the highest recognition accuracy of 84.7% is obtained by using DFA. This study demonstrates the differences of boredom, pain, and surprise and the best emotion recognizer for the classification of the three emotions by using physiological signals.

  15. SCENERY: a web application for (causal) network reconstruction from cytometry data

    PubMed Central

    Papoutsoglou, Georgios; Athineou, Giorgos; Lagani, Vincenzo; Xanthopoulos, Iordanis; Schmidt, Angelika; Éliás, Szabolcs; Tegnér, Jesper

    2017-01-01

    Abstract Flow and mass cytometry technologies can probe proteins as biological markers in thousands of individual cells simultaneously, providing unprecedented opportunities for reconstructing networks of protein interactions through machine learning algorithms. The network reconstruction (NR) problem has been well-studied by the machine learning community. However, the potentials of available methods remain largely unknown to the cytometry community, mainly due to their intrinsic complexity and the lack of comprehensive, powerful and easy-to-use NR software implementations specific for cytometry data. To bridge this gap, we present Single CEll NEtwork Reconstruction sYstem (SCENERY), a web server featuring several standard and advanced cytometry data analysis methods coupled with NR algorithms in a user-friendly, on-line environment. In SCENERY, users may upload their data and set their own study design. The server offers several data analysis options categorized into three classes of methods: data (pre)processing, statistical analysis and NR. The server also provides interactive visualization and download of results as ready-to-publish images or multimedia reports. Its core is modular and based on the widely-used and robust R platform allowing power users to extend its functionalities by submitting their own NR methods. SCENERY is available at scenery.csd.uoc.gr or http://mensxmachina.org/en/software/. PMID:28525568

  16. Bayesian analysis of energy and count rate data for detection of low count rate radioactive sources

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Klumpp, John

    We propose a radiation detection system which generates its own discrete sampling distribution based on past measurements of background. The advantage to this approach is that it can take into account variations in background with respect to time, location, energy spectra, detector-specific characteristics (i.e. different efficiencies at different count rates and energies), etc. This would therefore be a 'machine learning' approach, in which the algorithm updates and improves its characterization of background over time. The system would have a 'learning mode,' in which it measures and analyzes background count rates, and a 'detection mode,' in which it compares measurements frommore » an unknown source against its unique background distribution. By characterizing and accounting for variations in the background, general purpose radiation detectors can be improved with little or no increase in cost. The statistical and computational techniques to perform this kind of analysis have already been developed. The necessary signal analysis can be accomplished using existing Bayesian algorithms which account for multiple channels, multiple detectors, and multiple time intervals. Furthermore, Bayesian machine-learning techniques have already been developed which, with trivial modifications, can generate appropriate decision thresholds based on the comparison of new measurements against a nonparametric sampling distribution. (authors)« less

  17. High Throughput Determination of Mercury in Tobacco and Mainstream Smoke from Little Cigars

    PubMed Central

    Fresquez, Mark R.; Gonzalez-Jimenez, Nathalie; Gray, Naudia; Watson, Clifford H.; Pappas, R. Steven

    2015-01-01

    A method was developed that utilizes a platinum trap for mercury from mainstream tobacco smoke which represents an improvement over traditional approaches that require impingers and long sample preparation procedures. In this approach, the trapped mercury is directly released for analysis by heating the trap in a direct mercury analyzer. The method was applied to the analysis of mercury in the mainstream smoke of little cigars. The mercury levels in little cigar smoke obtained under Health Canada Intense smoking machine conditions ranged from 7.1 × 10−3 mg/m3 to 1.2 × 10−2 mg/m3. These air mercury levels exceed the chronic inhalation Minimal Risk Level corrected for intermittent exposure to metallic mercury (e.g., 1 or 2 hours per day, 5 days per week) determined by the Agency for Toxic Substances and Disease Registry. Multivariate statistical analysis was used to assess associations between mercury levels and little cigar physical design properties. Filter ventilation was identified as the principal physical parameter influencing mercury concentrations in mainstream little cigar smoke generated under ISO machine smoking conditions. With filter ventilation blocked under Health Canada Intense smoking conditions, mercury concentrations in tobacco and puff number (smoke volume) were the primary physical parameters that influenced mainstream smoke mercury concentrations. PMID:26051388

  18. Successful classification of cocaine dependence using brain imaging: a generalizable machine learning approach.

    PubMed

    Mete, Mutlu; Sakoglu, Unal; Spence, Jeffrey S; Devous, Michael D; Harris, Thomas S; Adinoff, Bryon

    2016-10-06

    Neuroimaging studies have yielded significant advances in the understanding of neural processes relevant to the development and persistence of addiction. However, these advances have not explored extensively for diagnostic accuracy in human subjects. The aim of this study was to develop a statistical approach, using a machine learning framework, to correctly classify brain images of cocaine-dependent participants and healthy controls. In this study, a framework suitable for educing potential brain regions that differed between the two groups was developed and implemented. Single Photon Emission Computerized Tomography (SPECT) images obtained during rest or a saline infusion in three cohorts of 2-4 week abstinent cocaine-dependent participants (n = 93) and healthy controls (n = 69) were used to develop a classification model. An information theoretic-based feature selection algorithm was first conducted to reduce the number of voxels. A density-based clustering algorithm was then used to form spatially connected voxel clouds in three-dimensional space. A statistical classifier, Support Vectors Machine (SVM), was then used for participant classification. Statistically insignificant voxels of spatially connected brain regions were removed iteratively and classification accuracy was reported through the iterations. The voxel-based analysis identified 1,500 spatially connected voxels in 30 distinct clusters after a grid search in SVM parameters. Participants were successfully classified with 0.88 and 0.89 F-measure accuracies in 10-fold cross validation (10xCV) and leave-one-out (LOO) approaches, respectively. Sensitivity and specificity were 0.90 and 0.89 for LOO; 0.83 and 0.83 for 10xCV. Many of the 30 selected clusters are highly relevant to the addictive process, including regions relevant to cognitive control, default mode network related self-referential thought, behavioral inhibition, and contextual memories. Relative hyperactivity and hypoactivity of regional cerebral blood flow in brain regions in cocaine-dependent participants are presented with corresponding level of significance. The SVM-based approach successfully classified cocaine-dependent and healthy control participants using voxels selected with information theoretic-based and statistical methods from participants' SPECT data. The regions found in this study align with brain regions reported in the literature. These findings support the future use of brain imaging and SVM-based classifier in the diagnosis of substance use disorders and furthering an understanding of their underlying pathology.

  19. Machine learning approach for automated screening of malaria parasite using light microscopic images.

    PubMed

    Das, Dev Kumar; Ghosh, Madhumala; Pal, Mallika; Maiti, Asok K; Chakraborty, Chandan

    2013-02-01

    The aim of this paper is to address the development of computer assisted malaria parasite characterization and classification using machine learning approach based on light microscopic images of peripheral blood smears. In doing this, microscopic image acquisition from stained slides, illumination correction and noise reduction, erythrocyte segmentation, feature extraction, feature selection and finally classification of different stages of malaria (Plasmodium vivax and Plasmodium falciparum) have been investigated. The erythrocytes are segmented using marker controlled watershed transformation and subsequently total ninety six features describing shape-size and texture of erythrocytes are extracted in respect to the parasitemia infected versus non-infected cells. Ninety four features are found to be statistically significant in discriminating six classes. Here a feature selection-cum-classification scheme has been devised by combining F-statistic, statistical learning techniques i.e., Bayesian learning and support vector machine (SVM) in order to provide the higher classification accuracy using best set of discriminating features. Results show that Bayesian approach provides the highest accuracy i.e., 84% for malaria classification by selecting 19 most significant features while SVM provides highest accuracy i.e., 83.5% with 9 most significant features. Finally, the performance of these two classifiers under feature selection framework has been compared toward malaria parasite classification. Copyright © 2012 Elsevier Ltd. All rights reserved.

  20. Multivariate Statistical Analysis of Orthogonal Mass Spectral Data for the Identification of Chemical Attribution Signatures of 3-Methylfentanyl

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mayer, B. P.; Valdez, C. A.; DeHope, A. J.

    Critical to many modern forensic investigations is the chemical attribution of the origin of an illegal drug. This process greatly relies on identification of compounds indicative of its clandestine or commercial production. The results of these studies can yield detailed information on method of manufacture, sophistication of the synthesis operation, starting material source, and final product. In the present work, chemical attribution signatures (CAS) associated with the synthesis of the analgesic 3- methylfentanyl, N-(3-methyl-1-phenethylpiperidin-4-yl)-N-phenylpropanamide, were investigated. Six synthesis methods were studied in an effort to identify and classify route-specific signatures. These methods were chosen to minimize the use of scheduledmore » precursors, complicated laboratory equipment, number of overall steps, and demanding reaction conditions. Using gas and liquid chromatographies combined with mass spectrometric methods (GC-QTOF and LC-QTOF) in conjunction with inductivelycoupled plasma mass spectrometry (ICP-MS), over 240 distinct compounds and elements were monitored. As seen in our previous work with CAS of fentanyl synthesis the complexity of the resultant data matrix necessitated the use of multivariate statistical analysis. Using partial least squares discriminant analysis (PLS-DA), 62 statistically significant, route-specific CAS were identified. Statistical classification models using a variety of machine learning techniques were then developed with the ability to predict the method of 3-methylfentanyl synthesis from three blind crude samples generated by synthetic chemists without prior experience with these methods.« less

  1. Using statistical text classification to identify health information technology incidents

    PubMed Central

    Chai, Kevin E K; Anthony, Stephen; Coiera, Enrico; Magrabi, Farah

    2013-01-01

    Objective To examine the feasibility of using statistical text classification to automatically identify health information technology (HIT) incidents in the USA Food and Drug Administration (FDA) Manufacturer and User Facility Device Experience (MAUDE) database. Design We used a subset of 570 272 incidents including 1534 HIT incidents reported to MAUDE between 1 January 2008 and 1 July 2010. Text classifiers using regularized logistic regression were evaluated with both ‘balanced’ (50% HIT) and ‘stratified’ (0.297% HIT) datasets for training, validation, and testing. Dataset preparation, feature extraction, feature selection, cross-validation, classification, performance evaluation, and error analysis were performed iteratively to further improve the classifiers. Feature-selection techniques such as removing short words and stop words, stemming, lemmatization, and principal component analysis were examined. Measurements κ statistic, F1 score, precision and recall. Results Classification performance was similar on both the stratified (0.954 F1 score) and balanced (0.995 F1 score) datasets. Stemming was the most effective technique, reducing the feature set size to 79% while maintaining comparable performance. Training with balanced datasets improved recall (0.989) but reduced precision (0.165). Conclusions Statistical text classification appears to be a feasible method for identifying HIT reports within large databases of incidents. Automated identification should enable more HIT problems to be detected, analyzed, and addressed in a timely manner. Semi-supervised learning may be necessary when applying machine learning to big data analysis of patient safety incidents and requires further investigation. PMID:23666777

  2. Machine learning modelling for predicting soil liquefaction susceptibility

    NASA Astrophysics Data System (ADS)

    Samui, P.; Sitharam, T. G.

    2011-01-01

    This study describes two machine learning techniques applied to predict liquefaction susceptibility of soil based on the standard penetration test (SPT) data from the 1999 Chi-Chi, Taiwan earthquake. The first machine learning technique which uses Artificial Neural Network (ANN) based on multi-layer perceptions (MLP) that are trained with Levenberg-Marquardt backpropagation algorithm. The second machine learning technique uses the Support Vector machine (SVM) that is firmly based on the theory of statistical learning theory, uses classification technique. ANN and SVM have been developed to predict liquefaction susceptibility using corrected SPT [(N1)60] and cyclic stress ratio (CSR). Further, an attempt has been made to simplify the models, requiring only the two parameters [(N1)60 and peck ground acceleration (amax/g)], for the prediction of liquefaction susceptibility. The developed ANN and SVM models have also been applied to different case histories available globally. The paper also highlights the capability of the SVM over the ANN models.

  3. Effects of promotional materials on vending sales of low-fat items in teachers' lounges.

    PubMed

    Fiske, Amy; Cullen, Karen Weber

    2004-01-01

    This study examined the impact of an environmental intervention in the form of promotional materials and increased availability of low-fat items on vending machine sales. Ten vending machines were selected and randomly assigned to one of three conditions: control, or one of two experimental conditions. Vending machines in the two intervention conditions received three additional low-fat selections. Low-fat items were promoted at two levels: labels (intervention I), and labels plus signs (intervention II). The number of individual items sold and the total revenue generated was recorded weekly for each machine for 4 weeks. Use of promotional materials resulted in a small, but not significant, increase in the number of low-fat items sold, although machine sales were not significantly impacted by the change in product selection. Results of this study, although not statistically significant, suggest that environmental change may be a realistic means of positively influencing consumer behavior.

  4. A Critical Review for Developing Accurate and Dynamic Predictive Models Using Machine Learning Methods in Medicine and Health Care.

    PubMed

    Alanazi, Hamdan O; Abdullah, Abdul Hanan; Qureshi, Kashif Naseer

    2017-04-01

    Recently, Artificial Intelligence (AI) has been used widely in medicine and health care sector. In machine learning, the classification or prediction is a major field of AI. Today, the study of existing predictive models based on machine learning methods is extremely active. Doctors need accurate predictions for the outcomes of their patients' diseases. In addition, for accurate predictions, timing is another significant factor that influences treatment decisions. In this paper, existing predictive models in medicine and health care have critically reviewed. Furthermore, the most famous machine learning methods have explained, and the confusion between a statistical approach and machine learning has clarified. A review of related literature reveals that the predictions of existing predictive models differ even when the same dataset is used. Therefore, existing predictive models are essential, and current methods must be improved.

  5. Landsat real-time processing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Davis, E.L.

    A novel method for performing real-time acquisition and processing Landsat/EROS data covers all aspects including radiometric and geometric corrections of multispectral scanner or return-beam vidicon inputs, image enhancement, statistical analysis, feature extraction, and classification. Radiometric transformations include bias/gain adjustment, noise suppression, calibration, scan angle compensation, and illumination compensation, including topography and atmospheric effects. Correction or compensation for geometric distortion includes sensor-related distortions, such as centering, skew, size, scan nonlinearity, radial symmetry, and tangential symmetry. Also included are object image-related distortions such as aspect angle (altitude), scale distortion (altitude), terrain relief, and earth curvature. Ephemeral corrections are also applied to compensatemore » for satellite forward movement, earth rotation, altitude variations, satellite vibration, and mirror scan velocity. Image enhancement includes high-pass, low-pass, and Laplacian mask filtering and data restoration for intermittent losses. Resource classification is provided by statistical analysis including histograms, correlational analysis, matrix manipulations, and determination of spectral responses. Feature extraction includes spatial frequency analysis, which is used in parallel discriminant functions in each array processor for rapid determination. The technique uses integrated parallel array processors that decimate the tasks concurrently under supervision of a control processor. The operator-machine interface is optimized for programming ease and graphics image windowing.« less

  6. Automated estimation of image quality for coronary computed tomographic angiography using machine learning.

    PubMed

    Nakanishi, Rine; Sankaran, Sethuraman; Grady, Leo; Malpeso, Jenifer; Yousfi, Razik; Osawa, Kazuhiro; Ceponiene, Indre; Nazarat, Negin; Rahmani, Sina; Kissel, Kendall; Jayawardena, Eranthi; Dailing, Christopher; Zarins, Christopher; Koo, Bon-Kwon; Min, James K; Taylor, Charles A; Budoff, Matthew J

    2018-03-23

    Our goal was to evaluate the efficacy of a fully automated method for assessing the image quality (IQ) of coronary computed tomography angiography (CCTA). The machine learning method was trained using 75 CCTA studies by mapping features (noise, contrast, misregistration scores, and un-interpretability index) to an IQ score based on manual ground truth data. The automated method was validated on a set of 50 CCTA studies and subsequently tested on a new set of 172 CCTA studies against visual IQ scores on a 5-point Likert scale. The area under the curve in the validation set was 0.96. In the 172 CCTA studies, our method yielded a Cohen's kappa statistic for the agreement between automated and visual IQ assessment of 0.67 (p < 0.01). In the group where good to excellent (n = 163), fair (n = 6), and poor visual IQ scores (n = 3) were graded, 155, 5, and 2 of the patients received an automated IQ score > 50 %, respectively. Fully automated assessment of the IQ of CCTA data sets by machine learning was reproducible and provided similar results compared with visual analysis within the limits of inter-operator variability. • The proposed method enables automated and reproducible image quality assessment. • Machine learning and visual assessments yielded comparable estimates of image quality. • Automated assessment potentially allows for more standardised image quality. • Image quality assessment enables standardization of clinical trial results across different datasets.

  7. Effects of the sliding rehabilitation machine on balance and gait in chronic stroke patients - a controlled clinical trial.

    PubMed

    Byun, Seung-Deuk; Jung, Tae-Du; Kim, Chul-Hyun; Lee, Yang-Soo

    2011-05-01

    To investigate the effects of a sliding rehabilitation machine on balance and gait in chronic stroke patients. A non-randomized crossover design. Inpatient rehabilitation in a general hospital. Thirty patients with chronic stroke who had medium or high falling risk as determined by the Berg Balance Scale. Participants were divided into two groups and underwent four weeks of training. Group A (n = 15) underwent training with the sliding rehabilitation machine for two weeks with concurrent conventional training, followed by conventional training only for another two weeks. Group B (n = 15) underwent the same training in reverse order. The effect of the experimental period was defined as the sum of changes during training with sliding rehabilitation machine in each group, and the effect of the control period was defined as those during the conventional training only in each group. Functional Ambulation Category, Berg Balance Scale, Six-Minute Walk Test, Timed Up and Go Test, Korean Modified Barthel Index, Modified Ashworth Scale and Manual Muscle Test. Statistically significant improvements were observed in all parameters except Modified Ashworth Scale in the experimental period, but only in Six-Minute Walk Test (P < 0.01) in the control period. There were also statistically significant differences in the degree of change in all parameters in the experimental period as compared to the control period. The sliding rehabilitation machine may be a useful tool for the improvement of balance and gait abilities in chronic stroke patients.

  8. Incipient fault detection study for advanced spacecraft systems

    NASA Technical Reports Server (NTRS)

    Milner, G. Martin; Black, Michael C.; Hovenga, J. Mike; Mcclure, Paul F.

    1986-01-01

    A feasibility study to investigate the application of vibration monitoring to the rotating machinery of planned NASA advanced spacecraft components is described. Factors investigated include: (1) special problems associated with small, high RPM machines; (2) application across multiple component types; (3) microgravity; (4) multiple fault types; (5) eight different analysis techniques including signature analysis, high frequency demodulation, cepstrum, clustering, amplitude analysis, and pattern recognition are compared; and (6) small sample statistical analysis is used to compare performance by computation of probability of detection and false alarm for an ensemble of repeated baseline and faulted tests. Both detection and classification performance are quantified. Vibration monitoring is shown to be an effective means of detecting the most important problem types for small, high RPM fans and pumps typical of those planned for the advanced spacecraft. A preliminary monitoring system design and implementation plan is presented.

  9. Spectral analysis of variable-length coded digital signals

    NASA Astrophysics Data System (ADS)

    Cariolaro, G. L.; Pierobon, G. L.; Pupolin, S. G.

    1982-05-01

    A spectral analysis is conducted for a variable-length word sequence by an encoder driven by a stationary memoryless source. A finite-state sequential machine is considered as a model of the line encoder, and the spectral analysis of the encoded message is performed under the assumption that the sourceword sequence is composed of independent identically distributed words. Closed form expressions for both the continuous and discrete parts of the spectral density are derived in terms of the encoder law and sourceword statistics. The jump part exhibits jumps at multiple integers of per lambda(sub 0)T, where lambda(sub 0) is the greatest common divisor of the possible codeword lengths, and T is the symbol period. The derivation of the continuous part can be conveniently factorized, and the theory is applied to the spectral analysis of BnZS and HDBn codes.

  10. The assisted prediction modelling frame with hybridisation and ensemble for business risk forecasting and an implementation

    NASA Astrophysics Data System (ADS)

    Li, Hui; Hong, Lu-Yao; Zhou, Qing; Yu, Hai-Jie

    2015-08-01

    The business failure of numerous companies results in financial crises. The high social costs associated with such crises have made people to search for effective tools for business risk prediction, among which, support vector machine is very effective. Several modelling means, including single-technique modelling, hybrid modelling, and ensemble modelling, have been suggested in forecasting business risk with support vector machine. However, existing literature seldom focuses on the general modelling frame for business risk prediction, and seldom investigates performance differences among different modelling means. We reviewed researches on forecasting business risk with support vector machine, proposed the general assisted prediction modelling frame with hybridisation and ensemble (APMF-WHAE), and finally, investigated the use of principal components analysis, support vector machine, random sampling, and group decision, under the general frame in forecasting business risk. Under the APMF-WHAE frame with support vector machine as the base predictive model, four specific predictive models were produced, namely, pure support vector machine, a hybrid support vector machine involved with principal components analysis, a support vector machine ensemble involved with random sampling and group decision, and an ensemble of hybrid support vector machine using group decision to integrate various hybrid support vector machines on variables produced from principle components analysis and samples from random sampling. The experimental results indicate that hybrid support vector machine and ensemble of hybrid support vector machines were able to produce dominating performance than pure support vector machine and support vector machine ensemble.

  11. Lightweight and Statistical Techniques for Petascale PetaScale Debugging

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Miller, Barton

    2014-06-30

    This project investigated novel techniques for debugging scientific applications on petascale architectures. In particular, we developed lightweight tools that narrow the problem space when bugs are encountered. We also developed techniques that either limit the number of tasks and the code regions to which a developer must apply a traditional debugger or that apply statistical techniques to provide direct suggestions of the location and type of error. We extend previous work on the Stack Trace Analysis Tool (STAT), that has already demonstrated scalability to over one hundred thousand MPI tasks. We also extended statistical techniques developed to isolate programming errorsmore » in widely used sequential or threaded applications in the Cooperative Bug Isolation (CBI) project to large scale parallel applications. Overall, our research substantially improved productivity on petascale platforms through a tool set for debugging that complements existing commercial tools. Previously, Office Of Science application developers relied either on primitive manual debugging techniques based on printf or they use tools, such as TotalView, that do not scale beyond a few thousand processors. However, bugs often arise at scale and substantial effort and computation cycles are wasted in either reproducing the problem in a smaller run that can be analyzed with the traditional tools or in repeated runs at scale that use the primitive techniques. New techniques that work at scale and automate the process of identifying the root cause of errors were needed. These techniques significantly reduced the time spent debugging petascale applications, thus leading to a greater overall amount of time for application scientists to pursue the scientific objectives for which the systems are purchased. We developed a new paradigm for debugging at scale: techniques that reduced the debugging scenario to a scale suitable for traditional debuggers, e.g., by narrowing the search for the root-cause analysis to a small set of nodes or by identifying equivalence classes of nodes and sampling our debug targets from them. We implemented these techniques as lightweight tools that efficiently work on the full scale of the target machine. We explored four lightweight debugging refinements: generic classification parameters, such as stack traces, application-specific classification parameters, such as global variables, statistical data acquisition techniques and machine learning based approaches to perform root cause analysis. Work done under this project can be divided into two categories, new algorithms and techniques for scalable debugging, and foundation infrastructure work on our MRNet multicast-reduction framework for scalability, and Dyninst binary analysis and instrumentation toolkits.« less

  12. Optimization of Coolant Technique Conditions for Machining A319 Aluminium Alloy Using Response Surface Method (RSM)

    NASA Astrophysics Data System (ADS)

    Zainal Ariffin, S.; Razlan, A.; Ali, M. Mohd; Efendee, A. M.; Rahman, M. M.

    2018-03-01

    Background/Objectives: The paper discusses about the optimum cutting parameters with coolant techniques condition (1.0 mm nozzle orifice, wet and dry) to optimize surface roughness, temperature and tool wear in the machining process based on the selected setting parameters. The selected cutting parameters for this study were the cutting speed, feed rate, depth of cut and coolant techniques condition. Methods/Statistical Analysis Experiments were conducted and investigated based on Design of Experiment (DOE) with Response Surface Method. The research of the aggressive machining process on aluminum alloy (A319) for automotive applications is an effort to understand the machining concept, which widely used in a variety of manufacturing industries especially in the automotive industry. Findings: The results show that the dominant failure mode is the surface roughness, temperature and tool wear when using 1.0 mm nozzle orifice, increases during machining and also can be alternative minimize built up edge of the A319. The exploration for surface roughness, productivity and the optimization of cutting speed in the technical and commercial aspects of the manufacturing processes of A319 are discussed in automotive components industries for further work Applications/Improvements: The research result also beneficial in minimizing the costs incurred and improving productivity of manufacturing firms. According to the mathematical model and equations, generated by CCD based RSM, experiments were performed and cutting coolant condition technique using size nozzle can reduces tool wear, surface roughness and temperature was obtained. Results have been analyzed and optimization has been carried out for selecting cutting parameters, shows that the effectiveness and efficiency of the system can be identified and helps to solve potential problems.

  13. An automated ranking platform for machine learning regression models for meat spoilage prediction using multi-spectral imaging and metabolic profiling.

    PubMed

    Estelles-Lopez, Lucia; Ropodi, Athina; Pavlidis, Dimitris; Fotopoulou, Jenny; Gkousari, Christina; Peyrodie, Audrey; Panagou, Efstathios; Nychas, George-John; Mohareb, Fady

    2017-09-01

    Over the past decade, analytical approaches based on vibrational spectroscopy, hyperspectral/multispectral imagining and biomimetic sensors started gaining popularity as rapid and efficient methods for assessing food quality, safety and authentication; as a sensible alternative to the expensive and time-consuming conventional microbiological techniques. Due to the multi-dimensional nature of the data generated from such analyses, the output needs to be coupled with a suitable statistical approach or machine-learning algorithms before the results can be interpreted. Choosing the optimum pattern recognition or machine learning approach for a given analytical platform is often challenging and involves a comparative analysis between various algorithms in order to achieve the best possible prediction accuracy. In this work, "MeatReg", a web-based application is presented, able to automate the procedure of identifying the best machine learning method for comparing data from several analytical techniques, to predict the counts of microorganisms responsible of meat spoilage regardless of the packaging system applied. In particularly up to 7 regression methods were applied and these are ordinary least squares regression, stepwise linear regression, partial least square regression, principal component regression, support vector regression, random forest and k-nearest neighbours. MeatReg" was tested with minced beef samples stored under aerobic and modified atmosphere packaging and analysed with electronic nose, HPLC, FT-IR, GC-MS and Multispectral imaging instrument. Population of total viable count, lactic acid bacteria, pseudomonads, Enterobacteriaceae and B. thermosphacta, were predicted. As a result, recommendations of which analytical platforms are suitable to predict each type of bacteria and which machine learning methods to use in each case were obtained. The developed system is accessible via the link: www.sorfml.com. Copyright © 2017 Elsevier Ltd. All rights reserved.

  14. Machine Learning Approaches for Predicting Radiation Therapy Outcomes: A Clinician's Perspective

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kang, John; Schwartz, Russell; Flickinger, John

    Radiation oncology has always been deeply rooted in modeling, from the early days of isoeffect curves to the contemporary Quantitative Analysis of Normal Tissue Effects in the Clinic (QUANTEC) initiative. In recent years, medical modeling for both prognostic and therapeutic purposes has exploded thanks to increasing availability of electronic data and genomics. One promising direction that medical modeling is moving toward is adopting the same machine learning methods used by companies such as Google and Facebook to combat disease. Broadly defined, machine learning is a branch of computer science that deals with making predictions from complex data through statistical models.more » These methods serve to uncover patterns in data and are actively used in areas such as speech recognition, handwriting recognition, face recognition, “spam” filtering (junk email), and targeted advertising. Although multiple radiation oncology research groups have shown the value of applied machine learning (ML), clinical adoption has been slow due to the high barrier to understanding these complex models by clinicians. Here, we present a review of the use of ML to predict radiation therapy outcomes from the clinician's point of view with the hope that it lowers the “barrier to entry” for those without formal training in ML. We begin by describing 7 principles that one should consider when evaluating (or creating) an ML model in radiation oncology. We next introduce 3 popular ML methods—logistic regression (LR), support vector machine (SVM), and artificial neural network (ANN)—and critique 3 seminal papers in the context of these principles. Although current studies are in exploratory stages, the overall methodology has progressively matured, and the field is ready for larger-scale further investigation.« less

  15. Building gene expression profile classifiers with a simple and efficient rejection option in R.

    PubMed

    Benso, Alfredo; Di Carlo, Stefano; Politano, Gianfranco; Savino, Alessandro; Hafeezurrehman, Hafeez

    2011-01-01

    The collection of gene expression profiles from DNA microarrays and their analysis with pattern recognition algorithms is a powerful technology applied to several biological problems. Common pattern recognition systems classify samples assigning them to a set of known classes. However, in a clinical diagnostics setup, novel and unknown classes (new pathologies) may appear and one must be able to reject those samples that do not fit the trained model. The problem of implementing a rejection option in a multi-class classifier has not been widely addressed in the statistical literature. Gene expression profiles represent a critical case study since they suffer from the curse of dimensionality problem that negatively reflects on the reliability of both traditional rejection models and also more recent approaches such as one-class classifiers. This paper presents a set of empirical decision rules that can be used to implement a rejection option in a set of multi-class classifiers widely used for the analysis of gene expression profiles. In particular, we focus on the classifiers implemented in the R Language and Environment for Statistical Computing (R for short in the remaining of this paper). The main contribution of the proposed rules is their simplicity, which enables an easy integration with available data analysis environments. Since in the definition of a rejection model tuning of the involved parameters is often a complex and delicate task, in this paper we exploit an evolutionary strategy to automate this process. This allows the final user to maximize the rejection accuracy with minimum manual intervention. This paper shows how the use of simple decision rules can be used to help the use of complex machine learning algorithms in real experimental setups. The proposed approach is almost completely automated and therefore a good candidate for being integrated in data analysis flows in labs where the machine learning expertise required to tune traditional classifiers might not be available.

  16. Multivariate Statistical Analysis Software Technologies for Astrophysical Research Involving Large Data Bases

    NASA Technical Reports Server (NTRS)

    Djorgovski, S. G.

    1994-01-01

    We developed a package to process and analyze the data from the digital version of the Second Palomar Sky Survey. This system, called SKICAT, incorporates the latest in machine learning and expert systems software technology, in order to classify the detected objects objectively and uniformly, and facilitate handling of the enormous data sets from digital sky surveys and other sources. The system provides a powerful, integrated environment for the manipulation and scientific investigation of catalogs from virtually any source. It serves three principal functions: image catalog construction, catalog management, and catalog analysis. Through use of the GID3* Decision Tree artificial induction software, SKICAT automates the process of classifying objects within CCD and digitized plate images. To exploit these catalogs, the system also provides tools to merge them into a large, complex database which may be easily queried and modified when new data or better methods of calibrating or classifying become available. The most innovative feature of SKICAT is the facility it provides to experiment with and apply the latest in machine learning technology to the tasks of catalog construction and analysis. SKICAT provides a unique environment for implementing these tools for any number of future scientific purposes. Initial scientific verification and performance tests have been made using galaxy counts and measurements of galaxy clustering from small subsets of the survey data, and a search for very high redshift quasars. All of the tests were successful and produced new and interesting scientific results. Attachments to this report give detailed accounts of the technical aspects of the SKICAT system, and of some of the scientific results achieved to date. We also developed a user-friendly package for multivariate statistical analysis of small and moderate-size data sets, called STATPROG. The package was tested extensively on a number of real scientific applications and has produced real, published results.

  17. Performance Measurement, Visualization and Modeling of Parallel and Distributed Programs

    NASA Technical Reports Server (NTRS)

    Yan, Jerry C.; Sarukkai, Sekhar R.; Mehra, Pankaj; Lum, Henry, Jr. (Technical Monitor)

    1994-01-01

    This paper presents a methodology for debugging the performance of message-passing programs on both tightly coupled and loosely coupled distributed-memory machines. The AIMS (Automated Instrumentation and Monitoring System) toolkit, a suite of software tools for measurement and analysis of performance, is introduced and its application illustrated using several benchmark programs drawn from the field of computational fluid dynamics. AIMS includes (i) Xinstrument, a powerful source-code instrumentor, which supports both Fortran77 and C as well as a number of different message-passing libraries including Intel's NX Thinking Machines' CMMD, and PVM; (ii) Monitor, a library of timestamping and trace -collection routines that run on supercomputers (such as Intel's iPSC/860, Delta, and Paragon and Thinking Machines' CM5) as well as on networks of workstations (including Convex Cluster and SparcStations connected by a LAN); (iii) Visualization Kernel, a trace-animation facility that supports source-code clickback, simultaneous visualization of computation and communication patterns, as well as analysis of data movements; (iv) Statistics Kernel, an advanced profiling facility, that associates a variety of performance data with various syntactic components of a parallel program; (v) Index Kernel, a diagnostic tool that helps pinpoint performance bottlenecks through the use of abstract indices; (vi) Modeling Kernel, a facility for automated modeling of message-passing programs that supports both simulation -based and analytical approaches to performance prediction and scalability analysis; (vii) Intrusion Compensator, a utility for recovering true performance from observed performance by removing the overheads of monitoring and their effects on the communication pattern of the program; and (viii) Compatibility Tools, that convert AIMS-generated traces into formats used by other performance-visualization tools, such as ParaGraph, Pablo, and certain AVS/Explorer modules.

  18. A machine learning approach for the identification of key markers involved in brain development from single-cell transcriptomic data.

    PubMed

    Hu, Yongli; Hase, Takeshi; Li, Hui Peng; Prabhakar, Shyam; Kitano, Hiroaki; Ng, See Kiong; Ghosh, Samik; Wee, Lawrence Jin Kiat

    2016-12-22

    The ability to sequence the transcriptomes of single cells using single-cell RNA-seq sequencing technologies presents a shift in the scientific paradigm where scientists, now, are able to concurrently investigate the complex biology of a heterogeneous population of cells, one at a time. However, till date, there has not been a suitable computational methodology for the analysis of such intricate deluge of data, in particular techniques which will aid the identification of the unique transcriptomic profiles difference between the different cellular subtypes. In this paper, we describe the novel methodology for the analysis of single-cell RNA-seq data, obtained from neocortical cells and neural progenitor cells, using machine learning algorithms (Support Vector machine (SVM) and Random Forest (RF)). Thirty-eight key transcripts were identified, using the SVM-based recursive feature elimination (SVM-RFE) method of feature selection, to best differentiate developing neocortical cells from neural progenitor cells in the SVM and RF classifiers built. Also, these genes possessed a higher discriminative power (enhanced prediction accuracy) as compared commonly used statistical techniques or geneset-based approaches. Further downstream network reconstruction analysis was carried out to unravel hidden general regulatory networks where novel interactions could be further validated in web-lab experimentation and be useful candidates to be targeted for the treatment of neuronal developmental diseases. This novel approach reported for is able to identify transcripts, with reported neuronal involvement, which optimally differentiate neocortical cells and neural progenitor cells. It is believed to be extensible and applicable to other single-cell RNA-seq expression profiles like that of the study of the cancer progression and treatment within a highly heterogeneous tumour.

  19. Machinery Bearing Fault Diagnosis Using Variational Mode Decomposition and Support Vector Machine as a Classifier

    NASA Astrophysics Data System (ADS)

    Rama Krishna, K.; Ramachandran, K. I.

    2018-02-01

    Crack propagation is a major cause of failure in rotating machines. It adversely affects the productivity, safety, and the machining quality. Hence, detecting the crack’s severity accurately is imperative for the predictive maintenance of such machines. Fault diagnosis is an established concept in identifying the faults, for observing the non-linear behaviour of the vibration signals at various operating conditions. In this work, we find the classification efficiencies for both original and the reconstructed vibrational signals. The reconstructed signals are obtained using Variational Mode Decomposition (VMD), by splitting the original signal into three intrinsic mode functional components and framing them accordingly. Feature extraction, feature selection and feature classification are the three phases in obtaining the classification efficiencies. All the statistical features from the original signals and reconstructed signals are found out in feature extraction process individually. A few statistical parameters are selected in feature selection process and are classified using the SVM classifier. The obtained results show the best parameters and appropriate kernel in SVM classifier for detecting the faults in bearings. Hence, we conclude that better results were obtained by VMD and SVM process over normal process using SVM. This is owing to denoising and filtering the raw vibrational signals.

  20. Occupational Accidents with Agricultural Machinery in Austria.

    PubMed

    Kogler, Robert; Quendler, Elisabeth; Boxberger, Josef

    2016-01-01

    The number of recognized accidents with fatalities during agricultural and forestry work, despite better technology and coordinated prevention and trainings, is still very high in Austria. The accident scenarios in which people are injured are very different on farms. The common causes of accidents in agriculture and forestry are the loss of control of machine, means of transport or handling equipment, hand-held tool, and object or animal, followed by slipping, stumbling and falling, breakage, bursting, splitting, slipping, fall, and collapse of material agent. In the literature, a number of studies of general (machine- and animal-related accidents) and specific (machine-related accidents) agricultural and forestry accident situations can be found that refer to different databases. From the database Data of the Austrian Workers Compensation Board (AUVA) about occupational accidents with different agricultural machinery over the period 2008-2010 in Austria, main characteristics of the accident, the victim, and the employer as well as variables on causes and circumstances by frequency and contexts of parameters were statistically analyzed by employing the chi-square test and odds ratio. The aim of the study was to determine the information content and quality of the European Statistics on Accidents at Work (ESAW) variables to evaluate safety gaps and risks as well as the accidental man-machine interaction.

  1. Machine Learning and Network Analysis of Molecular Dynamics Trajectories Reveal Two Chains of Red/Ox-specific Residue Interactions in Human Protein Disulfide Isomerase.

    PubMed

    Karamzadeh, Razieh; Karimi-Jafari, Mohammad Hossein; Sharifi-Zarchi, Ali; Chitsaz, Hamidreza; Salekdeh, Ghasem Hosseini; Moosavi-Movahedi, Ali Akbar

    2017-06-16

    The human protein disulfide isomerase (hPDI), is an essential four-domain multifunctional enzyme. As a result of disulfide shuffling in its terminal domains, hPDI exists in two oxidation states with different conformational preferences which are important for substrate binding and functional activities. Here, we address the redox-dependent conformational dynamics of hPDI through molecular dynamics (MD) simulations. Collective domain motions are identified by the principal component analysis of MD trajectories and redox-dependent opening-closing structure variations are highlighted on projected free energy landscapes. Then, important structural features that exhibit considerable differences in dynamics of redox states are extracted by statistical machine learning methods. Mapping the structural variations to time series of residue interaction networks also provides a holistic representation of the dynamical redox differences. With emphasizing on persistent long-lasting interactions, an approach is proposed that compiled these time series networks to a single dynamic residue interaction network (DRIN). Differential comparison of DRIN in oxidized and reduced states reveals chains of residue interactions that represent potential allosteric paths between catalytic and ligand binding sites of hPDI.

  2. Automated texture-based identification of ovarian cancer in confocal microendoscope images

    NASA Astrophysics Data System (ADS)

    Srivastava, Saurabh; Rodriguez, Jeffrey J.; Rouse, Andrew R.; Brewer, Molly A.; Gmitro, Arthur F.

    2005-03-01

    The fluorescence confocal microendoscope provides high-resolution, in-vivo imaging of cellular pathology during optical biopsy. There are indications that the examination of human ovaries with this instrument has diagnostic implications for the early detection of ovarian cancer. The purpose of this study was to develop a computer-aided system to facilitate the identification of ovarian cancer from digital images captured with the confocal microendoscope system. To achieve this goal, we modeled the cellular-level structure present in these images as texture and extracted features based on first-order statistics, spatial gray-level dependence matrices, and spatial-frequency content. Selection of the best features for classification was performed using traditional feature selection techniques including stepwise discriminant analysis, forward sequential search, a non-parametric method, principal component analysis, and a heuristic technique that combines the results of these methods. The best set of features selected was used for classification, and performance of various machine classifiers was compared by analyzing the areas under their receiver operating characteristic curves. The results show that it is possible to automatically identify patients with ovarian cancer based on texture features extracted from confocal microendoscope images and that the machine performance is superior to that of the human observer.

  3. An Analysis of the Billing and Bookkeeping Machine Operator Occupation.

    ERIC Educational Resources Information Center

    Six, Joseph E., Jr.

    The general purpose of the occupational analysis is to provide workable, basic information dealing with the many and varied duties performed in the billing and bookkeeping machine operating occupation. The analysis was written in general terms due to the diversity in bookkeeping machines on the market, increasing number and variation of the tasks…

  4. Histogram of gradient and binarized statistical image features of wavelet subband-based palmprint features extraction

    NASA Astrophysics Data System (ADS)

    Attallah, Bilal; Serir, Amina; Chahir, Youssef; Boudjelal, Abdelwahhab

    2017-11-01

    Palmprint recognition systems are dependent on feature extraction. A method of feature extraction using higher discrimination information was developed to characterize palmprint images. In this method, two individual feature extraction techniques are applied to a discrete wavelet transform of a palmprint image, and their outputs are fused. The two techniques used in the fusion are the histogram of gradient and the binarized statistical image features. They are then evaluated using an extreme learning machine classifier before selecting a feature based on principal component analysis. Three palmprint databases, the Hong Kong Polytechnic University (PolyU) Multispectral Palmprint Database, Hong Kong PolyU Palmprint Database II, and the Delhi Touchless (IIDT) Palmprint Database, are used in this study. The study shows that our method effectively identifies and verifies palmprints and outperforms other methods based on feature extraction.

  5. Ergonomics for enhancing detection of machine abnormalities.

    PubMed

    Illankoon, Prasanna; Abeysekera, John; Singh, Sarbjeet

    2016-10-17

    Detecting abnormal machine conditions is of great importance in an autonomous maintenance environment. Ergonomic aspects can be invaluable when detection of machine abnormalities using human senses is examined. This research outlines the ergonomic issues involved in detecting machine abnormalities and suggests how ergonomics would improve such detections. Cognitive Task Analysis was performed in a plant in Sri Lanka where Total Productive Maintenance is being implemented to identify sensory types that would be used to detect machine abnormalities and relevant Ergonomic characteristics. As the outcome of this research, a methodology comprising of an Ergonomic Gap Analysis Matrix for machine abnormality detection is presented.

  6. Research in image management and access

    NASA Technical Reports Server (NTRS)

    Vondran, Raymond F.; Barron, Billy J.

    1993-01-01

    Presently, the problem of over-all library system design has been compounded by the accretion of both function and structure to a basic framework of requirements. While more device power has led to increased functionality, opportunities for reducing system complexity at the user interface level have not always been pursued with equal zeal. The purpose of this book is therefore to set forth and examine these opportunities, within the general framework of human factors research in man-machine interfaces. Human factors may be viewed as a series of trade-off decisions among four polarized objectives: machine resources and user specifications; functionality and user requirements. In the past, a limiting factor was the availability of systems. However, in the last two years, over one hundred libraries supported by many different software configurations have been added to the Internet. This document includes a statistical analysis of human responses to five Internet library systems by key features, development of the ideal online catalog system, and ideal online catalog systems for libraries and information centers.

  7. Application of machine learning algorithms for clinical predictive modeling: a data-mining approach in SCT.

    PubMed

    Shouval, R; Bondi, O; Mishan, H; Shimoni, A; Unger, R; Nagler, A

    2014-03-01

    Data collected from hematopoietic SCT (HSCT) centers are becoming more abundant and complex owing to the formation of organized registries and incorporation of biological data. Typically, conventional statistical methods are used for the development of outcome prediction models and risk scores. However, these analyses carry inherent properties limiting their ability to cope with large data sets with multiple variables and samples. Machine learning (ML), a field stemming from artificial intelligence, is part of a wider approach for data analysis termed data mining (DM). It enables prediction in complex data scenarios, familiar to practitioners and researchers. Technological and commercial applications are all around us, gradually entering clinical research. In the following review, we would like to expose hematologists and stem cell transplanters to the concepts, clinical applications, strengths and limitations of such methods and discuss current research in HSCT. The aim of this review is to encourage utilization of the ML and DM techniques in the field of HSCT, including prediction of transplantation outcome and donor selection.

  8. Optical biopsy using fluorescence spectroscopy for prostate cancer diagnosis

    NASA Astrophysics Data System (ADS)

    Wu, Binlin; Gao, Xin; Smith, Jason; Bailin, Jacob

    2017-02-01

    Native fluorescence spectra are acquired from fresh normal and cancerous human prostate tissues. The fluorescence data are analyzed using a multivariate analysis algorithm such as non-negative matrix factorization. The nonnegative spectral components are retrieved and attributed to the native fluorophores such as collagen, reduced nicotinamide adenine dinucleotide (NADH), and flavin adenine dinucleotide (FAD) in tissue. The retrieved weights of the components, e.g. NADH and FAD are used to estimate the relative concentrations of the native fluorophores and the redox ratio. A machine learning algorithm such as support vector machine (SVM) is used for classification to distinguish normal and cancerous tissue samples based on either the relative concentrations of NADH and FAD or the redox ratio alone. The classification performance is shown based on statistical measures such as sensitivity, specificity, and accuracy, along with the area under receiver operating characteristic (ROC) curve. A cross validation method such as leave-one-out is used to evaluate the predictive performance of the SVM classifier to avoid bias due to overfitting.

  9. Parallel and Scalable Clustering and Classification for Big Data in Geosciences

    NASA Astrophysics Data System (ADS)

    Riedel, M.

    2015-12-01

    Machine learning, data mining, and statistical computing are common techniques to perform analysis in earth sciences. This contribution will focus on two concrete and widely used data analytics methods suitable to analyse 'big data' in the context of geoscience use cases: clustering and classification. From the broad class of available clustering methods we focus on the density-based spatial clustering of appliactions with noise (DBSCAN) algorithm that enables the identification of outliers or interesting anomalies. A new open source parallel and scalable DBSCAN implementation will be discussed in the light of a scientific use case that detects water mixing events in the Koljoefjords. The second technique we cover is classification, with a focus set on the support vector machines algorithm (SVMs), as one of the best out-of-the-box classification algorithm. A parallel and scalable SVM implementation will be discussed in the light of a scientific use case in the field of remote sensing with 52 different classes of land cover types.

  10. Rapid prediction of chemical metabolism by human UDP-glucuronosyltransferase isoforms using quantum chemical descriptors derived with the electronegativity equalization method.

    PubMed

    Sorich, Michael J; McKinnon, Ross A; Miners, John O; Winkler, David A; Smith, Paul A

    2004-10-07

    This study aimed to evaluate in silico models based on quantum chemical (QC) descriptors derived using the electronegativity equalization method (EEM) and to assess the use of QC properties to predict chemical metabolism by human UDP-glucuronosyltransferase (UGT) isoforms. Various EEM-derived QC molecular descriptors were calculated for known UGT substrates and nonsubstrates. Classification models were developed using support vector machine and partial least squares discriminant analysis. In general, the most predictive models were generated with the support vector machine. Combining QC and 2D descriptors (from previous work) using a consensus approach resulted in a statistically significant improvement in predictivity (to 84%) over both the QC and 2D models and the other methods of combining the descriptors. EEM-derived QC descriptors were shown to be both highly predictive and computationally efficient. It is likely that EEM-derived QC properties will be generally useful for predicting ADMET and physicochemical properties during drug discovery.

  11. Probiotic-enriched foods and dietary supplement containing SYNBIO positively affects bowel habits in healthy adults: an assessment using standard statistical analysis and Support Vector Machines.

    PubMed

    Silvi, Stefania; Verdenelli, M Cristina; Cecchini, Cinzia; Coman, M Magdalena; Bernabei, M Simonetta; Rosati, Jessica; De Leone, Renato; Orpianesi, Carla; Cresci, Alberto

    2014-12-01

    A randomised, double-blind, placebo-controlled, parallel group study assessed in healthy adults how daily consumption of the probiotic combination SYNBIO®, administered in probiotic-enriched foods or in a dietary supplement, affected bowel habits. Primary and secondary outcomes gave the overall assessment of bowel well-being, while a Psychological General Well-Being Index compiled by participants estimated the health-related quality of life as well as the gastrointestinal tolerance determined with the Gastrointestinal Symptom Rating Scale. Support Vector Machine models for classification problems were used to validate the total outcomes on bowel well-being. SYNBIO® consumption improved bowel habits of volunteers consuming the probiotic foods or capsules, while the same effects were not registered in the control groups. The recovery of probiotic bacteria from the faeces of a cohort of 100 subjects for each supplemented group showed the persistence of strains in the gastrointestinal tract.

  12. Psychological vulnerability and problem gambling: an application of Durand Jacobs' general theory of addictions to electronic gaming machine playing in Australia.

    PubMed

    McCormick, Jessica; Delfabbro, Paul; Denson, Linley A

    2012-12-01

    The aim of this study was to conduct an empirical investigation of the validity of Jacobs' (in J Gambl Behav 2:15-31, 1986) general theory of addictions in relation to gambling problems associated with electronic gaming machines (EGM). Regular EGM gamblers (n = 190) completed a series of standardised measures relating to psychological and physiological vulnerability, substance use, dissociative experiences, early childhood trauma and abuse and problem gambling (the Problem Gambling Severity Index). Statistical analysis using structural equation modelling revealed clear relationships between childhood trauma and life stressors and psychological vulnerability, dissociative-like experiences and problem gambling. These findings confirm and extend a previous model validated by Gupta and Derevensky (in J Gambl Stud 14: 17-49, 1998) using an adolescent population. The significance of these findings are discussed for existing pathway models of problem gambling, for Jacobs' theory, and for clinicians engaged in assessment and intervention.

  13. Machine learning-based assessment tool for imbalance and vestibular dysfunction with virtual reality rehabilitation system.

    PubMed

    Yeh, Shih-Ching; Huang, Ming-Chun; Wang, Pa-Chun; Fang, Te-Yung; Su, Mu-Chun; Tsai, Po-Yi; Rizzo, Albert

    2014-10-01

    Dizziness is a major consequence of imbalance and vestibular dysfunction. Compared to surgery and drug treatments, balance training is non-invasive and more desired. However, training exercises are usually tedious and the assessment tool is insufficient to diagnose patient's severity rapidly. An interactive virtual reality (VR) game-based rehabilitation program that adopted Cawthorne-Cooksey exercises, and a sensor-based measuring system were introduced. To verify the therapeutic effect, a clinical experiment with 48 patients and 36 normal subjects was conducted. Quantified balance indices were measured and analyzed by statistical tools and a Support Vector Machine (SVM) classifier. In terms of balance indices, patients who completed the training process are progressed and the difference between normal subjects and patients is obvious. Further analysis by SVM classifier show that the accuracy of recognizing the differences between patients and normal subject is feasible, and these results can be used to evaluate patients' severity and make rapid assessment. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  14. Face recognition using an enhanced independent component analysis approach.

    PubMed

    Kwak, Keun-Chang; Pedrycz, Witold

    2007-03-01

    This paper is concerned with an enhanced independent component analysis (ICA) and its application to face recognition. Typically, face representations obtained by ICA involve unsupervised learning and high-order statistics. In this paper, we develop an enhancement of the generic ICA by augmenting this method by the Fisher linear discriminant analysis (LDA); hence, its abbreviation, FICA. The FICA is systematically developed and presented along with its underlying architecture. A comparative analysis explores four distance metrics, as well as classification with support vector machines (SVMs). We demonstrate that the FICA approach leads to the formation of well-separated classes in low-dimension subspace and is endowed with a great deal of insensitivity to large variation in illumination and facial expression. The comprehensive experiments are completed for the facial-recognition technology (FERET) face database; a comparative analysis demonstrates that FICA comes with improved classification rates when compared with some other conventional approaches such as eigenface, fisherface, and the ICA itself.

  15. Analysis of machining accuracy during free form surface milling simulation for different milling strategies

    NASA Astrophysics Data System (ADS)

    Matras, A.; Kowalczyk, R.

    2014-11-01

    The analysis results of machining accuracy after the free form surface milling simulations (based on machining EN AW- 7075 alloys) for different machining strategies (Level Z, Radial, Square, Circular) are presented in the work. Particular milling simulations were performed using CAD/CAM Esprit software. The accuracy of obtained allowance is defined as a difference between the theoretical surface of work piece element (the surface designed in CAD software) and the machined surface after a milling simulation. The difference between two surfaces describes a value of roughness, which is as the result of tool shape mapping on the machined surface. Accuracy of the left allowance notifies in direct way a surface quality after the finish machining. Described methodology of usage CAD/CAM software can to let improve a time design of machining process for a free form surface milling by a 5-axis CNC milling machine with omitting to perform the item on a milling machine in order to measure the machining accuracy for the selected strategies and cutting data.

  16. Feature and Statistical Model Development in Structural Health Monitoring

    NASA Astrophysics Data System (ADS)

    Kim, Inho

    All structures suffer wear and tear because of impact, excessive load, fatigue, corrosion, etc. in addition to inherent defects during their manufacturing processes and their exposure to various environmental effects. These structural degradations are often imperceptible, but they can severely affect the structural performance of a component, thereby severely decreasing its service life. Although previous studies of Structural Health Monitoring (SHM) have revealed extensive prior knowledge on the parts of SHM processes, such as the operational evaluation, data processing, and feature extraction, few studies have been conducted from a systematical perspective, the statistical model development. The first part of this dissertation, the characteristics of inverse scattering problems, such as ill-posedness and nonlinearity, reviews ultrasonic guided wave-based structural health monitoring problems. The distinctive features and the selection of the domain analysis are investigated by analytically searching the conditions of the uniqueness solutions for ill-posedness and are validated experimentally. Based on the distinctive features, a novel wave packet tracing (WPT) method for damage localization and size quantification is presented. This method involves creating time-space representations of the guided Lamb waves (GLWs), collected at a series of locations, with a spatially dense distribution along paths at pre-selected angles with respect to the direction, normal to the direction of wave propagation. The fringe patterns due to wave dispersion, which depends on the phase velocity, are selected as the primary features that carry information, regarding the wave propagation and scattering. The following part of this dissertation presents a novel damage-localization framework, using a fully automated process. In order to construct the statistical model for autonomous damage localization deep-learning techniques, such as restricted Boltzmann machine and deep belief network, are trained and utilized to interpret nonlinear far-field wave patterns. Next, a novel bridge scour estimation approach that comprises advantages of both empirical and data-driven models is developed. Two field datasets from the literature are used, and a Support Vector Machine (SVM), a machine-learning algorithm, is used to fuse the field data samples and classify the data with physical phenomena. The Fast Non-dominated Sorting Genetic Algorithm (NSGA-II) is evaluated on the model performance objective functions to search for Pareto optimal fronts.

  17. Splendidly blended: a machine learning set up for CDU control

    NASA Astrophysics Data System (ADS)

    Utzny, Clemens

    2017-06-01

    As the concepts of machine learning and artificial intelligence continue to grow in importance in the context of internet related applications it is still in its infancy when it comes to process control within the semiconductor industry. Especially the branch of mask manufacturing presents a challenge to the concepts of machine learning since the business process intrinsically induces pronounced product variability on the background of small plate numbers. In this paper we present the architectural set up of a machine learning algorithm which successfully deals with the demands and pitfalls of mask manufacturing. A detailed motivation of this basic set up followed by an analysis of its statistical properties is given. The machine learning set up for mask manufacturing involves two learning steps: an initial step which identifies and classifies the basic global CD patterns of a process. These results form the basis for the extraction of an optimized training set via balanced sampling. A second learning step uses this training set to obtain the local as well as global CD relationships induced by the manufacturing process. Using two production motivated examples we show how this approach is flexible and powerful enough to deal with the exacting demands of mask manufacturing. In one example we show how dedicated covariates can be used in conjunction with increased spatial resolution of the CD map model in order to deal with pathological CD effects at the mask boundary. The other example shows how the model set up enables strategies for dealing tool specific CD signature differences. In this case the balanced sampling enables a process control scheme which allows usage of the full tool park within the specified tight tolerance budget. Overall, this paper shows that the current rapid developments off the machine learning algorithms can be successfully used within the context of semiconductor manufacturing.

  18. Bioinformatics in proteomics: application, terminology, and pitfalls.

    PubMed

    Wiemer, Jan C; Prokudin, Alexander

    2004-01-01

    Bioinformatics applies data mining, i.e., modern computer-based statistics, to biomedical data. It leverages on machine learning approaches, such as artificial neural networks, decision trees and clustering algorithms, and is ideally suited for handling huge data amounts. In this article, we review the analysis of mass spectrometry data in proteomics, starting with common pre-processing steps and using single decision trees and decision tree ensembles for classification. Special emphasis is put on the pitfall of overfitting, i.e., of generating too complex single decision trees. Finally, we discuss the pros and cons of the two different decision tree usages.

  19. Prediction of Muscle Performance During Dynamic Repetitive Exercise

    NASA Technical Reports Server (NTRS)

    Byerly, D. L.; Byerly, K. A.; Sognier, M. A.; Squires, W. G.

    2002-01-01

    A method for predicting human muscle performance was developed. Eight test subjects performed a repetitive dynamic exercise to failure using a Lordex spinal machine. Electromyography (EMG) data was collected from the erector spinae. Evaluation of the EMG data using a 5th order Autoregressive (AR) model and statistical regression analysis revealed that an AR parameter, the mean average magnitude of AR poles, can predict performance to failure as early as the second repetition of the exercise. Potential applications to the space program include evaluating on-orbit countermeasure effectiveness, maximizing post-flight recovery, and future real-time monitoring capability during Extravehicular Activity.

  20. feets: feATURE eXTRACTOR for tIME sERIES

    NASA Astrophysics Data System (ADS)

    Cabral, Juan; Sanchez, Bruno; Ramos, Felipe; Gurovich, Sebastián; Granitto, Pablo; VanderPlas, Jake

    2018-06-01

    feets characterizes and analyzes light-curves from astronomical photometric databases for modelling, classification, data cleaning, outlier detection and data analysis. It uses machine learning algorithms to determine the numerical descriptors that characterize and distinguish the different variability classes of light-curves; these range from basic statistical measures such as the mean or standard deviation to complex time-series characteristics such as the autocorrelation function. The library is not restricted to the astronomical field and could also be applied to any kind of time series. This project is a derivative work of FATS (ascl:1711.017).

  1. Spatial prediction of landslides using a hybrid machine learning approach based on Random Subspace and Classification and Regression Trees

    NASA Astrophysics Data System (ADS)

    Pham, Binh Thai; Prakash, Indra; Tien Bui, Dieu

    2018-02-01

    A hybrid machine learning approach of Random Subspace (RSS) and Classification And Regression Trees (CART) is proposed to develop a model named RSSCART for spatial prediction of landslides. This model is a combination of the RSS method which is known as an efficient ensemble technique and the CART which is a state of the art classifier. The Luc Yen district of Yen Bai province, a prominent landslide prone area of Viet Nam, was selected for the model development. Performance of the RSSCART model was evaluated through the Receiver Operating Characteristic (ROC) curve, statistical analysis methods, and the Chi Square test. Results were compared with other benchmark landslide models namely Support Vector Machines (SVM), single CART, Naïve Bayes Trees (NBT), and Logistic Regression (LR). In the development of model, ten important landslide affecting factors related with geomorphology, geology and geo-environment were considered namely slope angles, elevation, slope aspect, curvature, lithology, distance to faults, distance to rivers, distance to roads, and rainfall. Performance of the RSSCART model (AUC = 0.841) is the best compared with other popular landslide models namely SVM (0.835), single CART (0.822), NBT (0.821), and LR (0.723). These results indicate that performance of the RSSCART is a promising method for spatial landslide prediction.

  2. Predicting Refractive Surgery Outcome: Machine Learning Approach With Big Data.

    PubMed

    Achiron, Asaf; Gur, Zvi; Aviv, Uri; Hilely, Assaf; Mimouni, Michael; Karmona, Lily; Rokach, Lior; Kaiserman, Igor

    2017-09-01

    To develop a decision forest for prediction of laser refractive surgery outcome. Data from consecutive cases of patients who underwent LASIK or photorefractive surgeries during a 12-year period in a single center were assembled into a single dataset. Training of machine-learning classifiers and testing were performed with a statistical classifier algorithm. The decision forest was created by feature vectors extracted from 17,592 cases and 38 clinical parameters for each patient. A 10-fold cross-validation procedure was applied to estimate the predictive value of the decision forest when applied to new patients. Analysis included patients younger than 40 years who were not treated for monovision. Efficacy of 0.7 or greater and 0.8 or greater was achieved in 16,198 (92.0%) and 14,945 (84.9%) eyes, respectively. Efficacy of less than 0.4 and less than 0.5 was achieved in 322 (1.8%) and 506 (2.9%) eyes, respectively. Patients in the low efficacy group (< 0.4) had statistically significant differences compared with the high efficacy group (≥ 0.8), yet were clinically similar (mean differences between groups of 0.7 years, of 0.43 mm in pupil size, of 0.11 D in cylinder, of 0.22 logMAR in preoperative CDVA, of 0.11 mm in optical zone size, of 1.03 D in actual sphere treatment, and of 0.64 D in actual cylinder treatment). The preoperative subjective CDVA had the highest gain (most important to the model). Correlations analysis revealed significantly decreased efficacy with increased age (r = -0.67, P < .001), central corneal thickness (r = -0.40, P < .001), mean keratometry (r = -0.33, P < .001), and preoperative CDVA (r = -0.47, P < .001). Efficacy increased with pupil size (r = 0.20, P < .001). This model could support clinical decision making and may lead to better individual risk assessment. Expanding the role of machine learning in analyzing big data from refractive surgeries may be of interest. [J Refract Surg. 2017;33(9):592-597.]. Copyright 2017, SLACK Incorporated.

  3. Crabbing System for an Electron-Ion Collider

    NASA Astrophysics Data System (ADS)

    Castilla, Alejandro

    As high energy and nuclear physicists continue to push further the boundaries of knowledge using colliders, there is an imperative need, not only to increase the colliding beams' energies, but also to improve the accuracy of the experiments, and to collect a large quantity of events with good statistical sensitivity. To achieve the latter, it is necessary to collect more data by increasing the rate at which these pro- cesses are being produced and detected in the machine. This rate of events depends directly on the machine's luminosity. The luminosity itself is proportional to the frequency at which the beams are being delivered, the number of particles in each beam, and inversely proportional to the cross-sectional size of the colliding beams. There are several approaches that can be considered to increase the events statistics in a collider other than increasing the luminosity, such as running the experiments for a longer time. However, this also elevates the operation expenses, while increas- ing the frequency at which the beams are delivered implies strong physical changes along the accelerator and the detectors. Therefore, it is preferred to increase the beam intensities and reduce the beams cross-sectional areas to achieve these higher luminosities. In the case where the goal is to push the limits, sometimes even beyond the machines design parameters, one must develop a detailed High Luminosity Scheme. Any high luminosity scheme on a modern collider considers--in one of their versions--the use of crab cavities to correct the geometrical reduction of the luminosity due to the beams crossing angle. In this dissertation, we present the design and testing of a proof-of-principle compact superconducting crab cavity, at 750 MHz, for the future electron-ion collider, currently under design at Jefferson Lab. In addition to the design and validation of the cavity prototype, we present the analysis of the first order beam dynamics and the integration of the crabbing systems to the interaction region. Following this, we propose the concept of twin crabs to allow machines with variable beam transverse coupling in the interaction region to have full crabbing in only the desired plane. Finally, we present recommendations to extend this work to other frequencies.

  4. Surface roughness analysis after laser assisted machining of hard to cut materials

    NASA Astrophysics Data System (ADS)

    Przestacki, D.; Jankowiak, M.

    2014-03-01

    Metal matrix composites and Si3N4 ceramics are very attractive materials for various industry applications due to extremely high hardness and abrasive wear resistance. However because of these features they are problematic for the conventional turning process. The machining on a classic lathe still requires special polycrystalline diamond (PCD) or cubic boron nitride (CBN) cutting inserts which are very expensive. In the paper an experimental surface roughness analysis of laser assisted machining (LAM) for two tapes of hard-to-cut materials was presented. In LAM, the surface of work piece is heated directly by a laser beam in order to facilitate, the decohesion of material. Surface analysis concentrates on the influence of laser assisted machining on the surface quality of the silicon nitride ceramic Si3N4 and metal matrix composite (MMC). The effect of the laser assisted machining was compared to the conventional machining. The machining parameters influence on surface roughness parameters was also investigated. The 3D surface topographies were measured using optical surface profiler. The analysis of power spectrum density (PSD) roughness profile were analyzed.

  5. Human factors model concerning the man-machine interface of mining crewstations

    NASA Technical Reports Server (NTRS)

    Rider, James P.; Unger, Richard L.

    1989-01-01

    The U.S. Bureau of Mines is developing a computer model to analyze the human factors aspect of mining machine operator compartments. The model will be used as a research tool and as a design aid. It will have the capability to perform the following: simulated anthropometric or reach assessment, visibility analysis, illumination analysis, structural analysis of the protective canopy, operator fatigue analysis, and computation of an ingress-egress rating. The model will make extensive use of graphics to simplify data input and output. Two dimensional orthographic projections of the machine and its operator compartment are digitized and the data rebuilt into a three dimensional representation of the mining machine. Anthropometric data from either an individual or any size population may be used. The model is intended for use by equipment manufacturers and mining companies during initial design work on new machines. In addition to its use in machine design, the model should prove helpful as an accident investigation tool and for determining the effects of machine modifications made in the field on the critical areas of visibility and control reach ability.

  6. Applications of Support Vector Machines In Chemo And Bioinformatics

    NASA Astrophysics Data System (ADS)

    Jayaraman, V. K.; Sundararajan, V.

    2010-10-01

    Conventional linear & nonlinear tools for classification, regression & data driven modeling are being replaced on a rapid scale by newer techniques & tools based on artificial intelligence and machine learning. While the linear techniques are not applicable for inherently nonlinear problems, newer methods serve as attractive alternatives for solving real life problems. Support Vector Machine (SVM) classifiers are a set of universal feed-forward network based classification algorithms that have been formulated from statistical learning theory and structural risk minimization principle. SVM regression closely follows the classification methodology. In this work recent applications of SVM in Chemo & Bioinformatics will be described with suitable illustrative examples.

  7. Design features and results from fatigue reliability research machines.

    NASA Technical Reports Server (NTRS)

    Lalli, V. R.; Kececioglu, D.; Mcconnell, J. B.

    1971-01-01

    The design, fabrication, development, operation, calibration and results from reversed bending combined with steady torque fatigue research machines are presented. Fifteen-centimeter long, notched, SAE 4340 steel specimens are subjected to various combinations of these stresses and cycled to failure. Failure occurs when the crack in the notch passes through the specimen automatically shutting down the test machine. These cycles-to-failure data are statistically analyzed to develop a probabilistic S-N diagram. These diagrams have many uses; a rotating component design example given in the literature shows that minimum size and weight for a specified number of cycles and reliability can be calculated using these diagrams.

  8. Progress with modeling activity landscapes in drug discovery.

    PubMed

    Vogt, Martin

    2018-04-19

    Activity landscapes (ALs) are representations and models of compound data sets annotated with a target-specific activity. In contrast to quantitative structure-activity relationship (QSAR) models, ALs aim at characterizing structure-activity relationships (SARs) on a large-scale level encompassing all active compounds for specific targets. The popularity of AL modeling has grown substantially with the public availability of large activity-annotated compound data sets. AL modeling crucially depends on molecular representations and similarity metrics used to assess structural similarity. Areas covered: The concepts of AL modeling are introduced and its basis in quantitatively assessing molecular similarity is discussed. The different types of AL modeling approaches are introduced. AL designs can broadly be divided into three categories: compound-pair based, dimensionality reduction, and network approaches. Recent developments for each of these categories are discussed focusing on the application of mathematical, statistical, and machine learning tools for AL modeling. AL modeling using chemical space networks is covered in more detail. Expert opinion: AL modeling has remained a largely descriptive approach for the analysis of SARs. Beyond mere visualization, the application of analytical tools from statistics, machine learning and network theory has aided in the sophistication of AL designs and provides a step forward in transforming ALs from descriptive to predictive tools. To this end, optimizing representations that encode activity relevant features of molecules might prove to be a crucial step.

  9. Linear Discriminant Analysis Achieves High Classification Accuracy for the BOLD fMRI Response to Naturalistic Movie Stimuli

    PubMed Central

    Mandelkow, Hendrik; de Zwart, Jacco A.; Duyn, Jeff H.

    2016-01-01

    Naturalistic stimuli like movies evoke complex perceptual processes, which are of great interest in the study of human cognition by functional MRI (fMRI). However, conventional fMRI analysis based on statistical parametric mapping (SPM) and the general linear model (GLM) is hampered by a lack of accurate parametric models of the BOLD response to complex stimuli. In this situation, statistical machine-learning methods, a.k.a. multivariate pattern analysis (MVPA), have received growing attention for their ability to generate stimulus response models in a data-driven fashion. However, machine-learning methods typically require large amounts of training data as well as computational resources. In the past, this has largely limited their application to fMRI experiments involving small sets of stimulus categories and small regions of interest in the brain. By contrast, the present study compares several classification algorithms known as Nearest Neighbor (NN), Gaussian Naïve Bayes (GNB), and (regularized) Linear Discriminant Analysis (LDA) in terms of their classification accuracy in discriminating the global fMRI response patterns evoked by a large number of naturalistic visual stimuli presented as a movie. Results show that LDA regularized by principal component analysis (PCA) achieved high classification accuracies, above 90% on average for single fMRI volumes acquired 2 s apart during a 300 s movie (chance level 0.7% = 2 s/300 s). The largest source of classification errors were autocorrelations in the BOLD signal compounded by the similarity of consecutive stimuli. All classifiers performed best when given input features from a large region of interest comprising around 25% of the voxels that responded significantly to the visual stimulus. Consistent with this, the most informative principal components represented widespread distributions of co-activated brain regions that were similar between subjects and may represent functional networks. In light of these results, the combination of naturalistic movie stimuli and classification analysis in fMRI experiments may prove to be a sensitive tool for the assessment of changes in natural cognitive processes under experimental manipulation. PMID:27065832

  10. Applying Sparse Machine Learning Methods to Twitter: Analysis of the 2012 Change in Pap Smear Guidelines. A Sequential Mixed-Methods Study.

    PubMed

    Lyles, Courtney Rees; Godbehere, Andrew; Le, Gem; El Ghaoui, Laurent; Sarkar, Urmimala

    2016-06-10

    It is difficult to synthesize the vast amount of textual data available from social media websites. Capturing real-world discussions via social media could provide insights into individuals' opinions and the decision-making process. We conducted a sequential mixed methods study to determine the utility of sparse machine learning techniques in summarizing Twitter dialogues. We chose a narrowly defined topic for this approach: cervical cancer discussions over a 6-month time period surrounding a change in Pap smear screening guidelines. We applied statistical methodologies, known as sparse machine learning algorithms, to summarize Twitter messages about cervical cancer before and after the 2012 change in Pap smear screening guidelines by the US Preventive Services Task Force (USPSTF). All messages containing the search terms "cervical cancer," "Pap smear," and "Pap test" were analyzed during: (1) January 1-March 13, 2012, and (2) March 14-June 30, 2012. Topic modeling was used to discern the most common topics from each time period, and determine the singular value criterion for each topic. The results were then qualitatively coded from top 10 relevant topics to determine the efficiency of clustering method in grouping distinct ideas, and how the discussion differed before vs. after the change in guidelines . This machine learning method was effective in grouping the relevant discussion topics about cervical cancer during the respective time periods (~20% overall irrelevant content in both time periods). Qualitative analysis determined that a significant portion of the top discussion topics in the second time period directly reflected the USPSTF guideline change (eg, "New Screening Guidelines for Cervical Cancer"), and many topics in both time periods were addressing basic screening promotion and education (eg, "It is Cervical Cancer Awareness Month! Click the link to see where you can receive a free or low cost Pap test.") It was demonstrated that machine learning tools can be useful in cervical cancer prevention and screening discussions on Twitter. This method allowed us to prove that there is publicly available significant information about cervical cancer screening on social media sites. Moreover, we observed a direct impact of the guideline change within the Twitter messages.

  11. Improved Hierarchical Optimization-Based Classification of Hyperspectral Images Using Shape Analysis

    NASA Technical Reports Server (NTRS)

    Tarabalka, Yuliya; Tilton, James C.

    2012-01-01

    A new spectral-spatial method for classification of hyperspectral images is proposed. The HSegClas method is based on the integration of probabilistic classification and shape analysis within the hierarchical step-wise optimization algorithm. First, probabilistic support vector machines classification is applied. Then, at each iteration two neighboring regions with the smallest Dissimilarity Criterion (DC) are merged, and classification probabilities are recomputed. The important contribution of this work consists in estimating a DC between regions as a function of statistical, classification and geometrical (area and rectangularity) features. Experimental results are presented on a 102-band ROSIS image of the Center of Pavia, Italy. The developed approach yields more accurate classification results when compared to previously proposed methods.

  12. Complete scanpaths analysis toolbox.

    PubMed

    Augustyniak, Piotr; Mikrut, Zbigniew

    2006-01-01

    This paper presents a complete open software environment for control, data processing and assessment of visual experiments. Visual experiments are widely used in research on human perception physiology and the results are applicable to various visual information-based man-machine interfacing, human-emulated automatic visual systems or scanpath-based learning of perceptual habits. The toolbox is designed for Matlab platform and supports infra-red reflection-based eyetracker in calibration and scanpath analysis modes. Toolbox procedures are organized in three layers: the lower one, communicating with the eyetracker output file, the middle detecting scanpath events on a physiological background and the one upper consisting of experiment schedule scripts, statistics and summaries. Several examples of visual experiments carried out with use of the presented toolbox complete the paper.

  13. Machine learning applications in genetics and genomics.

    PubMed

    Libbrecht, Maxwell W; Noble, William Stafford

    2015-06-01

    The field of machine learning, which aims to develop computer algorithms that improve with experience, holds promise to enable computers to assist humans in the analysis of large, complex data sets. Here, we provide an overview of machine learning applications for the analysis of genome sequencing data sets, including the annotation of sequence elements and epigenetic, proteomic or metabolomic data. We present considerations and recurrent challenges in the application of supervised, semi-supervised and unsupervised machine learning methods, as well as of generative and discriminative modelling approaches. We provide general guidelines to assist in the selection of these machine learning methods and their practical application for the analysis of genetic and genomic data sets.

  14. [Application of chemometrics in composition-activity relationship research of traditional Chinese medicine].

    PubMed

    Han, Sheng-Nan

    2014-07-01

    Chemometrics is a new branch of chemistry which is widely applied to various fields of analytical chemistry. Chemometrics can use theories and methods of mathematics, statistics, computer science and other related disciplines to optimize the chemical measurement process and maximize access to acquire chemical information and other information on material systems by analyzing chemical measurement data. In recent years, traditional Chinese medicine has attracted widespread attention. In the research of traditional Chinese medicine, it has been a key problem that how to interpret the relationship between various chemical components and its efficacy, which seriously restricts the modernization of Chinese medicine. As chemometrics brings the multivariate analysis methods into the chemical research, it has been applied as an effective research tool in the composition-activity relationship research of Chinese medicine. This article reviews the applications of chemometrics methods in the composition-activity relationship research in recent years. The applications of multivariate statistical analysis methods (such as regression analysis, correlation analysis, principal component analysis, etc. ) and artificial neural network (such as back propagation artificial neural network, radical basis function neural network, support vector machine, etc. ) are summarized, including the brief fundamental principles, the research contents and the advantages and disadvantages. Finally, the existing main problems and prospects of its future researches are proposed.

  15. Evaluation of Cepstrum Algorithm with Impact Seeded Fault Data of Helicopter Oil Cooler Fan Bearings and Machine Fault Simulator Data

    DTIC Science & Technology

    2013-02-01

    of a bearing must be put into practice. There are many potential methods, the most traditional being the use of statistical time-domain features...accelerate degradation to test multiples bearings to gain statistical relevance and extrapolate results to scale for field conditions. Temperature...as time statistics , frequency estimation to improve the fault frequency detection. For future investigations, one can further explore the

  16. Improving Statistical Machine Translation Through N-best List Re-ranking and Optimization

    DTIC Science & Technology

    2014-03-27

    of Master of Science in Cyber Operations Jordan S. Keefer, B.S.C.S. Second Lieutenant, USAF March 2014 DISTRIBUTION STATEMENT A: APPROVED FOR PUBLIC...Atlantic Trade Organization NIST National Institute of Standards and Technology NL natural language NSF National Science Foundation ix Acronym Definition...the machine translation problem. In 1964 the Director of the National Science Foundation (NSF), 4 Dr. Leland Haworth, commissioned a research team to

  17. Motion Simulation Analysis of Rail Weld CNC Fine Milling Machine

    NASA Astrophysics Data System (ADS)

    Mao, Huajie; Shu, Min; Li, Chao; Zhang, Baojun

    CNC fine milling machine is a new advanced equipment of rail weld precision machining with high precision, high efficiency, low environmental pollution and other technical advantages. The motion performance of this machine directly affects its machining accuracy and stability, which makes it an important consideration for its design. Based on the design drawings, this article completed 3D modeling of 60mm/kg rail weld CNC fine milling machine by using Solidworks. After that, the geometry was imported into Adams to finish the motion simulation analysis. The displacement, velocity, angular velocity and some other kinematical parameters curves of the main components were obtained in the post-processing and these are the scientific basis for the design and development for this machine.

  18. The Evaluation of Efficiency of the Use of Machine Working Time in the Industrial Company - Case Study

    NASA Astrophysics Data System (ADS)

    Kardas, Edyta; Brožova, Silvie; Pustějovská, Pavlína; Jursová, Simona

    2017-12-01

    In the paper the evaluation of efficiency of the use of machines in the selected production company was presented. The OEE method (Overall Equipment Effectiveness) was used for the analysis. The selected company deals with the production of tapered roller bearings. The analysis of effectiveness was done for 17 automatic grinding lines working in the department of grinding rollers. Low level of efficiency of machines was affected by problems with the availability of machines and devices. The causes of machine downtime on these lines was also analyzed. Three basic causes of downtime were identified: no kanban card, diamonding, no operator. Ways to improve the use of these machines were suggested. The analysis takes into account the actual results from the production process and covers the period of one calendar year.

  19. Assessing a Novel Method to Reduce Anesthesia Machine Contamination: A Prospective, Observational Trial.

    PubMed

    Biddle, Chuck J; George-Gay, Beverly; Prasanna, Praveen; Hill, Emily M; Davis, Thomas C; Verhulst, Brad

    2018-01-01

    Anesthesia machines are known reservoirs of bacterial species, potentially contributing to healthcare associated infections (HAIs). An inexpensive, disposable, nonpermeable, transparent anesthesia machine wrap (AMW) may reduce microbial contamination of the anesthesia machine. This study quantified the density and diversity of bacterial species found on anesthesia machines after terminal cleaning and between cases during actual anesthesia care to assess the impact of the AMW. We hypothesized reduced bioburden with the use of the AMW. In a prospective, experimental research design, the AMW was used in 11 surgical cases (intervention group) and not used in 11 control surgical cases. Cases were consecutively assigned to general surgical operating rooms. Seven frequently touched and difficult to disinfect "hot spots" were cultured on each machine preceding and following each case. The density and diversity of cultured colony forming units (CFUs) between the covered and uncovered machines were compared using Wilcoxon signed-rank test and Student's t -tests. There was a statistically significant reduction in CFU density and diversity when the AMW was employed. The protective effect of the AMW during regular anesthetic care provides a reliable and low-cost method to minimize the transmission of pathogens across patients and potentially reduces HAIs.

  20. Design and Analysis of Linear Fault-Tolerant Permanent-Magnet Vernier Machines

    PubMed Central

    Xu, Liang; Liu, Guohai; Du, Yi; Liu, Hu

    2014-01-01

    This paper proposes a new linear fault-tolerant permanent-magnet (PM) vernier (LFTPMV) machine, which can offer high thrust by using the magnetic gear effect. Both PMs and windings of the proposed machine are on short mover, while the long stator is only manufactured from iron. Hence, the proposed machine is very suitable for long stroke system applications. The key of this machine is that the magnetizer splits the two movers with modular and complementary structures. Hence, the proposed machine offers improved symmetrical and sinusoidal back electromotive force waveform and reduced detent force. Furthermore, owing to the complementary structure, the proposed machine possesses favorable fault-tolerant capability, namely, independent phases. In particular, differing from the existing fault-tolerant machines, the proposed machine offers fault tolerance without sacrificing thrust density. This is because neither fault-tolerant teeth nor the flux-barriers are adopted. The electromagnetic characteristics of the proposed machine are analyzed using the time-stepping finite-element method, which verifies the effectiveness of the theoretical analysis. PMID:24982959

  1. Design and analysis of linear fault-tolerant permanent-magnet vernier machines.

    PubMed

    Xu, Liang; Ji, Jinghua; Liu, Guohai; Du, Yi; Liu, Hu

    2014-01-01

    This paper proposes a new linear fault-tolerant permanent-magnet (PM) vernier (LFTPMV) machine, which can offer high thrust by using the magnetic gear effect. Both PMs and windings of the proposed machine are on short mover, while the long stator is only manufactured from iron. Hence, the proposed machine is very suitable for long stroke system applications. The key of this machine is that the magnetizer splits the two movers with modular and complementary structures. Hence, the proposed machine offers improved symmetrical and sinusoidal back electromotive force waveform and reduced detent force. Furthermore, owing to the complementary structure, the proposed machine possesses favorable fault-tolerant capability, namely, independent phases. In particular, differing from the existing fault-tolerant machines, the proposed machine offers fault tolerance without sacrificing thrust density. This is because neither fault-tolerant teeth nor the flux-barriers are adopted. The electromagnetic characteristics of the proposed machine are analyzed using the time-stepping finite-element method, which verifies the effectiveness of the theoretical analysis.

  2. Analysis of NREL Cold-Drink Vending Machines for Energy Savings

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Deru, M.; Torcellini, P.; Bottom, K.

    NREL Staff, as part of Sustainable NREL, an initiative to improve the overall energy and environmental performance of the lab, decided to control how its vending machines used energy. The cold-drink vending machines across the lab were analyzed for potential energy savings opportunities. This report gives the monitoring and the analysis of two energy conservation measures applied to the cold-drink vending machines at NREL.

  3. Reliability Centred Maintenance (RCM) Analysis of Laser Machine in Filling Lithos at PT X

    NASA Astrophysics Data System (ADS)

    Suryono, M. A. E.; Rosyidi, C. N.

    2018-03-01

    PT. X used automated machines which work for sixteen hours per day. Therefore, the machines should be maintained to keep the availability of the machines. The aim of this research is to determine maintenance tasks according to the cause of component’s failure using Reliability Centred Maintenance (RCM) and determine the amount of optimal inspection frequency which must be performed to the machine at filling lithos process. In this research, RCM is used as an analysis tool to determine the critical component and find optimal inspection frequencies to maximize machine’s reliability. From the analysis, we found that the critical machine in filling lithos process is laser machine in Line 2. Then we proceed to determine the cause of machine’s failure. Lastube component has the highest Risk Priority Number (RPN) among other components such as power supply, lens, chiller, laser siren, encoder, conveyor, and mirror galvo. Most of the components have operational consequences and the others have hidden failure consequences and safety consequences. Time-directed life-renewal task, failure finding task, and servicing task can be used to overcome these consequences. The results of data analysis show that the inspection must be performed once a month for laser machine in the form of preventive maintenance to lowering the downtime.

  4. A Character Level Based and Word Level Based Approach for Chinese-Vietnamese Machine Translation.

    PubMed

    Tran, Phuoc; Dinh, Dien; Nguyen, Hien T

    2016-01-01

    Chinese and Vietnamese have the same isolated language; that is, the words are not delimited by spaces. In machine translation, word segmentation is often done first when translating from Chinese or Vietnamese into different languages (typically English) and vice versa. However, it is a matter for consideration that words may or may not be segmented when translating between two languages in which spaces are not used between words, such as Chinese and Vietnamese. Since Chinese-Vietnamese is a low-resource language pair, the sparse data problem is evident in the translation system of this language pair. Therefore, while translating, whether it should be segmented or not becomes more important. In this paper, we propose a new method for translating Chinese to Vietnamese based on a combination of the advantages of character level and word level translation. In addition, a hybrid approach that combines statistics and rules is used to translate on the word level. And at the character level, a statistical translation is used. The experimental results showed that our method improved the performance of machine translation over that of character or word level translation.

  5. Effect of overglazed and polished surface finishes on the compressive fracture strength of machinable ceramic materials.

    PubMed

    Asai, Tetsuya; Kazama, Ryunosuke; Fukushima, Masayoshi; Okiji, Takashi

    2010-11-01

    Controversy prevails over the effect of overglazing on the fracture strength of ceramic materials. Therefore, the effects of different surface finishes on the compressive fracture strength of machinable ceramic materials were investigated in this study. Plates prepared from four commercial brands of ceramic materials were either surface-polished or overglazed (n=10 per ceramic material for each surface finish), and bonded to flat surfaces of human dentin using a resin cement. Loads at failure were determined and statistically analyzed using two-way ANOVA and Bonferroni test. Although no statistical differences in load value were detected between polished and overglazed groups (p>0.05), the fracture load of Vita Mark II was significantly lower than those of ProCAD and IPS Empress CAD, whereas that of IPS e.max CAD was significantly higher than the latter two ceramic materials (p<0.05). It was concluded that overglazed and polished surfaces produced similar compressive fracture strengths irrespective of the machinable ceramic material tested, and that fracture strength was material-dependent.

  6. Applications of machine learning and data mining methods to detect associations of rare and common variants with complex traits.

    PubMed

    Lu, Ake Tzu-Hui; Austin, Erin; Bonner, Ashley; Huang, Hsin-Hsiung; Cantor, Rita M

    2014-09-01

    Machine learning methods (MLMs), designed to develop models using high-dimensional predictors, have been used to analyze genome-wide genetic and genomic data to predict risks for complex traits. We summarize the results from six contributions to our Genetic Analysis Workshop 18 working group; these investigators applied MLMs and data mining to analyses of rare and common genetic variants measured in pedigrees. To develop risk profiles, group members analyzed blood pressure traits along with single-nucleotide polymorphisms and rare variant genotypes derived from sequence and imputation analyses in large Mexican American pedigrees. Supervised MLMs included penalized regression with varying penalties, support vector machines, and permanental classification. Unsupervised MLMs included sparse principal components analysis and sparse graphical models. Entropy-based components analyses were also used to mine these data. None of the investigators fully capitalized on the genetic information provided by the complete pedigrees. Their approaches either corrected for the nonindependence of the individuals within the pedigrees or analyzed only those who were independent. Some methods allowed for covariate adjustment, whereas others did not. We evaluated these methods using a variety of metrics. Four contributors conducted primary analyses on the real data, and the other two research groups used the simulated data with and without knowledge of the underlying simulation model. One group used the answers to the simulated data to assess power and type I errors. Although the MLMs applied were substantially different, each research group concluded that MLMs have advantages over standard statistical approaches with these high-dimensional data. © 2014 WILEY PERIODICALS, INC.

  7. Reducing lumber thickness variation using real-time statistical process control

    Treesearch

    Thomas M. Young; Brian H. Bond; Jan Wiedenbeck

    2002-01-01

    A technology feasibility study for reducing lumber thickness variation was conducted from April 2001 until March 2002 at two sawmills located in the southern U.S. A real-time statistical process control (SPC) system was developed that featured Wonderware human machine interface technology (HMI) with distributed real-time control charts for all sawing centers and...

  8. [Discrimination of types of polyacrylamide based on near infrared spectroscopy coupled with least square support vector machine].

    PubMed

    Zhang, Hong-Guang; Yang, Qin-Min; Lu, Jian-Gang

    2014-04-01

    In this paper, a novel discriminant methodology based on near infrared spectroscopic analysis technique and least square support vector machine was proposed for rapid and nondestructive discrimination of different types of Polyacrylamide. The diffuse reflectance spectra of samples of Non-ionic Polyacrylamide, Anionic Polyacrylamide and Cationic Polyacrylamide were measured. Then principal component analysis method was applied to reduce the dimension of the spectral data and extract of the principal compnents. The first three principal components were used for cluster analysis of the three different types of Polyacrylamide. Then those principal components were also used as inputs of least square support vector machine model. The optimization of the parameters and the number of principal components used as inputs of least square support vector machine model was performed through cross validation based on grid search. 60 samples of each type of Polyacrylamide were collected. Thus a total of 180 samples were obtained. 135 samples, 45 samples for each type of Polyacrylamide, were randomly split into a training set to build calibration model and the rest 45 samples were used as test set to evaluate the performance of the developed model. In addition, 5 Cationic Polyacrylamide samples and 5 Anionic Polyacrylamide samples adulterated with different proportion of Non-ionic Polyacrylamide were also prepared to show the feasibilty of the proposed method to discriminate the adulterated Polyacrylamide samples. The prediction error threshold for each type of Polyacrylamide was determined by F statistical significance test method based on the prediction error of the training set of corresponding type of Polyacrylamide in cross validation. The discrimination accuracy of the built model was 100% for prediction of the test set. The prediction of the model for the 10 mixing samples was also presented, and all mixing samples were accurately discriminated as adulterated samples. The overall results demonstrate that the discrimination method proposed in the present paper can rapidly and nondestructively discriminate the different types of Polyacrylamide and the adulterated Polyacrylamide samples, and offered a new approach to discriminate the types of Polyacrylamide.

  9. Predicting dire outcomes of patients with community acquired pneumonia.

    PubMed

    Cooper, Gregory F; Abraham, Vijoy; Aliferis, Constantin F; Aronis, John M; Buchanan, Bruce G; Caruana, Richard; Fine, Michael J; Janosky, Janine E; Livingston, Gary; Mitchell, Tom; Monti, Stefano; Spirtes, Peter

    2005-10-01

    Community-acquired pneumonia (CAP) is an important clinical condition with regard to patient mortality, patient morbidity, and healthcare resource utilization. The assessment of the likely clinical course of a CAP patient can significantly influence decision making about whether to treat the patient as an inpatient or as an outpatient. That decision can in turn influence resource utilization, as well as patient well being. Predicting dire outcomes, such as mortality or severe clinical complications, is a particularly important component in assessing the clinical course of patients. We used a training set of 1601 CAP patient cases to construct 11 statistical and machine-learning models that predict dire outcomes. We evaluated the resulting models on 686 additional CAP-patient cases. The primary goal was not to compare these learning algorithms as a study end point; rather, it was to develop the best model possible to predict dire outcomes. A special version of an artificial neural network (NN) model predicted dire outcomes the best. Using the 686 test cases, we estimated the expected healthcare quality and cost impact of applying the NN model in practice. The particular, quantitative results of this analysis are based on a number of assumptions that we make explicit; they will require further study and validation. Nonetheless, the general implication of the analysis seems robust, namely, that even small improvements in predictive performance for prevalent and costly diseases, such as CAP, are likely to result in significant improvements in the quality and efficiency of healthcare delivery. Therefore, seeking models with the highest possible level of predictive performance is important. Consequently, seeking ever better machine-learning and statistical modeling methods is of great practical significance.

  10. Multivariate Statistical Analysis of Cigarette Design Feature Influence on ISO TNCO Yields.

    PubMed

    Agnew-Heard, Kimberly A; Lancaster, Vicki A; Bravo, Roberto; Watson, Clifford; Walters, Matthew J; Holman, Matthew R

    2016-06-20

    The aim of this study is to explore how differences in cigarette physical design parameters influence tar, nicotine, and carbon monoxide (TNCO) yields in mainstream smoke (MSS) using the International Organization of Standardization (ISO) smoking regimen. Standardized smoking methods were used to evaluate 50 U.S. domestic brand cigarettes and a reference cigarette representing a range of TNCO yields in MSS collected from linear smoking machines using a nonintense smoking regimen. Multivariate statistical methods were used to form clusters of cigarettes based on their ISO TNCO yields and then to explore the relationship between the ISO generated TNCO yields and the nine cigarette physical design parameters between and within each cluster simultaneously. The ISO generated TNCO yields in MSS are 1.1-17.0 mg tar/cigarette, 0.1-2.2 mg nicotine/cigarette, and 1.6-17.3 mg CO/cigarette. Cluster analysis divided the 51 cigarettes into five discrete clusters based on their ISO TNCO yields. No one physical parameter dominated across all clusters. Predicting ISO machine generated TNCO yields based on these nine physical design parameters is complex due to the correlation among and between the nine physical design parameters and TNCO yields. From these analyses, it is estimated that approximately 20% of the variability in the ISO generated TNCO yields comes from other parameters (e.g., filter material, filter type, inclusion of expanded or reconstituted tobacco, and tobacco blend composition, along with differences in tobacco leaf origin and stalk positions and added ingredients). A future article will examine the influence of these physical design parameters on TNCO yields under a Canadian Intense (CI) smoking regimen. Together, these papers will provide a more robust picture of the design features that contribute to TNCO exposure across the range of real world smoking patterns.

  11. Predicting radiotherapy outcomes using statistical learning techniques

    NASA Astrophysics Data System (ADS)

    El Naqa, Issam; Bradley, Jeffrey D.; Lindsay, Patricia E.; Hope, Andrew J.; Deasy, Joseph O.

    2009-09-01

    Radiotherapy outcomes are determined by complex interactions between treatment, anatomical and patient-related variables. A common obstacle to building maximally predictive outcome models for clinical practice is the failure to capture potential complexity of heterogeneous variable interactions and applicability beyond institutional data. We describe a statistical learning methodology that can automatically screen for nonlinear relations among prognostic variables and generalize to unseen data before. In this work, several types of linear and nonlinear kernels to generate interaction terms and approximate the treatment-response function are evaluated. Examples of institutional datasets of esophagitis, pneumonitis and xerostomia endpoints were used. Furthermore, an independent RTOG dataset was used for 'generalizabilty' validation. We formulated the discrimination between risk groups as a supervised learning problem. The distribution of patient groups was initially analyzed using principle components analysis (PCA) to uncover potential nonlinear behavior. The performance of the different methods was evaluated using bivariate correlations and actuarial analysis. Over-fitting was controlled via cross-validation resampling. Our results suggest that a modified support vector machine (SVM) kernel method provided superior performance on leave-one-out testing compared to logistic regression and neural networks in cases where the data exhibited nonlinear behavior on PCA. For instance, in prediction of esophagitis and pneumonitis endpoints, which exhibited nonlinear behavior on PCA, the method provided 21% and 60% improvements, respectively. Furthermore, evaluation on the independent pneumonitis RTOG dataset demonstrated good generalizabilty beyond institutional data in contrast with other models. This indicates that the prediction of treatment response can be improved by utilizing nonlinear kernel methods for discovering important nonlinear interactions among model variables. These models have the capacity to predict on unseen data. Part of this work was first presented at the Seventh International Conference on Machine Learning and Applications, San Diego, CA, USA, 11-13 December 2008.

  12. An algorithm for the classification of mRNA patterns in eosinophilic esophagitis: Integration of machine learning.

    PubMed

    Sallis, Benjamin F; Erkert, Lena; Moñino-Romero, Sherezade; Acar, Utkucan; Wu, Rina; Konnikova, Liza; Lexmond, Willem S; Hamilton, Matthew J; Dunn, W Augustine; Szepfalusi, Zsolt; Vanderhoof, Jon A; Snapper, Scott B; Turner, Jerrold R; Goldsmith, Jeffrey D; Spencer, Lisa A; Nurko, Samuel; Fiebiger, Edda

    2018-04-01

    Diagnostic evaluation of eosinophilic esophagitis (EoE) remains difficult, particularly the assessment of the patient's allergic status. This study sought to establish an automated medical algorithm to assist in the evaluation of EoE. Machine learning techniques were used to establish a diagnostic probability score for EoE, p(EoE), based on esophageal mRNA transcript patterns from biopsies of patients with EoE, gastroesophageal reflux disease and controls. Dimensionality reduction in the training set established weighted factors, which were confirmed by immunohistochemistry. Following weighted factor analysis, p(EoE) was determined by random forest classification. Accuracy was tested in an external test set, and predictive power was assessed with equivocal patients. Esophageal IgE production was quantified with epsilon germ line (IGHE) transcripts and correlated with serum IgE and the T h 2-type mRNA profile to establish an IGHE score for tissue allergy. In the primary analysis, a 3-class statistical model generated a p(EoE) score based on common characteristics of the inflammatory EoE profile. A p(EoE) ≥ 25 successfully identified EoE with high accuracy (sensitivity: 90.9%, specificity: 93.2%, area under the curve: 0.985) and improved diagnosis of equivocal cases by 84.6%. The p(EoE) changed in response to therapy. A secondary analysis loop in EoE patients defined an IGHE score of ≥37.5 for a patient subpopulation with increased esophageal allergic inflammation. The development of intelligent data analysis from a machine learning perspective provides exciting opportunities to improve diagnostic precision and improve patient care in EoE. The p(EoE) and the IGHE score are steps toward the development of decision trees to define EoE subpopulations and, consequently, will facilitate individualized therapy. Copyright © 2017 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.

  13. Predictive Big Data Analytics: A Study of Parkinson's Disease Using Large, Complex, Heterogeneous, Incongruent, Multi-Source and Incomplete Observations.

    PubMed

    Dinov, Ivo D; Heavner, Ben; Tang, Ming; Glusman, Gustavo; Chard, Kyle; Darcy, Mike; Madduri, Ravi; Pa, Judy; Spino, Cathie; Kesselman, Carl; Foster, Ian; Deutsch, Eric W; Price, Nathan D; Van Horn, John D; Ames, Joseph; Clark, Kristi; Hood, Leroy; Hampstead, Benjamin M; Dauer, William; Toga, Arthur W

    2016-01-01

    A unique archive of Big Data on Parkinson's Disease is collected, managed and disseminated by the Parkinson's Progression Markers Initiative (PPMI). The integration of such complex and heterogeneous Big Data from multiple sources offers unparalleled opportunities to study the early stages of prevalent neurodegenerative processes, track their progression and quickly identify the efficacies of alternative treatments. Many previous human and animal studies have examined the relationship of Parkinson's disease (PD) risk to trauma, genetics, environment, co-morbidities, or life style. The defining characteristics of Big Data-large size, incongruency, incompleteness, complexity, multiplicity of scales, and heterogeneity of information-generating sources-all pose challenges to the classical techniques for data management, processing, visualization and interpretation. We propose, implement, test and validate complementary model-based and model-free approaches for PD classification and prediction. To explore PD risk using Big Data methodology, we jointly processed complex PPMI imaging, genetics, clinical and demographic data. Collective representation of the multi-source data facilitates the aggregation and harmonization of complex data elements. This enables joint modeling of the complete data, leading to the development of Big Data analytics, predictive synthesis, and statistical validation. Using heterogeneous PPMI data, we developed a comprehensive protocol for end-to-end data characterization, manipulation, processing, cleaning, analysis and validation. Specifically, we (i) introduce methods for rebalancing imbalanced cohorts, (ii) utilize a wide spectrum of classification methods to generate consistent and powerful phenotypic predictions, and (iii) generate reproducible machine-learning based classification that enables the reporting of model parameters and diagnostic forecasting based on new data. We evaluated several complementary model-based predictive approaches, which failed to generate accurate and reliable diagnostic predictions. However, the results of several machine-learning based classification methods indicated significant power to predict Parkinson's disease in the PPMI subjects (consistent accuracy, sensitivity, and specificity exceeding 96%, confirmed using statistical n-fold cross-validation). Clinical (e.g., Unified Parkinson's Disease Rating Scale (UPDRS) scores), demographic (e.g., age), genetics (e.g., rs34637584, chr12), and derived neuroimaging biomarker (e.g., cerebellum shape index) data all contributed to the predictive analytics and diagnostic forecasting. Model-free Big Data machine learning-based classification methods (e.g., adaptive boosting, support vector machines) can outperform model-based techniques in terms of predictive precision and reliability (e.g., forecasting patient diagnosis). We observed that statistical rebalancing of cohort sizes yields better discrimination of group differences, specifically for predictive analytics based on heterogeneous and incomplete PPMI data. UPDRS scores play a critical role in predicting diagnosis, which is expected based on the clinical definition of Parkinson's disease. Even without longitudinal UPDRS data, however, the accuracy of model-free machine learning based classification is over 80%. The methods, software and protocols developed here are openly shared and can be employed to study other neurodegenerative disorders (e.g., Alzheimer's, Huntington's, amyotrophic lateral sclerosis), as well as for other predictive Big Data analytics applications.

  14. Detection of mental stress due to oral academic examination via ultra-short-term HRV analysis.

    PubMed

    Castaldo, R; Xu, W; Melillo, P; Pecchia, L; Santamaria, L; James, C

    2016-08-01

    Mental stress may cause cognitive dysfunctions, cardiovascular disorders and depression. Mental stress detection via short-term Heart Rate Variability (HRV) analysis has been widely explored in the last years, while ultra-short term (less than 5 minutes) HRV has been not. This study aims to detect mental stress using linear and non-linear HRV features extracted from 3 minutes ECG excerpts recorded from 42 university students, during oral examination (stress) and at rest after a vacation. HRV features were then extracted and analyzed according to the literature using validated software tools. Statistical and data mining analysis were then performed on the extracted HRV features. The best performing machine learning method was the C4.5 tree algorithm, which discriminated between stress and rest with sensitivity, specificity and accuracy rate of 78%, 80% and 79% respectively.

  15. Discrimination of soft tissues using laser-induced breakdown spectroscopy in combination with k nearest neighbors (kNN) and support vector machine (SVM) classifiers

    NASA Astrophysics Data System (ADS)

    Li, Xiaohui; Yang, Sibo; Fan, Rongwei; Yu, Xin; Chen, Deying

    2018-06-01

    In this paper, discrimination of soft tissues using laser-induced breakdown spectroscopy (LIBS) in combination with multivariate statistical methods is presented. Fresh pork fat, skin, ham, loin and tenderloin muscle tissues are manually cut into slices and ablated using a 1064 nm pulsed Nd:YAG laser. Discrimination analyses between fat, skin and muscle tissues, and further between highly similar ham, loin and tenderloin muscle tissues, are performed based on the LIBS spectra in combination with multivariate statistical methods, including principal component analysis (PCA), k nearest neighbors (kNN) classification, and support vector machine (SVM) classification. Performances of the discrimination models, including accuracy, sensitivity and specificity, are evaluated using 10-fold cross validation. The classification models are optimized to achieve best discrimination performances. The fat, skin and muscle tissues can be definitely discriminated using both kNN and SVM classifiers, with accuracy of over 99.83%, sensitivity of over 0.995 and specificity of over 0.998. The highly similar ham, loin and tenderloin muscle tissues can also be discriminated with acceptable performances. The best performances are achieved with SVM classifier using Gaussian kernel function, with accuracy of 76.84%, sensitivity of over 0.742 and specificity of over 0.869. The results show that the LIBS technique assisted with multivariate statistical methods could be a powerful tool for online discrimination of soft tissues, even for tissues of high similarity, such as muscles from different parts of the animal body. This technique could be used for discrimination of tissues suffering minor clinical changes, thus may advance the diagnosis of early lesions and abnormalities.

  16. Study on the Optimization and Process Modeling of the Rotary Ultrasonic Machining of Zerodur Glass-Ceramic

    NASA Astrophysics Data System (ADS)

    Pitts, James Daniel

    Rotary ultrasonic machining (RUM), a hybrid process combining ultrasonic machining and diamond grinding, was created to increase material removal rates for the fabrication of hard and brittle workpieces. The objective of this research was to experimentally derive empirical equations for the prediction of multiple machined surface roughness parameters for helically pocketed rotary ultrasonic machined Zerodur glass-ceramic workpieces by means of a systematic statistical experimental approach. A Taguchi parametric screening design of experiments was employed to systematically determine the RUM process parameters with the largest effect on mean surface roughness. Next empirically determined equations for the seven common surface quality metrics were developed via Box-Behnken surface response experimental trials. Validation trials were conducted resulting in predicted and experimental surface roughness in varying levels of agreement. The reductions in cutting force and tool wear associated with RUM, reported by previous researchers, was experimentally verified to also extended to helical pocketing of Zerodur glass-ceramic.

  17. Function library programming to support B89 evaluation of Sheffield Apollo RS50 DCC (Direct Computer Control) CMM (Coordinate Measuring Machine)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Frank, R.N.

    1990-02-28

    The Inspection Shop at Lawrence Livermore Lab recently purchased a Sheffield Apollo RS50 Direct Computer Control Coordinate Measuring Machine. The performance of the machine was specified to conform to B89 standard which relies heavily upon using the measuring machine in its intended manner to verify its accuracy (rather than parametric tests). Although it would be possible to use the interactive measurement system to perform these tasks, a more thorough and efficient job can be done by creating Function Library programs for certain tasks which integrate Hewlett-Packard Basic 5.0 language and calls to proprietary analysis and machine control routines. This combinationmore » provides efficient use of the measuring machine with a minimum of keyboard input plus an analysis of the data with respect to the B89 Standard rather than a CMM analysis which would require subsequent interpretation. This paper discusses some characteristics of the Sheffield machine control and analysis software and my use of H-P Basic language to create automated measurement programs to support the B89 performance evaluation of the CMM. 1 ref.« less

  18. An online sleep apnea detection method based on recurrence quantification analysis.

    PubMed

    Nguyen, Hoa Dinh; Wilkins, Brek A; Cheng, Qi; Benjamin, Bruce Allen

    2014-07-01

    This paper introduces an online sleep apnea detection method based on heart rate complexity as measured by recurrence quantification analysis (RQA) statistics of heart rate variability (HRV) data. RQA statistics can capture nonlinear dynamics of a complex cardiorespiratory system during obstructive sleep apnea. In order to obtain a more robust measurement of the nonstationarity of the cardiorespiratory system, we use different fixed amount of neighbor thresholdings for recurrence plot calculation. We integrate a feature selection algorithm based on conditional mutual information to select the most informative RQA features for classification, and hence, to speed up the real-time classification process without degrading the performance of the system. Two types of binary classifiers, i.e., support vector machine and neural network, are used to differentiate apnea from normal sleep. A soft decision fusion rule is developed to combine the results of these classifiers in order to improve the classification performance of the whole system. Experimental results show that our proposed method achieves better classification results compared with the previous recurrence analysis-based approach. We also show that our method is flexible and a strong candidate for a real efficient sleep apnea detection system.

  19. Application of the Teager-Kaiser energy operator in bearing fault diagnosis.

    PubMed

    Henríquez Rodríguez, Patricia; Alonso, Jesús B; Ferrer, Miguel A; Travieso, Carlos M

    2013-03-01

    Condition monitoring of rotating machines is important in the prevention of failures. As most machine malfunctions are related to bearing failures, several bearing diagnosis techniques have been developed. Some of them feature the bearing vibration signal with statistical measures and others extract the bearing fault characteristic frequency from the AM component of the vibration signal. In this paper, we propose to transform the vibration signal to the Teager-Kaiser domain and feature it with statistical and energy-based measures. A bearing database with normal and faulty bearings is used. The diagnosis is performed with two classifiers: a neural network classifier and a LS-SVM classifier. Experiments show that the Teager domain features outperform those based on the temporal or AM signal. Copyright © 2012 ISA. Published by Elsevier Ltd. All rights reserved.

  20. Stroke dynamics and frequency of 3 phacoemulsification machines.

    PubMed

    Tognetto, Daniele; Cecchini, Paolo; Leon, Pia; Di Nicola, Marta; Ravalico, Giuseppe

    2012-02-01

    To measure the working frequency and the stroke dynamics of the phaco tip of 3 phacoemulsification machines. University Eye Clinic of Trieste, Italy. Experimental study. A video wet fixture was assembled to measure the working frequency using a micro camera and a micropulsed strobe-light system. A different video wet fixture was created to measure tip displacement as vectorial movement at different phaco powers using a microscopic video apparatus. The working frequency of the Infiniti Ozil machine was 43.0 kHz in longitudinal mode and 31.6 kHz in torsional mode. The frequency of the Whitestar Signature machine was 29.0 kHz in longitudinal mode and 38.0 kHz with the Ellips FX handpiece. The Stellaris machine had a frequency of 28.8 kHz. The longitudinal stroke of the 3 machines at different phaco powers was statistically significantly different. The Stellaris machine had the highest stroke extent (139 μm). The lateral movement of the Infiniti Ozil and Whitestar Signature machines differed significantly. No movement on the y-axis was observed for the Infiniti Ozil machine in torsional mode. The elliptical path of the Ellips FX handpiece had different x and y components at different phaco powers. The 3 phaco machines performed differently in terms of working frequency and stroke dynamics. The knowledge of the peculiar lateral and elliptical path strokes of Infiniti and Whitestar Signature machines may allow the surgeon to fully use these features for lens removal. Copyright © 2012 ASCRS and ESCRS. Published by Elsevier Inc. All rights reserved.

  1. Optical Neasurements Of Diamond-Turned Surfaces

    NASA Astrophysics Data System (ADS)

    Politch, Jacob

    1989-07-01

    We describe here a system for measuring very accurately diamond-turned surfaces. This system is based on heterodyne interfercmetry and measures surface height variations with an accuracy of 4A, and the spatial resolution is 1 micrometer. Fran the measured data we have calculated the statistical properties of the surface - enabling us to identify the spatial frequencies caused by the vibrations of the diamond - turning machine and the measuring machine as well as the frequency of the grid.

  2. Standardized data collection to build prediction models in oncology: a prototype for rectal cancer.

    PubMed

    Meldolesi, Elisa; van Soest, Johan; Damiani, Andrea; Dekker, Andre; Alitto, Anna Rita; Campitelli, Maura; Dinapoli, Nicola; Gatta, Roberto; Gambacorta, Maria Antonietta; Lanzotti, Vito; Lambin, Philippe; Valentini, Vincenzo

    2016-01-01

    The advances in diagnostic and treatment technology are responsible for a remarkable transformation in the internal medicine concept with the establishment of a new idea of personalized medicine. Inter- and intra-patient tumor heterogeneity and the clinical outcome and/or treatment's toxicity's complexity, justify the effort to develop predictive models from decision support systems. However, the number of evaluated variables coming from multiple disciplines: oncology, computer science, bioinformatics, statistics, genomics, imaging, among others could be very large thus making traditional statistical analysis difficult to exploit. Automated data-mining processes and machine learning approaches can be a solution to organize the massive amount of data, trying to unravel important interaction. The purpose of this paper is to describe the strategy to collect and analyze data properly for decision support and introduce the concept of an 'umbrella protocol' within the framework of 'rapid learning healthcare'.

  3. Inferring action structure and causal relationships in continuous sequences of human action.

    PubMed

    Buchsbaum, Daphna; Griffiths, Thomas L; Plunkett, Dillon; Gopnik, Alison; Baldwin, Dare

    2015-02-01

    In the real world, causal variables do not come pre-identified or occur in isolation, but instead are embedded within a continuous temporal stream of events. A challenge faced by both human learners and machine learning algorithms is identifying subsequences that correspond to the appropriate variables for causal inference. A specific instance of this problem is action segmentation: dividing a sequence of observed behavior into meaningful actions, and determining which of those actions lead to effects in the world. Here we present a Bayesian analysis of how statistical and causal cues to segmentation should optimally be combined, as well as four experiments investigating human action segmentation and causal inference. We find that both people and our model are sensitive to statistical regularities and causal structure in continuous action, and are able to combine these sources of information in order to correctly infer both causal relationships and segmentation boundaries. Copyright © 2014. Published by Elsevier Inc.

  4. Fracture load and failure analysis of zirconia single crowns veneered with pressed and layered ceramics after chewing simulation.

    PubMed

    Stawarczyk, Bogna; Ozcan, Mutlu; Roos, Malgorzata; Trottmann, Albert; Hämmerle, Christoph H F

    2011-01-01

    This study determined the fracture load of zirconia crowns veneered with four overpressed and four layered ceramics after chewing simulation. The veneered zirconia crowns were cemented and subjected to chewing cycling. Subsequently, the specimens were loaded at an angle of 45° in a Universal Testing Machine to determine the fracture load. One-way ANOVA, followed by a post-hoc Scheffé test, t-test and Weibull statistic were performed. Overpressed crowns showed significantly lower fracture load (543-577 N) compared to layered ones (805-1067 N). No statistical difference was found between the fracture loads within the overpressed group. Within the layered groups, LV (1067 N) presented significantly higher results compared to LC (805 N). The mean values of all other groups were not significantly different. Single zirconia crowns veneered with overpressed ceramics exhibited lower fracture load than those of the layered ones after chewing simulation.

  5. Physics of Electronic Materials

    NASA Astrophysics Data System (ADS)

    Rammer, Jørgen

    2017-03-01

    1. Quantum mechanics; 2. Quantum tunneling; 3. Standard metal model; 4. Standard conductor model; 5. Electric circuit theory; 6. Quantum wells; 7. Particle in a periodic potential; 8. Bloch currents; 9. Crystalline solids; 10. Semiconductor doping; 11. Transistors; 12. Heterostructures; 13. Mesoscopic physics; 14. Arithmetic, logic and machines; Appendix A. Principles of quantum mechanics; Appendix B. Dirac's delta function; Appendix C. Fourier analysis; Appendix D. Classical mechanics; Appendix E. Wave function properties; Appendix F. Transfer matrix properties; Appendix G. Momentum; Appendix H. Confined particles; Appendix I. Spin and quantum statistics; Appendix J. Statistical mechanics; Appendix K. The Fermi-Dirac distribution; Appendix L. Thermal current fluctuations; Appendix M. Gaussian wave packets; Appendix N. Wave packet dynamics; Appendix O. Screening by symmetry method; Appendix P. Commutation and common eigenfunctions; Appendix Q. Interband coupling; Appendix R. Common crystal structures; Appendix S. Effective mass approximation; Appendix T. Integral doubling formula; Bibliography; Index.

  6. Experimental statistical signature of many-body quantum interference

    NASA Astrophysics Data System (ADS)

    Giordani, Taira; Flamini, Fulvio; Pompili, Matteo; Viggianiello, Niko; Spagnolo, Nicolò; Crespi, Andrea; Osellame, Roberto; Wiebe, Nathan; Walschaers, Mattia; Buchleitner, Andreas; Sciarrino, Fabio

    2018-03-01

    Multi-particle interference is an essential ingredient for fundamental quantum mechanics phenomena and for quantum information processing to provide a computational advantage, as recently emphasized by boson sampling experiments. Hence, developing a reliable and efficient technique to witness its presence is pivotal in achieving the practical implementation of quantum technologies. Here, we experimentally identify genuine many-body quantum interference via a recent efficient protocol, which exploits statistical signatures at the output of a multimode quantum device. We successfully apply the test to validate three-photon experiments in an integrated photonic circuit, providing an extensive analysis on the resources required to perform it. Moreover, drawing upon established techniques of machine learning, we show how such tools help to identify the—a priori unknown—optimal features to witness these signatures. Our results provide evidence on the efficacy and feasibility of the method, paving the way for its adoption in large-scale implementations.

  7. Optimizing Integrated Terminal Airspace Operations Under Uncertainty

    NASA Technical Reports Server (NTRS)

    Bosson, Christabelle; Xue, Min; Zelinski, Shannon

    2014-01-01

    In the terminal airspace, integrated departures and arrivals have the potential to increase operations efficiency. Recent research has developed geneticalgorithm- based schedulers for integrated arrival and departure operations under uncertainty. This paper presents an alternate method using a machine jobshop scheduling formulation to model the integrated airspace operations. A multistage stochastic programming approach is chosen to formulate the problem and candidate solutions are obtained by solving sample average approximation problems with finite sample size. Because approximate solutions are computed, the proposed algorithm incorporates the computation of statistical bounds to estimate the optimality of the candidate solutions. A proof-ofconcept study is conducted on a baseline implementation of a simple problem considering a fleet mix of 14 aircraft evolving in a model of the Los Angeles terminal airspace. A more thorough statistical analysis is also performed to evaluate the impact of the number of scenarios considered in the sampled problem. To handle extensive sampling computations, a multithreading technique is introduced.

  8. Landslide susceptibility modeling applying machine learning methods: A case study from Longju in the Three Gorges Reservoir area, China

    NASA Astrophysics Data System (ADS)

    Zhou, Chao; Yin, Kunlong; Cao, Ying; Ahmed, Bayes; Li, Yuanyao; Catani, Filippo; Pourghasemi, Hamid Reza

    2018-03-01

    Landslide is a common natural hazard and responsible for extensive damage and losses in mountainous areas. In this study, Longju in the Three Gorges Reservoir area in China was taken as a case study for landslide susceptibility assessment in order to develop effective risk prevention and mitigation strategies. To begin, 202 landslides were identified, including 95 colluvial landslides and 107 rockfalls. Twelve landslide causal factor maps were prepared initially, and the relationship between these factors and each landslide type was analyzed using the information value model. Later, the unimportant factors were selected and eliminated using the information gain ratio technique. The landslide locations were randomly divided into two groups: 70% for training and 30% for verifying. Two machine learning models: the support vector machine (SVM) and artificial neural network (ANN), and a multivariate statistical model: the logistic regression (LR), were applied for landslide susceptibility modeling (LSM) for each type. The LSM index maps, obtained from combining the assessment results of the two landslide types, were classified into five levels. The performance of the LSMs was evaluated using the receiver operating characteristics curve and Friedman test. Results show that the elimination of noise-generating factors and the separated modeling of each landslide type have significantly increased the prediction accuracy. The machine learning models outperformed the multivariate statistical model and SVM model was found ideal for the case study area.

  9. Automated assessment of cognitive health using smart home technologies.

    PubMed

    Dawadi, Prafulla N; Cook, Diane J; Schmitter-Edgecombe, Maureen; Parsey, Carolyn

    2013-01-01

    The goal of this work is to develop intelligent systems to monitor the wellbeing of individuals in their home environments. This paper introduces a machine learning-based method to automatically predict activity quality in smart homes and automatically assess cognitive health based on activity quality. This paper describes an automated framework to extract set of features from smart home sensors data that reflects the activity performance or ability of an individual to complete an activity which can be input to machine learning algorithms. Output from learning algorithms including principal component analysis, support vector machine, and logistic regression algorithms are used to quantify activity quality for a complex set of smart home activities and predict cognitive health of participants. Smart home activity data was gathered from volunteer participants (n=263) who performed a complex set of activities in our smart home testbed. We compare our automated activity quality prediction and cognitive health prediction with direct observation scores and health assessment obtained from neuropsychologists. With all samples included, we obtained statistically significant correlation (r=0.54) between direct observation scores and predicted activity quality. Similarly, using a support vector machine classifier, we obtained reasonable classification accuracy (area under the ROC curve=0.80, g-mean=0.73) in classifying participants into two different cognitive classes, dementia and cognitive healthy. The results suggest that it is possible to automatically quantify the task quality of smart home activities and perform limited assessment of the cognitive health of individual if smart home activities are properly chosen and learning algorithms are appropriately trained.

  10. Brittleness index of machinable dental materials and its relation to the marginal chipping factor.

    PubMed

    Tsitrou, Effrosyni A; Northeast, Simon E; van Noort, Richard

    2007-12-01

    The machinability of a material can be measured with the calculation of its brittleness index (BI). It is possible that different materials with different BI could produce restorations with varied marginal integrity. The degree of marginal chipping of a milled restoration can be estimated by the calculation of the marginal chipping factor (CF). The aim of this study is to investigate any possible correlation between the BI of machinable dental materials and the CF of the final restorations. The CEREC system was used to mill a wide range of materials used with that system; namely the Paradigm MZ100 (3M/ESPE), Vita Mark II (VITA), ProCAD (Ivoclar-Vivadent) and IPS e.max CAD (Ivoclar-Vivadent). A Vickers Hardness Tester was used for the calculation of BI, while for the calculation of CF the percentage of marginal chipping of crowns prepared with bevelled marginal angulations was estimated. The results of this study showed that Paradigm MZ100 had the lowest BI and CF, while IPS e.max CAD demonstrated the highest BI and CF. Vita Mark II and ProCAD had similar BI and CF and were lying between the above materials. Statistical analysis of the results showed that there is a perfect positive correlation between BI and CF for all the materials. The BI and CF could be both regarded as indicators of a material's machinability. Within the limitations of this study it was shown that as the BI increases so does the potential for marginal chipping, indicating that the BI of a material can be used as a predictor of the CF.

  11. Automated Assessment of Cognitive Health Using Smart Home Technologies

    PubMed Central

    Dawadi, Prafulla N.; Cook, Diane J.; Schmitter-Edgecombe, Maureen; Parsey, Carolyn

    2014-01-01

    BACKGROUND The goal of this work is to develop intelligent systems to monitor the well being of individuals in their home environments. OBJECTIVE This paper introduces a machine learning-based method to automatically predict activity quality in smart homes and automatically assess cognitive health based on activity quality. METHODS This paper describes an automated framework to extract set of features from smart home sensors data that reflects the activity performance or ability of an individual to complete an activity which can be input to machine learning algorithms. Output from learning algorithms including principal component analysis, support vector machine, and logistic regression algorithms are used to quantify activity quality for a complex set of smart home activities and predict cognitive health of participants. RESULTS Smart home activity data was gathered from volunteer participants (n=263) who performed a complex set of activities in our smart home testbed. We compare our automated activity quality prediction and cognitive health prediction with direct observation scores and health assessment obtained from neuropsychologists. With all samples included, we obtained statistically significant correlation (r=0.54) between direct observation scores and predicted activity quality. Similarly, using a support vector machine classifier, we obtained reasonable classification accuracy (area under the ROC curve = 0.80, g-mean = 0.73) in classifying participants into two different cognitive classes, dementia and cognitive healthy. CONCLUSIONS The results suggest that it is possible to automatically quantify the task quality of smart home activities and perform limited assessment of the cognitive health of individual if smart home activities are properly chosen and learning algorithms are appropriately trained. PMID:23949177

  12. Automated Clinical Assessment from Smart home-based Behavior Data

    PubMed Central

    Dawadi, Prafulla Nath; Cook, Diane Joyce; Schmitter-Edgecombe, Maureen

    2016-01-01

    Smart home technologies offer potential benefits for assisting clinicians by automating health monitoring and well-being assessment. In this paper, we examine the actual benefits of smart home-based analysis by monitoring daily behaviour in the home and predicting standard clinical assessment scores of the residents. To accomplish this goal, we propose a Clinical Assessment using Activity Behavior (CAAB) approach to model a smart home resident’s daily behavior and predict the corresponding standard clinical assessment scores. CAAB uses statistical features that describe characteristics of a resident’s daily activity performance to train machine learning algorithms that predict the clinical assessment scores. We evaluate the performance of CAAB utilizing smart home sensor data collected from 18 smart homes over two years using prediction and classification-based experiments. In the prediction-based experiments, we obtain a statistically significant correlation (r = 0.72) between CAAB-predicted and clinician-provided cognitive assessment scores and a statistically significant correlation (r = 0.45) between CAAB-predicted and clinician-provided mobility scores. Similarly, for the classification-based experiments, we find CAAB has a classification accuracy of 72% while classifying cognitive assessment scores and 76% while classifying mobility scores. These prediction and classification results suggest that it is feasible to predict standard clinical scores using smart home sensor data and learning-based data analysis. PMID:26292348

  13. mvp - an open-source preprocessor for cleaning duplicate records and missing values in mass spectrometry data.

    PubMed

    Lee, Geunho; Lee, Hyun Beom; Jung, Byung Hwa; Nam, Hojung

    2017-07-01

    Mass spectrometry (MS) data are used to analyze biological phenomena based on chemical species. However, these data often contain unexpected duplicate records and missing values due to technical or biological factors. These 'dirty data' problems increase the difficulty of performing MS analyses because they lead to performance degradation when statistical or machine-learning tests are applied to the data. Thus, we have developed missing values preprocessor (mvp), an open-source software for preprocessing data that might include duplicate records and missing values. mvp uses the property of MS data in which identical chemical species present the same or similar values for key identifiers, such as the mass-to-charge ratio and intensity signal, and forms cliques via graph theory to process dirty data. We evaluated the validity of the mvp process via quantitative and qualitative analyses and compared the results from a statistical test that analyzed the original and mvp-applied data. This analysis showed that using mvp reduces problems associated with duplicate records and missing values. We also examined the effects of using unprocessed data in statistical tests and examined the improved statistical test results obtained with data preprocessed using mvp.

  14. SCENERY: a web application for (causal) network reconstruction from cytometry data.

    PubMed

    Papoutsoglou, Georgios; Athineou, Giorgos; Lagani, Vincenzo; Xanthopoulos, Iordanis; Schmidt, Angelika; Éliás, Szabolcs; Tegnér, Jesper; Tsamardinos, Ioannis

    2017-07-03

    Flow and mass cytometry technologies can probe proteins as biological markers in thousands of individual cells simultaneously, providing unprecedented opportunities for reconstructing networks of protein interactions through machine learning algorithms. The network reconstruction (NR) problem has been well-studied by the machine learning community. However, the potentials of available methods remain largely unknown to the cytometry community, mainly due to their intrinsic complexity and the lack of comprehensive, powerful and easy-to-use NR software implementations specific for cytometry data. To bridge this gap, we present Single CEll NEtwork Reconstruction sYstem (SCENERY), a web server featuring several standard and advanced cytometry data analysis methods coupled with NR algorithms in a user-friendly, on-line environment. In SCENERY, users may upload their data and set their own study design. The server offers several data analysis options categorized into three classes of methods: data (pre)processing, statistical analysis and NR. The server also provides interactive visualization and download of results as ready-to-publish images or multimedia reports. Its core is modular and based on the widely-used and robust R platform allowing power users to extend its functionalities by submitting their own NR methods. SCENERY is available at scenery.csd.uoc.gr or http://mensxmachina.org/en/software/. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  15. Influence of Manufacturing Methods of Implant-Supported Crowns on External and Internal Marginal Fit: A Micro-CT Analysis.

    PubMed

    Moris, Izabela C M; Monteiro, Silas Borges; Martins, Raíssa; Ribeiro, Ricardo Faria; Gomes, Erica A

    2018-01-01

    To evaluate the influence of different manufacturing methods of single implant-supported metallic crowns on the internal and external marginal fit through computed microtomography. Forty external hexagon implants were divided into 4 groups ( n = 8), according to the manufacturing method: GC, conventional casting; GI, induction casting; GP, plasma casting; and GCAD, CAD/CAM machining. The crowns were attached to the implants with insertion torque of 30 N·cm. The external (vertical and horizontal) marginal fit and internal fit were assessed through computed microtomography. Internal and external marginal fit data ( μ m) were submitted to a one-way ANOVA and Tukey's test ( α = .05). Qualitative evaluation of the images was conducted by using micro-CT. The statistical analysis revealed no significant difference between the groups for vertical misfit ( P = 0.721). There was no significant difference ( P > 0.05) for the internal and horizontal marginal misfit in the groups GC, GI, and GP, but it was found for the group GCAD ( P ≤ 0.05). Qualitative analysis revealed that most of the samples of cast groups exhibited crowns underextension while the group GCAD showed overextension. The manufacturing method of the crowns influenced the accuracy of marginal fit between the prosthesis and implant. The best results were found for the crowns fabricated through CAD/CAM machining.

  16. Research on the EDM Technology for Micro-holes at Complex Spatial Locations

    NASA Astrophysics Data System (ADS)

    Y Liu, J.; Guo, J. M.; Sun, D. J.; Cai, Y. H.; Ding, L. T.; Jiang, H.

    2017-12-01

    For the demands on machining micro-holes at complex spatial location, several key technical problems are conquered such as micro-Electron Discharge Machining (micro-EDM) power supply system’s development, the host structure’s design and machining process technical. Through developing low-voltage power supply circuit, high-voltage circuit, micro and precision machining circuit and clearance detection system, the narrow pulse and high frequency six-axis EDM machining power supply system is developed to meet the demands on micro-hole discharging machining. With the method of combining the CAD structure design, CAE simulation analysis, modal test, ODS (Operational Deflection Shapes) test and theoretical analysis, the host construction and key axes of the machine tool are optimized to meet the position demands of the micro-holes. Through developing the special deionized water filtration system to make sure that the machining process is stable enough. To verify the machining equipment and processing technical developed in this paper through developing the micro-hole’s processing flow and test on the real machine tool. As shown in the final test results: the efficient micro-EDM machining pulse power supply system, machine tool host system, deionized filtration system and processing method developed in this paper meet the demands on machining micro-holes at complex spatial locations.

  17. Impact of Machine Virtualization on Timing Precision for Performance-critical Tasks

    NASA Astrophysics Data System (ADS)

    Karpov, Kirill; Fedotova, Irina; Siemens, Eduard

    2017-07-01

    In this paper we present a measurement study to characterize the impact of hardware virtualization on basic software timing, as well as on precise sleep operations of an operating system. We investigated how timer hardware is shared among heavily CPU-, I/O- and Network-bound tasks on a virtual machine as well as on the host machine. VMware ESXi and QEMU/KVM have been chosen as commonly used examples of hypervisor- and host-based models. Based on statistical parameters of retrieved distributions, our results provide a very good estimation of timing behavior. It is essential for real-time and performance-critical applications such as image processing or real-time control.

  18. Supervised Machine Learning for Regionalization of Environmental Data: Distribution of Uranium in Groundwater in Ukraine

    NASA Astrophysics Data System (ADS)

    Govorov, Michael; Gienko, Gennady; Putrenko, Viktor

    2018-05-01

    In this paper, several supervised machine learning algorithms were explored to define homogeneous regions of con-centration of uranium in surface waters in Ukraine using multiple environmental parameters. The previous study was focused on finding the primary environmental parameters related to uranium in ground waters using several methods of spatial statistics and unsupervised classification. At this step, we refined the regionalization using Artifi-cial Neural Networks (ANN) techniques including Multilayer Perceptron (MLP), Radial Basis Function (RBF), and Convolutional Neural Network (CNN). The study is focused on building local ANN models which may significantly improve the prediction results of machine learning algorithms by taking into considerations non-stationarity and autocorrelation in spatial data.

  19. Design of a Modular E-Core Flux Concentrating Axial Flux Machine: Preprint

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Husain, Tausif; Sozer, Yilmaz; Husain, Iqbal

    2015-08-24

    In this paper a novel E-Core axial flux machine is proposed. The machine has a double-stator, single-rotor configuration with flux-concentrating ferrite magnets and pole windings across each leg of an E-Core stator. E-Core stators with the proposed flux-concentrating rotor arrangement result in better magnet utilization and higher torque density. The machine also has a modular structure facilitating simpler construction. This paper presents a single-phase and a three-phase version of the E-Core machine. Case studies for a 1.1-kW, 400-rpm machine for both the single-phase and three-phase axial flux machines are presented. The results are verified through 3D finite element analysis. facilitatingmore » simpler construction. This paper presents a single-phase and a three-phase version of the E-Core machine. Case studies for a 1.1-kW, 400-rpm machine for both the single-phase and three-phase axial flux machines are presented. The results are verified through 3D finite element analysis.« less

  20. AZOrange - High performance open source machine learning for QSAR modeling in a graphical programming environment

    PubMed Central

    2011-01-01

    Background Machine learning has a vast range of applications. In particular, advanced machine learning methods are routinely and increasingly used in quantitative structure activity relationship (QSAR) modeling. QSAR data sets often encompass tens of thousands of compounds and the size of proprietary, as well as public data sets, is rapidly growing. Hence, there is a demand for computationally efficient machine learning algorithms, easily available to researchers without extensive machine learning knowledge. In granting the scientific principles of transparency and reproducibility, Open Source solutions are increasingly acknowledged by regulatory authorities. Thus, an Open Source state-of-the-art high performance machine learning platform, interfacing multiple, customized machine learning algorithms for both graphical programming and scripting, to be used for large scale development of QSAR models of regulatory quality, is of great value to the QSAR community. Results This paper describes the implementation of the Open Source machine learning package AZOrange. AZOrange is specially developed to support batch generation of QSAR models in providing the full work flow of QSAR modeling, from descriptor calculation to automated model building, validation and selection. The automated work flow relies upon the customization of the machine learning algorithms and a generalized, automated model hyper-parameter selection process. Several high performance machine learning algorithms are interfaced for efficient data set specific selection of the statistical method, promoting model accuracy. Using the high performance machine learning algorithms of AZOrange does not require programming knowledge as flexible applications can be created, not only at a scripting level, but also in a graphical programming environment. Conclusions AZOrange is a step towards meeting the needs for an Open Source high performance machine learning platform, supporting the efficient development of highly accurate QSAR models fulfilling regulatory requirements. PMID:21798025

  1. AZOrange - High performance open source machine learning for QSAR modeling in a graphical programming environment.

    PubMed

    Stålring, Jonna C; Carlsson, Lars A; Almeida, Pedro; Boyer, Scott

    2011-07-28

    Machine learning has a vast range of applications. In particular, advanced machine learning methods are routinely and increasingly used in quantitative structure activity relationship (QSAR) modeling. QSAR data sets often encompass tens of thousands of compounds and the size of proprietary, as well as public data sets, is rapidly growing. Hence, there is a demand for computationally efficient machine learning algorithms, easily available to researchers without extensive machine learning knowledge. In granting the scientific principles of transparency and reproducibility, Open Source solutions are increasingly acknowledged by regulatory authorities. Thus, an Open Source state-of-the-art high performance machine learning platform, interfacing multiple, customized machine learning algorithms for both graphical programming and scripting, to be used for large scale development of QSAR models of regulatory quality, is of great value to the QSAR community. This paper describes the implementation of the Open Source machine learning package AZOrange. AZOrange is specially developed to support batch generation of QSAR models in providing the full work flow of QSAR modeling, from descriptor calculation to automated model building, validation and selection. The automated work flow relies upon the customization of the machine learning algorithms and a generalized, automated model hyper-parameter selection process. Several high performance machine learning algorithms are interfaced for efficient data set specific selection of the statistical method, promoting model accuracy. Using the high performance machine learning algorithms of AZOrange does not require programming knowledge as flexible applications can be created, not only at a scripting level, but also in a graphical programming environment. AZOrange is a step towards meeting the needs for an Open Source high performance machine learning platform, supporting the efficient development of highly accurate QSAR models fulfilling regulatory requirements.

  2. Analysis of Flatness Deviations for Austenitic Stainless Steel Workpieces after Efficient Surface Machining

    NASA Astrophysics Data System (ADS)

    Nadolny, K.; Kapłonek, W.

    2014-08-01

    The following work is an analysis of flatness deviations of a workpiece made of X2CrNiMo17-12-2 austenitic stainless steel. The workpiece surface was shaped using efficient machining techniques (milling, grinding, and smoothing). After the machining was completed, all surfaces underwent stylus measurements in order to obtain surface flatness and roughness parameters. For this purpose the stylus profilometer Hommel-Tester T8000 by Hommelwerke with HommelMap software was used. The research results are presented in the form of 2D surface maps, 3D surface topographies with extracted single profiles, Abbott-Firestone curves, and graphical studies of the Sk parameters. The results of these experimental tests proved the possibility of a correlation between flatness and roughness parameters, as well as enabled an analysis of changes in these parameters from shaping and rough grinding to finished machining. The main novelty of this paper is comprehensive analysis of measurement results obtained during a three-step machining process of austenitic stainless steel. Simultaneous analysis of individual machining steps (milling, grinding, and smoothing) enabled a complementary assessment of the process of shaping the workpiece surface macro- and micro-geometry, giving special consideration to minimize the flatness deviations

  3. Exploring the complementarity of THz pulse imaging and DCE-MRIs: Toward a unified multi-channel classification and a deep learning framework.

    PubMed

    Yin, X-X; Zhang, Y; Cao, J; Wu, J-L; Hadjiloucas, S

    2016-12-01

    We provide a comprehensive account of recent advances in biomedical image analysis and classification from two complementary imaging modalities: terahertz (THz) pulse imaging and dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI). The work aims to highlight underlining commonalities in both data structures so that a common multi-channel data fusion framework can be developed. Signal pre-processing in both datasets is discussed briefly taking into consideration advances in multi-resolution analysis and model based fractional order calculus system identification. Developments in statistical signal processing using principal component and independent component analysis are also considered. These algorithms have been developed independently by the THz-pulse imaging and DCE-MRI communities, and there is scope to place them in a common multi-channel framework to provide better software standardization at the pre-processing de-noising stage. A comprehensive discussion of feature selection strategies is also provided and the importance of preserving textural information is highlighted. Feature extraction and classification methods taking into consideration recent advances in support vector machine (SVM) and extreme learning machine (ELM) classifiers and their complex extensions are presented. An outlook on Clifford algebra classifiers and deep learning techniques suitable to both types of datasets is also provided. The work points toward the direction of developing a new unified multi-channel signal processing framework for biomedical image analysis that will explore synergies from both sensing modalities for inferring disease proliferation. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  4. Analysis of x-ray tomography data of an extruded low density styrenic foam: an image analysis study

    NASA Astrophysics Data System (ADS)

    Lin, Jui-Ching; Heeschen, William

    2016-10-01

    Extruded styrenic foams are low density foams that are widely used for thermal insulation. It is difficult to precisely characterize the structure of the cells in low density foams by traditional cross-section viewing due to the frailty of the walls of the cells. X-ray computed tomography (CT) is a non-destructive, three dimensional structure characterization technique that has great potential for structure characterization of styrenic foams. Unfortunately the intrinsic artifacts of the data and the artifacts generated during image reconstruction are often comparable in size and shape to the thin walls of the foam, making robust and reliable analysis of cell sizes challenging. We explored three different image processing methods to clean up artifacts in the reconstructed images, thus allowing quantitative three dimensional determination of cell size in a low density styrenic foam. Three image processing approaches - an intensity based approach, an intensity variance based approach, and a machine learning based approach - are explored in this study, and the machine learning image feature classification method was shown to be the best. Individual cells are segmented within the images after the images were cleaned up using the three different methods and the cell sizes are measured and compared in the study. Although the collected data with the image analysis methods together did not yield enough measurements for a good statistic of the measurement of cell sizes, the problem can be resolved by measuring multiple samples or increasing imaging field of view.

  5. Calculation of design load for the MOD-5A 7.3 mW wind turbine system

    NASA Technical Reports Server (NTRS)

    Mirandy, L.; Strain, J. C.

    1995-01-01

    Design loads are presented for the General Electric MOD-SA wind turbine. The MOD-SA system consists of a 400 ft. diameter, upwind, two-bladed, teetered rotor connected to a 7.3 mW variable-speed generator. Fatigue loads are specified in the form of histograms for the 30 year life of the machine, while limit (or maximum) loads have been derived from transient dynamic analysis at critical operating conditions. Loads prediction was accomplished using state of the art aeroelastic analyses developed at General Electric. Features of the primary predictive tool - the Transient Rotor Analysis Code (TRAC) are described in the paper. Key to the load predictions are the following wind models: (1) yearly mean wind distribution; (2) mean wind variations during operation; (3) number of start/shutdown cycles; (4) spatially large gusts; and (5) spatially small gusts (local turbulence). The methods used to develop statistical distributions from load calculations represent an extension of procedures used in past wind programs and are believed to be a significant contribution to Wind Turbine Generator analysis. Test/theory correlations are presented to demonstrate code load predictive capability and to support the wind models used in the analysis. In addition MOD-5A loads are compared with those of existing machines. The MOD-5A design was performed by the General Electric Company, Advanced Energy Program Department, under Contract DEN3-153 with NASA Lewis Research Center and sponsored by the Department of Energy.

  6. Stratiform/convective rain delineation for TRMM microwave imager

    NASA Astrophysics Data System (ADS)

    Islam, Tanvir; Srivastava, Prashant K.; Dai, Qiang; Gupta, Manika; Wan Jaafar, Wan Zurina

    2015-10-01

    This article investigates the potential for using machine learning algorithms to delineate stratiform/convective (S/C) rain regimes for passive microwave imager taking calibrated brightness temperatures as only spectral parameters. The algorithms have been implemented for the Tropical Rainfall Measuring Mission (TRMM) microwave imager (TMI), and calibrated as well as validated taking the Precipitation Radar (PR) S/C information as the target class variables. Two different algorithms are particularly explored for the delineation. The first one is metaheuristic adaptive boosting algorithm that includes the real, gentle, and modest versions of the AdaBoost. The second one is the classical linear discriminant analysis that includes the Fisher's and penalized versions of the linear discriminant analysis. Furthermore, prior to the development of the delineation algorithms, a feature selection analysis has been conducted for a total of 85 features, which contains the combinations of brightness temperatures from 10 GHz to 85 GHz and some derived indexes, such as scattering index, polarization corrected temperature, and polarization difference with the help of mutual information aided minimal redundancy maximal relevance criterion (mRMR). It has been found that the polarization corrected temperature at 85 GHz and the features derived from the "addition" operator associated with the 85 GHz channels have good statistical dependency to the S/C target class variables. Further, it has been shown how the mRMR feature selection technique helps to reduce the number of features without deteriorating the results when applying through the machine learning algorithms. The proposed scheme is able to delineate the S/C rain regimes with reasonable accuracy. Based on the statistical validation experience from the validation period, the Matthews correlation coefficients are in the range of 0.60-0.70. Since, the proposed method does not rely on any a priori information, this makes it very suitable for other microwave sensors having similar channels to the TMI. The method could possibly benefit the constellation sensors in the Global Precipitation Measurement (GPM) mission era.

  7. DHLAS: A web-based information system for statistical genetic analysis of HLA population data.

    PubMed

    Thriskos, P; Zintzaras, E; Germenis, A

    2007-03-01

    DHLAS (database HLA system) is a user-friendly, web-based information system for the analysis of human leukocyte antigens (HLA) data from population studies. DHLAS has been developed using JAVA and the R system, it runs on a Java Virtual Machine and its user-interface is web-based powered by the servlet engine TOMCAT. It utilizes STRUTS, a Model-View-Controller framework and uses several GNU packages to perform several of its tasks. The database engine it relies upon for fast access is MySQL, but others can be used a well. The system estimates metrics, performs statistical testing and produces graphs required for HLA population studies: (i) Hardy-Weinberg equilibrium (calculated using both asymptotic and exact tests), (ii) genetics distances (Euclidian or Nei), (iii) phylogenetic trees using the unweighted pair group method with averages and neigbor-joining method, (iv) linkage disequilibrium (pairwise and overall, including variance estimations), (v) haplotype frequencies (estimate using the expectation-maximization algorithm) and (vi) discriminant analysis. The main merit of DHLAS is the incorporation of a database, thus, the data can be stored and manipulated along with integrated genetic data analysis procedures. In addition, it has an open architecture allowing the inclusion of other functions and procedures.

  8. Machine translation project alternatives analysis

    NASA Technical Reports Server (NTRS)

    Bajis, Catherine J.; Bedford, Denise A. D.

    1993-01-01

    The Machine Translation Project consists of several components, two of which, the Project Plan and the Requirements Analysis, have already been delivered. The Project Plan details the overall rationale, objectives and time-table for the project as a whole. The Requirements Analysis compares a number of available machine translation systems, their capabilities, possible configurations, and costs. The Alternatives Analysis has resulted in a number of conclusions and recommendations to the NASA STI program concerning the acquisition of specific MT systems and related hardware and software.

  9. Technological and economical analysis of salient pole and permanent magnet synchronous machines designed for wind turbines

    NASA Astrophysics Data System (ADS)

    Gündoğdu, Tayfun; Kömürgöz, Güven

    2012-08-01

    Chinese export restrictions already reduced the planning reliability for investments in permanent magnet wind turbines. Today the production of permanent magnets consumes the largest proportion of rare earth elements, with 40% of the rare earth-based magnets used for generators and other electrical machines. The cost and availability of NdFeB magnets will likely determine the production rate of permanent magnet generators. The high volatility of rare earth metals makes it very difficult to quote a price. Prices may also vary from supplier to supplier to an extent of up to 50% for the same size, shape and quantity with a minor difference in quality. The paper presents the analysis and the comparison of salient pole with field winding and of peripheral winding synchronous electrical machines, presenting important advantages. A neodymium alloy magnet rotor structure has been considered and compared to the salient rotor case. The Salient Pole Synchronous Machine and the Permanent Magnet Synchronous Machine were designed so that the plate values remain constant. The Eddy current effect on the windings is taken into account during the design, and the efficiency, output power and the air-gap flux density obtained after the simulation were compared. The analysis results clearly indicate that Salient Pole Synchronous Machine designs would be attractive to wind power companies. Furthermore, the importance of the design of electrical machines and the determination of criteria are emphasized. This paper will be a helpful resource in terms of examination and comparison of the basic structure and magnetic features of the Salient Pole Synchronous Machine and Permanent Magnet Synchronous Machine. Furthermore, an economic analysis of the designed machines was conducted.

  10. The Perseus computational platform for comprehensive analysis of (prote)omics data.

    PubMed

    Tyanova, Stefka; Temu, Tikira; Sinitcyn, Pavel; Carlson, Arthur; Hein, Marco Y; Geiger, Tamar; Mann, Matthias; Cox, Jürgen

    2016-09-01

    A main bottleneck in proteomics is the downstream biological analysis of highly multivariate quantitative protein abundance data generated using mass-spectrometry-based analysis. We developed the Perseus software platform (http://www.perseus-framework.org) to support biological and biomedical researchers in interpreting protein quantification, interaction and post-translational modification data. Perseus contains a comprehensive portfolio of statistical tools for high-dimensional omics data analysis covering normalization, pattern recognition, time-series analysis, cross-omics comparisons and multiple-hypothesis testing. A machine learning module supports the classification and validation of patient groups for diagnosis and prognosis, and it also detects predictive protein signatures. Central to Perseus is a user-friendly, interactive workflow environment that provides complete documentation of computational methods used in a publication. All activities in Perseus are realized as plugins, and users can extend the software by programming their own, which can be shared through a plugin store. We anticipate that Perseus's arsenal of algorithms and its intuitive usability will empower interdisciplinary analysis of complex large data sets.

  11. Health-promoting vending machines: evaluation of a pediatric hospital intervention.

    PubMed

    Van Hulst, Andraea; Barnett, Tracie A; Déry, Véronique; Côté, Geneviève; Colin, Christine

    2013-01-01

    Taking advantage of a natural experiment made possible by the placement of health-promoting vending machines (HPVMs), we evaluated the impact of the intervention on consumers' attitudes toward and practices with vending machines in a pediatric hospital. Vending machines offering healthy snacks, meals, and beverages were developed to replace four vending machines offering the usual high-energy, low-nutrition fare. A pre- and post-intervention evaluation design was used; data were collected through exit surveys and six-week follow-up telephone surveys among potential vending machine users before (n=293) and after (n=226) placement of HPVMs. Chi-2 statistics were used to compare pre- and post-intervention participants' responses. More than 90% of pre- and post-intervention participants were satisfied with their purchase. Post-intervention participants were more likely to state that nutritional content and appropriateness of portion size were elements that influenced their purchase. Overall, post-intervention participants were more likely than pre-intervention participants to perceive as healthy the options offered by the hospital vending machines. Thirty-three percent of post-intervention participants recalled two or more sources of information integrated in the HPVM concept. No differences were found between pre- and post-intervention participants' readiness to adopt healthy diets. While the HPVM project had challenges as well as strengths, vending machines offering healthy snacks are feasible in hospital settings.

  12. Statistical interpretation of machine learning-based feature importance scores for biomarker discovery.

    PubMed

    Huynh-Thu, Vân Anh; Saeys, Yvan; Wehenkel, Louis; Geurts, Pierre

    2012-07-01

    Univariate statistical tests are widely used for biomarker discovery in bioinformatics. These procedures are simple, fast and their output is easily interpretable by biologists but they can only identify variables that provide a significant amount of information in isolation from the other variables. As biological processes are expected to involve complex interactions between variables, univariate methods thus potentially miss some informative biomarkers. Variable relevance scores provided by machine learning techniques, however, are potentially able to highlight multivariate interacting effects, but unlike the p-values returned by univariate tests, these relevance scores are usually not statistically interpretable. This lack of interpretability hampers the determination of a relevance threshold for extracting a feature subset from the rankings and also prevents the wide adoption of these methods by practicians. We evaluated several, existing and novel, procedures that extract relevant features from rankings derived from machine learning approaches. These procedures replace the relevance scores with measures that can be interpreted in a statistical way, such as p-values, false discovery rates, or family wise error rates, for which it is easier to determine a significance level. Experiments were performed on several artificial problems as well as on real microarray datasets. Although the methods differ in terms of computing times and the tradeoff, they achieve in terms of false positives and false negatives, some of them greatly help in the extraction of truly relevant biomarkers and should thus be of great practical interest for biologists and physicians. As a side conclusion, our experiments also clearly highlight that using model performance as a criterion for feature selection is often counter-productive. Python source codes of all tested methods, as well as the MATLAB scripts used for data simulation, can be found in the Supplementary Material.

  13. CMM Interim Check (U)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Montano, Joshua Daniel

    2015-03-23

    Coordinate Measuring Machines (CMM) are widely used in industry, throughout the Nuclear Weapons Complex and at Los Alamos National Laboratory (LANL) to verify part conformance to design definition. Calibration cycles for CMMs at LANL are predominantly one year in length. Unfortunately, several nonconformance reports have been generated to document the discovery of a certified machine found out of tolerance during a calibration closeout. In an effort to reduce risk to product quality two solutions were proposed – shorten the calibration cycle which could be costly, or perform an interim check to monitor the machine’s performance between cycles. The CMM interimmore » check discussed makes use of Renishaw’s Machine Checking Gauge. This off-the-shelf product simulates a large sphere within a CMM’s measurement volume and allows for error estimation. Data was gathered, analyzed, and simulated from seven machines in seventeen different configurations to create statistical process control run charts for on-the-floor monitoring.« less

  14. A Comparison of Machine Learning Approaches for Corn Yield Estimation

    NASA Astrophysics Data System (ADS)

    Kim, N.; Lee, Y. W.

    2017-12-01

    Machine learning is an efficient empirical method for classification and prediction, and it is another approach to crop yield estimation. The objective of this study is to estimate corn yield in the Midwestern United States by employing the machine learning approaches such as the support vector machine (SVM), random forest (RF), and deep neural networks (DNN), and to perform the comprehensive comparison for their results. We constructed the database using satellite images from MODIS, the climate data of PRISM climate group, and GLDAS soil moisture data. In addition, to examine the seasonal sensitivities of corn yields, two period groups were set up: May to September (MJJAS) and July and August (JA). In overall, the DNN showed the highest accuracies in term of the correlation coefficient for the two period groups. The differences between our predictions and USDA yield statistics were about 10-11 %.

  15. Modeling Geomagnetic Variations using a Machine Learning Framework

    NASA Astrophysics Data System (ADS)

    Cheung, C. M. M.; Handmer, C.; Kosar, B.; Gerules, G.; Poduval, B.; Mackintosh, G.; Munoz-Jaramillo, A.; Bobra, M.; Hernandez, T.; McGranaghan, R. M.

    2017-12-01

    We present a framework for data-driven modeling of Heliophysics time series data. The Solar Terrestrial Interaction Neural net Generator (STING) is an open source python module built on top of state-of-the-art statistical learning frameworks (traditional machine learning methods as well as deep learning). To showcase the capability of STING, we deploy it for the problem of predicting the temporal variation of geomagnetic fields. The data used includes solar wind measurements from the OMNI database and geomagnetic field data taken by magnetometers at US Geological Survey observatories. We examine the predictive capability of different machine learning techniques (recurrent neural networks, support vector machines) for a range of forecasting times (minutes to 12 hours). STING is designed to be extensible to other types of data. We show how STING can be used on large sets of data from different sensors/observatories and adapted to tackle other problems in Heliophysics.

  16. Machine Learning in the Big Data Era: Are We There Yet?

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sukumar, Sreenivas Rangan

    In this paper, we discuss the machine learning challenges of the Big Data era. We observe that recent innovations in being able to collect, access, organize, integrate, and query massive amounts of data from a wide variety of data sources have brought statistical machine learning under more scrutiny and evaluation for gleaning insights from the data than ever before. In that context, we pose and debate the question - Are machine learning algorithms scaling with the ability to store and compute? If yes, how? If not, why not? We survey recent developments in the state-of-the-art to discuss emerging and outstandingmore » challenges in the design and implementation of machine learning algorithms at scale. We leverage experience from real-world Big Data knowledge discovery projects across domains of national security and healthcare to suggest our efforts be focused along the following axes: (i) the data science challenge - designing scalable and flexible computational architectures for machine learning (beyond just data-retrieval); (ii) the science of data challenge the ability to understand characteristics of data before applying machine learning algorithms and tools; and (iii) the scalable predictive functions challenge the ability to construct, learn and infer with increasing sample size, dimensionality, and categories of labels. We conclude with a discussion of opportunities and directions for future research.« less

  17. Mortality risk prediction in burn injury: Comparison of logistic regression with machine learning approaches.

    PubMed

    Stylianou, Neophytos; Akbarov, Artur; Kontopantelis, Evangelos; Buchan, Iain; Dunn, Ken W

    2015-08-01

    Predicting mortality from burn injury has traditionally employed logistic regression models. Alternative machine learning methods have been introduced in some areas of clinical prediction as the necessary software and computational facilities have become accessible. Here we compare logistic regression and machine learning predictions of mortality from burn. An established logistic mortality model was compared to machine learning methods (artificial neural network, support vector machine, random forests and naïve Bayes) using a population-based (England & Wales) case-cohort registry. Predictive evaluation used: area under the receiver operating characteristic curve; sensitivity; specificity; positive predictive value and Youden's index. All methods had comparable discriminatory abilities, similar sensitivities, specificities and positive predictive values. Although some machine learning methods performed marginally better than logistic regression the differences were seldom statistically significant and clinically insubstantial. Random forests were marginally better for high positive predictive value and reasonable sensitivity. Neural networks yielded slightly better prediction overall. Logistic regression gives an optimal mix of performance and interpretability. The established logistic regression model of burn mortality performs well against more complex alternatives. Clinical prediction with a small set of strong, stable, independent predictors is unlikely to gain much from machine learning outside specialist research contexts. Copyright © 2015 Elsevier Ltd and ISBI. All rights reserved.

  18. Prediction of outcome in internet-delivered cognitive behaviour therapy for paediatric obsessive-compulsive disorder: A machine learning approach.

    PubMed

    Lenhard, Fabian; Sauer, Sebastian; Andersson, Erik; Månsson, Kristoffer Nt; Mataix-Cols, David; Rück, Christian; Serlachius, Eva

    2018-03-01

    There are no consistent predictors of treatment outcome in paediatric obsessive-compulsive disorder (OCD). One reason for this might be the use of suboptimal statistical methodology. Machine learning is an approach to efficiently analyse complex data. Machine learning has been widely used within other fields, but has rarely been tested in the prediction of paediatric mental health treatment outcomes. To test four different machine learning methods in the prediction of treatment response in a sample of paediatric OCD patients who had received Internet-delivered cognitive behaviour therapy (ICBT). Participants were 61 adolescents (12-17 years) who enrolled in a randomized controlled trial and received ICBT. All clinical baseline variables were used to predict strictly defined treatment response status three months after ICBT. Four machine learning algorithms were implemented. For comparison, we also employed a traditional logistic regression approach. Multivariate logistic regression could not detect any significant predictors. In contrast, all four machine learning algorithms performed well in the prediction of treatment response, with 75 to 83% accuracy. The results suggest that machine learning algorithms can successfully be applied to predict paediatric OCD treatment outcome. Validation studies and studies in other disorders are warranted. Copyright © 2017 John Wiley & Sons, Ltd.

  19. On the Safety of Machine Learning: Cyber-Physical Systems, Decision Sciences, and Data Products.

    PubMed

    Varshney, Kush R; Alemzadeh, Homa

    2017-09-01

    Machine learning algorithms increasingly influence our decisions and interact with us in all parts of our daily lives. Therefore, just as we consider the safety of power plants, highways, and a variety of other engineered socio-technical systems, we must also take into account the safety of systems involving machine learning. Heretofore, the definition of safety has not been formalized in a machine learning context. In this article, we do so by defining machine learning safety in terms of risk, epistemic uncertainty, and the harm incurred by unwanted outcomes. We then use this definition to examine safety in all sorts of applications in cyber-physical systems, decision sciences, and data products. We find that the foundational principle of modern statistical machine learning, empirical risk minimization, is not always a sufficient objective. We discuss how four different categories of strategies for achieving safety in engineering, including inherently safe design, safety reserves, safe fail, and procedural safeguards can be mapped to a machine learning context. We then discuss example techniques that can be adopted in each category, such as considering interpretability and causality of predictive models, objective functions beyond expected prediction accuracy, human involvement for labeling difficult or rare examples, and user experience design of software and open data.

  20. Building a database for statistical characterization of ELMs on DIII-D

    NASA Astrophysics Data System (ADS)

    Fritch, B. J.; Marinoni, A.; Bortolon, A.

    2017-10-01

    Edge localized modes (ELMs) are bursty instabilities which occur in the edge region of H-mode plasmas and have the potential to damage in-vessel components of future fusion machines by exposing the divertor region to large energy and particle fluxes during each ELM event. While most ELM studies focus on average quantities (e.g. energy loss per ELM), this work investigates the statistical distributions of ELM characteristics, as a function of plasma parameters. A semi-automatic algorithm is being used to create a database documenting trigger times of the tens of thousands of ELMs for DIII-D discharges in scenarios relevant to ITER, thus allowing statistically significant analysis. Probability distributions of inter-ELM periods and energy losses will be determined and related to relevant plasma parameters such as density, stored energy, and current in order to constrain models and improve estimates of the expected inter-ELM periods and sizes, both of which must be controlled in future reactors. Work supported in part by US DoE under the Science Undergraduate Laboratory Internships (SULI) program, DE-FC02-04ER54698 and DE-FG02- 94ER54235.

  1. Statistical robustness of machine-learning estimates for characterizing a groundwater-surface water system, Southland, New Zealand

    NASA Astrophysics Data System (ADS)

    Friedel, M. J.; Daughney, C.

    2016-12-01

    The development of a successful surface-groundwater management strategy depends on the quality of data provided for analysis. This study evaluates the statistical robustness when using a modified self-organizing map (MSOM) technique to estimate missing values for three hypersurface models: synoptic groundwater-surface water hydrochemistry, time-series of groundwater-surface water hydrochemistry, and mixed-survey (combination of groundwater-surface water hydrochemistry and lithologies) hydrostratigraphic unit data. These models of increasing complexity are developed and validated based on observations from the Southland region of New Zealand. In each case, the estimation method is sufficiently robust to cope with groundwater-surface water hydrochemistry vagaries due to sample size and extreme data insufficiency, even when >80% of the data are missing. The estimation of surface water hydrochemistry time series values enabled the evaluation of seasonal variation, and the imputation of lithologies facilitated the evaluation of hydrostratigraphic controls on groundwater-surface water interaction. The robust statistical results for groundwater-surface water models of increasing data complexity provide justification to apply the MSOM technique in other regions of New Zealand and abroad.

  2. Design of off-statistics axial-flow fans by means of vortex law optimization

    NASA Astrophysics Data System (ADS)

    Lazari, Andrea; Cattanei, Andrea

    2014-12-01

    Off-statistics input data sets are common in axial-flow fans design and may easily result in some violation of the requirements of a good aerodynamic blade design. In order to circumvent this problem, in the present paper, a solution to the radial equilibrium equation is found which minimizes the outlet kinetic energy and fulfills the aerodynamic constraints, thus ensuring that the resulting blade has acceptable aerodynamic performance. The presented method is based on the optimization of a three-parameters vortex law and of the meridional channel size. The aerodynamic quantities to be employed as constraints are individuated and their suitable ranges of variation are proposed. The method is validated by means of a design with critical input data values and CFD analysis. Then, by means of systematic computations with different input data sets, some correlations and charts are obtained which are analogous to classic correlations based on statistical investigations on existing machines. Such new correlations help size a fan of given characteristics as well as study the feasibility of a given design.

  3. Machine Learning. Part 1. A Historical and Methodological Analysis.

    DTIC Science & Technology

    1983-05-31

    Machine learning has always been an integral part of artificial intelligence, and its methodology has evolved in concert with the major concerns of the field. In response to the difficulties of encoding ever-increasing volumes of knowledge in modern Al systems, many researchers have recently turned their attention to machine learning as a means to overcome the knowledge acquisition bottleneck. Part 1 of this paper presents a taxonomic analysis of machine learning organized primarily by learning strategies and secondarily by

  4. Relative Kerf and Sawing Variation Values for Some Hardwood Sawing Machines

    Treesearch

    Philip H. Steele; Michael W. Wade; Steven H. Bullard; Philip A. Araman

    1992-01-01

    Information on the conversion efficiency of sawing machines is important to those involved in the management, maintenance, and design of sawmills. Little information on the conversion characteristics of hardwood sawing machines has been available. This study, based on 266 studies of 6 machine types, provides an analysis of the machine characteristics of kerf width,...

  5. Machinability of titanium metal matrix composites (Ti-MMCs)

    NASA Astrophysics Data System (ADS)

    Aramesh, Maryam

    Titanium metal matrix composites (Ti-MMCs), as a new generation of materials, have various potential applications in aerospace and automotive industries. The presence of ceramic particles enhances the physical and mechanical properties of the alloy matrix. However, the hard and abrasive nature of these particles causes various issues in the field of their machinability. Severe tool wear and short tool life are the most important drawbacks of machining this class of materials. There is very limited work in the literature regarding the machinability of this class of materials especially in the area of tool life estimation and tool wear. By far, polycrystalline diamond (PCD) tools appear to be the best choice for machining MMCs from researchers' point of view. However, due to their high cost, economical alternatives are sought. Cubic boron nitride (CBN) inserts, as the second hardest available tools, show superior characteristics such as great wear resistance, high hardness at elevated temperatures, a low coefficient of friction and a high melting point. Yet, so far CBN tools have not been studied during machining of Ti-MMCs. In this study, a comprehensive study has been performed to explore the tool wear mechanisms of CBN inserts during turning of Ti-MMCs. The unique morphology of the worn faces of the tools was investigated for the first time, which led to new insights in the identification of chemical wear mechanisms during machining of Ti-MMCs. Utilizing the full tool life capacity of cutting tools is also very crucial, due to the considerable costs associated with suboptimal replacement of tools. This strongly motivates development of a reliable model for tool life estimation under any cutting conditions. In this study, a novel model based on the survival analysis methodology is developed to estimate the progressive states of tool wear under any cutting conditions during machining of Ti-MMCs. This statistical model takes into account the machining time in addition to the effect of cutting parameters. Thus, promising results were obtained which showed a very good agreement with the experimental results. Moreover, a more advanced model was constructed, by adding the tool wear as another variable to the previous model. Therefore, a new model was proposed for estimating the remaining life of worn inserts under different cutting conditions, using the current tool wear data as an input. The results of this model were validated with the experimental results. The estimated results were well consistent with the results obtained from the experiments.

  6. Are we there yet?

    PubMed

    Cristianini, Nello

    2010-05-01

    Statistical approaches to Artificial Intelligence are behind most success stories of the field in the past decade. The idea of generating non-trivial behaviour by analysing vast amounts of data has enabled recommendation systems, search engines, spam filters, optical character recognition, machine translation and speech recognition, among other things. As we celebrate the spectacular achievements of this line of research, we need to assess its full potential and its limitations. What are the next steps to take towards machine intelligence? 2010 Elsevier Ltd. All rights reserved.

  7. STAMPS: Software Tool for Automated MRI Post-processing on a supercomputer.

    PubMed

    Bigler, Don C; Aksu, Yaman; Miller, David J; Yang, Qing X

    2009-08-01

    This paper describes a Software Tool for Automated MRI Post-processing (STAMP) of multiple types of brain MRIs on a workstation and for parallel processing on a supercomputer (STAMPS). This software tool enables the automation of nonlinear registration for a large image set and for multiple MR image types. The tool uses standard brain MRI post-processing tools (such as SPM, FSL, and HAMMER) for multiple MR image types in a pipeline fashion. It also contains novel MRI post-processing features. The STAMP image outputs can be used to perform brain analysis using Statistical Parametric Mapping (SPM) or single-/multi-image modality brain analysis using Support Vector Machines (SVMs). Since STAMPS is PBS-based, the supercomputer may be a multi-node computer cluster or one of the latest multi-core computers.

  8. Machine processing for remotely acquired data. [using multivariate statistical analysis

    NASA Technical Reports Server (NTRS)

    Landgrebe, D. A.

    1974-01-01

    This paper is a general discussion of earth resources information systems which utilize airborne and spaceborne sensors. It points out that information may be derived by sensing and analyzing the spectral, spatial and temporal variations of electromagnetic fields emanating from the earth surface. After giving an overview system organization, the two broad categories of system types are discussed. These are systems in which high quality imagery is essential and those more numerically oriented. Sensors are also discussed with this categorization of systems in mind. The multispectral approach and pattern recognition are described as an example data analysis procedure for numerically-oriented systems. The steps necessary in using a pattern recognition scheme are described and illustrated with data obtained from aircraft and the Earth Resources Technology Satellite (ERTS-1).

  9. Biomarker analysis of American toad (Anaxyrus americanus) ...

    EPA Pesticide Factsheets

    The objective of the current study was to use a biomarker-based approach to investigate the influence of atrazine exposure on American toad (Anaxyrus americanus) and grey tree frog (Hyla versicolor) tadpoles. Atrazine is one of the most frequently detected herbicides in environmental matrices throughout the United States. In surface waters, it has been found at concentrations from 0.04–2859 μg/L and thus presents a likely exposure scenario for non-target species such as amphibians. Studies have examined the effect of atrazine on the metamorphic parameters of amphibians, however, the data are often contradictory. Gosner stage 22–24 tadpoles were exposed to 0 (control), 10, 50, 250 or 1250 μg/L of atrazine for 48 h. Endogenous polar metabolites were extracted and analyzed using gas chromatography coupled with mass spectrometry. Statistical analyses of the acquired spectra with machine learning classification models demonstrated identifiable changes in the metabolomic profiles between exposed and control tadpoles. Support vector machine models with recursive feature elimination created a more efficient, non-parametric data analysis and increased interpretability of metabolomic profiles. Biochemical fluxes observed in the exposed groups of both A. americanus and H. versicolor displayed perturbations in a number of classes of biological macromolecules including fatty acids, amino acids, purine nucleosides, pyrimidines, and mono- and di-saccharides. Metabolomic

  10. An Enhanced Engineering Perspective of Global Climate Systems and Statistical Formulation of Terrestrial CO2 Exchanges

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dai, Yuanshun; Baek, Seung H.; Garcia-Diza, Alberto

    2012-01-01

    This paper designs a comprehensive approach based on the engineering machine/system concept, to model, analyze, and assess the level of CO2 exchange between the atmosphere and terrestrial ecosystems, which is an important factor in understanding changes in global climate. The focus of this article is on spatial patterns and on the correlation between levels of CO2 fluxes and a variety of influencing factors in eco-environments. The engineering/machine concept used is a system protocol that includes the sequential activities of design, test, observe, and model. This concept is applied to explicitly include various influencing factors and interactions associated with CO2 fluxes.more » To formulate effective models of a large and complex climate system, this article introduces a modeling technique that will be referred to as Stochastic Filtering Analysis of Variance (SFANOVA). The CO2 flux data observed from some sites of AmeriFlux are used to illustrate and validate the analysis, prediction and globalization capabilities of the proposed engineering approach and the SF-ANOVA technology. The SF-ANOVA modeling approach was compared to stepwise regression, ridge regression, and neural networks. The comparison indicated that the proposed approach is a valid and effective tool with similar accuracy and less complexity than the other procedures.« less

  11. Computational methods using genome-wide association studies to predict radiotherapy complications and to identify correlative molecular processes

    NASA Astrophysics Data System (ADS)

    Oh, Jung Hun; Kerns, Sarah; Ostrer, Harry; Powell, Simon N.; Rosenstein, Barry; Deasy, Joseph O.

    2017-02-01

    The biological cause of clinically observed variability of normal tissue damage following radiotherapy is poorly understood. We hypothesized that machine/statistical learning methods using single nucleotide polymorphism (SNP)-based genome-wide association studies (GWAS) would identify groups of patients of differing complication risk, and furthermore could be used to identify key biological sources of variability. We developed a novel learning algorithm, called pre-conditioned random forest regression (PRFR), to construct polygenic risk models using hundreds of SNPs, thereby capturing genomic features that confer small differential risk. Predictive models were trained and validated on a cohort of 368 prostate cancer patients for two post-radiotherapy clinical endpoints: late rectal bleeding and erectile dysfunction. The proposed method results in better predictive performance compared with existing computational methods. Gene ontology enrichment analysis and protein-protein interaction network analysis are used to identify key biological processes and proteins that were plausible based on other published studies. In conclusion, we confirm that novel machine learning methods can produce large predictive models (hundreds of SNPs), yielding clinically useful risk stratification models, as well as identifying important underlying biological processes in the radiation damage and tissue repair process. The methods are generally applicable to GWAS data and are not specific to radiotherapy endpoints.

  12. Prevalence and associated factors of work related musculoskeletal disorders among commercial milling machine operators in South-Eastern Nigerian markets.

    PubMed

    Ojukwu, Chidiebele Petronilla; Anyanwu, Godson Emeka; Nwabueze, Augustine Chijindu; Anekwu, Emelie Morris; Chukwu, Sylvester Caesar

    2017-01-01

    Milling machine operators perform physically demanding tasks that can lead to work related musculoskeletal disorders (WRMSDs), but literature on WRMSDs among milling machine operators is scarce. Knowledge of prevalence and risk factors of WRMSDs can be an appropriate base for planning and implementing ergonomics intervention programs in the workplace. This study aimed to determine the prevalence, pattern and associated factors of WRMSDs among commercial milling machine operators in Enugu, Nigeria. This cross-sectional survey involved 148 commercial milling machine operators (74 hand-operated milling machine operators (HOMMO) and 74 electrically-operated milling machine operators (EOMMO)), within the age range of 18-65 years, who were conveniently selected from four markets in Enugu, Nigeria. A standard Nordic questionnaire was used to assess the prevalence of WRMSDs among the participants. Data were summarized using descriptive statistics. There was a significant difference (p = 0.001) related to prevalence of WRMSDs between HOMMOs (77%) and EOMMOs (50%). All body parts were affected in both groups and shoulders (85.1%) and lower back (46%) had the highest percentage of prevalence. Working in awkward and same postures, working with injury, poor workplace design, repetition of tasks, vibratory working equipments, reduced rest, high job demand and heavy lifting were significantly associated with the prevalence of WRMSDs. WRMSDs are prevalent among commercial milling machine operators with higher occurrence in HOMMOs. Ergonomic interventions, including the re-design of milling machines and appropriate work posture education of machine operators are recommended in the milling industry.

  13. Machine learning methods as a tool to analyse incomplete or irregularly sampled radon time series data.

    PubMed

    Janik, M; Bossew, P; Kurihara, O

    2018-07-15

    Machine learning is a class of statistical techniques which has proven to be a powerful tool for modelling the behaviour of complex systems, in which response quantities depend on assumed controls or predictors in a complicated way. In this paper, as our first purpose, we propose the application of machine learning to reconstruct incomplete or irregularly sampled data of time series indoor radon ( 222 Rn). The physical assumption underlying the modelling is that Rn concentration in the air is controlled by environmental variables such as air temperature and pressure. The algorithms "learn" from complete sections of multivariate series, derive a dependence model and apply it to sections where the controls are available, but not the response (Rn), and in this way complete the Rn series. Three machine learning techniques are applied in this study, namely random forest, its extension called the gradient boosting machine and deep learning. For a comparison, we apply the classical multiple regression in a generalized linear model version. Performance of the models is evaluated through different metrics. The performance of the gradient boosting machine is found to be superior to that of the other techniques. By applying learning machines, we show, as our second purpose, that missing data or periods of Rn series data can be reconstructed and resampled on a regular grid reasonably, if data of appropriate physical controls are available. The techniques also identify to which degree the assumed controls contribute to imputing missing Rn values. Our third purpose, though no less important from the viewpoint of physics, is identifying to which degree physical, in this case environmental variables, are relevant as Rn predictors, or in other words, which predictors explain most of the temporal variability of Rn. We show that variables which contribute most to the Rn series reconstruction, are temperature, relative humidity and day of the year. The first two are physical predictors, while "day of the year" is a statistical proxy or surrogate for missing or unknown predictors. Copyright © 2018 Elsevier B.V. All rights reserved.

  14. Inverse Problems in Geodynamics Using Machine Learning Algorithms

    NASA Astrophysics Data System (ADS)

    Shahnas, M. H.; Yuen, D. A.; Pysklywec, R. N.

    2018-01-01

    During the past few decades numerical studies have been widely employed to explore the style of circulation and mixing in the mantle of Earth and other planets. However, in geodynamical studies there are many properties from mineral physics, geochemistry, and petrology in these numerical models. Machine learning, as a computational statistic-related technique and a subfield of artificial intelligence, has rapidly emerged recently in many fields of sciences and engineering. We focus here on the application of supervised machine learning (SML) algorithms in predictions of mantle flow processes. Specifically, we emphasize on estimating mantle properties by employing machine learning techniques in solving an inverse problem. Using snapshots of numerical convection models as training samples, we enable machine learning models to determine the magnitude of the spin transition-induced density anomalies that can cause flow stagnation at midmantle depths. Employing support vector machine algorithms, we show that SML techniques can successfully predict the magnitude of mantle density anomalies and can also be used in characterizing mantle flow patterns. The technique can be extended to more complex geodynamic problems in mantle dynamics by employing deep learning algorithms for putting constraints on properties such as viscosity, elastic parameters, and the nature of thermal and chemical anomalies.

  15. A Practical Framework Toward Prediction of Breaking Force and Disintegration of Tablet Formulations Using Machine Learning Tools.

    PubMed

    Akseli, Ilgaz; Xie, Jingjin; Schultz, Leon; Ladyzhynsky, Nadia; Bramante, Tommasina; He, Xiaorong; Deanne, Rich; Horspool, Keith R; Schwabe, Robert

    2017-01-01

    Enabling the paradigm of quality by design requires the ability to quantitatively correlate material properties and process variables to measureable product performance attributes. Conventional, quality-by-test methods for determining tablet breaking force and disintegration time usually involve destructive tests, which consume significant amount of time and labor and provide limited information. Recent advances in material characterization, statistical analysis, and machine learning have provided multiple tools that have the potential to develop nondestructive, fast, and accurate approaches in drug product development. In this work, a methodology to predict the breaking force and disintegration time of tablet formulations using nondestructive ultrasonics and machine learning tools was developed. The input variables to the model include intrinsic properties of formulation and extrinsic process variables influencing the tablet during manufacturing. The model has been applied to predict breaking force and disintegration time using small quantities of active pharmaceutical ingredient and prototype formulation designs. The novel approach presented is a step forward toward rational design of a robust drug product based on insight into the performance of common materials during formulation and process development. It may also help expedite drug product development timeline and reduce active pharmaceutical ingredient usage while improving efficiency of the overall process. Copyright © 2016 American Pharmacists Association®. Published by Elsevier Inc. All rights reserved.

  16. A preliminary study on the development of electronic pump system using Arduino controller

    NASA Astrophysics Data System (ADS)

    Salleh, Mohd Sharil; Miskon, Azizi; Hashim, Fakroul Ridzuan

    2018-02-01

    The implications of treatment using hemodialysis machine and equipment remain speculative. Most studies, case reviews and medical surveys have shown statistics of side effects of hypertension while undergo a treatment using hemodialysis machine. Therefore, a specific action must be taken to prevent the effects of hypertension during treatment especially using hemodialysis machine. In order to reduce this matter in terms of frequency of hypertension while undergo hemodialysis treatment, many approach have been undertaken for improvement. For the beginning, this project reviews the technique of controlling instantaneous blood pressure for normal and hypertension stage and describe the challenges faced by a researcher during experiment to match human stability. The methodology used in this project is to develop one electronics pump system using Arduino controller for transferring liquid (a tap water) from a tank to another tank. The liquid flow rate was measured by using flow sensor where it located at input and output part. This project has decided to focus on flow rate range from 300 mL/min to 900 mL/min. Results shows an efficiency for speed 30 is 97.96%, speed 50 is 100.15%, speed 130 is 99.54% and speed 200 is 99.87%. A range of efficiency for this preliminary study on the development of Electronic Pump System are from 97.96% to 100.15%. In addition, analysis and simulation of the system delivers a better performance efficiency.

  17. Machine learning and data science in soft materials engineering

    NASA Astrophysics Data System (ADS)

    Ferguson, Andrew L.

    2018-01-01

    In many branches of materials science it is now routine to generate data sets of such large size and dimensionality that conventional methods of analysis fail. Paradigms and tools from data science and machine learning can provide scalable approaches to identify and extract trends and patterns within voluminous data sets, perform guided traversals of high-dimensional phase spaces, and furnish data-driven strategies for inverse materials design. This topical review provides an accessible introduction to machine learning tools in the context of soft and biological materials by ‘de-jargonizing’ data science terminology, presenting a taxonomy of machine learning techniques, and surveying the mathematical underpinnings and software implementations of popular tools, including principal component analysis, independent component analysis, diffusion maps, support vector machines, and relative entropy. We present illustrative examples of machine learning applications in soft matter, including inverse design of self-assembling materials, nonlinear learning of protein folding landscapes, high-throughput antimicrobial peptide design, and data-driven materials design engines. We close with an outlook on the challenges and opportunities for the field.

  18. Machine learning and data science in soft materials engineering.

    PubMed

    Ferguson, Andrew L

    2018-01-31

    In many branches of materials science it is now routine to generate data sets of such large size and dimensionality that conventional methods of analysis fail. Paradigms and tools from data science and machine learning can provide scalable approaches to identify and extract trends and patterns within voluminous data sets, perform guided traversals of high-dimensional phase spaces, and furnish data-driven strategies for inverse materials design. This topical review provides an accessible introduction to machine learning tools in the context of soft and biological materials by 'de-jargonizing' data science terminology, presenting a taxonomy of machine learning techniques, and surveying the mathematical underpinnings and software implementations of popular tools, including principal component analysis, independent component analysis, diffusion maps, support vector machines, and relative entropy. We present illustrative examples of machine learning applications in soft matter, including inverse design of self-assembling materials, nonlinear learning of protein folding landscapes, high-throughput antimicrobial peptide design, and data-driven materials design engines. We close with an outlook on the challenges and opportunities for the field.

  19. Surface Characteristics of Machined NiTi Shape Memory Alloy: The Effects of Cryogenic Cooling and Preheating Conditions

    NASA Astrophysics Data System (ADS)

    Kaynak, Y.; Huang, B.; Karaca, H. E.; Jawahir, I. S.

    2017-07-01

    This experimental study focuses on the phase state and phase transformation response of the surface and subsurface of machined NiTi alloys. X-ray diffraction (XRD) analysis and differential scanning calorimeter techniques were utilized to measure the phase state and the transformation response of machined specimens, respectively. Specimens were machined under dry machining at ambient temperature, preheated conditions, and cryogenic cooling conditions at various cutting speeds. The findings from this research demonstrate that cryogenic machining substantially alters austenite finish temperature of martensitic NiTi alloy. Austenite finish ( A f) temperature shows more than 25 percent increase resulting from cryogenic machining compared with austenite finish temperature of as-received NiTi. Dry and preheated conditions do not substantially alter austenite finish temperature. XRD analysis shows that distinctive transformation from martensite to austenite occurs during machining process in all three conditions. Complete transformation from martensite to austenite is observed in dry cutting at all selected cutting speeds.

  20. Tensile strength of laser welded cobalt-chromium alloy with and without an argon atmosphere.

    PubMed

    Tartari, Anna; Clark, Robert K F; Juszczyk, Andrzej S; Radford, David R

    2010-06-01

    The tensile strength and depth of weld of two cobalt chromium alloys before and after laser welding with and without an argon gas atmosphere were investigated. Using two cobalt chromium alloys, rod shaped specimens (5 cm x 1.5 mm) were cast. Specimens were sand blasted, sectioned and welded with a pulsed Nd: YAG laser welding machine and tested in tension using an Instron universal testing machine. A statistically significant difference in tensile strength was observed between the two alloys. The tensile strength of specimens following laser welding was significantly less than the unwelded controls. Scanning electron microscopy showed that the micro-structure of the cast alloy was altered in the region of the weld. No statistically significant difference was found between specimens welded with or without an argon atmosphere.

  1. Statistical Mechanics of Coherent Ising Machine — The Case of Ferromagnetic and Finite-Loading Hopfield Models —

    NASA Astrophysics Data System (ADS)

    Aonishi, Toru; Mimura, Kazushi; Utsunomiya, Shoko; Okada, Masato; Yamamoto, Yoshihisa

    2017-10-01

    The coherent Ising machine (CIM) has attracted attention as one of the most effective Ising computing architectures for solving large scale optimization problems because of its scalability and high-speed computational ability. However, it is difficult to implement the Ising computation in the CIM because the theories and techniques of classical thermodynamic equilibrium Ising spin systems cannot be directly applied to the CIM. This means we have to adapt these theories and techniques to the CIM. Here we focus on a ferromagnetic model and a finite loading Hopfield model, which are canonical models sharing a common mathematical structure with almost all other Ising models. We derive macroscopic equations to capture nonequilibrium phase transitions in these models. The statistical mechanical methods developed here constitute a basis for constructing evaluation methods for other Ising computation models.

  2. Combining MEDLINE and publisher data to create parallel corpora for the automatic translation of biomedical text

    PubMed Central

    2013-01-01

    Background Most of the institutional and research information in the biomedical domain is available in the form of English text. Even in countries where English is an official language, such as the United States, language can be a barrier for accessing biomedical information for non-native speakers. Recent progress in machine translation suggests that this technique could help make English texts accessible to speakers of other languages. However, the lack of adequate specialized corpora needed to train statistical models currently limits the quality of automatic translations in the biomedical domain. Results We show how a large-sized parallel corpus can automatically be obtained for the biomedical domain, using the MEDLINE database. The corpus generated in this work comprises article titles obtained from MEDLINE and abstract text automatically retrieved from journal websites, which substantially extends the corpora used in previous work. After assessing the quality of the corpus for two language pairs (English/French and English/Spanish) we use the Moses package to train a statistical machine translation model that outperforms previous models for automatic translation of biomedical text. Conclusions We have built translation data sets in the biomedical domain that can easily be extended to other languages available in MEDLINE. These sets can successfully be applied to train statistical machine translation models. While further progress should be made by incorporating out-of-domain corpora and domain-specific lexicons, we believe that this work improves the automatic translation of biomedical texts. PMID:23631733

  3. Unified risk analysis of fatigue failure in ductile alloy components during all three stages of fatigue crack evolution process.

    PubMed

    Patankar, Ravindra

    2003-10-01

    Statistical fatigue life of a ductile alloy specimen is traditionally divided into three stages, namely, crack nucleation, small crack growth, and large crack growth. Crack nucleation and small crack growth show a wide variation and hence a big spread on cycles versus crack length graph. Relatively, large crack growth shows a lesser variation. Therefore, different models are fitted to the different stages of the fatigue evolution process, thus treating different stages as different phenomena. With these independent models, it is impossible to predict one phenomenon based on the information available about the other phenomenon. Experimentally, it is easier to carry out crack length measurements of large cracks compared to nucleating cracks and small cracks. Thus, it is easier to collect statistical data for large crack growth compared to the painstaking effort it would take to collect statistical data for crack nucleation and small crack growth. This article presents a fracture mechanics-based stochastic model of fatigue crack growth in ductile alloys that are commonly encountered in mechanical structures and machine components. The model has been validated by Ray (1998) for crack propagation by various statistical fatigue data. Based on the model, this article proposes a technique to predict statistical information of fatigue crack nucleation and small crack growth properties that uses the statistical properties of large crack growth under constant amplitude stress excitation. The statistical properties of large crack growth under constant amplitude stress excitation can be obtained via experiments.

  4. Using machine learning to assess covariate balance in matching studies.

    PubMed

    Linden, Ariel; Yarnold, Paul R

    2016-12-01

    In order to assess the effectiveness of matching approaches in observational studies, investigators typically present summary statistics for each observed pre-intervention covariate, with the objective of showing that matching reduces the difference in means (or proportions) between groups to as close to zero as possible. In this paper, we introduce a new approach to distinguish between study groups based on their distributions of the covariates using a machine-learning algorithm called optimal discriminant analysis (ODA). Assessing covariate balance using ODA as compared with the conventional method has several key advantages: the ability to ascertain how individuals self-select based on optimal (maximum-accuracy) cut-points on the covariates; the application to any variable metric and number of groups; its insensitivity to skewed data or outliers; and the use of accuracy measures that can be widely applied to all analyses. Moreover, ODA accepts analytic weights, thereby extending the assessment of covariate balance to any study design where weights are used for covariate adjustment. By comparing the two approaches using empirical data, we are able to demonstrate that using measures of classification accuracy as balance diagnostics produces highly consistent results to those obtained via the conventional approach (in our matched-pairs example, ODA revealed a weak statistically significant relationship not detected by the conventional approach). Thus, investigators should consider ODA as a robust complement, or perhaps alternative, to the conventional approach for assessing covariate balance in matching studies. © 2016 John Wiley & Sons, Ltd.

  5. Statistical problems in measuring surface ozone and modelling its patterns

    NASA Astrophysics Data System (ADS)

    Hutchison, Paul Stewart

    The Thesis examines ground level air pollution data supplied by ITE Bush, Penicuik, Midlothian, Scotland. There is a brief examination of sulphur dioxide concentration data, but the Thesis is primarily concerned with ozone. The diurnal behaviour of ozone is the major topic, and a new methodology of classification of 'ozone days' is introduced and discussed. In chapter 2, the inverse Gaussian distribution is considered and rejected as a possible alternative to the standard approach of using the lognormal as a model for the frequency distribution of observed sulphur dioxide concentrations. In chapter 3, the behaviour of digital gas pollution analysers is investigated by making use of data obtained from two such machines operating side by side. A time series model of the differences between the readings obtained from the two machines is considered, and possible effects on modelling discussed. In chapter 4, the changes in the diurnal behaviour of ozone over a year are examined. A new approach involving a distortion of the time axis is shown to give diurnal ozone curves more homogeneous properties and have beneficial effects for modelling purposes. Chapter 5 extends the analysis of the diurnal behaviour of ozone begun in chapter 4 by considering individual 'ozone days' and attempting to classify them as one of several typical 'types' of day. The time distortion method introduced in chapter 4 is used, and a new classification methodology is introduced for considering data of this type. The statistical properties of this method are discussed in chapter 6.

  6. Novel forecasting approaches using combination of machine learning and statistical models for flood susceptibility mapping.

    PubMed

    Shafizadeh-Moghadam, Hossein; Valavi, Roozbeh; Shahabi, Himan; Chapi, Kamran; Shirzadi, Ataollah

    2018-07-01

    In this research, eight individual machine learning and statistical models are implemented and compared, and based on their results, seven ensemble models for flood susceptibility assessment are introduced. The individual models included artificial neural networks, classification and regression trees, flexible discriminant analysis, generalized linear model, generalized additive model, boosted regression trees, multivariate adaptive regression splines, and maximum entropy, and the ensemble models were Ensemble Model committee averaging (EMca), Ensemble Model confidence interval Inferior (EMciInf), Ensemble Model confidence interval Superior (EMciSup), Ensemble Model to estimate the coefficient of variation (EMcv), Ensemble Model to estimate the mean (EMmean), Ensemble Model to estimate the median (EMmedian), and Ensemble Model based on weighted mean (EMwmean). The data set covered 201 flood events in the Haraz watershed (Mazandaran province in Iran) and 10,000 randomly selected non-occurrence points. Among the individual models, the Area Under the Receiver Operating Characteristic (AUROC), which showed the highest value, belonged to boosted regression trees (0.975) and the lowest value was recorded for generalized linear model (0.642). On the other hand, the proposed EMmedian resulted in the highest accuracy (0.976) among all models. In spite of the outstanding performance of some models, nevertheless, variability among the prediction of individual models was considerable. Therefore, to reduce uncertainty, creating more generalizable, more stable, and less sensitive models, ensemble forecasting approaches and in particular the EMmedian is recommended for flood susceptibility assessment. Copyright © 2018 Elsevier Ltd. All rights reserved.

  7. Influence of the supporting die structures on the fracture strength of all-ceramic materials.

    PubMed

    Yucel, Munir Tolga; Yondem, Isa; Aykent, Filiz; Eraslan, Oğuz

    2012-08-01

    This study investigated the influence of the elastic modulus of supporting dies on the fracture strengths of all-ceramic materials used in dental crowns. Four different types of supporting die materials (dentin, epoxy resin, brass, and stainless steel) (24 per group) were prepared using a milling machine to simulate a mandibular molar all-ceramic core preparation. A total number of 96 zirconia cores were fabricated using a CAD/CAM system. The specimens were divided into two groups. In the first group, cores were cemented to substructures using a dual-cure resin cement. In the second group, cores were not cemented to the supporting dies. The specimens were loaded using a universal testing machine at a crosshead speed of 0.5 mm/min until fracture occurred. Data were statistically analyzed using two-way analysis of variance and Tukey HSD tests (α = 0.05). The geometric models of cores and supporting die materials were developed using finite element method to obtain the stress distribution of the forces. Cemented groups showed statistically higher fracture strength values than non-cemented groups. While ceramic cores on stainless steel dies showed the highest fracture strength values, ceramic cores on dentin dies showed the lowest fracture strength values among the groups. The elastic modulus of the supporting die structure is a significant factor in determining the fracture resistance of all-ceramic crowns. Using supporting die structures that have a low elastic modulus may be suitable for fracture strength tests, in order to accurately reflect clinical conditions.

  8. Automated tissue classification of intracardiac optical coherence tomography images (Conference Presentation)

    NASA Astrophysics Data System (ADS)

    Gan, Yu; Tsay, David; Amir, Syed B.; Marboe, Charles C.; Hendon, Christine P.

    2016-03-01

    Remodeling of the myocardium is associated with increased risk of arrhythmia and heart failure. Our objective is to automatically identify regions of fibrotic myocardium, dense collagen, and adipose tissue, which can serve as a way to guide radiofrequency ablation therapy or endomyocardial biopsies. Using computer vision and machine learning, we present an automated algorithm to classify tissue compositions from cardiac optical coherence tomography (OCT) images. Three dimensional OCT volumes were obtained from 15 human hearts ex vivo within 48 hours of donor death (source, NDRI). We first segmented B-scans using a graph searching method. We estimated the boundary of each region by minimizing a cost function, which consisted of intensity, gradient, and contour smoothness. Then, features, including texture analysis, optical properties, and statistics of high moments, were extracted. We used a statistical model, relevance vector machine, and trained this model with abovementioned features to classify tissue compositions. To validate our method, we applied our algorithm to 77 volumes. The datasets for validation were manually segmented and classified by two investigators who were blind to our algorithm results and identified the tissues based on trichrome histology and pathology. The difference between automated and manual segmentation was 51.78 +/- 50.96 μm. Experiments showed that the attenuation coefficients of dense collagen were significantly different from other tissue types (P < 0.05, ANOVA). Importantly, myocardial fibrosis tissues were different from normal myocardium in entropy and kurtosis. The tissue types were classified with an accuracy of 84%. The results show good agreements with histology.

  9. ChargeOut! : discounted cash flow compared with traditional machine-rate analysis

    Treesearch

    Ted Bilek

    2008-01-01

    ChargeOut!, a discounted cash-flow methodology in spreadsheet format for analyzing machine costs, is compared with traditional machine-rate methodologies. Four machine-rate models are compared and a common data set representative of logging skidders’ costs is used to illustrate the differences between ChargeOut! and the machine-rate methods. The study found that the...

  10. A Character Level Based and Word Level Based Approach for Chinese-Vietnamese Machine Translation

    PubMed Central

    2016-01-01

    Chinese and Vietnamese have the same isolated language; that is, the words are not delimited by spaces. In machine translation, word segmentation is often done first when translating from Chinese or Vietnamese into different languages (typically English) and vice versa. However, it is a matter for consideration that words may or may not be segmented when translating between two languages in which spaces are not used between words, such as Chinese and Vietnamese. Since Chinese-Vietnamese is a low-resource language pair, the sparse data problem is evident in the translation system of this language pair. Therefore, while translating, whether it should be segmented or not becomes more important. In this paper, we propose a new method for translating Chinese to Vietnamese based on a combination of the advantages of character level and word level translation. In addition, a hybrid approach that combines statistics and rules is used to translate on the word level. And at the character level, a statistical translation is used. The experimental results showed that our method improved the performance of machine translation over that of character or word level translation. PMID:27446207

  11. Fault diagnosis of automobile hydraulic brake system using statistical features and support vector machines

    NASA Astrophysics Data System (ADS)

    Jegadeeshwaran, R.; Sugumaran, V.

    2015-02-01

    Hydraulic brakes in automobiles are important components for the safety of passengers; therefore, the brakes are a good subject for condition monitoring. The condition of the brake components can be monitored by using the vibration characteristics. On-line condition monitoring by using machine learning approach is proposed in this paper as a possible solution to such problems. The vibration signals for both good as well as faulty conditions of brakes were acquired from a hydraulic brake test setup with the help of a piezoelectric transducer and a data acquisition system. Descriptive statistical features were extracted from the acquired vibration signals and the feature selection was carried out using the C4.5 decision tree algorithm. There is no specific method to find the right number of features required for classification for a given problem. Hence an extensive study is needed to find the optimum number of features. The effect of the number of features was also studied, by using the decision tree as well as Support Vector Machines (SVM). The selected features were classified using the C-SVM and Nu-SVM with different kernel functions. The results are discussed and the conclusion of the study is presented.

  12. gsSKAT: Rapid gene set analysis and multiple testing correction for rare-variant association studies using weighted linear kernels.

    PubMed

    Larson, Nicholas B; McDonnell, Shannon; Cannon Albright, Lisa; Teerlink, Craig; Stanford, Janet; Ostrander, Elaine A; Isaacs, William B; Xu, Jianfeng; Cooney, Kathleen A; Lange, Ethan; Schleutker, Johanna; Carpten, John D; Powell, Isaac; Bailey-Wilson, Joan E; Cussenot, Olivier; Cancel-Tassin, Geraldine; Giles, Graham G; MacInnis, Robert J; Maier, Christiane; Whittemore, Alice S; Hsieh, Chih-Lin; Wiklund, Fredrik; Catalona, William J; Foulkes, William; Mandal, Diptasri; Eeles, Rosalind; Kote-Jarai, Zsofia; Ackerman, Michael J; Olson, Timothy M; Klein, Christopher J; Thibodeau, Stephen N; Schaid, Daniel J

    2017-05-01

    Next-generation sequencing technologies have afforded unprecedented characterization of low-frequency and rare genetic variation. Due to low power for single-variant testing, aggregative methods are commonly used to combine observed rare variation within a single gene. Causal variation may also aggregate across multiple genes within relevant biomolecular pathways. Kernel-machine regression and adaptive testing methods for aggregative rare-variant association testing have been demonstrated to be powerful approaches for pathway-level analysis, although these methods tend to be computationally intensive at high-variant dimensionality and require access to complete data. An additional analytical issue in scans of large pathway definition sets is multiple testing correction. Gene set definitions may exhibit substantial genic overlap, and the impact of the resultant correlation in test statistics on Type I error rate control for large agnostic gene set scans has not been fully explored. Herein, we first outline a statistical strategy for aggregative rare-variant analysis using component gene-level linear kernel score test summary statistics as well as derive simple estimators of the effective number of tests for family-wise error rate control. We then conduct extensive simulation studies to characterize the behavior of our approach relative to direct application of kernel and adaptive methods under a variety of conditions. We also apply our method to two case-control studies, respectively, evaluating rare variation in hereditary prostate cancer and schizophrenia. Finally, we provide open-source R code for public use to facilitate easy application of our methods to existing rare-variant analysis results. © 2017 WILEY PERIODICALS, INC.

  13. SQC: secure quality control for meta-analysis of genome-wide association studies.

    PubMed

    Huang, Zhicong; Lin, Huang; Fellay, Jacques; Kutalik, Zoltán; Hubaux, Jean-Pierre

    2017-08-01

    Due to the limited power of small-scale genome-wide association studies (GWAS), researchers tend to collaborate and establish a larger consortium in order to perform large-scale GWAS. Genome-wide association meta-analysis (GWAMA) is a statistical tool that aims to synthesize results from multiple independent studies to increase the statistical power and reduce false-positive findings of GWAS. However, it has been demonstrated that the aggregate data of individual studies are subject to inference attacks, hence privacy concerns arise when researchers share study data in GWAMA. In this article, we propose a secure quality control (SQC) protocol, which enables checking the quality of data in a privacy-preserving way without revealing sensitive information to a potential adversary. SQC employs state-of-the-art cryptographic and statistical techniques for privacy protection. We implement the solution in a meta-analysis pipeline with real data to demonstrate the efficiency and scalability on commodity machines. The distributed execution of SQC on a cluster of 128 cores for one million genetic variants takes less than one hour, which is a modest cost considering the 10-month time span usually observed for the completion of the QC procedure that includes timing of logistics. SQC is implemented in Java and is publicly available at https://github.com/acs6610987/secureqc. jean-pierre.hubaux@epfl.ch. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  14. When do traumatic experiences alter risk-taking behavior? A machine learning analysis of reports from refugees.

    PubMed

    Augsburger, Mareike; Elbert, Thomas

    2017-01-01

    Exposure to traumatic stressors and subsequent trauma-related mental changes may alter a person's risk-taking behavior. It is unclear whether this relationship depends on the specific types of traumatic experiences. Moreover, the association has never been tested in displaced individuals with substantial levels of traumatic experiences. The present study assessed risk-taking behavior in 56 displaced individuals by means of the balloon analogue risk task (BART). Exposure to traumatic events, symptoms of posttraumatic stress disorder and depression were assessed by means of semi-structured interviews. Using a novel statistical approach (stochastic gradient boosting machines), we analyzed predictors of risk-taking behavior. Exposure to organized violence was associated with less risk-taking, as indicated by fewer adjusted pumps in the BART, as was the reported experience of physical abuse and neglect, emotional abuse, and peer violence in childhood. However, civil traumatic stressors, as well as other events during childhood were associated with lower risk taking. This suggests that the association between global risk-taking behavior and exposure to traumatic stress depends on the particular type of the stressors that have been experienced.

  15. Support vector machines to detect physiological patterns for EEG and EMG-based human-computer interaction: a review

    NASA Astrophysics Data System (ADS)

    Quitadamo, L. R.; Cavrini, F.; Sbernini, L.; Riillo, F.; Bianchi, L.; Seri, S.; Saggio, G.

    2017-02-01

    Support vector machines (SVMs) are widely used classifiers for detecting physiological patterns in human-computer interaction (HCI). Their success is due to their versatility, robustness and large availability of free dedicated toolboxes. Frequently in the literature, insufficient details about the SVM implementation and/or parameters selection are reported, making it impossible to reproduce study analysis and results. In order to perform an optimized classification and report a proper description of the results, it is necessary to have a comprehensive critical overview of the applications of SVM. The aim of this paper is to provide a review of the usage of SVM in the determination of brain and muscle patterns for HCI, by focusing on electroencephalography (EEG) and electromyography (EMG) techniques. In particular, an overview of the basic principles of SVM theory is outlined, together with a description of several relevant literature implementations. Furthermore, details concerning reviewed papers are listed in tables and statistics of SVM use in the literature are presented. Suitability of SVM for HCI is discussed and critical comparisons with other classifiers are reported.

  16. Language extraction from zinc sulfide

    NASA Astrophysics Data System (ADS)

    Varn, Dowman Parks

    2001-09-01

    Recent advances in the analysis of one-dimensional temporal and spacial series allow for detailed characterization of disorder and computation in physical systems. One such system that has defied theoretical understanding since its discovery in 1912 is polytypism. Polytypes are layered compounds, exhibiting crystallinity in two dimensions, yet having complicated stacking sequences in the third direction. They can show both ordered and disordered sequences, sometimes each in the same specimen. We demonstrate a method for extracting two-layer correlation information from ZnS diffraction patterns and employ a novel technique for epsilon-machine reconstruction. We solve a long-standing problem---that of determining structural information for disordered materials from their diffraction patterns---for this special class of disorder. Our solution offers the most complete possible statistical description of the disorder. Furthermore, from our reconstructed epsilon-machines we find the effective range of the interlayer interaction in these materials, as well as the configurational energy of both ordered and disordered specimens. Finally, we can determine the 'language' (in terms of the Chomsky Hierarchy) these small rocks speak, and we find that regular languages are sufficient to describe them.

  17. Application of machine vision to pup loaf bread evaluation

    NASA Astrophysics Data System (ADS)

    Zayas, Inna Y.; Chung, O. K.

    1996-12-01

    Intrinsic end-use quality of hard winter wheat breeding lines is routinely evaluated at the USDA, ARS, USGMRL, Hard Winter Wheat Quality Laboratory. Experimental baking test of pup loaves is the ultimate test for evaluating hard wheat quality. Computer vision was applied to developing an objective methodology for bread quality evaluation for the 1994 and 1995 crop wheat breeding line samples. Computer extracted features for bread crumb grain were studied, using subimages (32 by 32 pixel) and features computed for the slices with different threshold settings. A subsampling grid was located with respect to the axis of symmetry of a slice to provide identical topological subimage information. Different ranking techniques were applied to the databases. Statistical analysis was run on the database with digital image and breadmaking features. Several ranking algorithms and data visualization techniques were employed to create a sensitive scale for porosity patterns of bread crumb. There were significant linear correlations between machine vision extracted features and breadmaking parameters. Crumb grain scores by human experts were correlated more highly with some image features than with breadmaking parameters.

  18. The Harvard Clean Energy Project: High-throughput screening of organic photovoltaic materials using cheminformatics, machine learning, and pattern recognition

    NASA Astrophysics Data System (ADS)

    Olivares-Amaya, Roberto; Hachmann, Johannes; Amador-Bedolla, Carlos; Daly, Aidan; Jinich, Adrian; Atahan-Evrenk, Sule; Boixo, Sergio; Aspuru-Guzik, Alán

    2012-02-01

    Organic photovoltaic devices have emerged as competitors to silicon-based solar cells, currently reaching efficiencies of over 9% and offering desirable properties for manufacturing and installation. We study conjugated donor polymers for high-efficiency bulk-heterojunction photovoltaic devices with a molecular library motivated by experimental feasibility. We use quantum mechanics and a distributed computing approach to explore this vast molecular space. We will detail the screening approach starting from the generation of the molecular library, which can be easily extended to other kinds of molecular systems. We will describe the screening method for these materials which ranges from descriptor models, ubiquitous in the drug discovery community, to eventually reaching first principles quantum chemistry methods. We will present results on the statistical analysis, based principally on machine learning, specifically partial least squares and Gaussian processes. Alongside, clustering methods and the use of the hypergeometric distribution reveal moieties important for the donor materials and allow us to quantify structure-property relationships. These efforts enable us to accelerate materials discovery in organic photovoltaics through our collaboration with experimental groups.

  19. Comparison of surface roughness and chip characteristics obtained under different modes of lubrication during hard turning of AISI H13 tool work steel.

    NASA Astrophysics Data System (ADS)

    Raj, Anil; Wins, K. Leo Dev; Varadarajan, A. S.

    2016-09-01

    Surface roughness is one of the important parameters, which not only affects the service life of a component but also serves as a good index of machinability. Near Dry Machining, methods (NDM) are considered as sustainable alternative for workshops trying to bring down their dependence on cutting fluids and the hazards associated with their indiscriminate usage. The present work presents a comparison of the surface roughness and chip characteristics during hard turning of AISI H13 tool work steel using hard metal inserts under two popular NDM techniques namely the minimal fluid application and the Minimum Quantity Lubrication technique(MQL) using an experiment designed based on Taguchi's techniques. The statistical method of analysis of variance (ANOVA) was used to determine the relative significance of input parameters consisting of cutting speed, feed and depth of cut on the attainable surface finish and the chip characteristics. It was observed that the performance during minimal fluid application was better than that during MQL application.

  20. Developing a radiomics framework for classifying non-small cell lung carcinoma subtypes

    NASA Astrophysics Data System (ADS)

    Yu, Dongdong; Zang, Yali; Dong, Di; Zhou, Mu; Gevaert, Olivier; Fang, Mengjie; Shi, Jingyun; Tian, Jie

    2017-03-01

    Patient-targeted treatment of non-small cell lung carcinoma (NSCLC) has been well documented according to the histologic subtypes over the past decade. In parallel, recent development of quantitative image biomarkers has recently been highlighted as important diagnostic tools to facilitate histological subtype classification. In this study, we present a radiomics analysis that classifies the adenocarcinoma (ADC) and squamous cell carcinoma (SqCC). We extract 52-dimensional, CT-based features (7 statistical features and 45 image texture features) to represent each nodule. We evaluate our approach on a clinical dataset including 324 ADCs and 110 SqCCs patients with CT image scans. Classification of these features is performed with four different machine-learning classifiers including Support Vector Machines with Radial Basis Function kernel (RBF-SVM), Random forest (RF), K-nearest neighbor (KNN), and RUSBoost algorithms. To improve the classifiers' performance, optimal feature subset is selected from the original feature set by using an iterative forward inclusion and backward eliminating algorithm. Extensive experimental results demonstrate that radiomics features achieve encouraging classification results on both complete feature set (AUC=0.89) and optimal feature subset (AUC=0.91).

  1. Classification of pulmonary pathology from breath sounds using the wavelet packet transform and an extreme learning machine.

    PubMed

    Palaniappan, Rajkumar; Sundaraj, Kenneth; Sundaraj, Sebastian; Huliraj, N; Revadi, S S

    2017-06-08

    Auscultation is a medical procedure used for the initial diagnosis and assessment of lung and heart diseases. From this perspective, we propose assessing the performance of the extreme learning machine (ELM) classifiers for the diagnosis of pulmonary pathology using breath sounds. Energy and entropy features were extracted from the breath sound using the wavelet packet transform. The statistical significance of the extracted features was evaluated by one-way analysis of variance (ANOVA). The extracted features were inputted into the ELM classifier. The maximum classification accuracies obtained for the conventional validation (CV) of the energy and entropy features were 97.36% and 98.37%, respectively, whereas the accuracies obtained for the cross validation (CRV) of the energy and entropy features were 96.80% and 97.91%, respectively. In addition, maximum classification accuracies of 98.25% and 99.25% were obtained for the CV and CRV of the ensemble features, respectively. The results indicate that the classification accuracy obtained with the ensemble features was higher than those obtained with the energy and entropy features.

  2. A Systematic Strategy for Screening and Application of Specific Biomarkers in Hepatotoxicity Using Metabolomics Combined With ROC Curves and SVMs.

    PubMed

    Li, Yubo; Wang, Lei; Ju, Liang; Deng, Haoyue; Zhang, Zhenzhu; Hou, Zhiguo; Xie, Jiabin; Wang, Yuming; Zhang, Yanjun

    2016-04-01

    Current studies that evaluate toxicity based on metabolomics have primarily focused on the screening of biomarkers while largely neglecting further verification and biomarker applications. For this reason, we used drug-induced hepatotoxicity as an example to establish a systematic strategy for screening specific biomarkers and applied these biomarkers to evaluate whether the drugs have potential hepatotoxicity toxicity. Carbon tetrachloride (5 ml/kg), acetaminophen (1500 mg/kg), and atorvastatin (5 mg/kg) are established as rat hepatotoxicity models. Fifteen common biomarkers were screened by multivariate statistical analysis and integration analysis-based metabolomics data. The receiver operating characteristic curve was used to evaluate the sensitivity and specificity of the biomarkers. We obtained 10 specific biomarker candidates with an area under the curve greater than 0.7. Then, a support vector machine model was established by extracting specific biomarker candidate data from the hepatotoxic drugs and nonhepatotoxic drugs; the accuracy of the model was 94.90% (92.86% sensitivity and 92.59% specificity) and the results demonstrated that those ten biomarkers are specific. 6 drugs were used to predict the hepatotoxicity by the support vector machines model; the prediction results were consistent with the biochemical and histopathological results, demonstrating that the model was reliable. Thus, this support vector machine model can be applied to discriminate the between the hepatic or nonhepatic toxicity of drugs. This approach not only presents a new strategy for screening-specific biomarkers with greater diagnostic significance but also provides a new evaluation pattern for hepatotoxicity, and it will be a highly useful tool in toxicity estimation and disease diagnoses. © The Author 2016. Published by Oxford University Press on behalf of the Society of Toxicology. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  3. Machine-learning-based calving prediction from activity, lying, and ruminating behaviors in dairy cattle.

    PubMed

    Borchers, M R; Chang, Y M; Proudfoot, K L; Wadsworth, B A; Stone, A E; Bewley, J M

    2017-07-01

    The objective of this study was to use automated activity, lying, and rumination monitors to characterize prepartum behavior and predict calving in dairy cattle. Data were collected from 20 primiparous and 33 multiparous Holstein dairy cattle from September 2011 to May 2013 at the University of Kentucky Coldstream Dairy. The HR Tag (SCR Engineers Ltd., Netanya, Israel) automatically collected neck activity and rumination data in 2-h increments. The IceQube (IceRobotics Ltd., South Queensferry, United Kingdom) automatically collected number of steps, lying time, standing time, number of transitions from standing to lying (lying bouts), and total motion, summed in 15-min increments. IceQube data were summed in 2-h increments to match HR Tag data. All behavioral data were collected for 14 d before the predicted calving date. Retrospective data analysis was performed using mixed linear models to examine behavioral changes by day in the 14 d before calving. Bihourly behavioral differences from baseline values over the 14 d before calving were also evaluated using mixed linear models. Changes in daily rumination time, total motion, lying time, and lying bouts occurred in the 14 d before calving. In the bihourly analysis, extreme values for all behaviors occurred in the final 24 h, indicating that the monitored behaviors may be useful in calving prediction. To determine whether technologies were useful at predicting calving, random forest, linear discriminant analysis, and neural network machine-learning techniques were constructed and implemented using R version 3.1.0 (R Foundation for Statistical Computing, Vienna, Austria). These methods were used on variables from each technology and all combined variables from both technologies. A neural network analysis that combined variables from both technologies at the daily level yielded 100.0% sensitivity and 86.8% specificity. A neural network analysis that combined variables from both technologies in bihourly increments was used to identify 2-h periods in the 8 h before calving with 82.8% sensitivity and 80.4% specificity. Changes in behavior and machine-learning alerts indicate that commercially marketed behavioral monitors may have calving prediction potential. Copyright © 2017 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  4. Statistical complexity measure of pseudorandom bit generators

    NASA Astrophysics Data System (ADS)

    González, C. M.; Larrondo, H. A.; Rosso, O. A.

    2005-08-01

    Pseudorandom number generators (PRNG) are extensively used in Monte Carlo simulations, gambling machines and cryptography as substitutes of ideal random number generators (RNG). Each application imposes different statistical requirements to PRNGs. As L’Ecuyer clearly states “the main goal for Monte Carlo methods is to reproduce the statistical properties on which these methods are based whereas for gambling machines and cryptology, observing the sequence of output values for some time should provide no practical advantage for predicting the forthcoming numbers better than by just guessing at random”. In accordance with different applications several statistical test suites have been developed to analyze the sequences generated by PRNGs. In a recent paper a new statistical complexity measure [Phys. Lett. A 311 (2003) 126] has been defined. Here we propose this measure, as a randomness quantifier of a PRNGs. The test is applied to three very well known and widely tested PRNGs available in the literature. All of them are based on mathematical algorithms. Another PRNGs based on Lorenz 3D chaotic dynamical system is also analyzed. PRNGs based on chaos may be considered as a model for physical noise sources and important new results are recently reported. All the design steps of this PRNG are described, and each stage increase the PRNG randomness using different strategies. It is shown that the MPR statistical complexity measure is capable to quantify this randomness improvement. The PRNG based on the chaotic 3D Lorenz dynamical system is also evaluated using traditional digital signal processing tools for comparison.

  5. Machine learning-based methods for prediction of linear B-cell epitopes.

    PubMed

    Wang, Hsin-Wei; Pai, Tun-Wen

    2014-01-01

    B-cell epitope prediction facilitates immunologists in designing peptide-based vaccine, diagnostic test, disease prevention, treatment, and antibody production. In comparison with T-cell epitope prediction, the performance of variable length B-cell epitope prediction is still yet to be satisfied. Fortunately, due to increasingly available verified epitope databases, bioinformaticians could adopt machine learning-based algorithms on all curated data to design an improved prediction tool for biomedical researchers. Here, we have reviewed related epitope prediction papers, especially those for linear B-cell epitope prediction. It should be noticed that a combination of selected propensity scales and statistics of epitope residues with machine learning-based tools formulated a general way for constructing linear B-cell epitope prediction systems. It is also observed from most of the comparison results that the kernel method of support vector machine (SVM) classifier outperformed other machine learning-based approaches. Hence, in this chapter, except reviewing recently published papers, we have introduced the fundamentals of B-cell epitope and SVM techniques. In addition, an example of linear B-cell prediction system based on physicochemical features and amino acid combinations is illustrated in details.

  6. Simulation-driven machine learning: Bearing fault classification

    NASA Astrophysics Data System (ADS)

    Sobie, Cameron; Freitas, Carina; Nicolai, Mike

    2018-01-01

    Increasing the accuracy of mechanical fault detection has the potential to improve system safety and economic performance by minimizing scheduled maintenance and the probability of unexpected system failure. Advances in computational performance have enabled the application of machine learning algorithms across numerous applications including condition monitoring and failure detection. Past applications of machine learning to physical failure have relied explicitly on historical data, which limits the feasibility of this approach to in-service components with extended service histories. Furthermore, recorded failure data is often only valid for the specific circumstances and components for which it was collected. This work directly addresses these challenges for roller bearings with race faults by generating training data using information gained from high resolution simulations of roller bearing dynamics, which is used to train machine learning algorithms that are then validated against four experimental datasets. Several different machine learning methodologies are compared starting from well-established statistical feature-based methods to convolutional neural networks, and a novel application of dynamic time warping (DTW) to bearing fault classification is proposed as a robust, parameter free method for race fault detection.

  7. Kernel machines for epilepsy diagnosis via EEG signal classification: a comparative study.

    PubMed

    Lima, Clodoaldo A M; Coelho, André L V

    2011-10-01

    We carry out a systematic assessment on a suite of kernel-based learning machines while coping with the task of epilepsy diagnosis through automatic electroencephalogram (EEG) signal classification. The kernel machines investigated include the standard support vector machine (SVM), the least squares SVM, the Lagrangian SVM, the smooth SVM, the proximal SVM, and the relevance vector machine. An extensive series of experiments was conducted on publicly available data, whose clinical EEG recordings were obtained from five normal subjects and five epileptic patients. The performance levels delivered by the different kernel machines are contrasted in terms of the criteria of predictive accuracy, sensitivity to the kernel function/parameter value, and sensitivity to the type of features extracted from the signal. For this purpose, 26 values for the kernel parameter (radius) of two well-known kernel functions (namely, Gaussian and exponential radial basis functions) were considered as well as 21 types of features extracted from the EEG signal, including statistical values derived from the discrete wavelet transform, Lyapunov exponents, and combinations thereof. We first quantitatively assess the impact of the choice of the wavelet basis on the quality of the features extracted. Four wavelet basis functions were considered in this study. Then, we provide the average accuracy (i.e., cross-validation error) values delivered by 252 kernel machine configurations; in particular, 40%/35% of the best-calibrated models of the standard and least squares SVMs reached 100% accuracy rate for the two kernel functions considered. Moreover, we show the sensitivity profiles exhibited by a large sample of the configurations whereby one can visually inspect their levels of sensitiveness to the type of feature and to the kernel function/parameter value. Overall, the results evidence that all kernel machines are competitive in terms of accuracy, with the standard and least squares SVMs prevailing more consistently. Moreover, the choice of the kernel function and parameter value as well as the choice of the feature extractor are critical decisions to be taken, albeit the choice of the wavelet family seems not to be so relevant. Also, the statistical values calculated over the Lyapunov exponents were good sources of signal representation, but not as informative as their wavelet counterparts. Finally, a typical sensitivity profile has emerged among all types of machines, involving some regions of stability separated by zones of sharp variation, with some kernel parameter values clearly associated with better accuracy rates (zones of optimality). Copyright © 2011 Elsevier B.V. All rights reserved.

  8. Machine learning for medical images analysis.

    PubMed

    Criminisi, A

    2016-10-01

    This article discusses the application of machine learning for the analysis of medical images. Specifically: (i) We show how a special type of learning models can be thought of as automatically optimized, hierarchically-structured, rule-based algorithms, and (ii) We discuss how the issue of collecting large labelled datasets applies to both conventional algorithms as well as machine learning techniques. The size of the training database is a function of model complexity rather than a characteristic of machine learning methods. Crown Copyright © 2016. Published by Elsevier B.V. All rights reserved.

  9. A Speech Recognition-based Solution for the Automatic Detection of Mild Cognitive Impairment from Spontaneous Speech

    PubMed Central

    Tóth, László; Hoffmann, Ildikó; Gosztolya, Gábor; Vincze, Veronika; Szatlóczki, Gréta; Bánréti, Zoltán; Pákáski, Magdolna; Kálmán, János

    2018-01-01

    Background: Even today the reliable diagnosis of the prodromal stages of Alzheimer’s disease (AD) remains a great challenge. Our research focuses on the earliest detectable indicators of cognitive de-cline in mild cognitive impairment (MCI). Since the presence of language impairment has been reported even in the mild stage of AD, the aim of this study is to develop a sensitive neuropsychological screening method which is based on the analysis of spontaneous speech production during performing a memory task. In the future, this can form the basis of an Internet-based interactive screening software for the recognition of MCI. Methods: Participants were 38 healthy controls and 48 clinically diagnosed MCI patients. The provoked spontaneous speech by asking the patients to recall the content of 2 short black and white films (one direct, one delayed), and by answering one question. Acoustic parameters (hesitation ratio, speech tempo, length and number of silent and filled pauses, length of utterance) were extracted from the recorded speech sig-nals, first manually (using the Praat software), and then automatically, with an automatic speech recogni-tion (ASR) based tool. First, the extracted parameters were statistically analyzed. Then we applied machine learning algorithms to see whether the MCI and the control group can be discriminated automatically based on the acoustic features. Results: The statistical analysis showed significant differences for most of the acoustic parameters (speech tempo, articulation rate, silent pause, hesitation ratio, length of utterance, pause-per-utterance ratio). The most significant differences between the two groups were found in the speech tempo in the delayed recall task, and in the number of pauses for the question-answering task. The fully automated version of the analysis process – that is, using the ASR-based features in combination with machine learning - was able to separate the two classes with an F1-score of 78.8%. Conclusion: The temporal analysis of spontaneous speech can be exploited in implementing a new, auto-matic detection-based tool for screening MCI for the community. PMID:29165085

  10. A Speech Recognition-based Solution for the Automatic Detection of Mild Cognitive Impairment from Spontaneous Speech.

    PubMed

    Toth, Laszlo; Hoffmann, Ildiko; Gosztolya, Gabor; Vincze, Veronika; Szatloczki, Greta; Banreti, Zoltan; Pakaski, Magdolna; Kalman, Janos

    2018-01-01

    Even today the reliable diagnosis of the prodromal stages of Alzheimer's disease (AD) remains a great challenge. Our research focuses on the earliest detectable indicators of cognitive decline in mild cognitive impairment (MCI). Since the presence of language impairment has been reported even in the mild stage of AD, the aim of this study is to develop a sensitive neuropsychological screening method which is based on the analysis of spontaneous speech production during performing a memory task. In the future, this can form the basis of an Internet-based interactive screening software for the recognition of MCI. Participants were 38 healthy controls and 48 clinically diagnosed MCI patients. The provoked spontaneous speech by asking the patients to recall the content of 2 short black and white films (one direct, one delayed), and by answering one question. Acoustic parameters (hesitation ratio, speech tempo, length and number of silent and filled pauses, length of utterance) were extracted from the recorded speech signals, first manually (using the Praat software), and then automatically, with an automatic speech recognition (ASR) based tool. First, the extracted parameters were statistically analyzed. Then we applied machine learning algorithms to see whether the MCI and the control group can be discriminated automatically based on the acoustic features. The statistical analysis showed significant differences for most of the acoustic parameters (speech tempo, articulation rate, silent pause, hesitation ratio, length of utterance, pause-per-utterance ratio). The most significant differences between the two groups were found in the speech tempo in the delayed recall task, and in the number of pauses for the question-answering task. The fully automated version of the analysis process - that is, using the ASR-based features in combination with machine learning - was able to separate the two classes with an F1-score of 78.8%. The temporal analysis of spontaneous speech can be exploited in implementing a new, automatic detection-based tool for screening MCI for the community. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  11. An evaluation of shear bond strength of self-etch adhesive on pre-etched enamel: an in vitro study.

    PubMed

    Rao, Bhadra; Reddy, Satti Narayana; Mujeeb, Abdul; Mehta, Kanchan; Saritha, G

    2013-11-01

    To determine the shear bond strength of self-etch adhesive G-bond on pre-etched enamel. Thirty caries free human mandibular premolars extracted for orthodontic purpose were used for the study. Occlusal surfaces of all the teeth were flattened with diamond bur and a silicon carbide paper was used for surface smoothening. The thirty samples were randomly grouped into three groups. Three different etch systems were used for the composite build up: group 1 (G-bond self-etch adhesive system), group 2 (G-bond) and group 3 (Adper single bond). Light cured was applied for 10 seconds with a LED unit for composite buildup on the occlusal surface of each tooth with 8 millimeters (mm) in diameter and 3 mm in thickness. The specimens in each group were tested in shear mode using a knife-edge testing apparatus in a universal testing machine across head speed of 1 mm/ minute. Shear bond strength values in Mpa were calculated from the peak load at failure divided by the specimen surface area. The mean shear bond strength of all the groups were calculated and statistical analysis was carried out using one-way Analysis of Variance (ANOVA). The mean bond strength of group 1 is 15.5 Mpa, group 2 is 19.5 Mpa and group 3 is 20.1 Mpa. Statistical analysis was carried out between the groups using one-way ANOVA. Group 1 showed statistically significant lower bond strength when compared to groups 2 and 3. No statistical significant difference between groups 2 and 3 (p < 0.05). Self-etch adhesive G-bond showed increase in shear bond strength on pre-etched enamel.

  12. Characteristics of genomic signatures derived using univariate methods and mechanistically anchored functional descriptors for predicting drug- and xenobiotic-induced nephrotoxicity.

    PubMed

    Shi, Weiwei; Bugrim, Andrej; Nikolsky, Yuri; Nikolskya, Tatiana; Brennan, Richard J

    2008-01-01

    ABSTRACT The ideal toxicity biomarker is composed of the properties of prediction (is detected prior to traditional pathological signs of injury), accuracy (high sensitivity and specificity), and mechanistic relationships to the endpoint measured (biological relevance). Gene expression-based toxicity biomarkers ("signatures") have shown good predictive power and accuracy, but are difficult to interpret biologically. We have compared different statistical methods of feature selection with knowledge-based approaches, using GeneGo's database of canonical pathway maps, to generate gene sets for the classification of renal tubule toxicity. The gene set selection algorithms include four univariate analyses: t-statistics, fold-change, B-statistics, and RankProd, and their combination and overlap for the identification of differentially expressed probes. Enrichment analysis following the results of the four univariate analyses, Hotelling T-square test, and, finally out-of-bag selection, a variant of cross-validation, were used to identify canonical pathway maps-sets of genes coordinately involved in key biological processes-with classification power. Differentially expressed genes identified by the different statistical univariate analyses all generated reasonably performing classifiers of tubule toxicity. Maps identified by enrichment analysis or Hotelling T-square had lower classification power, but highlighted perturbed lipid homeostasis as a common discriminator of nephrotoxic treatments. The out-of-bag method yielded the best functionally integrated classifier. The map "ephrins signaling" performed comparably to a classifier derived using sparse linear programming, a machine learning algorithm, and represents a signaling network specifically involved in renal tubule development and integrity. Such functional descriptors of toxicity promise to better integrate predictive toxicogenomics with mechanistic analysis, facilitating the interpretation and risk assessment of predictive genomic investigations.

  13. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Webb-Robertson, Bobbie-Jo M.

    Accurate identification of peptides is a current challenge in mass spectrometry (MS) based proteomics. The standard approach uses a search routine to compare tandem mass spectra to a database of peptides associated with the target organism. These database search routines yield multiple metrics associated with the quality of the mapping of the experimental spectrum to the theoretical spectrum of a peptide. The structure of these results make separating correct from false identifications difficult and has created a false identification problem. Statistical confidence scores are an approach to battle this false positive problem that has led to significant improvements in peptidemore » identification. We have shown that machine learning, specifically support vector machine (SVM), is an effective approach to separating true peptide identifications from false ones. The SVM-based peptide statistical scoring method transforms a peptide into a vector representation based on database search metrics to train and validate the SVM. In practice, following the database search routine, a peptides is denoted in its vector representation and the SVM generates a single statistical score that is then used to classify presence or absence in the sample« less

  14. Statistical quality control for volumetric modulated arc therapy (VMAT) delivery by using the machine's log data

    NASA Astrophysics Data System (ADS)

    Cheong, Kwang-Ho; Lee, Me-Yeon; Kang, Sei-Kwon; Yoon, Jai-Woong; Park, Soah; Hwang, Taejin; Kim, Haeyoung; Kim, Kyoung Ju; Han, Tae Jin; Bae, Hoonsik

    2015-07-01

    The aim of this study is to set up statistical quality control for monitoring the volumetric modulated arc therapy (VMAT) delivery error by using the machine's log data. Eclipse and a Clinac iX linac with the RapidArc system (Varian Medical Systems, Palo Alto, USA) are used for delivery of the VMAT plan. During the delivery of the RapidArc fields, the machine determines the delivered monitor units (MUs) and the gantry angle's position accuracy and the standard deviations of the MU ( σMU: dosimetric error) and the gantry angle ( σGA: geometric error) are displayed on the console monitor after completion of the RapidArc delivery. In the present study, first, the log data were analyzed to confirm its validity and usability; then, statistical process control (SPC) was applied to monitor the σMU and the σGA in a timely manner for all RapidArc fields: a total of 195 arc fields for 99 patients. The MU and the GA were determined twice for all fields, that is, first during the patient-specific plan QA and then again during the first treatment. The sMU and the σGA time series were quite stable irrespective of the treatment site; however, the sGA strongly depended on the gantry's rotation speed. The σGA of the RapidArc delivery for stereotactic body radiation therapy (SBRT) was smaller than that for the typical VMAT. Therefore, SPC was applied for SBRT cases and general cases respectively. Moreover, the accuracy of the potential meter of the gantry rotation is important because the σGA can change dramatically due to its condition. By applying SPC to the σMU and σGA, we could monitor the delivery error efficiently. However, the upper and the lower limits of SPC need to be determined carefully with full knowledge of the machine and log data.

  15. A Non-Destructive Method for Distinguishing Reindeer Antler (Rangifer tarandus) from Red Deer Antler (Cervus elaphus) Using X-Ray Micro-Tomography Coupled with SVM Classifiers

    PubMed Central

    Lefebvre, Alexandre; Rochefort, Gael Y.; Santos, Frédéric; Le Denmat, Dominique; Salmon, Benjamin; Pétillon, Jean-Marc

    2016-01-01

    Over the last decade, biomedical 3D-imaging tools have gained widespread use in the analysis of prehistoric bone artefacts. While initial attempts to characterise the major categories used in osseous industry (i.e. bone, antler, and dentine/ivory) have been successful, the taxonomic determination of prehistoric artefacts remains to be investigated. The distinction between reindeer and red deer antler can be challenging, particularly in cases of anthropic and/or taphonomic modifications. In addition to the range of destructive physicochemical identification methods available (mass spectrometry, isotopic ratio, and DNA analysis), X-ray micro-tomography (micro-CT) provides convincing non-destructive 3D images and analyses. This paper presents the experimental protocol (sample scans, image processing, and statistical analysis) we have developed in order to identify modern and archaeological antler collections (from Isturitz, France). This original method is based on bone microstructure analysis combined with advanced statistical support vector machine (SVM) classifiers. A combination of six microarchitecture biomarkers (bone volume fraction, trabecular number, trabecular separation, trabecular thickness, trabecular bone pattern factor, and structure model index) were screened using micro-CT in order to characterise internal alveolar structure. Overall, reindeer alveoli presented a tighter mesh than red deer alveoli, and statistical analysis allowed us to distinguish archaeological antler by species with an accuracy of 96%, regardless of anatomical location on the antler. In conclusion, micro-CT combined with SVM classifiers proves to be a promising additional non-destructive method for antler identification, suitable for archaeological artefacts whose degree of human modification and cultural heritage or scientific value has previously made it impossible (tools, ornaments, etc.). PMID:26901355

  16. Crabbing system for an electron-ion collider

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Castilla, Alejandro

    2017-05-01

    As high energy and nuclear physicists continue to push further the boundaries of knowledge using colliders, there is an imperative need, not only to increase the colliding beams' energies, but also to improve the accuracy of the experiments, and to collect a large quantity of events with good statistical sensitivity. To achieve the latter, it is necessary to collect more data by increasing the rate at which these processes are being produced and detected in the machine. This rate of events depends directly on the machine's luminosity. The luminosity itself is proportional to the frequency at which the beams aremore » being delivered, the number of particles in each beam, and inversely proportional to the cross-sectional size of the colliding beams. There are several approaches that can be considered to increase the events statistics in a collider other than increasing the luminosity, such as running the experiments for a longer time. However, this also elevates the operation expenses, while increasing the frequency at which the beams are delivered implies strong physical changes along the accelerator and the detectors. Therefore, it is preferred to increase the beam intensities and reduce the beams cross-sectional areas to achieve these higher luminosities. In the case where the goal is to push the limits, sometimes even beyond the machines design parameters, one must develop a detailed High Luminosity Scheme. Any high luminosity scheme on a modern collider considers|in one of their versions|the use of crab cavities to correct the geometrical reduction of the luminosity due to the beams crossing angle. In this dissertation, we present the design and testing of a proof-of-principle compact superconducting crab cavity, at 750 MHz, for the future electron-ion collider, currently under design at Jefferson Lab. In addition to the design and validation of the cavity prototype, we present the analysis of the first order beam dynamics and the integration of the crabbing systems to the interaction region. Following this, we propose the concept of twin crabs to allow machines with variable beam transverse coupling in the interaction region to have full crabbing in only the desired plane. Finally, we present recommendations to extend this work to other frequencies.« less

  17. A new machine-learning method to prognosticate paraquat poisoned patients by combining coagulation, liver, and kidney indices.

    PubMed

    Hu, Lufeng; Li, Huaizhong; Cai, Zhennao; Lin, Feiyan; Hong, Guangliang; Chen, Huiling; Lu, Zhongqiu

    2017-01-01

    The prognosis of paraquat (PQ) poisoning is highly correlated to plasma PQ concentration, which has been identified as the most important index in PQ poisoning. This study investigated the predictive value of coagulation, liver, and kidney indices in prognosticating PQ-poisoning patients, when aligned with plasma PQ concentrations. Coagulation, liver, and kidney indices were first analyzed by variance analysis, receiver operating characteristic curves, and Fisher discriminant analysis. Then, a new, intelligent, machine learning-based system was established to effectively provide prognostic analysis of PQ-poisoning patients based on a combination of the aforementioned indices. In the proposed system, an enhanced extreme learning machine wrapped with a grey wolf-optimization strategy was developed to predict the risk status from a pool of 103 patients (56 males and 47 females); of these, 52 subjects were deceased and 51 alive. The proposed method was rigorously evaluated against this real-life dataset, in terms of accuracy, Matthews correlation coefficients, sensitivity, and specificity. Additionally, the feature selection was investigated to identify correlating factors for risk status. The results demonstrated that there were significant differences in the coagulation, liver, and kidney indices between deceased and surviving subjects (p<0.05). Aspartate aminotransferase, prothrombin time, prothrombin activity, total bilirubin, direct bilirubin, indirect bilirubin, alanine aminotransferase, urea nitrogen, and creatinine were the most highly correlated indices in PQ poisoning and showed statistical significance (p<0.05) in predicting PQ-poisoning prognoses. According to the feature selection, the most important correlated indices were found to be associated with aspartate aminotransferase, the aspartate aminotransferase to alanine ratio, creatinine, prothrombin time, and prothrombin activity. The method proposed here showed excellent results that were better than that produced based on blood-PQ concentration alone. These promising results indicated that the combination of these indices can provide a new avenue for prognosticating the outcome of PQ poisoning.

  18. Spatiotemporal modeling of node temperatures in supercomputers

    DOE PAGES

    Storlie, Curtis Byron; Reich, Brian James; Rust, William Newton; ...

    2016-06-10

    Los Alamos National Laboratory (LANL) is home to many large supercomputing clusters. These clusters require an enormous amount of power (~500-2000 kW each), and most of this energy is converted into heat. Thus, cooling the components of the supercomputer becomes a critical and expensive endeavor. Recently a project was initiated to investigate the effect that changes to the cooling system in a machine room had on three large machines that were housed there. Coupled with this goal was the aim to develop a general good-practice for characterizing the effect of cooling changes and monitoring machine node temperatures in this andmore » other machine rooms. This paper focuses on the statistical approach used to quantify the effect that several cooling changes to the room had on the temperatures of the individual nodes of the computers. The largest cluster in the room has 1,600 nodes that run a variety of jobs during general use. Since extremes temperatures are important, a Normal distribution plus generalized Pareto distribution for the upper tail is used to model the marginal distribution, along with a Gaussian process copula to account for spatio-temporal dependence. A Gaussian Markov random field (GMRF) model is used to model the spatial effects on the node temperatures as the cooling changes take place. This model is then used to assess the condition of the node temperatures after each change to the room. The analysis approach was used to uncover the cause of a problematic episode of overheating nodes on one of the supercomputing clusters. Lastly, this same approach can easily be applied to monitor and investigate cooling systems at other data centers, as well.« less

  19. Effectiveness and efficiency of different weight machine-based strength training programmes for patients with hip or knee osteoarthritis: a protocol for a quasi-experimental controlled study in the context of health services research.

    PubMed

    Krauss, Inga; Müller, Gerhard; Steinhilber, Benjamin; Haupt, Georg; Janssen, Pia; Martus, Peter

    2017-01-01

    Osteoarthritis is a chronic musculoskeletal disease with a major impact on the individual and the healthcare system. As there is no cure, therapy aims for symptom release and reduction of disease progression. Physical exercises have been defined as a core treatment for osteoarthritis. However, research questions related to dose response, sustainability of effects, economic efficiency and safety are still open and will be evaluated in this trial, investigating a progressive weight machine-based strength training. This is a quasi-experimental controlled trial in the context of health services research. The intervention group (n=300) is recruited from participants of an offer for insurants of a health insurance company suffering from hip or knee osteoarthritis. Potential participants of the control group are selected and written to from the insurance database according to predefined matching criteria. The final statistical twins from the control responders will be determined via propensity score matching (n=300). The training intervention comprises 24 supervised mandatory sessions (2/week) and another 12 facultative sessions (1/week). Exercises include resistance training for the lower extremity and core muscles by use of weight machines and small training devices. The training offer is available at two sites. They differ with respect to the weight machines in use resulting in different dosage parameters. Primary outcomes are self-reported pain and function immediately after the 12-week intervention period. Health-related quality of life, self-efficacy, cost utility and safety will be evaluated as secondary outcomes. Secondary analysis will be undertaken with two strata related to study site. Participants will be followed up 6, 12 and 24 months after baseline. German Clinical Trial Register DRKS00009257. Pre-results.

  20. HUMAN DECISIONS AND MACHINE PREDICTIONS.

    PubMed

    Kleinberg, Jon; Lakkaraju, Himabindu; Leskovec, Jure; Ludwig, Jens; Mullainathan, Sendhil

    2018-02-01

    Can machine learning improve human decision making? Bail decisions provide a good test case. Millions of times each year, judges make jail-or-release decisions that hinge on a prediction of what a defendant would do if released. The concreteness of the prediction task combined with the volume of data available makes this a promising machine-learning application. Yet comparing the algorithm to judges proves complicated. First, the available data are generated by prior judge decisions. We only observe crime outcomes for released defendants, not for those judges detained. This makes it hard to evaluate counterfactual decision rules based on algorithmic predictions. Second, judges may have a broader set of preferences than the variable the algorithm predicts; for instance, judges may care specifically about violent crimes or about racial inequities. We deal with these problems using different econometric strategies, such as quasi-random assignment of cases to judges. Even accounting for these concerns, our results suggest potentially large welfare gains: one policy simulation shows crime reductions up to 24.7% with no change in jailing rates, or jailing rate reductions up to 41.9% with no increase in crime rates. Moreover, all categories of crime, including violent crimes, show reductions; and these gains can be achieved while simultaneously reducing racial disparities. These results suggest that while machine learning can be valuable, realizing this value requires integrating these tools into an economic framework: being clear about the link between predictions and decisions; specifying the scope of payoff functions; and constructing unbiased decision counterfactuals. JEL Codes: C10 (Econometric and statistical methods and methodology), C55 (Large datasets: Modeling and analysis), K40 (Legal procedure, the legal system, and illegal behavior).

  1. HUMAN DECISIONS AND MACHINE PREDICTIONS*

    PubMed Central

    Kleinberg, Jon; Lakkaraju, Himabindu; Leskovec, Jure; Ludwig, Jens; Mullainathan, Sendhil

    2018-01-01

    Can machine learning improve human decision making? Bail decisions provide a good test case. Millions of times each year, judges make jail-or-release decisions that hinge on a prediction of what a defendant would do if released. The concreteness of the prediction task combined with the volume of data available makes this a promising machine-learning application. Yet comparing the algorithm to judges proves complicated. First, the available data are generated by prior judge decisions. We only observe crime outcomes for released defendants, not for those judges detained. This makes it hard to evaluate counterfactual decision rules based on algorithmic predictions. Second, judges may have a broader set of preferences than the variable the algorithm predicts; for instance, judges may care specifically about violent crimes or about racial inequities. We deal with these problems using different econometric strategies, such as quasi-random assignment of cases to judges. Even accounting for these concerns, our results suggest potentially large welfare gains: one policy simulation shows crime reductions up to 24.7% with no change in jailing rates, or jailing rate reductions up to 41.9% with no increase in crime rates. Moreover, all categories of crime, including violent crimes, show reductions; and these gains can be achieved while simultaneously reducing racial disparities. These results suggest that while machine learning can be valuable, realizing this value requires integrating these tools into an economic framework: being clear about the link between predictions and decisions; specifying the scope of payoff functions; and constructing unbiased decision counterfactuals. JEL Codes: C10 (Econometric and statistical methods and methodology), C55 (Large datasets: Modeling and analysis), K40 (Legal procedure, the legal system, and illegal behavior) PMID:29755141

  2. Common component classification: what can we learn from machine learning?

    PubMed

    Anderson, Ariana; Labus, Jennifer S; Vianna, Eduardo P; Mayer, Emeran A; Cohen, Mark S

    2011-05-15

    Machine learning methods have been applied to classifying fMRI scans by studying locations in the brain that exhibit temporal intensity variation between groups, frequently reporting classification accuracy of 90% or better. Although empirical results are quite favorable, one might doubt the ability of classification methods to withstand changes in task ordering and the reproducibility of activation patterns over runs, and question how much of the classification machines' power is due to artifactual noise versus genuine neurological signal. To examine the true strength and power of machine learning classifiers we create and then deconstruct a classifier to examine its sensitivity to physiological noise, task reordering, and across-scan classification ability. The models are trained and tested both within and across runs to assess stability and reproducibility across conditions. We demonstrate the use of independent components analysis for both feature extraction and artifact removal and show that removal of such artifacts can reduce predictive accuracy even when data has been cleaned in the preprocessing stages. We demonstrate how mistakes in the feature selection process can cause the cross-validation error seen in publication to be a biased estimate of the testing error seen in practice and measure this bias by purposefully making flawed models. We discuss other ways to introduce bias and the statistical assumptions lying behind the data and model themselves. Finally we discuss the complications in drawing inference from the smaller sample sizes typically seen in fMRI studies, the effects of small or unbalanced samples on the Type 1 and Type 2 error rates, and how publication bias can give a false confidence of the power of such methods. Collectively this work identifies challenges specific to fMRI classification and methods affecting the stability of models. Copyright © 2010 Elsevier Inc. All rights reserved.

  3. Machine Learning Principles Can Improve Hip Fracture Prediction.

    PubMed

    Kruse, Christian; Eiken, Pia; Vestergaard, Peter

    2017-04-01

    Apply machine learning principles to predict hip fractures and estimate predictor importance in Dual-energy X-ray absorptiometry (DXA)-scanned men and women. Dual-energy X-ray absorptiometry data from two Danish regions between 1996 and 2006 were combined with national Danish patient data to comprise 4722 women and 717 men with 5 years of follow-up time (original cohort n = 6606 men and women). Twenty-four statistical models were built on 75% of data points through k-5, 5-repeat cross-validation, and then validated on the remaining 25% of data points to calculate area under the curve (AUC) and calibrate probability estimates. The best models were retrained with restricted predictor subsets to estimate the best subsets. For women, bootstrap aggregated flexible discriminant analysis ("bagFDA") performed best with a test AUC of 0.92 [0.89; 0.94] and well-calibrated probabilities following Naïve Bayes adjustments. A "bagFDA" model limited to 11 predictors (among them bone mineral densities (BMD), biochemical glucose measurements, general practitioner and dentist use) achieved a test AUC of 0.91 [0.88; 0.93]. For men, eXtreme Gradient Boosting ("xgbTree") performed best with a test AUC of 0.89 [0.82; 0.95], but with poor calibration in higher probabilities. A ten predictor subset (BMD, biochemical cholesterol and liver function tests, penicillin use and osteoarthritis diagnoses) achieved a test AUC of 0.86 [0.78; 0.94] using an "xgbTree" model. Machine learning can improve hip fracture prediction beyond logistic regression using ensemble models. Compiling data from international cohorts of longer follow-up and performing similar machine learning procedures has the potential to further improve discrimination and calibration.

  4. Detection of Alzheimer's Disease by Three-Dimensional Displacement Field Estimation in Structural Magnetic Resonance Imaging.

    PubMed

    Wang, Shuihua; Zhang, Yudong; Liu, Ge; Phillips, Preetha; Yuan, Ti-Fei

    2016-01-01

    Within the past decade, computer scientists have developed many methods using computer vision and machine learning techniques to detect Alzheimer's disease (AD) in its early stages. However, some of these methods are unable to achieve excellent detection accuracy, and several other methods are unable to locate AD-related regions. Hence, our goal was to develop a novel AD brain detection method. In this study, our method was based on the three-dimensional (3D) displacement-field (DF) estimation between subjects in the healthy elder control group and AD group. The 3D-DF was treated with AD-related features. The three feature selection measures were used in the Bhattacharyya distance, Student's t-test, and Welch's t-test (WTT). Two non-parallel support vector machines, i.e., generalized eigenvalue proximal support vector machine and twin support vector machine (TSVM), were then used for classification. A 50 × 10-fold cross validation was implemented for statistical analysis. The results showed that "3D-DF+WTT+TSVM" achieved the best performance, with an accuracy of 93.05 ± 2.18, a sensitivity of 92.57 ± 3.80, a specificity of 93.18 ± 3.35, and a precision of 79.51 ± 2.86. This method also exceled in 13 state-of-the-art approaches. Additionally, we were able to detect 17 regions related to AD by using the pure computer-vision technique. These regions include sub-gyral, inferior parietal lobule, precuneus, angular gyrus, lingual gyrus, supramarginal gyrus, postcentral gyrus, third ventricle, superior parietal lobule, thalamus, middle temporal gyrus, precentral gyrus, superior temporal gyrus, superior occipital gyrus, cingulate gyrus, culmen, and insula. These regions were reported in recent publications. The 3D-DF is effective in AD subject and related region detection.

  5. The reported incidence of man-machine interface issues in Army aviators using the Aviator's Night Vision System (ANVIS) in a combat theatre

    NASA Astrophysics Data System (ADS)

    Hiatt, Keith L.; Rash, Clarence E.

    2011-06-01

    Background: Army Aviators rely on the ANVIS for night operations. Human factors literature notes that the ANVIS man-machine interface results in reports of visual and spinal complaints. This is the first study that has looked at these issues in the much harsher combat environment. Last year, the authors reported on the statistically significant (p<0.01) increased complaints of visual discomfort, degraded visual cues, and incidence of static and dynamic visual illusions in the combat environment [Proc. SPIE, Vol. 7688, 76880G (2010)]. In this paper we present the findings regarding increased spinal complaints and other man-machine interface issues found in the combat environment. Methods: A survey was administered to Aircrew deployed in support of Operation Enduring Freedom (OEF). Results: 82 Aircrew (representing an aggregate of >89,000 flight hours of which >22,000 were with ANVIS) participated. Analysis demonstrated high complaints of almost all levels of back and neck pain. Additionally, the use of body armor and other Aviation Life Support Equipment (ALSE) caused significant ergonomic complaints when used with ANVIS. Conclusions: ANVIS use in a combat environment resulted in higher and different types of reports of spinal symptoms and other man-machine interface issues over what was previously reported. Data from this study may be more operationally relevant than that of the peacetime literature as it is derived from actual combat and not from training flights, and it may have important implications about making combat predictions based on performance in training scenarios. Notably, Aircrew remarked that they could not execute the mission without ANVIS and ALSE and accepted the degraded ergonomic environment.

  6. Improving machine learning reproducibility in genetic association studies with proportional instance cross validation (PICV).

    PubMed

    Piette, Elizabeth R; Moore, Jason H

    2018-01-01

    Machine learning methods and conventions are increasingly employed for the analysis of large, complex biomedical data sets, including genome-wide association studies (GWAS). Reproducibility of machine learning analyses of GWAS can be hampered by biological and statistical factors, particularly so for the investigation of non-additive genetic interactions. Application of traditional cross validation to a GWAS data set may result in poor consistency between the training and testing data set splits due to an imbalance of the interaction genotypes relative to the data as a whole. We propose a new cross validation method, proportional instance cross validation (PICV), that preserves the original distribution of an independent variable when splitting the data set into training and testing partitions. We apply PICV to simulated GWAS data with epistatic interactions of varying minor allele frequencies and prevalences and compare performance to that of a traditional cross validation procedure in which individuals are randomly allocated to training and testing partitions. Sensitivity and positive predictive value are significantly improved across all tested scenarios for PICV compared to traditional cross validation. We also apply PICV to GWAS data from a study of primary open-angle glaucoma to investigate a previously-reported interaction, which fails to significantly replicate; PICV however improves the consistency of testing and training results. Application of traditional machine learning procedures to biomedical data may require modifications to better suit intrinsic characteristics of the data, such as the potential for highly imbalanced genotype distributions in the case of epistasis detection. The reproducibility of genetic interaction findings can be improved by considering this variable imbalance in cross validation implementation, such as with PICV. This approach may be extended to problems in other domains in which imbalanced variable distributions are a concern.

  7. Data-driven mapping of the potential mountain permafrost distribution.

    PubMed

    Deluigi, Nicola; Lambiel, Christophe; Kanevski, Mikhail

    2017-07-15

    Existing mountain permafrost distribution models generally offer a good overview of the potential extent of this phenomenon at a regional scale. They are however not always able to reproduce the high spatial discontinuity of permafrost at the micro-scale (scale of a specific landform; ten to several hundreds of meters). To overcome this lack, we tested an alternative modelling approach using three classification algorithms belonging to statistics and machine learning: Logistic regression, Support Vector Machines and Random forests. These supervised learning techniques infer a classification function from labelled training data (pixels of permafrost absence and presence) with the aim of predicting the permafrost occurrence where it is unknown. The research was carried out in a 588km 2 area of the Western Swiss Alps. Permafrost evidences were mapped from ortho-image interpretation (rock glacier inventorying) and field data (mainly geoelectrical and thermal data). The relationship between selected permafrost evidences and permafrost controlling factors was computed with the mentioned techniques. Classification performances, assessed with AUROC, range between 0.81 for Logistic regression, 0.85 with Support Vector Machines and 0.88 with Random forests. The adopted machine learning algorithms have demonstrated to be efficient for permafrost distribution modelling thanks to consistent results compared to the field reality. The high resolution of the input dataset (10m) allows elaborating maps at the micro-scale with a modelled permafrost spatial distribution less optimistic than classic spatial models. Moreover, the probability output of adopted algorithms offers a more precise overview of the potential distribution of mountain permafrost than proposing simple indexes of the permafrost favorability. These encouraging results also open the way to new possibilities of permafrost data analysis and mapping. Copyright © 2017 Elsevier B.V. All rights reserved.

  8. Applying Sparse Machine Learning Methods to Twitter: Analysis of the 2012 Change in Pap Smear Guidelines. A Sequential Mixed-Methods Study

    PubMed Central

    Godbehere, Andrew; Le, Gem; El Ghaoui, Laurent; Sarkar, Urmimala

    2016-01-01

    Background It is difficult to synthesize the vast amount of textual data available from social media websites. Capturing real-world discussions via social media could provide insights into individuals’ opinions and the decision-making process. Objective We conducted a sequential mixed methods study to determine the utility of sparse machine learning techniques in summarizing Twitter dialogues. We chose a narrowly defined topic for this approach: cervical cancer discussions over a 6-month time period surrounding a change in Pap smear screening guidelines. Methods We applied statistical methodologies, known as sparse machine learning algorithms, to summarize Twitter messages about cervical cancer before and after the 2012 change in Pap smear screening guidelines by the US Preventive Services Task Force (USPSTF). All messages containing the search terms “cervical cancer,” “Pap smear,” and “Pap test” were analyzed during: (1) January 1–March 13, 2012, and (2) March 14–June 30, 2012. Topic modeling was used to discern the most common topics from each time period, and determine the singular value criterion for each topic. The results were then qualitatively coded from top 10 relevant topics to determine the efficiency of clustering method in grouping distinct ideas, and how the discussion differed before vs. after the change in guidelines . Results This machine learning method was effective in grouping the relevant discussion topics about cervical cancer during the respective time periods (~20% overall irrelevant content in both time periods). Qualitative analysis determined that a significant portion of the top discussion topics in the second time period directly reflected the USPSTF guideline change (eg, “New Screening Guidelines for Cervical Cancer”), and many topics in both time periods were addressing basic screening promotion and education (eg, “It is Cervical Cancer Awareness Month! Click the link to see where you can receive a free or low cost Pap test.”) Conclusions It was demonstrated that machine learning tools can be useful in cervical cancer prevention and screening discussions on Twitter. This method allowed us to prove that there is publicly available significant information about cervical cancer screening on social media sites. Moreover, we observed a direct impact of the guideline change within the Twitter messages. PMID:27288093

  9. State but not District Nutrition Policies Are Associated with Less Junk Food in Vending Machines and School Stores in US Public Schools

    PubMed Central

    KUBIK, MARTHA Y.; WALL, MELANIE; SHEN, LIJUAN; NANNEY, MARILYN S.; NELSON, TOBEN F.; LASKA, MELISSA N.; STORY, MARY

    2012-01-01

    Background Policy that targets the school food environment has been advanced as one way to increase the availability of healthy food at schools and healthy food choice by students. Although both state- and district-level policy initiatives have focused on school nutrition standards, it remains to be seen whether these policies translate into healthy food practices at the school level, where student behavior will be impacted. Objective To examine whether state- and district-level nutrition policies addressing junk food in school vending machines and school stores were associated with less junk food in school vending machines and school stores. Junk food was defined as foods and beverages with low nutrient density that provide calories primarily through fats and added sugars. Design A cross-sectional study design was used to assess self-report data collected by computer-assisted telephone interviews or self-administered mail questionnaires from state-, district-, and school-level respondents participating in the School Health Policies and Programs Study 2006. The School Health Policies and Programs Study, administered every 6 years since 1994 by the Centers for Disease Control and Prevention, is considered the largest, most comprehensive assessment of school health policies and programs in the United States. Subjects/setting A nationally representative sample (n = 563) of public elementary, middle, and high schools was studied. Statistical analysis Logistic regression adjusted for school characteristics, sampling weights, and clustering was used to analyze data. Policies were assessed for strength (required, recommended, neither required nor recommended prohibiting junk food) and whether strength was similar for school vending machines and school stores. Results School vending machines and school stores were more prevalent in high schools (93%) than middle (84%) and elementary (30%) schools. For state policies, elementary schools that required prohibiting junk food in school vending machines and school stores offered less junk food than elementary schools that neither required nor recommended prohibiting junk food (13% vs 37%; P = 0.006). Middle schools that required prohibiting junk food in vending machines and school stores offered less junk food than middle schools that recommended prohibiting junk food (71% vs 87%; P = 0.07). Similar associations were not evident for district-level polices or high schools. Conclusions Policy may be an effective tool to decrease junk food in schools, particularly in elementary and middle schools. PMID:20630161

  10. Measurement of W + bb and a search for MSSM Higgs bosons with the CMS detector at the LHC

    NASA Astrophysics Data System (ADS)

    O'Connor, Alexander Pinpin

    Tooling used to cure composite laminates in the aerospace and automotive industries must provide a dimensionally stable geometry throughout the thermal cycle applied during the part curing process. This requires that the Coefficient of Thermal Expansion (CTE) of the tooling materials match that of the composite being cured. The traditional tooling material for production applications is a nickel alloy. Poor machinability and high material costs increase the expense of metallic tooling made from nickel alloys such as 'Invar 36' or 'Invar 42'. Currently, metallic tooling is unable to meet the needs of applications requiring rapid affordable tooling solutions. In applications where the tooling is not required to have the durability provided by metals, such as for small area repair, an opportunity exists for non-metallic tooling materials like graphite, carbon foams, composites, or ceramics and machinable glasses. Nevertheless, efficient machining of brittle, non-metallic materials is challenging due to low ductility, porosity, and high hardness. The machining of a layup tool comprises a large portion of the final cost. Achieving maximum process economy requires optimization of the machining process in the given tooling material. Therefore, machinability of the tooling material is a critical aspect of the overall cost of the tool. In this work, three commercially available, brittle/porous, non-metallic candidate tooling materials were selected, namely: (AAC) Autoclaved Aerated Concrete, CB1100 ceramic block and Cfoam carbon foam. Machining tests were conducted in order to evaluate the machinability of these materials using end milling. Chip formation, cutting forces, cutting tool wear, machining induced damage, surface quality and surface integrity were investigated using High Speed Steel (HSS), carbide, diamond abrasive and Polycrystalline Diamond (PCD) cutting tools. Cutting forces were found to be random in magnitude, which was a result of material porosity. The abrasive nature of Cfoam produced rapid tool wear when using HSS and PCD type cutting tools. However, tool wear was not significant in AAC or CB1100 regardless of the type of cutting edge. Machining induced damage was observed in the form of macro-scale chipping and fracture in combination with micro-scale cracking. Transverse rupture test results revealed significant reductions in residual strength and damage tolerance in CB1100. In contrast, AAC and Cfoam showed no correlation between machining induced damage and a reduction in surface integrity. Cutting forces in machining were modeled for all materials. Cutting force regression models were developed based on Design of Experiment and Analysis of Variance. A mechanistic cutting force model was proposed based upon conventional end milling force models and statistical distributions of material porosity. In order to validate the model, predicted cutting forces were compared to experimental results. Predicted cutting forces agreed well with experimental measurements. Furthermore, over the range of cutting conditions tested, the proposed model was shown to have comparable predictive accuracy to empirically produced regression models; greatly reducing the number of cutting tests required to simulate cutting forces. Further, this work demonstrates a key adaptation of metallic cutting force models to brittle porous material; a vital step in the research into the machining of these materials using end milling.

  11. Predictive Big Data Analytics: A Study of Parkinson’s Disease Using Large, Complex, Heterogeneous, Incongruent, Multi-Source and Incomplete Observations

    PubMed Central

    Dinov, Ivo D.; Heavner, Ben; Tang, Ming; Glusman, Gustavo; Chard, Kyle; Darcy, Mike; Madduri, Ravi; Pa, Judy; Spino, Cathie; Kesselman, Carl; Foster, Ian; Deutsch, Eric W.; Price, Nathan D.; Van Horn, John D.; Ames, Joseph; Clark, Kristi; Hood, Leroy; Hampstead, Benjamin M.; Dauer, William; Toga, Arthur W.

    2016-01-01

    Background A unique archive of Big Data on Parkinson’s Disease is collected, managed and disseminated by the Parkinson’s Progression Markers Initiative (PPMI). The integration of such complex and heterogeneous Big Data from multiple sources offers unparalleled opportunities to study the early stages of prevalent neurodegenerative processes, track their progression and quickly identify the efficacies of alternative treatments. Many previous human and animal studies have examined the relationship of Parkinson’s disease (PD) risk to trauma, genetics, environment, co-morbidities, or life style. The defining characteristics of Big Data–large size, incongruency, incompleteness, complexity, multiplicity of scales, and heterogeneity of information-generating sources–all pose challenges to the classical techniques for data management, processing, visualization and interpretation. We propose, implement, test and validate complementary model-based and model-free approaches for PD classification and prediction. To explore PD risk using Big Data methodology, we jointly processed complex PPMI imaging, genetics, clinical and demographic data. Methods and Findings Collective representation of the multi-source data facilitates the aggregation and harmonization of complex data elements. This enables joint modeling of the complete data, leading to the development of Big Data analytics, predictive synthesis, and statistical validation. Using heterogeneous PPMI data, we developed a comprehensive protocol for end-to-end data characterization, manipulation, processing, cleaning, analysis and validation. Specifically, we (i) introduce methods for rebalancing imbalanced cohorts, (ii) utilize a wide spectrum of classification methods to generate consistent and powerful phenotypic predictions, and (iii) generate reproducible machine-learning based classification that enables the reporting of model parameters and diagnostic forecasting based on new data. We evaluated several complementary model-based predictive approaches, which failed to generate accurate and reliable diagnostic predictions. However, the results of several machine-learning based classification methods indicated significant power to predict Parkinson’s disease in the PPMI subjects (consistent accuracy, sensitivity, and specificity exceeding 96%, confirmed using statistical n-fold cross-validation). Clinical (e.g., Unified Parkinson's Disease Rating Scale (UPDRS) scores), demographic (e.g., age), genetics (e.g., rs34637584, chr12), and derived neuroimaging biomarker (e.g., cerebellum shape index) data all contributed to the predictive analytics and diagnostic forecasting. Conclusions Model-free Big Data machine learning-based classification methods (e.g., adaptive boosting, support vector machines) can outperform model-based techniques in terms of predictive precision and reliability (e.g., forecasting patient diagnosis). We observed that statistical rebalancing of cohort sizes yields better discrimination of group differences, specifically for predictive analytics based on heterogeneous and incomplete PPMI data. UPDRS scores play a critical role in predicting diagnosis, which is expected based on the clinical definition of Parkinson’s disease. Even without longitudinal UPDRS data, however, the accuracy of model-free machine learning based classification is over 80%. The methods, software and protocols developed here are openly shared and can be employed to study other neurodegenerative disorders (e.g., Alzheimer’s, Huntington’s, amyotrophic lateral sclerosis), as well as for other predictive Big Data analytics applications. PMID:27494614

  12. An Update on Statistical Boosting in Biomedicine.

    PubMed

    Mayr, Andreas; Hofner, Benjamin; Waldmann, Elisabeth; Hepp, Tobias; Meyer, Sebastian; Gefeller, Olaf

    2017-01-01

    Statistical boosting algorithms have triggered a lot of research during the last decade. They combine a powerful machine learning approach with classical statistical modelling, offering various practical advantages like automated variable selection and implicit regularization of effect estimates. They are extremely flexible, as the underlying base-learners (regression functions defining the type of effect for the explanatory variables) can be combined with any kind of loss function (target function to be optimized, defining the type of regression setting). In this review article, we highlight the most recent methodological developments on statistical boosting regarding variable selection, functional regression, and advanced time-to-event modelling. Additionally, we provide a short overview on relevant applications of statistical boosting in biomedicine.

  13. Probability machines: consistent probability estimation using nonparametric learning machines.

    PubMed

    Malley, J D; Kruppa, J; Dasgupta, A; Malley, K G; Ziegler, A

    2012-01-01

    Most machine learning approaches only provide a classification for binary responses. However, probabilities are required for risk estimation using individual patient characteristics. It has been shown recently that every statistical learning machine known to be consistent for a nonparametric regression problem is a probability machine that is provably consistent for this estimation problem. The aim of this paper is to show how random forests and nearest neighbors can be used for consistent estimation of individual probabilities. Two random forest algorithms and two nearest neighbor algorithms are described in detail for estimation of individual probabilities. We discuss the consistency of random forests, nearest neighbors and other learning machines in detail. We conduct a simulation study to illustrate the validity of the methods. We exemplify the algorithms by analyzing two well-known data sets on the diagnosis of appendicitis and the diagnosis of diabetes in Pima Indians. Simulations demonstrate the validity of the method. With the real data application, we show the accuracy and practicality of this approach. We provide sample code from R packages in which the probability estimation is already available. This means that all calculations can be performed using existing software. Random forest algorithms as well as nearest neighbor approaches are valid machine learning methods for estimating individual probabilities for binary responses. Freely available implementations are available in R and may be used for applications.

  14. [Comparison of machinability of two types of dental machinable ceramic].

    PubMed

    Fu, Qiang; Zhao, Yunfeng; Li, Yong; Fan, Xinping; Li, Yan; Lin, Xuefeng

    2002-11-01

    In terms of the problems of now available dental machinable ceramics, a new type of calcium-mica glass-ceramic, PMC-I ceramic, was developed, and its machinability was compared with that of Vita MKII quantitatively. Moreover, the relationship between the strength and the machinability of PMC-I ceramic was studied. Samples of PMC-I ceramic were divided into four groups according to their nucleation procedures. 600-seconds drilling tests were conducted with high-speed steel tools (Phi = 2.3 mm) to measure the drilling depths of Vita MKII ceramic and PMC-I ceramic, while constant drilling speed of 600 rpm and constant axial load of 39.2 N were used. And the 3-point bending strength of the four groups of PMC-I ceramic were recorded. Drilling depth of Vita MKII was 0.71 mm, while the depths of the four groups of PMC-I ceramic were 0.88 mm, 1.40 mm, 0.40 mm and 0.90 mm, respectively. Group B of PMC-I ceramic showed the largest depth of 1.40 mm and was statistically different from other groups and Vita MKII. And the strength of the four groups of PMC-I ceramic were 137.7, 210.2, 118.0 and 106.0 MPa, respectively. The machinability of the new developed dental machinable ceramic of PMC-I could meet the need of the clinic.

  15. [Hygienic assessment of student's nutrition through vending machines (fast food)].

    PubMed

    Karelin, A O; Pavlova, D V; Babalyan, A V

    2015-01-01

    The article presents the results of a research work on studying the nutrition of students through vending machines (fast food), taking into account consumer priorities of students of medical University, the features and possible consequences of their use by students. The object of study was assortment of products sold through vending machines on the territory of the First Saint-Petersburg Medical University. Net calories, content of proteins, fats and carbohydrates, glycemic index, glycemic load were determined for each product. Information about the use of vending machines was obtained by questionnaires of students 2 and 4 courses of medical and dental faculties by standardized interview method. As was found, most sold through vending machines products has a high energy value, mainly due to refined carbohydrates, and was characterized by medium and high glycemic load. They have got low protein content. Most of the students (87.3%) take some products from the vending machines, mainly because of lack of time for canteen and buffets visiting. Only 4.2% students like assortment of vending machines. More than 50% students have got gastrointestinal complaints. Statistically significant relationship between time of study at the University and morbidity of gastrointestinal tract, as well as the number of students needing medical diet nutrition was found. The students who need the medical diet use fast food significantly more often (46.6% who need the medical diet and 37.7% who don't need it).

  16. Machine Learning and Radiology

    PubMed Central

    Wang, Shijun; Summers, Ronald M.

    2012-01-01

    In this paper, we give a short introduction to machine learning and survey its applications in radiology. We focused on six categories of applications in radiology: medical image segmentation, registration, computer aided detection and diagnosis, brain function or activity analysis and neurological disease diagnosis from fMR images, content-based image retrieval systems for CT or MRI images, and text analysis of radiology reports using natural language processing (NLP) and natural language understanding (NLU). This survey shows that machine learning plays a key role in many radiology applications. Machine learning identifies complex patterns automatically and helps radiologists make intelligent decisions on radiology data such as conventional radiographs, CT, MRI, and PET images and radiology reports. In many applications, the performance of machine learning-based automatic detection and diagnosis systems has shown to be comparable to that of a well-trained and experienced radiologist. Technology development in machine learning and radiology will benefit from each other in the long run. Key contributions and common characteristics of machine learning techniques in radiology are discussed. We also discuss the problem of translating machine learning applications to the radiology clinical setting, including advantages and potential barriers. PMID:22465077

  17. Statistical properties of two sine waves in Gaussian noise.

    NASA Technical Reports Server (NTRS)

    Esposito, R.; Wilson, L. R.

    1973-01-01

    A detailed study is presented of some statistical properties of a stochastic process that consists of the sum of two sine waves of unknown relative phase and a normal process. Since none of the statistics investigated seem to yield a closed-form expression, all the derivations are cast in a form that is particularly suitable for machine computation. Specifically, results are presented for the probability density function (pdf) of the envelope and the instantaneous value, the moments of these distributions, and the relative cumulative density function (cdf).

  18. Fresh Biomass Estimation in Heterogeneous Grassland Using Hyperspectral Measurements and Multivariate Statistical Analysis

    NASA Astrophysics Data System (ADS)

    Darvishzadeh, R.; Skidmore, A. K.; Mirzaie, M.; Atzberger, C.; Schlerf, M.

    2014-12-01

    Accurate estimation of grassland biomass at their peak productivity can provide crucial information regarding the functioning and productivity of the rangelands. Hyperspectral remote sensing has proved to be valuable for estimation of vegetation biophysical parameters such as biomass using different statistical techniques. However, in statistical analysis of hyperspectral data, multicollinearity is a common problem due to large amount of correlated hyper-spectral reflectance measurements. The aim of this study was to examine the prospect of above ground biomass estimation in a heterogeneous Mediterranean rangeland employing multivariate calibration methods. Canopy spectral measurements were made in the field using a GER 3700 spectroradiometer, along with concomitant in situ measurements of above ground biomass for 170 sample plots. Multivariate calibrations including partial least squares regression (PLSR), principal component regression (PCR), and Least-Squared Support Vector Machine (LS-SVM) were used to estimate the above ground biomass. The prediction accuracy of the multivariate calibration methods were assessed using cross validated R2 and RMSE. The best model performance was obtained using LS_SVM and then PLSR both calibrated with first derivative reflectance dataset with R2cv = 0.88 & 0.86 and RMSEcv= 1.15 & 1.07 respectively. The weakest prediction accuracy was appeared when PCR were used (R2cv = 0.31 and RMSEcv= 2.48). The obtained results highlight the importance of multivariate calibration methods for biomass estimation when hyperspectral data are used.

  19. Broiler chickens can benefit from machine learning: support vector machine analysis of observational epidemiological data

    PubMed Central

    Hepworth, Philip J.; Nefedov, Alexey V.; Muchnik, Ilya B.; Morgan, Kenton L.

    2012-01-01

    Machine-learning algorithms pervade our daily lives. In epidemiology, supervised machine learning has the potential for classification, diagnosis and risk factor identification. Here, we report the use of support vector machine learning to identify the features associated with hock burn on commercial broiler farms, using routinely collected farm management data. These data lend themselves to analysis using machine-learning techniques. Hock burn, dermatitis of the skin over the hock, is an important indicator of broiler health and welfare. Remarkably, this classifier can predict the occurrence of high hock burn prevalence with accuracy of 0.78 on unseen data, as measured by the area under the receiver operating characteristic curve. We also compare the results with those obtained by standard multi-variable logistic regression and suggest that this technique provides new insights into the data. This novel application of a machine-learning algorithm, embedded in poultry management systems could offer significant improvements in broiler health and welfare worldwide. PMID:22319115

  20. Broiler chickens can benefit from machine learning: support vector machine analysis of observational epidemiological data.

    PubMed

    Hepworth, Philip J; Nefedov, Alexey V; Muchnik, Ilya B; Morgan, Kenton L

    2012-08-07

    Machine-learning algorithms pervade our daily lives. In epidemiology, supervised machine learning has the potential for classification, diagnosis and risk factor identification. Here, we report the use of support vector machine learning to identify the features associated with hock burn on commercial broiler farms, using routinely collected farm management data. These data lend themselves to analysis using machine-learning techniques. Hock burn, dermatitis of the skin over the hock, is an important indicator of broiler health and welfare. Remarkably, this classifier can predict the occurrence of high hock burn prevalence with accuracy of 0.78 on unseen data, as measured by the area under the receiver operating characteristic curve. We also compare the results with those obtained by standard multi-variable logistic regression and suggest that this technique provides new insights into the data. This novel application of a machine-learning algorithm, embedded in poultry management systems could offer significant improvements in broiler health and welfare worldwide.

Top