Sample records for classify unknown samples

  1. A Predictive Model for Toxicity Effects Assessment of Biotransformed Hepatic Drugs Using Iterative Sampling Method.

    PubMed

    Tharwat, Alaa; Moemen, Yasmine S; Hassanien, Aboul Ella

    2016-12-09

    Measuring toxicity is one of the main steps in drug development. Hence, there is a high demand for computational models to predict the toxicity effects of the potential drugs. In this study, we used a dataset, which consists of four toxicity effects:mutagenic, tumorigenic, irritant and reproductive effects. The proposed model consists of three phases. In the first phase, rough set-based methods are used to select the most discriminative features for reducing the classification time and improving the classification performance. Due to the imbalanced class distribution, in the second phase, different sampling methods such as Random Under-Sampling, Random Over-Sampling and Synthetic Minority Oversampling Technique are used to solve the problem of imbalanced datasets. ITerative Sampling (ITS) method is proposed to avoid the limitations of those methods. ITS method has two steps. The first step (sampling step) iteratively modifies the prior distribution of the minority and majority classes. In the second step, a data cleaning method is used to remove the overlapping that is produced from the first step. In the third phase, Bagging classifier is used to classify an unknown drug into toxic or non-toxic. The experimental results proved that the proposed model performed well in classifying the unknown samples according to all toxic effects in the imbalanced datasets.

  2. The decision tree classifier - Design and potential. [for Landsat-1 data

    NASA Technical Reports Server (NTRS)

    Hauska, H.; Swain, P. H.

    1975-01-01

    A new classifier has been developed for the computerized analysis of remote sensor data. The decision tree classifier is essentially a maximum likelihood classifier using multistage decision logic. It is characterized by the fact that an unknown sample can be classified into a class using one or several decision functions in a successive manner. The classifier is applied to the analysis of data sensed by Landsat-1 over Kenosha Pass, Colorado. The classifier is illustrated by a tree diagram which for processing purposes is encoded as a string of symbols such that there is a unique one-to-one relationship between string and decision tree.

  3. Using CRANID to test the population affinity of known crania.

    PubMed

    Kallenberger, Lauren; Pilbrow, Varsha

    2012-11-01

    CRANID is a statistical program used to infer the source population of a cranium of unknown origin by comparing its cranial dimensions with a worldwide craniometric database. It has great potential for estimating ancestry in archaeological, forensic and repatriation cases. In this paper we test the validity of CRANID in classifying crania of known geographic origin. Twenty-three crania of known geographic origin but unknown sex were selected from the osteological collections of the University of Melbourne. Only 18 crania showed good statistical match with the CRANID database. Without considering accuracy of sex allocation, 11 crania were accurately classified into major geographic regions and nine were correctly classified to geographically closest available reference populations. Four of the five crania with poor statistical match were nonetheless correctly allocated to major geographical regions, although none was accurately assigned to geographically closest reference samples. We conclude that if sex allocations are overlooked, CRANID can accurately assign 39% of specimens to geographically closest matching reference samples and 48% to major geographic regions. Better source population representation may improve goodness of fit, but known sex-differentiated samples are needed to further test the utility of CRANID. © 2012 The Authors Journal of Anatomy © 2012 Anatomical Society.

  4. A Litmus Test for Performance Assessment.

    ERIC Educational Resources Information Center

    Finson, Kevin D.; Beaver, John B.

    1992-01-01

    Presents 10 guidelines for developing performance-based assessment items. Presents a sample activity developed from the guidelines. The activity tests students ability to observe, classify, and infer, using red and blue litmus paper, a pH-range finder, vinegar, ammonia, an unknown solution, distilled water, and paper towels. (PR)

  5. Bayesian Integration and Classification of Composition C-4 Plastic Explosives Based on Time-of-Flight-Secondary Ion Mass Spectrometry and Laser Ablation-Inductively Coupled Plasma Mass Spectrometry.

    PubMed

    Mahoney, Christine M; Kelly, Ryan T; Alexander, Liz; Newburn, Matt; Bader, Sydney; Ewing, Robert G; Fahey, Albert J; Atkinson, David A; Beagley, Nathaniel

    2016-04-05

    Time-of-flight-secondary ion mass spectrometry (TOF-SIMS) and laser ablation-inductively coupled plasma mass spectrometry (LA-ICPMS) were used for characterization and identification of unique signatures from a series of 18 Composition C-4 plastic explosives. The samples were obtained from various commercial and military sources around the country. Positive and negative ion TOF-SIMS data were acquired directly from the C-4 residue on Si surfaces, where the positive ion mass spectra obtained were consistent with the major composition of organic additives, and the negative ion mass spectra were more consistent with explosive content in the C-4 samples. Each series of mass spectra was subjected to partial least squares-discriminant analysis (PLS-DA), a multivariate statistical analysis approach which serves to first find the areas of maximum variance within different classes of C-4 and subsequently to classify unknown samples based on correlations between the unknown data set and the original data set (often referred to as a training data set). This method was able to successfully classify test samples of C-4, though with a limited degree of certainty. The classification accuracy of the method was further improved by integrating the positive and negative ion data using a Bayesian approach. The TOF-SIMS data was combined with a second analytical method, LA-ICPMS, which was used to analyze elemental signatures in the C-4. The integrated data were able to classify test samples with a high degree of certainty. Results indicate that this Bayesian integrated approach constitutes a robust classification method that should be employable even in dirty samples collected in the field.

  6. Probabilistic Multi-Person Tracking Using Dynamic Bayes Networks

    NASA Astrophysics Data System (ADS)

    Klinger, T.; Rottensteiner, F.; Heipke, C.

    2015-08-01

    Tracking-by-detection is a widely used practice in recent tracking systems. These usually rely on independent single frame detections that are handled as observations in a recursive estimation framework. If these observations are imprecise the generated trajectory is prone to be updated towards a wrong position. In contrary to existing methods our novel approach uses a Dynamic Bayes Network in which the state vector of a recursive Bayes filter, as well as the location of the tracked object in the image are modelled as unknowns. These unknowns are estimated in a probabilistic framework taking into account a dynamic model, and a state-of-the-art pedestrian detector and classifier. The classifier is based on the Random Forest-algorithm and is capable of being trained incrementally so that new training samples can be incorporated at runtime. This allows the classifier to adapt to the changing appearance of a target and to unlearn outdated features. The approach is evaluated on a publicly available benchmark. The results confirm that our approach is well suited for tracking pedestrians over long distances while at the same time achieving comparatively good geometric accuracy.

  7. Determination of Inorganic Ion Profiles of Illicit Drugs by Capillary Electrophoresis.

    PubMed

    Evans, Elizabeth; Costrino, Carolina; do Lago, Claudimir L; Garcia, Carlos D; Roux, Claude; Blanes, Lucas

    2016-11-01

    A portable capillary electrophoresis instrument with dual capacitively coupled contactless conductivity detection (C 4 D) was used to determine the inorganic ionic profiles of three pharmaceutical samples and precursors of two illicit drugs (contemporary samples of methylone and para-methoxymethamphetamine). The LODs ranged from 0.10 μmol/L to 1.25 μmol/L for the 10 selected cations, and from 0.13 μmol/L to 1.03 μmol/L for the eight selected anions. All separations were performed in less than 6 min with migration times and peak area RSD values ranging from 2 to 7%. The results demonstrate the potential of the analysis of inorganic ionic species to aid in the identification and/or differentiation of unknown tablets, and real samples found in illicit drug manufacture scenarios. From the resulting ionic fingerprint, the unknown tablets and samples can be further classified. © 2016 American Academy of Forensic Sciences.

  8. Selective Transfer Machine for Personalized Facial Expression Analysis

    PubMed Central

    Chu, Wen-Sheng; De la Torre, Fernando; Cohn, Jeffrey F.

    2017-01-01

    Automatic facial action unit (AU) and expression detection from videos is a long-standing problem. The problem is challenging in part because classifiers must generalize to previously unknown subjects that differ markedly in behavior and facial morphology (e.g., heavy versus delicate brows, smooth versus deeply etched wrinkles) from those on which the classifiers are trained. While some progress has been achieved through improvements in choices of features and classifiers, the challenge occasioned by individual differences among people remains. Person-specific classifiers would be a possible solution but for a paucity of training data. Sufficient training data for person-specific classifiers typically is unavailable. This paper addresses the problem of how to personalize a generic classifier without additional labels from the test subject. We propose a transductive learning method, which we refer as a Selective Transfer Machine (STM), to personalize a generic classifier by attenuating person-specific mismatches. STM achieves this effect by simultaneously learning a classifier and re-weighting the training samples that are most relevant to the test subject. We compared STM to both generic classifiers and cross-domain learning methods on four benchmarks: CK+ [44], GEMEP-FERA [67], RU-FACS [4] and GFT [57]. STM outperformed generic classifiers in all. PMID:28113267

  9. Robust online tracking via adaptive samples selection with saliency detection

    NASA Astrophysics Data System (ADS)

    Yan, Jia; Chen, Xi; Zhu, QiuPing

    2013-12-01

    Online tracking has shown to be successful in tracking of previously unknown objects. However, there are two important factors which lead to drift problem of online tracking, the one is how to select the exact labeled samples even when the target locations are inaccurate, and the other is how to handle the confusors which have similar features with the target. In this article, we propose a robust online tracking algorithm with adaptive samples selection based on saliency detection to overcome the drift problem. To deal with the problem of degrading the classifiers using mis-aligned samples, we introduce the saliency detection method to our tracking problem. Saliency maps and the strong classifiers are combined to extract the most correct positive samples. Our approach employs a simple yet saliency detection algorithm based on image spectral residual analysis. Furthermore, instead of using the random patches as the negative samples, we propose a reasonable selection criterion, in which both the saliency confidence and similarity are considered with the benefits that confusors in the surrounding background are incorporated into the classifiers update process before the drift occurs. The tracking task is formulated as a binary classification via online boosting framework. Experiment results in several challenging video sequences demonstrate the accuracy and stability of our tracker.

  10. [Confirmation of West Nile virus seroreactivity in central nervous system infections of unknown etiology from Ankara Province, Central Anatolia, Turkey].

    PubMed

    Ergünay, Koray; Özkul, Aykut

    2011-04-01

    West Nile virus (WNV) infections may trigger febrile conditions and/or neuroinvasive disease in a portion of the exposed individuals. Serosurveillance data from various regions of Turkey indicate WNV activity. The aim of this study was to confirm the antibody specificity of the serum samples via virus neutralization assay, previously reported to be reactive for WNV IgM. The samples originated from two individuals with the preliminary diagnosis of aseptic meningitis/encephalitis of unknown etiology in 2009 and had been classified as probable WNV infections. Cerebrospinal fluid and sera samples of these patients had been evaluated as negative for WNV RNA and IgG antibodies. Only one serum sample could be included in the neutralization assay due to the limited amounts in the current investigation. The sample was observed as positive in dilutions of 1/20 and 1/40, thus confirming the diagnosis of WNV-related central nervous system infection in a 62 year-old female patient from Ankara, Central Anatolia, Turkey.

  11. The Inverse Bagging Algorithm: Anomaly Detection by Inverse Bootstrap Aggregating

    NASA Astrophysics Data System (ADS)

    Vischia, Pietro; Dorigo, Tommaso

    2017-03-01

    For data sets populated by a very well modeled process and by another process of unknown probability density function (PDF), a desired feature when manipulating the fraction of the unknown process (either for enhancing it or suppressing it) consists in avoiding to modify the kinematic distributions of the well modeled one. A bootstrap technique is used to identify sub-samples rich in the well modeled process, and classify each event according to the frequency of it being part of such sub-samples. Comparisons with general MVA algorithms will be shown, as well as a study of the asymptotic properties of the method, making use of a public domain data set that models a typical search for new physics as performed at hadronic colliders such as the Large Hadron Collider (LHC).

  12. Estimation of the diagnostic threshold accounting for decision costs and sampling uncertainty.

    PubMed

    Skaltsa, Konstantina; Jover, Lluís; Carrasco, Josep Lluís

    2010-10-01

    Medical diagnostic tests are used to classify subjects as non-diseased or diseased. The classification rule usually consists of classifying subjects using the values of a continuous marker that is dichotomised by means of a threshold. Here, the optimum threshold estimate is found by minimising a cost function that accounts for both decision costs and sampling uncertainty. The cost function is optimised either analytically in a normal distribution setting or empirically in a free-distribution setting when the underlying probability distributions of diseased and non-diseased subjects are unknown. Inference of the threshold estimates is based on approximate analytically standard errors and bootstrap-based approaches. The performance of the proposed methodology is assessed by means of a simulation study, and the sample size required for a given confidence interval precision and sample size ratio is also calculated. Finally, a case example based on previously published data concerning the diagnosis of Alzheimer's patients is provided in order to illustrate the procedure.

  13. Chemometric brand differentiation of commercial spices using direct analysis in real time mass spectrometry.

    PubMed

    Pavlovich, Matthew J; Dunn, Emily E; Hall, Adam B

    2016-05-15

    Commercial spices represent an emerging class of fuels for improvised explosives. Being able to classify such spices not only by type but also by brand would represent an important step in developing methods to analytically investigate these explosive compositions. Therefore, a combined ambient mass spectrometric/chemometric approach was developed to quickly and accurately classify commercial spices by brand. Direct analysis in real time mass spectrometry (DART-MS) was used to generate mass spectra for samples of black pepper, cayenne pepper, and turmeric, along with four different brands of cinnamon, all dissolved in methanol. Unsupervised learning techniques showed that the cinnamon samples clustered according to brand. Then, we used supervised machine learning algorithms to build chemometric models with a known training set and classified the brands of an unknown testing set of cinnamon samples. Ten independent runs of five-fold cross-validation showed that the training set error for the best-performing models (i.e., the linear discriminant and neural network models) was lower than 2%. The false-positive percentages for these models were 3% or lower, and the false-negative percentages were lower than 10%. In particular, the linear discriminant model perfectly classified the testing set with 0% error. Repeated iterations of training and testing gave similar results, demonstrating the reproducibility of these models. Chemometric models were able to classify the DART mass spectra of commercial cinnamon samples according to brand, with high specificity and low classification error. This method could easily be generalized to other classes of spices, and it could be applied to authenticating questioned commercial samples of spices or to examining evidence from improvised explosives. Copyright © 2016 John Wiley & Sons, Ltd.

  14. A sampling bias in identifying children in foster care using Medicaid data.

    PubMed

    Rubin, David M; Pati, Susmita; Luan, Xianqun; Alessandrini, Evaline A

    2005-01-01

    Prior research identified foster care children using Medicaid eligibility codes specific to foster care, but it is unknown whether these codes capture all foster care children. To describe the sampling bias in relying on Medicaid eligibility codes to identify foster care children. Using foster care administrative files linked to Medicaid data, we describe the proportion of children whose Medicaid eligibility was correctly encoded as foster child during a 1-year follow-up period following a new episode of foster care. Sampling bias is described by comparing claims in mental health, emergency department (ED), and other ambulatory settings among correctly and incorrectly classified foster care children. Twenty-eight percent of the 5683 sampled children were incorrectly classified in Medicaid eligibility files. In a multivariate logistic regression model, correct classification was associated with duration of foster care (>9 vs <2 months, odds ratio [OR] 7.67, 95% confidence interval [CI] 7.17-7.97), number of placements (>3 vs 1 placement, OR 4.20, 95% CI 3.14-5.64), and placement in a group home among adjudicated dependent children (OR 1.87, 95% CI 1.33-2.63). Compared with incorrectly classified children, correctly classified foster care children were 3 times more likely to use any services, 2 times more likely to visit the ED, 3 times more likely to make ambulatory visits, and 4 times more likely to use mental health care services (P < .001 for all comparisons). Identifying children in foster care using Medicaid eligibility files is prone to sampling bias that over-represents children in foster care who use more services.

  15. Analysis of select Dalbergia and trade timber using direct analysis in real time and time-of-flight mass spectrometry for CITES enforcement.

    PubMed

    Lancaster, Cady; Espinoza, Edgard

    2012-05-15

    International trade of several Dalbergia wood species is regulated by The Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES). In order to supplement morphological identification of these species, a rapid chemical method of analysis was developed. Using Direct Analysis in Real Time (DART) ionization coupled with Time-of-Flight (TOF) Mass Spectrometry (MS), selected Dalbergia and common trade species were analyzed. Each of the 13 wood species was classified using principal component analysis and linear discriminant analysis (LDA). These statistical data clusters served as reliable anchors for species identification of unknowns. Analysis of 20 or more samples from the 13 species studied in this research indicates that the DART-TOFMS results are reproducible. Statistical analysis of the most abundant ions gave good classifications that were useful for identifying unknown wood samples. DART-TOFMS and LDA analysis of 13 species of selected timber samples and the statistical classification allowed for the correct assignment of unknown wood samples. This method is rapid and can be useful when anatomical identification is difficult but needed in order to support CITES enforcement. Published 2012. This article is a US Government work and is in the public domain in the USA.

  16. Classification without labels: learning from mixed samples in high energy physics

    NASA Astrophysics Data System (ADS)

    Metodiev, Eric M.; Nachman, Benjamin; Thaler, Jesse

    2017-10-01

    Modern machine learning techniques can be used to construct powerful models for difficult collider physics problems. In many applications, however, these models are trained on imperfect simulations due to a lack of truth-level information in the data, which risks the model learning artifacts of the simulation. In this paper, we introduce the paradigm of classification without labels (CWoLa) in which a classifier is trained to distinguish statistical mixtures of classes, which are common in collider physics. Crucially, neither individual labels nor class proportions are required, yet we prove that the optimal classifier in the CWoLa paradigm is also the optimal classifier in the traditional fully-supervised case where all label information is available. After demonstrating the power of this method in an analytical toy example, we consider a realistic benchmark for collider physics: distinguishing quark- versus gluon-initiated jets using mixed quark/gluon training samples. More generally, CWoLa can be applied to any classification problem where labels or class proportions are unknown or simulations are unreliable, but statistical mixtures of the classes are available.

  17. Classification without labels: learning from mixed samples in high energy physics

    DOE PAGES

    Metodiev, Eric M.; Nachman, Benjamin; Thaler, Jesse

    2017-10-25

    Modern machine learning techniques can be used to construct powerful models for difficult collider physics problems. In many applications, however, these models are trained on imperfect simulations due to a lack of truth-level information in the data, which risks the model learning artifacts of the simulation. In this paper, we introduce the paradigm of classification without labels (CWoLa) in which a classifier is trained to distinguish statistical mixtures of classes, which are common in collider physics. Crucially, neither individual labels nor class proportions are required, yet we prove that the optimal classifier in the CWoLa paradigm is also the optimalmore » classifier in the traditional fully-supervised case where all label information is available. After demonstrating the power of this method in an analytical toy example, we consider a realistic benchmark for collider physics: distinguishing quark- versus gluon-initiated jets using mixed quark/gluon training samples. More generally, CWoLa can be applied to any classification problem where labels or class proportions are unknown or simulations are unreliable, but statistical mixtures of the classes are available.« less

  18. Classification without labels: learning from mixed samples in high energy physics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Metodiev, Eric M.; Nachman, Benjamin; Thaler, Jesse

    Modern machine learning techniques can be used to construct powerful models for difficult collider physics problems. In many applications, however, these models are trained on imperfect simulations due to a lack of truth-level information in the data, which risks the model learning artifacts of the simulation. In this paper, we introduce the paradigm of classification without labels (CWoLa) in which a classifier is trained to distinguish statistical mixtures of classes, which are common in collider physics. Crucially, neither individual labels nor class proportions are required, yet we prove that the optimal classifier in the CWoLa paradigm is also the optimalmore » classifier in the traditional fully-supervised case where all label information is available. After demonstrating the power of this method in an analytical toy example, we consider a realistic benchmark for collider physics: distinguishing quark- versus gluon-initiated jets using mixed quark/gluon training samples. More generally, CWoLa can be applied to any classification problem where labels or class proportions are unknown or simulations are unreliable, but statistical mixtures of the classes are available.« less

  19. Classification of Malaysia aromatic rice using multivariate statistical analysis

    NASA Astrophysics Data System (ADS)

    Abdullah, A. H.; Adom, A. H.; Shakaff, A. Y. Md; Masnan, M. J.; Zakaria, A.; Rahim, N. A.; Omar, O.

    2015-05-01

    Aromatic rice (Oryza sativa L.) is considered as the best quality premium rice. The varieties are preferred by consumers because of its preference criteria such as shape, colour, distinctive aroma and flavour. The price of aromatic rice is higher than ordinary rice due to its special needed growth condition for instance specific climate and soil. Presently, the aromatic rice quality is identified by using its key elements and isotopic variables. The rice can also be classified via Gas Chromatography Mass Spectrometry (GC-MS) or human sensory panels. However, the uses of human sensory panels have significant drawbacks such as lengthy training time, and prone to fatigue as the number of sample increased and inconsistent. The GC-MS analysis techniques on the other hand, require detailed procedures, lengthy analysis and quite costly. This paper presents the application of in-house developed Electronic Nose (e-nose) to classify new aromatic rice varieties. The e-nose is used to classify the variety of aromatic rice based on the samples odour. The samples were taken from the variety of rice. The instrument utilizes multivariate statistical data analysis, including Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) and K-Nearest Neighbours (KNN) to classify the unknown rice samples. The Leave-One-Out (LOO) validation approach is applied to evaluate the ability of KNN to perform recognition and classification of the unspecified samples. The visual observation of the PCA and LDA plots of the rice proves that the instrument was able to separate the samples into different clusters accordingly. The results of LDA and KNN with low misclassification error support the above findings and we may conclude that the e-nose is successfully applied to the classification of the aromatic rice varieties.

  20. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Abdullah, A. H.; Adom, A. H.; Shakaff, A. Y. Md

    Aromatic rice (Oryza sativa L.) is considered as the best quality premium rice. The varieties are preferred by consumers because of its preference criteria such as shape, colour, distinctive aroma and flavour. The price of aromatic rice is higher than ordinary rice due to its special needed growth condition for instance specific climate and soil. Presently, the aromatic rice quality is identified by using its key elements and isotopic variables. The rice can also be classified via Gas Chromatography Mass Spectrometry (GC-MS) or human sensory panels. However, the uses of human sensory panels have significant drawbacks such as lengthy trainingmore » time, and prone to fatigue as the number of sample increased and inconsistent. The GC–MS analysis techniques on the other hand, require detailed procedures, lengthy analysis and quite costly. This paper presents the application of in-house developed Electronic Nose (e-nose) to classify new aromatic rice varieties. The e-nose is used to classify the variety of aromatic rice based on the samples odour. The samples were taken from the variety of rice. The instrument utilizes multivariate statistical data analysis, including Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) and K-Nearest Neighbours (KNN) to classify the unknown rice samples. The Leave-One-Out (LOO) validation approach is applied to evaluate the ability of KNN to perform recognition and classification of the unspecified samples. The visual observation of the PCA and LDA plots of the rice proves that the instrument was able to separate the samples into different clusters accordingly. The results of LDA and KNN with low misclassification error support the above findings and we may conclude that the e-nose is successfully applied to the classification of the aromatic rice varieties.« less

  1. System and method for resolving gamma-ray spectra

    DOEpatents

    Gentile, Charles A.; Perry, Jason; Langish, Stephen W.; Silber, Kenneth; Davis, William M.; Mastrovito, Dana

    2010-05-04

    A system for identifying radionuclide emissions is described. The system includes at least one processor for processing output signals from a radionuclide detecting device, at least one training algorithm run by the at least one processor for analyzing data derived from at least one set of known sample data from the output signals, at least one classification algorithm derived from the training algorithm for classifying unknown sample data, wherein the at least one training algorithm analyzes the at least one sample data set to derive at least one rule used by said classification algorithm for identifying at least one radionuclide emission detected by the detecting device.

  2. Online clustering algorithms for radar emitter classification.

    PubMed

    Liu, Jun; Lee, Jim P Y; Senior; Li, Lingjie; Luo, Zhi-Quan; Wong, K Max

    2005-08-01

    Radar emitter classification is a special application of data clustering for classifying unknown radar emitters from received radar pulse samples. The main challenges of this task are the high dimensionality of radar pulse samples, small sample group size, and closely located radar pulse clusters. In this paper, two new online clustering algorithms are developed for radar emitter classification: One is model-based using the Minimum Description Length (MDL) criterion and the other is based on competitive learning. Computational complexity is analyzed for each algorithm and then compared. Simulation results show the superior performance of the model-based algorithm over competitive learning in terms of better classification accuracy, flexibility, and stability.

  3. Using the concept of pseudo amino acid composition to predict resistance gene against Xanthomonas oryzae pv. oryzae in rice: an approach from chaos games representation.

    PubMed

    Jingbo, Xia; Silan, Zhang; Feng, Shi; Huijuan, Xiong; Xuehai, Hu; Xiaohui, Niu; Zhi, Li

    2011-09-07

    To evaluate the possibility of an unknown protein to be a resistant gene against Xanthomonas oryzae pv. oryzae, a different mode of pseudo amino acid composition (PseAAC) is proposed to formulate the protein samples by integrating the amino acid composition, as well as the Chaos games representation (CGR) method. Some numerical comparisons of triangle, quadrangle and 12-vertex polygon CGR are carried to evaluate the efficiency of using these fractal figures in classifiers. The numerical results show that among the three polygon methods, triangle method owns a good fractal visualization and performs the best in the classifier construction. By using triangle + 12-vertex polygon CGR as the mathematical feature, the classifier achieves 98.13% in Jackknife test and MCC achieves 0.8462. Copyright © 2011 Elsevier Ltd. All rights reserved.

  4. Classifier performance prediction for computer-aided diagnosis using a limited dataset.

    PubMed

    Sahiner, Berkman; Chan, Heang-Ping; Hadjiiski, Lubomir

    2008-04-01

    In a practical classifier design problem, the true population is generally unknown and the available sample is finite-sized. A common approach is to use a resampling technique to estimate the performance of the classifier that will be trained with the available sample. We conducted a Monte Carlo simulation study to compare the ability of the different resampling techniques in training the classifier and predicting its performance under the constraint of a finite-sized sample. The true population for the two classes was assumed to be multivariate normal distributions with known covariance matrices. Finite sets of sample vectors were drawn from the population. The true performance of the classifier is defined as the area under the receiver operating characteristic curve (AUC) when the classifier designed with the specific sample is applied to the true population. We investigated methods based on the Fukunaga-Hayes and the leave-one-out techniques, as well as three different types of bootstrap methods, namely, the ordinary, 0.632, and 0.632+ bootstrap. The Fisher's linear discriminant analysis was used as the classifier. The dimensionality of the feature space was varied from 3 to 15. The sample size n2 from the positive class was varied between 25 and 60, while the number of cases from the negative class was either equal to n2 or 3n2. Each experiment was performed with an independent dataset randomly drawn from the true population. Using a total of 1000 experiments for each simulation condition, we compared the bias, the variance, and the root-mean-squared error (RMSE) of the AUC estimated using the different resampling techniques relative to the true AUC (obtained from training on a finite dataset and testing on the population). Our results indicated that, under the study conditions, there can be a large difference in the RMSE obtained using different resampling methods, especially when the feature space dimensionality is relatively large and the sample size is small. Under this type of conditions, the 0.632 and 0.632+ bootstrap methods have the lowest RMSE, indicating that the difference between the estimated and the true performances obtained using the 0.632 and 0.632+ bootstrap will be statistically smaller than those obtained using the other three resampling methods. Of the three bootstrap methods, the 0.632+ bootstrap provides the lowest bias. Although this investigation is performed under some specific conditions, it reveals important trends for the problem of classifier performance prediction under the constraint of a limited dataset.

  5. Layered classification techniques for remote sensing applications

    NASA Technical Reports Server (NTRS)

    Swain, P. H.; Wu, C. L.; Landgrebe, D. A.; Hauska, H.

    1975-01-01

    The single-stage method of pattern classification utilizes all available features in a single test which assigns the unknown to a category according to a specific decision strategy (such as the maximum likelihood strategy). The layered classifier classifies the unknown through a sequence of tests, each of which may be dependent on the outcome of previous tests. Although the layered classifier was originally investigated as a means of improving classification accuracy and efficiency, it was found that in the context of remote sensing data analysis, other advantages also accrue due to many of the special characteristics of both the data and the applications pursued. The layered classifier method and several of the diverse applications of this approach are discussed.

  6. Joint deconvolution and classification with applications to passive acoustic underwater multipath.

    PubMed

    Anderson, Hyrum S; Gupta, Maya R

    2008-11-01

    This paper addresses the problem of classifying signals that have been corrupted by noise and unknown linear time-invariant (LTI) filtering such as multipath, given labeled uncorrupted training signals. A maximum a posteriori approach to the deconvolution and classification is considered, which produces estimates of the desired signal, the unknown channel, and the class label. For cases in which only a class label is needed, the classification accuracy can be improved by not committing to an estimate of the channel or signal. A variant of the quadratic discriminant analysis (QDA) classifier is proposed that probabilistically accounts for the unknown LTI filtering, and which avoids deconvolution. The proposed QDA classifier can work either directly on the signal or on features whose transformation by LTI filtering can be analyzed; as an example a classifier for subband-power features is derived. Results on simulated data and real Bowhead whale vocalizations show that jointly considering deconvolution with classification can dramatically improve classification performance over traditional methods over a range of signal-to-noise ratios.

  7. Bayesian nonlinear structural FE model and seismic input identification for damage assessment of civil structures

    NASA Astrophysics Data System (ADS)

    Astroza, Rodrigo; Ebrahimian, Hamed; Li, Yong; Conte, Joel P.

    2017-09-01

    A methodology is proposed to update mechanics-based nonlinear finite element (FE) models of civil structures subjected to unknown input excitation. The approach allows to jointly estimate unknown time-invariant model parameters of a nonlinear FE model of the structure and the unknown time histories of input excitations using spatially-sparse output response measurements recorded during an earthquake event. The unscented Kalman filter, which circumvents the computation of FE response sensitivities with respect to the unknown model parameters and unknown input excitations by using a deterministic sampling approach, is employed as the estimation tool. The use of measurement data obtained from arrays of heterogeneous sensors, including accelerometers, displacement sensors, and strain gauges is investigated. Based on the estimated FE model parameters and input excitations, the updated nonlinear FE model can be interrogated to detect, localize, classify, and assess damage in the structure. Numerically simulated response data of a three-dimensional 4-story 2-by-1 bay steel frame structure with six unknown model parameters subjected to unknown bi-directional horizontal seismic excitation, and a three-dimensional 5-story 2-by-1 bay reinforced concrete frame structure with nine unknown model parameters subjected to unknown bi-directional horizontal seismic excitation are used to illustrate and validate the proposed methodology. The results of the validation studies show the excellent performance and robustness of the proposed algorithm to jointly estimate unknown FE model parameters and unknown input excitations.

  8. Comparison of seven protocols to identify fecal contamination sources using Escherichia coli

    USGS Publications Warehouse

    Stoeckel, D.M.; Mathes, M.V.; Hyer, K.E.; Hagedorn, C.; Kator, H.; Lukasik, J.; O'Brien, T. L.; Fenger, T.W.; Samadpour, M.; Strickler, K.M.; Wiggins, B.A.

    2004-01-01

    Microbial source tracking (MST) uses various approaches to classify fecal-indicator microorganisms to source hosts. Reproducibility, accuracy, and robustness of seven phenotypic and genotypic MST protocols were evaluated by use of Escherichia coli from an eight-host library of known-source isolates and a separate, blinded challenge library. In reproducibility tests, measuring each protocol's ability to reclassify blinded replicates, only one (pulsed-field gel electrophoresis; PFGE) correctly classified all test replicates to host species; three protocols classified 48-62% correctly, and the remaining three classified fewer than 25% correctly. In accuracy tests, measuring each protocol's ability to correctly classify new isolates, ribotyping with EcoRI and PvuII approached 100% correct classification but only 6% of isolates were classified; four of the other six protocols (antibiotic resistance analysis, PFGE, and two repetitive-element PCR protocols) achieved better than random accuracy rates when 30-100% of challenge isolates were classified. In robustness tests, measuring each protocol's ability to recognize isolates from nonlibrary hosts, three protocols correctly classified 33-100% of isolates as "unknown origin," whereas four protocols classified all isolates to a source category. A relevance test, summarizing interpretations for a hypothetical water sample containing 30 challenge isolates, indicated that false-positive classifications would hinder interpretations for most protocols. Study results indicate that more representation in known-source libraries and better classification accuracy would be needed before field application. Thorough reliability assessment of classification results is crucial before and during application of MST protocols.

  9. Moments and Root-Mean-Square Error of the Bayesian MMSE Estimator of Classification Error in the Gaussian Model.

    PubMed

    Zollanvari, Amin; Dougherty, Edward R

    2014-06-01

    The most important aspect of any classifier is its error rate, because this quantifies its predictive capacity. Thus, the accuracy of error estimation is critical. Error estimation is problematic in small-sample classifier design because the error must be estimated using the same data from which the classifier has been designed. Use of prior knowledge, in the form of a prior distribution on an uncertainty class of feature-label distributions to which the true, but unknown, feature-distribution belongs, can facilitate accurate error estimation (in the mean-square sense) in circumstances where accurate completely model-free error estimation is impossible. This paper provides analytic asymptotically exact finite-sample approximations for various performance metrics of the resulting Bayesian Minimum Mean-Square-Error (MMSE) error estimator in the case of linear discriminant analysis (LDA) in the multivariate Gaussian model. These performance metrics include the first, second, and cross moments of the Bayesian MMSE error estimator with the true error of LDA, and therefore, the Root-Mean-Square (RMS) error of the estimator. We lay down the theoretical groundwork for Kolmogorov double-asymptotics in a Bayesian setting, which enables us to derive asymptotic expressions of the desired performance metrics. From these we produce analytic finite-sample approximations and demonstrate their accuracy via numerical examples. Various examples illustrate the behavior of these approximations and their use in determining the necessary sample size to achieve a desired RMS. The Supplementary Material contains derivations for some equations and added figures.

  10. A large-scale dataset of single and mixed-source short tandem repeat profiles to inform human identification strategies: PROVEDIt.

    PubMed

    Alfonse, Lauren E; Garrett, Amanda D; Lun, Desmond S; Duffy, Ken R; Grgicak, Catherine M

    2018-01-01

    DNA-based human identity testing is conducted by comparison of PCR-amplified polymorphic Short Tandem Repeat (STR) motifs from a known source with the STR profiles obtained from uncertain sources. Samples such as those found at crime scenes often result in signal that is a composite of incomplete STR profiles from an unknown number of unknown contributors, making interpretation an arduous task. To facilitate advancement in STR interpretation challenges we provide over 25,000 multiplex STR profiles produced from one to five known individuals at target levels ranging from one to 160 copies of DNA. The data, generated under 144 laboratory conditions, are classified by total copy number and contributor proportions. For the 70% of samples that were synthetically compromised, we report the level of DNA damage using quantitative and end-point PCR. In addition, we characterize the complexity of the signal by exploring the number of detected alleles in each profile. Copyright © 2017 Elsevier B.V. All rights reserved.

  11. A statistical approach to combining multisource information in one-class classifiers

    DOE PAGES

    Simonson, Katherine M.; Derek West, R.; Hansen, Ross L.; ...

    2017-06-08

    A new method is introduced in this paper for combining information from multiple sources to support one-class classification. The contributing sources may represent measurements taken by different sensors of the same physical entity, repeated measurements by a single sensor, or numerous features computed from a single measured image or signal. The approach utilizes the theory of statistical hypothesis testing, and applies Fisher's technique for combining p-values, modified to handle nonindependent sources. Classifier outputs take the form of fused p-values, which may be used to gauge the consistency of unknown entities with one or more class hypotheses. The approach enables rigorousmore » assessment of classification uncertainties, and allows for traceability of classifier decisions back to the constituent sources, both of which are important for high-consequence decision support. Application of the technique is illustrated in two challenge problems, one for skin segmentation and the other for terrain labeling. Finally, the method is seen to be particularly effective for relatively small training samples.« less

  12. A statistical approach to combining multisource information in one-class classifiers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Simonson, Katherine M.; Derek West, R.; Hansen, Ross L.

    A new method is introduced in this paper for combining information from multiple sources to support one-class classification. The contributing sources may represent measurements taken by different sensors of the same physical entity, repeated measurements by a single sensor, or numerous features computed from a single measured image or signal. The approach utilizes the theory of statistical hypothesis testing, and applies Fisher's technique for combining p-values, modified to handle nonindependent sources. Classifier outputs take the form of fused p-values, which may be used to gauge the consistency of unknown entities with one or more class hypotheses. The approach enables rigorousmore » assessment of classification uncertainties, and allows for traceability of classifier decisions back to the constituent sources, both of which are important for high-consequence decision support. Application of the technique is illustrated in two challenge problems, one for skin segmentation and the other for terrain labeling. Finally, the method is seen to be particularly effective for relatively small training samples.« less

  13. Automated Classification of ROSAT Sources Using Heterogeneous Multiwavelength Source Catalogs

    NASA Technical Reports Server (NTRS)

    McGlynn, Thomas; Suchkov, A. A.; Winter, E. L.; Hanisch, R. J.; White, R. L.; Ochsenbein, F.; Derriere, S.; Voges, W.; Corcoran, M. F.

    2004-01-01

    We describe an on-line system for automated classification of X-ray sources, ClassX, and present preliminary results of classification of the three major catalogs of ROSAT sources, RASS BSC, RASS FSC, and WGACAT, into six class categories: stars, white dwarfs, X-ray binaries, galaxies, AGNs, and clusters of galaxies. ClassX is based on a machine learning technology. It represents a system of classifiers, each classifier consisting of a considerable number of oblique decision trees. These trees are built as the classifier is 'trained' to recognize various classes of objects using a training sample of sources of known object types. Each source is characterized by a preselected set of parameters, or attributes; the same set is then used as the classifier conducts classification of sources of unknown identity. The ClassX pipeline features an automatic search for X-ray source counterparts among heterogeneous data sets in on-line data archives using Virtual Observatory protocols; it retrieves from those archives all the attributes required by the selected classifier and inputs them to the classifier. The user input to ClassX is typically a file with target coordinates, optionally complemented with target IDs. The output contains the class name, attributes, and class probabilities for all classified targets. We discuss ways to characterize and assess the classifier quality and performance and present the respective validation procedures. Based on both internal and external validation, we conclude that the ClassX classifiers yield reasonable and reliable classifications for ROSAT sources and have the potential to broaden class representation significantly for rare object types.

  14. The contribution of cluster and discriminant analysis to the classification of complex aquifer systems.

    PubMed

    Panagopoulos, G P; Angelopoulou, D; Tzirtzilakis, E E; Giannoulopoulos, P

    2016-10-01

    This paper presents an innovated method for the discrimination of groundwater samples in common groups representing the hydrogeological units from where they have been pumped. This method proved very efficient even in areas with complex hydrogeological regimes. The proposed method requires chemical analyses of water samples only for major ions, meaning that it is applicable to most of cases worldwide. Another benefit of the method is that it gives a further insight of the aquifer hydrogeochemistry as it provides the ions that are responsible for the discrimination of the group. The procedure begins with cluster analysis of the dataset in order to classify the samples in the corresponding hydrogeological unit. The feasibility of the method is proven from the fact that the samples of volcanic origin were separated into two different clusters, namely the lava units and the pyroclastic-ignimbritic aquifer. The second step is the discriminant analysis of the data which provides the functions that distinguish the groups from each other and the most significant variables that define the hydrochemical composition of the aquifer. The whole procedure was highly successful as the 94.7 % of the samples were classified to the correct aquifer system. Finally, the resulted functions can be safely used to categorize samples of either unknown or doubtful origin improving thus the quality and the size of existing hydrochemical databases.

  15. Building gene expression profile classifiers with a simple and efficient rejection option in R.

    PubMed

    Benso, Alfredo; Di Carlo, Stefano; Politano, Gianfranco; Savino, Alessandro; Hafeezurrehman, Hafeez

    2011-01-01

    The collection of gene expression profiles from DNA microarrays and their analysis with pattern recognition algorithms is a powerful technology applied to several biological problems. Common pattern recognition systems classify samples assigning them to a set of known classes. However, in a clinical diagnostics setup, novel and unknown classes (new pathologies) may appear and one must be able to reject those samples that do not fit the trained model. The problem of implementing a rejection option in a multi-class classifier has not been widely addressed in the statistical literature. Gene expression profiles represent a critical case study since they suffer from the curse of dimensionality problem that negatively reflects on the reliability of both traditional rejection models and also more recent approaches such as one-class classifiers. This paper presents a set of empirical decision rules that can be used to implement a rejection option in a set of multi-class classifiers widely used for the analysis of gene expression profiles. In particular, we focus on the classifiers implemented in the R Language and Environment for Statistical Computing (R for short in the remaining of this paper). The main contribution of the proposed rules is their simplicity, which enables an easy integration with available data analysis environments. Since in the definition of a rejection model tuning of the involved parameters is often a complex and delicate task, in this paper we exploit an evolutionary strategy to automate this process. This allows the final user to maximize the rejection accuracy with minimum manual intervention. This paper shows how the use of simple decision rules can be used to help the use of complex machine learning algorithms in real experimental setups. The proposed approach is almost completely automated and therefore a good candidate for being integrated in data analysis flows in labs where the machine learning expertise required to tune traditional classifiers might not be available.

  16. Malignant pleural mesothelioma and mesothelial hyperplasia: A new molecular tool for the differential diagnosis.

    PubMed

    Bruno, Rossella; Alì, Greta; Giannini, Riccardo; Proietti, Agnese; Lucchi, Marco; Chella, Antonio; Melfi, Franca; Mussi, Alfredo; Fontanini, Gabriella

    2017-01-10

    Malignant pleural mesothelioma (MPM) is a rare asbestos related cancer, aggressive and unresponsive to therapies. Histological examination of pleural lesions is the gold standard of MPM diagnosis, although it is sometimes hard to discriminate the epithelioid type of MPM from benign mesothelial hyperplasia (MH).This work aims to define a new molecular tool for the differential diagnosis of MPM, using the expression profile of 117 genes deregulated in this tumour.The gene expression analysis was performed by nanoString System on tumour tissues from 36 epithelioid MPM and 17 MH patients, and on 14 mesothelial pleural samples analysed in a blind way. Data analysis included raw nanoString data normalization, unsupervised cluster analysis by Pearson correlation, non-parametric Mann Whitney U-test and molecular classification by the Uncorrelated Shrunken Centroid (USC) Algorithm.The Mann-Whitney U-test found 35 genes upregulated and 31 downregulated in MPM. The unsupervised cluster analysis revealed two clusters, one composed only of MPM and one only of MH samples, thus revealing class-specific gene profiles. The Uncorrelated Shrunken Centroid algorithm identified two classifiers, one including 22 genes and the other 40 genes, able to properly classify all the samples as benign or malignant using gene expression data; both classifiers were also able to correctly determine, in a blind analysis, the diagnostic categories of all the 14 unknown samples.In conclusion we delineated a diagnostic tool combining molecular data (gene expression) and computational analysis (USC algorithm), which can be applied in the clinical practice for the differential diagnosis of MPM.

  17. Identification of beta-lactam antibiotics in tissue samples containing unknown microbial inhibitors.

    PubMed

    Moats, W A; Romanowski, R D; Medina, M B

    1998-01-01

    Antibiotic residues in animal tissues can be detected by various screening tests based on microbial inhibition. In the 7-plate assay used by the U.S. Department of Agriculture's Food Safety and Inspection Service (FSIS), penicillinase is incorporated into all but one plate to distinguish beta-lactam antibiotics from other types. However, beta-lactams such as cloxacillin and the cephalosporins are resistant to degradation by penicillinase. They may not be identified as beta-lactams by this procedure, and thus, they may be identified as unidentified microbial inhibitors (UMIs). However, these penicillinase-resistant compounds can be degraded by other beta-lactamases. The present study describes an improved screening protocol to identify beta-lactam antibiotics classified as UMIs. A multiresidue liquid chromatographic procedure based on a method for determining beta-lactams in milk was also used to identify and quantitate residues. The 2 methods were tested with 24 tissue FSIS samples classified as containing UMIs. Of these, 3 contained penicillin G, including one at a violative level, and 5 contained a metabolite of ceftiofur. The others were negative for beta-lactam antibiotics.

  18. Positive-unlabeled learning for disease gene identification

    PubMed Central

    Yang, Peng; Li, Xiao-Li; Mei, Jian-Ping; Kwoh, Chee-Keong; Ng, See-Kiong

    2012-01-01

    Background: Identifying disease genes from human genome is an important but challenging task in biomedical research. Machine learning methods can be applied to discover new disease genes based on the known ones. Existing machine learning methods typically use the known disease genes as the positive training set P and the unknown genes as the negative training set N (non-disease gene set does not exist) to build classifiers to identify new disease genes from the unknown genes. However, such kind of classifiers is actually built from a noisy negative set N as there can be unknown disease genes in N itself. As a result, the classifiers do not perform as well as they could be. Result: Instead of treating the unknown genes as negative examples in N, we treat them as an unlabeled set U. We design a novel positive-unlabeled (PU) learning algorithm PUDI (PU learning for disease gene identification) to build a classifier using P and U. We first partition U into four sets, namely, reliable negative set RN, likely positive set LP, likely negative set LN and weak negative set WN. The weighted support vector machines are then used to build a multi-level classifier based on the four training sets and positive training set P to identify disease genes. Our experimental results demonstrate that our proposed PUDI algorithm outperformed the existing methods significantly. Conclusion: The proposed PUDI algorithm is able to identify disease genes more accurately by treating the unknown data more appropriately as unlabeled set U instead of negative set N. Given that many machine learning problems in biomedical research do involve positive and unlabeled data instead of negative data, it is possible that the machine learning methods for these problems can be further improved by adopting PU learning methods, as we have done here for disease gene identification. Availability and implementation: The executable program and data are available at http://www1.i2r.a-star.edu.sg/∼xlli/PUDI/PUDI.html. Contact: xlli@i2r.a-star.edu.sg or yang0293@e.ntu.edu.sg Supplementary information: Supplementary Data are available at Bioinformatics online. PMID:22923290

  19. Ranking and combining multiple predictors without labeled data

    PubMed Central

    Parisi, Fabio; Strino, Francesco; Nadler, Boaz; Kluger, Yuval

    2014-01-01

    In a broad range of classification and decision-making problems, one is given the advice or predictions of several classifiers, of unknown reliability, over multiple questions or queries. This scenario is different from the standard supervised setting, where each classifier’s accuracy can be assessed using available labeled data, and raises two questions: Given only the predictions of several classifiers over a large set of unlabeled test data, is it possible to (i) reliably rank them and (ii) construct a metaclassifier more accurate than most classifiers in the ensemble? Here we present a spectral approach to address these questions. First, assuming conditional independence between classifiers, we show that the off-diagonal entries of their covariance matrix correspond to a rank-one matrix. Moreover, the classifiers can be ranked using the leading eigenvector of this covariance matrix, because its entries are proportional to their balanced accuracies. Second, via a linear approximation to the maximum likelihood estimator, we derive the Spectral Meta-Learner (SML), an unsupervised ensemble classifier whose weights are equal to these eigenvector entries. On both simulated and real data, SML typically achieves a higher accuracy than most classifiers in the ensemble and can provide a better starting point than majority voting for estimating the maximum likelihood solution. Furthermore, SML is robust to the presence of small malicious groups of classifiers designed to veer the ensemble prediction away from the (unknown) ground truth. PMID:24474744

  20. Motor vehicle fuel analyzer

    DOEpatents

    Hoffheins, B.S.; Lauf, R.J.

    1997-08-05

    A gas detecting system is described for classifying the type of liquid fuel in a container or tank. The system includes a plurality of semiconductor gas sensors, each of which differs from the other in its response to various organic vapors. The system includes a means of processing the responses of the plurality of sensors such that the responses to any particular organic substance or mixture is sufficiently distinctive to constitute a recognizable ``signature``. The signature of known substances are collected and divided into two classes based on some other known characteristic of the substances. A pattern recognition system classifies the signature of an unknown substance with reference to the two user-defined classes, thereby classifying the unknown substance with regard to the characteristic of interest, such as its suitability for a particular use. 14 figs.

  1. Motor vehicle fuel analyzer

    DOEpatents

    Hoffheins, Barbara S.; Lauf, Robert J.

    1997-01-01

    A gas detecting system for classifying the type of liquid fuel in a container or tank. The system includes a plurality of semiconductor gas sensors, each of which differs from the other in its response to various organic vapors. The system includes a means of processing the responses of the plurality of sensors such that the responses to any particular organic substance or mixture is sufficiently distinctive to constitute a recognizable "signature". The signature of known substances are collected and divided into two classes based on some other known characteristic of the substances. A pattern recognition system classifies the signature of an unknown substance with reference to the two user-defined classes, thereby classifying the unknown substance with regard to the characteristic of interest, such as its suitability for a particular use.

  2. The Cellient System for Paraffin Histology Can Be Combined with HPV Testing and Morphotyping the Vaginal Microbiome Thanks to BoonFixing

    PubMed Central

    Boon, Mathilde E.

    2013-01-01

    The Cellient Automated Cell Block System (Hologic) can be used to process cervical scrapes to paraffin sections. For the first study on this subject, cervical scrapes were fixed in the formalin-free fixative BoonFix. This pilot study was limited to cases classified as atypical squamous lesion of unknown significance (ASCUS) and high-grade squamous lesion (HSIL) as diagnosed in the ThinPrep slide. The Cellient paraffin sections were classified into negative, atypical, CIN 1, CIN 2, and CIN 3. Multiple HPV genotypes were encountered in 79% of the scrapes. This study showed that the Cellient system for paraffin sections can be combined with HPV testing thanks to the formalin-free BoonFix. In two additional studies it was shown that such samples can also be used for morphotyping the vaginal microbiome and preparing cytologic ThinPrep slides. PMID:23577033

  3. The Cellient System for Paraffin Histology Can Be Combined with HPV Testing and Morphotyping the Vaginal Microbiome Thanks to BoonFixing.

    PubMed

    Boon, Mathilde E

    2013-01-01

    The Cellient Automated Cell Block System (Hologic) can be used to process cervical scrapes to paraffin sections. For the first study on this subject, cervical scrapes were fixed in the formalin-free fixative BoonFix. This pilot study was limited to cases classified as atypical squamous lesion of unknown significance (ASCUS) and high-grade squamous lesion (HSIL) as diagnosed in the ThinPrep slide. The Cellient paraffin sections were classified into negative, atypical, CIN 1, CIN 2, and CIN 3. Multiple HPV genotypes were encountered in 79% of the scrapes. This study showed that the Cellient system for paraffin sections can be combined with HPV testing thanks to the formalin-free BoonFix. In two additional studies it was shown that such samples can also be used for morphotyping the vaginal microbiome and preparing cytologic ThinPrep slides.

  4. Classification and authentication of unknown water samples using machine learning algorithms.

    PubMed

    Kundu, Palash K; Panchariya, P C; Kundu, Madhusree

    2011-07-01

    This paper proposes the development of water sample classification and authentication, in real life which is based on machine learning algorithms. The proposed techniques used experimental measurements from a pulse voltametry method which is based on an electronic tongue (E-tongue) instrumentation system with silver and platinum electrodes. E-tongue include arrays of solid state ion sensors, transducers even of different types, data collectors and data analysis tools, all oriented to the classification of liquid samples and authentication of unknown liquid samples. The time series signal and the corresponding raw data represent the measurement from a multi-sensor system. The E-tongue system, implemented in a laboratory environment for 6 numbers of different ISI (Bureau of Indian standard) certified water samples (Aquafina, Bisleri, Kingfisher, Oasis, Dolphin, and McDowell) was the data source for developing two types of machine learning algorithms like classification and regression. A water data set consisting of 6 numbers of sample classes containing 4402 numbers of features were considered. A PCA (principal component analysis) based classification and authentication tool was developed in this study as the machine learning component of the E-tongue system. A proposed partial least squares (PLS) based classifier, which was dedicated as well; to authenticate a specific category of water sample evolved out as an integral part of the E-tongue instrumentation system. The developed PCA and PLS based E-tongue system emancipated an overall encouraging authentication percentage accuracy with their excellent performances for the aforesaid categories of water samples. Copyright © 2011 ISA. Published by Elsevier Ltd. All rights reserved.

  5. Automated analysis of food-borne pathogens using a novel microbial cell culture, sensing and classification system.

    PubMed

    Xiang, Kun; Li, Yinglei; Ford, William; Land, Walker; Schaffer, J David; Congdon, Robert; Zhang, Jing; Sadik, Omowunmi

    2016-02-21

    We hereby report the design and implementation of an Autonomous Microbial Cell Culture and Classification (AMC(3)) system for rapid detection of food pathogens. Traditional food testing methods require multistep procedures and long incubation period, and are thus prone to human error. AMC(3) introduces a "one click approach" to the detection and classification of pathogenic bacteria. Once the cultured materials are prepared, all operations are automatic. AMC(3) is an integrated sensor array platform in a microbial fuel cell system composed of a multi-potentiostat, an automated data collection system (Python program, Yocto Maxi-coupler electromechanical relay module) and a powerful classification program. The classification scheme consists of Probabilistic Neural Network (PNN), Support Vector Machines (SVM) and General Regression Neural Network (GRNN) oracle-based system. Differential Pulse Voltammetry (DPV) is performed on standard samples or unknown samples. Then, using preset feature extractions and quality control, accepted data are analyzed by the intelligent classification system. In a typical use, thirty-two extracted features were analyzed to correctly classify the following pathogens: Escherichia coli ATCC#25922, Escherichia coli ATCC#11775, and Staphylococcus epidermidis ATCC#12228. 85.4% accuracy range was recorded for unknown samples, and within a shorter time period than the industry standard of 24 hours.

  6. Simultaneous beta and gamma spectroscopy

    DOEpatents

    Farsoni, Abdollah T.; Hamby, David M.

    2010-03-23

    A phoswich radiation detector for simultaneous spectroscopy of beta rays and gamma rays includes three scintillators with different decay time characteristics. Two of the three scintillators are used for beta detection and the third scintillator is used for gamma detection. A pulse induced by an interaction of radiation with the detector is digitally analyzed to classify the type of event as beta, gamma, or unknown. A pulse is classified as a beta event if the pulse originated from just the first scintillator alone or from just the first and the second scintillator. A pulse from just the third scintillator is recorded as gamma event. Other pulses are rejected as unknown events.

  7. An expert system shell for inferring vegetation characteristics

    NASA Technical Reports Server (NTRS)

    Harrison, P. Ann; Harrison, Patrick R.

    1992-01-01

    The NASA VEGetation Workbench (VEG) is a knowledge based system that infers vegetation characteristics from reflectance data. The report describes the extensions that have been made to the first generation version of VEG. An interface to a file of unkown cover type data has been constructed. An interface that allows the results of VEG to be written to a file has been implemented. A learning system that learns class descriptions from a data base of historical cover type data and then uses the learned class descriptions to classify an unknown sample has been built. This system has an interface that integrates it into the rest of VEG. The VEG subgoal PROPORTION.GROUND.COVER has been completed and a number of additional techniques that infer the proportion ground cover of a sample have been implemented.

  8. Classification of jet fuels by fuzzy rule-building expert systems applied to three-way data by fast gas chromatography--fast scanning quadrupole ion trap mass spectrometry.

    PubMed

    Sun, Xiaobo; Zimmermann, Carolyn M; Jackson, Glen P; Bunker, Christopher E; Harrington, Peter B

    2011-01-30

    A fast method that can be used to classify unknown jet fuel types or detect possible property changes in jet fuel physical properties is of paramount interest to national defense and the airline industries. While fast gas chromatography (GC) has been used with conventional mass spectrometry (MS) to study jet fuels, fast GC was combined with fast scanning MS and used to classify jet fuels into lot numbers or origin for the first time by using fuzzy rule-building expert system (FuRES) classifiers. In the process of building classifiers, the data were pretreated with and without wavelet transformation and evaluated with respect to performance. Principal component transformation was used to compress the two-way data images prior to classification. Jet fuel samples were successfully classified with 99.8 ± 0.5% accuracy for both with and without wavelet compression. Ten bootstrapped Latin partitions were used to validate the generalized prediction accuracy. Optimized partial least squares (o-PLS) regression results were used as positively biased references for comparing the FuRES prediction results. The prediction results for the jet fuel samples obtained with these two methods were compared statistically. The projected difference resolution (PDR) method was also used to evaluate the fast GC and fast MS data. Two batches of aliquots of ten new samples were prepared and run independently 4 days apart to evaluate the robustness of the method. The only change in classification parameters was the use of polynomial retention time alignment to correct for drift that occurred during the 4-day span of the two collections. FuRES achieved perfect classifications for four models of uncompressed three-way data. This fast GC/fast MS method furnishes characteristics of high speed, accuracy, and robustness. This mode of measurement may be useful as a monitoring tool to track changes in the chemical composition of fuels that may also lead to property changes. Copyright © 2010 Elsevier B.V. All rights reserved.

  9. [Using neural networks based template matching method to obtain redshifts of normal galaxies].

    PubMed

    Xu, Xin; Luo, A-li; Wu, Fu-chao; Zhao, Yong-heng

    2005-06-01

    Galaxies can be divided into two classes: normal galaxy (NG) and active galaxy (AG). In order to determine NG redshifts, an automatic effective method is proposed in this paper, which consists of the following three main steps: (1) From the template of normal galaxy, the two sets of samples are simulated, one with the redshift of 0.0-0.3, the other of 0.3-0.5, then the PCA is used to extract the main components, and train samples are projected to the main component subspace to obtain characteristic spectra. (2) The characteristic spectra are used to train a Probabilistic Neural Network to obtain a Bayes classifier. (3) An unknown real NG spectrum is first inputted to this Bayes classifier to determine the possible range of redshift, then the template matching is invoked to locate the redshift value within the estimated range. Compared with the traditional template matching technique with an unconstrained range, our proposed method not only halves the computational load, but also increases the estimation accuracy. As a result, the proposed method is particularly useful for automatic spectrum processing produced from a large-scale sky survey project.

  10. Movement activity based classification of animal behaviour with an application to data from cheetah (Acinonyx jubatus).

    PubMed

    Grünewälder, Steffen; Broekhuis, Femke; Macdonald, David Whyte; Wilson, Alan Martin; McNutt, John Weldon; Shawe-Taylor, John; Hailes, Stephen

    2012-01-01

    We propose a new method, based on machine learning techniques, for the analysis of a combination of continuous data from dataloggers and a sampling of contemporaneous behaviour observations. This data combination provides an opportunity for biologists to study behaviour at a previously unknown level of detail and accuracy; however, continuously recorded data are of little use unless the resulting large volumes of raw data can be reliably translated into actual behaviour. We address this problem by applying a Support Vector Machine and a Hidden-Markov Model that allows us to classify an animal's behaviour using a small set of field observations to calibrate continuously recorded activity data. Such classified data can be applied quantitatively to the behaviour of animals over extended periods and at times during which observation is difficult or impossible. We demonstrate the usefulness of the method by applying it to data from six cheetah (Acinonyx jubatus) in the Okavango Delta, Botswana. Cumulative activity data scores were recorded every five minutes by accelerometers embedded in GPS radio-collars for around one year on average. Direct behaviour sampling of each of the six cheetah were collected in the field for comparatively short periods. Using this approach we are able to classify each five minute activity score into a set of three key behaviour (feeding, mobile and stationary), creating a continuous behavioural sequence for the entire period for which the collars were deployed. Evaluation of our classifier with cross-validation shows the accuracy to be 83%-94%, but that the accuracy for individual classes is reduced with decreasing sample size of direct observations. We demonstrate how these processed data can be used to study behaviour identifying seasonal and gender differences in daily activity and feeding times. Results given here are unlike any that could be obtained using traditional approaches in both accuracy and detail.

  11. Movement Activity Based Classification of Animal Behaviour with an Application to Data from Cheetah (Acinonyx jubatus)

    PubMed Central

    Grünewälder, Steffen; Broekhuis, Femke; Macdonald, David Whyte; Wilson, Alan Martin; McNutt, John Weldon; Shawe-Taylor, John; Hailes, Stephen

    2012-01-01

    We propose a new method, based on machine learning techniques, for the analysis of a combination of continuous data from dataloggers and a sampling of contemporaneous behaviour observations. This data combination provides an opportunity for biologists to study behaviour at a previously unknown level of detail and accuracy; however, continuously recorded data are of little use unless the resulting large volumes of raw data can be reliably translated into actual behaviour. We address this problem by applying a Support Vector Machine and a Hidden-Markov Model that allows us to classify an animal's behaviour using a small set of field observations to calibrate continuously recorded activity data. Such classified data can be applied quantitatively to the behaviour of animals over extended periods and at times during which observation is difficult or impossible. We demonstrate the usefulness of the method by applying it to data from six cheetah (Acinonyx jubatus) in the Okavango Delta, Botswana. Cumulative activity data scores were recorded every five minutes by accelerometers embedded in GPS radio-collars for around one year on average. Direct behaviour sampling of each of the six cheetah were collected in the field for comparatively short periods. Using this approach we are able to classify each five minute activity score into a set of three key behaviour (feeding, mobile and stationary), creating a continuous behavioural sequence for the entire period for which the collars were deployed. Evaluation of our classifier with cross-validation shows the accuracy to be , but that the accuracy for individual classes is reduced with decreasing sample size of direct observations. We demonstrate how these processed data can be used to study behaviour identifying seasonal and gender differences in daily activity and feeding times. Results given here are unlike any that could be obtained using traditional approaches in both accuracy and detail. PMID:23185301

  12. Sample classification for improved performance of PLS models applied to the quality control of deep-frying oils of different botanic origins analyzed using ATR-FTIR spectroscopy.

    PubMed

    Kuligowski, Julia; Carrión, David; Quintás, Guillermo; Garrigues, Salvador; de la Guardia, Miguel

    2011-01-01

    The selection of an appropriate calibration set is a critical step in multivariate method development. In this work, the effect of using different calibration sets, based on a previous classification of unknown samples, on the partial least squares (PLS) regression model performance has been discussed. As an example, attenuated total reflection (ATR) mid-infrared spectra of deep-fried vegetable oil samples from three botanical origins (olive, sunflower, and corn oil), with increasing polymerized triacylglyceride (PTG) content induced by a deep-frying process were employed. The use of a one-class-classifier partial least squares-discriminant analysis (PLS-DA) and a rooted binary directed acyclic graph tree provided accurate oil classification. Oil samples fried without foodstuff could be classified correctly, independent of their PTG content. However, class separation of oil samples fried with foodstuff, was less evident. The combined use of double-cross model validation with permutation testing was used to validate the obtained PLS-DA classification models, confirming the results. To discuss the usefulness of the selection of an appropriate PLS calibration set, the PTG content was determined by calculating a PLS model based on the previously selected classes. In comparison to a PLS model calculated using a pooled calibration set containing samples from all classes, the root mean square error of prediction could be improved significantly using PLS models based on the selected calibration sets using PLS-DA, ranging between 1.06 and 2.91% (w/w).

  13. Genetic Diversity of Avian Paramyxovirus Type 6 Isolated from Wild Ducks in the Republic of Korea.

    PubMed

    Choi, Kang-Seuk; Kim, Ji-Ye; Lee, Hyun-Jeong; Jang, Min-Jun; Kwon, Hyuk-Moo; Sung, Haan-Woo

    2018-03-08

    Eleven avian paramyxovirus type 6 (APMV-6) isolates from Eurasian Wigeon ( n=5; Anas penelope), Mallards ( n=2; Anas platyrhynchos), and unknown species of wild ducks ( n=4) from Korea were analyzed based on the nucleotide (nt) and deduced amino acid (aa) sequences of the fusion (F) gene. Fecal samples were collected in 2010-2014. Genotypes were assigned based on phylogenetic analyses. Our results revealed that APMV-6 could be classified into at least two distinct genotypes, G1 and G2. The open reading frame (ORF) of the G1 genotype was 1,668 nt in length, and the putative F0 cleavage site sequence was 113 PAPEPRL 119 . The G2 genotype viruses included five isolates from Eurasian wigeons and four isolates from unknown waterfowl species, together with two reference APMV-6 strains from the Red-necked Stint ( Calidris ruficollis) from Japan and an unknown duck from Italy. There was an N-truncated ORF (1,638 nt), due to an N-terminal truncation of 30 nt in the signal peptide region of the F gene, and the putative F0 cleavage site sequence was 103 SIREPRL 109 . The genetic diversity and ecology of APMV-6 are discussed.

  14. Characteristic Cytokine and Chemokine Profiles in Encephalitis of Infectious, Immune-Mediated, and Unknown Aetiology

    PubMed Central

    Michael, Benedict D.; Griffiths, Michael J.; Granerod, Julia; Brown, David; Davies, Nicholas W. S.; Borrow, Ray; Solomon, Tom

    2016-01-01

    Background Encephalitis is parenchymal brain inflammation due to infectious or immune-mediated processes. However, in 15–60% the cause remains unknown. This study aimed to determine if the cytokine/chemokine-mediated host response can distinguish infectious from immune-mediated cases, and whether this may give a clue to aetiology in those of unknown cause. Methods We measured 38 mediators in serum and cerebrospinal fluid (CSF) of patients from the Health Protection Agency Encephalitis Study. Of serum from 78 patients, 38 had infectious, 20 immune-mediated, and 20 unknown aetiology. Of CSF from 37 patients, 20 had infectious, nine immune-mediated and eight unknown aetiology. Results Heat-map analysis of CSF mediator interactions was different for infectious and immune-mediated cases, and that of the unknown aetiology group was similar to the infectious pattern. Higher myeloperoxidase (MPO) concentrations were found in infectious than immune-mediated cases, in serum and CSF (p = 0.01 and p = 0.006). Serum MPO was also higher in unknown than immune-mediated cases (p = 0.03). Multivariate analysis selected serum MPO; classifying 31 (91%) as infectious (p = 0.008) and 17 (85%) as unknown (p = 0.009) as opposed to immune-mediated. CSF data also selected MPO classifying 11 (85%) as infectious as opposed to immune-mediated (p = 0.036). CSF neutrophils were detected in eight (62%) infective and one (14%) immune-mediated cases (p = 0.004); CSF MPO correlated with neutrophils (p<0.0001). Conclusions Mediator profiles of infectious aetiology differed from immune-mediated encephalitis; and those of unknown cause were similar to infectious cases, raising the hypothesis of a possible undiagnosed infectious cause. Particularly, neutrophils and MPO merit further investigation. PMID:26808276

  15. Characteristic Cytokine and Chemokine Profiles in Encephalitis of Infectious, Immune-Mediated, and Unknown Aetiology.

    PubMed

    Michael, Benedict D; Griffiths, Michael J; Granerod, Julia; Brown, David; Davies, Nicholas W S; Borrow, Ray; Solomon, Tom

    2016-01-01

    Encephalitis is parenchymal brain inflammation due to infectious or immune-mediated processes. However, in 15-60% the cause remains unknown. This study aimed to determine if the cytokine/chemokine-mediated host response can distinguish infectious from immune-mediated cases, and whether this may give a clue to aetiology in those of unknown cause. We measured 38 mediators in serum and cerebrospinal fluid (CSF) of patients from the Health Protection Agency Encephalitis Study. Of serum from 78 patients, 38 had infectious, 20 immune-mediated, and 20 unknown aetiology. Of CSF from 37 patients, 20 had infectious, nine immune-mediated and eight unknown aetiology. Heat-map analysis of CSF mediator interactions was different for infectious and immune-mediated cases, and that of the unknown aetiology group was similar to the infectious pattern. Higher myeloperoxidase (MPO) concentrations were found in infectious than immune-mediated cases, in serum and CSF (p = 0.01 and p = 0.006). Serum MPO was also higher in unknown than immune-mediated cases (p = 0.03). Multivariate analysis selected serum MPO; classifying 31 (91%) as infectious (p = 0.008) and 17 (85%) as unknown (p = 0.009) as opposed to immune-mediated. CSF data also selected MPO classifying 11 (85%) as infectious as opposed to immune-mediated (p = 0.036). CSF neutrophils were detected in eight (62%) infective and one (14%) immune-mediated cases (p = 0.004); CSF MPO correlated with neutrophils (p<0.0001). Mediator profiles of infectious aetiology differed from immune-mediated encephalitis; and those of unknown cause were similar to infectious cases, raising the hypothesis of a possible undiagnosed infectious cause. Particularly, neutrophils and MPO merit further investigation.

  16. Technical support for creating an artificial intelligence system for feature extraction and experimental design

    NASA Technical Reports Server (NTRS)

    Glick, B. J.

    1985-01-01

    Techniques for classifying objects into groups or clases go under many different names including, most commonly, cluster analysis. Mathematically, the general problem is to find a best mapping of objects into an index set consisting of class identifiers. When an a priori grouping of objects exists, the process of deriving the classification rules from samples of classified objects is known as discrimination. When such rules are applied to objects of unknown class, the process is denoted classification. The specific problem addressed involves the group classification of a set of objects that are each associated with a series of measurements (ratio, interval, ordinal, or nominal levels of measurement). Each measurement produces one variable in a multidimensional variable space. Cluster analysis techniques are reviewed and methods for incuding geographic location, distance measures, and spatial pattern (distribution) as parameters in clustering are examined. For the case of patterning, measures of spatial autocorrelation are discussed in terms of the kind of data (nominal, ordinal, or interval scaled) to which they may be applied.

  17. Early detection of disease: The correlation of the volatile organic profiles from patients with upper respiratory infections with subjects of normal profiles

    NASA Technical Reports Server (NTRS)

    Zlatkis, A.

    1979-01-01

    A method is described whereby a transevaporator is used for sampling 60-100 microns of aqueous sample. Volatiles are stripped from the sample either by a stream of helium and collection on a porous polymer, Tenax, or by 0.8 ml of 2-chloropropane and collected on glass beads. The volatiles are thermally desorbed into a precolumn which is connected to a capillary gas chromatographic column for analysis. The technique is shown to be reproducible and suitable for determining chromatographic profiles for a wide variety of sample types. Using a transevaporator sampling technique, the volatile profiles from 70 microns of serum were obtained by capillary column gas chromatography. The complex chromatograms were interpreted by a combination of manual and computer techniques and a two peak ratio method devised for the classification of normal and virus infected sera. Using the K-Nearest Neighbor approach, 85.7 percent of the unknown samples were classified correctly. Some preliminary results indicate the possible use of the method for the assessment of virus susceptibility.

  18. Bayes Error Rate Estimation Using Classifier Ensembles

    NASA Technical Reports Server (NTRS)

    Tumer, Kagan; Ghosh, Joydeep

    2003-01-01

    The Bayes error rate gives a statistical lower bound on the error achievable for a given classification problem and the associated choice of features. By reliably estimating th is rate, one can assess the usefulness of the feature set that is being used for classification. Moreover, by comparing the accuracy achieved by a given classifier with the Bayes rate, one can quantify how effective that classifier is. Classical approaches for estimating or finding bounds for the Bayes error, in general, yield rather weak results for small sample sizes; unless the problem has some simple characteristics, such as Gaussian class-conditional likelihoods. This article shows how the outputs of a classifier ensemble can be used to provide reliable and easily obtainable estimates of the Bayes error with negligible extra computation. Three methods of varying sophistication are described. First, we present a framework that estimates the Bayes error when multiple classifiers, each providing an estimate of the a posteriori class probabilities, a recombined through averaging. Second, we bolster this approach by adding an information theoretic measure of output correlation to the estimate. Finally, we discuss a more general method that just looks at the class labels indicated by ensem ble members and provides error estimates based on the disagreements among classifiers. The methods are illustrated for artificial data, a difficult four-class problem involving underwater acoustic data, and two problems from the Problem benchmarks. For data sets with known Bayes error, the combiner-based methods introduced in this article outperform existing methods. The estimates obtained by the proposed methods also seem quite reliable for the real-life data sets for which the true Bayes rates are unknown.

  19. Hyperspectral analysis of clay minerals

    NASA Astrophysics Data System (ADS)

    Janaki Rama Suresh, G.; Sreenivas, K.; Sivasamy, R.

    2014-11-01

    A study was carried out by collecting soil samples from parts of Gwalior and Shivpuri district, Madhya Pradesh in order to assess the dominant clay mineral of these soils using hyperspectral data, as 0.4 to 2.5 μm spectral range provides abundant and unique information about many important earth-surface minerals. Understanding the spectral response along with the soil chemical properties can provide important clues for retrieval of mineralogical soil properties. The soil samples were collected based on stratified random sampling approach and dominant clay minerals were identified through XRD analysis. The absorption feature parameters like depth, width, area and asymmetry of the absorption peaks were derived from spectral profile of soil samples through DISPEC tool. The derived absorption feature parameters were used as inputs for modelling the dominant soil clay mineral present in the unknown samples using Random forest approach which resulted in kappa accuracy of 0.795. Besides, an attempt was made to classify the Hyperion data using Spectral Angle Mapper (SAM) algorithm with an overall accuracy of 68.43 %. Results showed that kaolinite was the dominant mineral present in the soils followed by montmorillonite in the study area.

  20. vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria.

    PubMed

    Bolduc, Benjamin; Jang, Ho Bin; Doulcier, Guilhem; You, Zhi-Qiang; Roux, Simon; Sullivan, Matthew B

    2017-01-01

    Taxonomic classification of archaeal and bacterial viruses is challenging, yet also fundamental for developing a predictive understanding of microbial ecosystems. Recent identification of hundreds of thousands of new viral genomes and genome fragments, whose hosts remain unknown, requires a paradigm shift away from traditional classification approaches and towards the use of genomes for taxonomy. Here we revisited the use of genomes and their protein content as a means for developing a viral taxonomy for bacterial and archaeal viruses. A network-based analytic was evaluated and benchmarked against authority-accepted taxonomic assignments and found to be largely concordant. Exceptions were manually examined and found to represent areas of viral genome 'sequence space' that are under-sampled or prone to excessive genetic exchange. While both cases are poorly resolved by genome-based taxonomic approaches, the former will improve as viral sequence space is better sampled and the latter are uncommon. Finally, given the largely robust taxonomic capabilities of this approach, we sought to enable researchers to easily and systematically classify new viruses. Thus, we established a tool, vConTACT, as an app at iVirus, where it operates as a fast, highly scalable, user-friendly app within the free and powerful CyVerse cyberinfrastructure.

  1. vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria

    PubMed Central

    Doulcier, Guilhem; You, Zhi-Qiang; Roux, Simon

    2017-01-01

    Taxonomic classification of archaeal and bacterial viruses is challenging, yet also fundamental for developing a predictive understanding of microbial ecosystems. Recent identification of hundreds of thousands of new viral genomes and genome fragments, whose hosts remain unknown, requires a paradigm shift away from traditional classification approaches and towards the use of genomes for taxonomy. Here we revisited the use of genomes and their protein content as a means for developing a viral taxonomy for bacterial and archaeal viruses. A network-based analytic was evaluated and benchmarked against authority-accepted taxonomic assignments and found to be largely concordant. Exceptions were manually examined and found to represent areas of viral genome ‘sequence space’ that are under-sampled or prone to excessive genetic exchange. While both cases are poorly resolved by genome-based taxonomic approaches, the former will improve as viral sequence space is better sampled and the latter are uncommon. Finally, given the largely robust taxonomic capabilities of this approach, we sought to enable researchers to easily and systematically classify new viruses. Thus, we established a tool, vConTACT, as an app at iVirus, where it operates as a fast, highly scalable, user-friendly app within the free and powerful CyVerse cyberinfrastructure. PMID:28480138

  2. vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bolduc, Benjamin; Jang, Ho Bin; Doulcier, Guilhem

    Taxonomic classification of archaeal and bacterial viruses is challenging, yet also fundamental for developing a predictive understanding of microbial ecosystems. Recent identification of hundreds of thousands of new viral genomes and genome fragments, whose hosts remain unknown, requires a paradigm shift away from traditional classification approaches and towards the use of genomes for taxonomy. Here we revisited the use of genomes and their protein content as a means for developing a viral taxonomy for bacterial and archaeal viruses. A network-based analytic was evaluated and benchmarked against authority-accepted taxonomic assignments and found to be largely concordant. Exceptions were manually examined andmore » found to represent areas of viral genome ‘sequence space’ that are under-sampled or prone to excessive genetic exchange. While both cases are poorly resolved by genome-based taxonomic approaches, the former will improve as viral sequence space is better sampled and the latter are uncommon. Finally, given the largely robust taxonomic capabilities of this approach, we sought to enable researchers to easily and systematically classify new viruses. Thus, we established a tool, vConTACT, as an app at iVirus, where it operates as a fast, highly scalable, user-friendly app within the free and powerful CyVerse cyberinfrastructure.« less

  3. vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria

    DOE PAGES

    Bolduc, Benjamin; Jang, Ho Bin; Doulcier, Guilhem; ...

    2017-05-03

    Taxonomic classification of archaeal and bacterial viruses is challenging, yet also fundamental for developing a predictive understanding of microbial ecosystems. Recent identification of hundreds of thousands of new viral genomes and genome fragments, whose hosts remain unknown, requires a paradigm shift away from traditional classification approaches and towards the use of genomes for taxonomy. Here we revisited the use of genomes and their protein content as a means for developing a viral taxonomy for bacterial and archaeal viruses. A network-based analytic was evaluated and benchmarked against authority-accepted taxonomic assignments and found to be largely concordant. Exceptions were manually examined andmore » found to represent areas of viral genome ‘sequence space’ that are under-sampled or prone to excessive genetic exchange. While both cases are poorly resolved by genome-based taxonomic approaches, the former will improve as viral sequence space is better sampled and the latter are uncommon. Finally, given the largely robust taxonomic capabilities of this approach, we sought to enable researchers to easily and systematically classify new viruses. Thus, we established a tool, vConTACT, as an app at iVirus, where it operates as a fast, highly scalable, user-friendly app within the free and powerful CyVerse cyberinfrastructure.« less

  4. Multilocus Sequence Typing of Cronobacter Strains Isolated from Retail Foods and Environmental Samples.

    PubMed

    Killer, Jiří; Skřivanová, Eva; Hochel, Igor; Marounek, Milan

    2015-06-01

    Cronobacter spp. are bacterial pathogens that affect children and immunocompromised adults. In this study, we used multilocus sequence typing (MLST) to determine sequence types (STs) in 11 Cronobacter spp. strains isolated from retail foods, 29 strains from dust samples obtained from vacuum cleaners, and 4 clinical isolates. Using biochemical tests, species-specific polymerase chain reaction, and MLST analysis, 36 strains were identified as Cronobacter sakazakii, and 6 were identified as Cronobacter malonaticus. In addition, one strain that originated from retail food and one from a dust sample from a vacuum cleaner were identified on the basis of MLST analysis as Cronobacter dublinensis and Cronobacter turicensis, respectively. Cronobacter spp. strains isolated from the retail foods were assigned to eight different MLST sequence types, seven of which were newly identified. The strains isolated from the dust samples were assigned to 7 known STs and 14 unknown STs. Three clinical isolates and one household dust isolate were assigned to ST4, which is the predominant ST associated with neonatal meningitis. One clinical isolate was classified based on MLST analysis as Cronobacter malonaticus and belonged to an as-yet-unknown ST. Three strains isolated from the household dust samples were assigned to ST1, which is another clinically significant ST. It can be concluded that Cronobacter spp. strains of different origin are genetically quite variable. The recovery of C. sakazakii strains belonging to ST1 and ST4 from the dust samples suggests the possibility that contamination could occur during food preparation. All of the novel STs and alleles for C. sakazakii, C. malonaticus, C. dublinensis, and C. turicensis determined in this study were deposited in the Cronobacter MLST database available online ( http://pubmlst.org/cronobacter/).

  5. Automatic Classification of Time-variable X-Ray Sources

    NASA Astrophysics Data System (ADS)

    Lo, Kitty K.; Farrell, Sean; Murphy, Tara; Gaensler, B. M.

    2014-05-01

    To maximize the discovery potential of future synoptic surveys, especially in the field of transient science, it will be necessary to use automatic classification to identify some of the astronomical sources. The data mining technique of supervised classification is suitable for this problem. Here, we present a supervised learning method to automatically classify variable X-ray sources in the Second XMM-Newton Serendipitous Source Catalog (2XMMi-DR2). Random Forest is our classifier of choice since it is one of the most accurate learning algorithms available. Our training set consists of 873 variable sources and their features are derived from time series, spectra, and other multi-wavelength contextual information. The 10 fold cross validation accuracy of the training data is ~97% on a 7 class data set. We applied the trained classification model to 411 unknown variable 2XMM sources to produce a probabilistically classified catalog. Using the classification margin and the Random Forest derived outlier measure, we identified 12 anomalous sources, of which 2XMM J180658.7-500250 appears to be the most unusual source in the sample. Its X-ray spectra is suggestive of a ultraluminous X-ray source but its variability makes it highly unusual. Machine-learned classification and anomaly detection will facilitate scientific discoveries in the era of all-sky surveys.

  6. Wire connector classification with machine vision and a novel hybrid SVM

    NASA Astrophysics Data System (ADS)

    Chauhan, Vedang; Joshi, Keyur D.; Surgenor, Brian W.

    2018-04-01

    A machine vision-based system has been developed and tested that uses a novel hybrid Support Vector Machine (SVM) in a part inspection application with clear plastic wire connectors. The application required the system to differentiate between 4 different known styles of connectors plus one unknown style, for a total of 5 classes. The requirement to handle an unknown class is what necessitated the hybrid approach. The system was trained with the 4 known classes and tested with 5 classes (the 4 known plus the 1 unknown). The hybrid classification approach used two layers of SVMs: one layer was semi-supervised and the other layer was supervised. The semi-supervised SVM was a special case of unsupervised machine learning that classified test images as one of the 4 known classes (to accept) or as the unknown class (to reject). The supervised SVM classified test images as one of the 4 known classes and consequently would give false positives (FPs). Two methods were tested. The difference between the methods was that the order of the layers was switched. The method with the semi-supervised layer first gave an accuracy of 80% with 20% FPs. The method with the supervised layer first gave an accuracy of 98% with 0% FPs. Further work is being conducted to see if the hybrid approach works with other applications that have an unknown class requirement.

  7. Realtime motion planning for a mobile robot in an unknown environment using a neurofuzzy based approach

    NASA Astrophysics Data System (ADS)

    Zheng, Taixiong

    2005-12-01

    A neuro-fuzzy network based approach for robot motion in an unknown environment was proposed. In order to control the robot motion in an unknown environment, the behavior of the robot was classified into moving to the goal and avoiding obstacles. Then, according to the dynamics of the robot and the behavior character of the robot in an unknown environment, fuzzy control rules were introduced to control the robot motion. At last, a 6-layer neuro-fuzzy network was designed to merge from what the robot sensed to robot motion control. After being trained, the network may be used for robot motion control. Simulation results show that the proposed approach is effective for robot motion control in unknown environment.

  8. Learning in the context of distribution drift

    DTIC Science & Technology

    2017-05-09

    published in the leading data mining journal, Data Mining and Knowledge Discovery (Webb et. al., 2016)1. We have shown that the previous qualitative...learner Low-bias learner Aggregated classifier Figure 7: Architecture for learning fr m streaming data in th co text of variable or unknown...Learning limited dependence Bayesian classifiers, in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD

  9. Speech and Language and Language Translation (SALT)

    DTIC Science & Technology

    2012-12-01

    Resources are classified as: Parallel Text Dictionaries Monolingual Text Other Dictionaries are further classified as: Text: can download entire...not clear how many are translated http://www.redsea-online.com/modules.php?name= dictionary Monolingual Text Monolingual Text; An Crubadan web...attached to a following word. A program could be written to detach the character د from unknown words, when the remaining word matches a dictionary

  10. Cocoa content influences chocolate molecular profile investigated by MALDI-TOF mass spectrometry.

    PubMed

    Bonatto, Cínthia C; Silva, Luciano P

    2015-06-01

    Chocolate authentication is a key aspect of quality control and safety. Matrix-assisted laser desorption ionization time-of flight (MALDI-TOF) mass spectrometry (MS) has been demonstrated to be useful for molecular profiling of cells, tissues, and even food. The present study evaluated if MALDI-TOF MS analysis on low molecular mass profile may classify chocolate samples according to the cocoa content. The molecular profiles of seven processed commercial chocolate samples were compared by using MALDI-TOF MS. Some ions detected exclusively in chocolate samples corresponded to the metabolites of cocoa or other constituents. This method showed the presence of three distinct clusters according to confectionery and sensorial features of the chocolates and was used to establish a mass spectra database. Also, novel chocolate samples were evaluated in order to check the validity of the method and to challenge the database created with the mass spectra of the primary samples. Thus, the method was shown to be reliable for clustering unknown samples into the main chocolate categories. Simple sample preparation of the MALDI-TOF MS approach described will allow the surveillance and monitoring of constituents during the molecular profiling of chocolates. © 2014 Society of Chemical Industry.

  11. Effectively identifying compound-protein interactions by learning from positive and unlabeled examples.

    PubMed

    Cheng, Zhanzhan; Zhou, Shuigeng; Wang, Yang; Liu, Hui; Guan, Jihong; Chen, Yi-Ping Phoebe

    2016-05-18

    Prediction of compound-protein interactions (CPIs) is to find new compound-protein pairs where a protein is targeted by at least a compound, which is a crucial step in new drug design. Currently, a number of machine learning based methods have been developed to predict new CPIs in the literature. However, as there is not yet any publicly available set of validated negative CPIs, most existing machine learning based approaches use the unknown interactions (not validated CPIs) selected randomly as the negative examples to train classifiers for predicting new CPIs. Obviously, this is not quite reasonable and unavoidably impacts the CPI prediction performance. In this paper, we simply take the unknown CPIs as unlabeled examples, and propose a new method called PUCPI (the abbreviation of PU learning for Compound-Protein Interaction identification) that employs biased-SVM (Support Vector Machine) to predict CPIs using only positive and unlabeled examples. PU learning is a class of learning methods that leans from positive and unlabeled (PU) samples. To the best of our knowledge, this is the first work that identifies CPIs using only positive and unlabeled examples. We first collect known CPIs as positive examples and then randomly select compound-protein pairs not in the positive set as unlabeled examples. For each CPI/compound-protein pair, we extract protein domains as protein features and compound substructures as chemical features, then take the tensor product of the corresponding compound features and protein features as the feature vector of the CPI/compound-protein pair. After that, biased-SVM is employed to train classifiers on different datasets of CPIs and compound-protein pairs. Experiments over various datasets show that our method outperforms six typical classifiers, including random forest, L1- and L2-regularized logistic regression, naive Bayes, SVM and k-nearest neighbor (kNN), and three types of existing CPI prediction models. Source code, datasets and related documents of PUCPI are available at: http://admis.fudan.edu.cn/projects/pucpi.html.

  12. Rapid analysis of pharmaceutical drugs using LIBS coupled with multivariate analysis.

    PubMed

    Tiwari, P K; Awasthi, S; Kumar, R; Anand, R K; Rai, P K; Rai, A K

    2018-02-01

    Type 2 diabetes drug tablets containing voglibose having dose strengths of 0.2 and 0.3 mg of various brands have been examined, using laser-induced breakdown spectroscopy (LIBS) technique. The statistical methods such as the principal component analysis (PCA) and the partial least square regression analysis (PLSR) have been employed on LIBS spectral data for classifying and developing the calibration models of drug samples. We have developed the ratio-based calibration model applying PLSR in which relative spectral intensity ratios H/C, H/N and O/N are used. Further, the developed model has been employed to predict the relative concentration of element in unknown drug samples. The experiment has been performed in air and argon atmosphere, respectively, and the obtained results have been compared. The present model provides rapid spectroscopic method for drug analysis with high statistical significance for online control and measurement process in a wide variety of pharmaceutical industrial applications.

  13. Employing Machine-Learning Methods to Study Young Stellar Objects

    NASA Astrophysics Data System (ADS)

    Moore, Nicholas

    2018-01-01

    Vast amounts of data exist in the astronomical data archives, and yet a large number of sources remain unclassified. We developed a multi-wavelength pipeline to classify infrared sources. The pipeline uses supervised machine learning methods to classify objects into the appropriate categories. The program is fed data that is already classified to train it, and is then applied to unknown catalogues. The primary use for such a pipeline is the rapid classification and cataloging of data that would take a much longer time to classify otherwise. While our primary goal is to study young stellar objects (YSOs), the applications extend beyond the scope of this project. We present preliminary results from our analysis and discuss future applications.

  14. OGLE-IV Real-Time Transient Search

    NASA Astrophysics Data System (ADS)

    Wyrzykowski, Ł.; Kostrzewa-Rutkowska, Z.; Kozłowski, S.; Udalski, A.; Poleski, R.; Skowron, J.; Blagorodnova, N.; Kubiak, M.; Szymański, M. K.; Pietrzyński, G.; Soszyński, I.; Ulaczyk, K.; Pietrukowicz, P.; Mróz, P.

    2014-09-01

    We present the design and first results of a real-time search for transients within the 650 sq. deg. area around the Magellanic Clouds, conducted as part of the OGLE-IV project and aimed at detecting supernovae, novae and other events. The average sampling of about four days from September to May, yielded a detection of 238 transients in 2012/2013 and 2013/2014 seasons. The superb photometric and astrometric quality of the OGLE data allows for numerous applications of the discovered transients. We use this sample to prepare and train a Machine Learning-based automated classifier for early light curves, which distinguishes major classes of transients with more than 80% of correct answers. Spectroscopically classified 49 supernovae Type Ia are used to construct a Hubble Diagram with statistical scatter of about 0.3 mag and fill the least populated region of the redshifts range in the Union sample. We investigate the influence of host galaxy environments on supernovae statistics and find the mean host extinction of AI=0.19±0.10 mag and AV=0.39±0.21 mag based on a subsample of supernovae Type Ia. We show that the positional accuracy of the survey is of the order of 0.5 pixels (0.13'') and that the OGLE-IV Transient Detection System is capable of detecting transients within the nuclei of galaxies. We present a few interesting cases of nuclear transients of unknown type. All data on the OGLE transients are made publicly available to the astronomical community via the OGLE website.

  15. Does an uneven sample size distribution across settings matter in cross-classified multilevel modeling? Results of a simulation study.

    PubMed

    Milliren, Carly E; Evans, Clare R; Richmond, Tracy K; Dunn, Erin C

    2018-06-06

    Recent advances in multilevel modeling allow for modeling non-hierarchical levels (e.g., youth in non-nested schools and neighborhoods) using cross-classified multilevel models (CCMM). Current practice is to cluster samples from one context (e.g., schools) and utilize the observations however they are distributed from the second context (e.g., neighborhoods). However, it is unknown whether an uneven distribution of sample size across these contexts leads to incorrect estimates of random effects in CCMMs. Using the school and neighborhood data structure in Add Health, we examined the effect of neighborhood sample size imbalance on the estimation of variance parameters in models predicting BMI. We differentially assigned students from a given school to neighborhoods within that school's catchment area using three scenarios of (im)balance. 1000 random datasets were simulated for each of five combinations of school- and neighborhood-level variance and imbalance scenarios, for a total of 15,000 simulated data sets. For each simulation, we calculated 95% CIs for the variance parameters to determine whether the true simulated variance fell within the interval. Across all simulations, the "true" school and neighborhood variance parameters were estimated 93-96% of the time. Only 5% of models failed to capture neighborhood variance; 6% failed to capture school variance. These results suggest that there is no systematic bias in the ability of CCMM to capture the true variance parameters regardless of the distribution of students across neighborhoods. Ongoing efforts to use CCMM are warranted and can proceed without concern for the sample imbalance across contexts. Copyright © 2018 Elsevier Ltd. All rights reserved.

  16. Characterization of Armillaria spp. from peach orchards in the southeastern United States using fatty acid methyl ester profiling.

    PubMed

    Cox, K D; Scherm, H; Riley, M B

    2006-04-01

    Limited information is available regarding the composition of cellular fatty acids in Armillaria and the extent to which fatty acid profiles can be used to characterize species in this genus. Fatty acid methyl ester (FAME) profiles generated from cultures of A. tabescens, A. mellea, and A. gallica consisted of 16-18 fatty acids ranging from 12-24 carbons in length, although some of these were present only in trace amounts. Across the three species, 9-cis,12-cis-octadecadienoic acid (9,12-C18:2), hexadecanoic acid (16:0), heneicosanoic acid (21:0), 9-cis-octadecenoic acid (9-C18:1), and 2-hydroxy-docosanoic acid (OH-22:0) were the most abundant fatty acids. FAME profiles from different thallus morphologies (mycelium, sclerotial crust, or rhizomorphs) displayed by cultures of A. gallica showed that thallus type had no significant effect on cellular fatty acid composition (P > 0.05), suggesting that FAME profiling is sufficiently robust for species differentiation despite potential differences in thallus morphology within and among species. The three Armillaria species included in this study could be distinguished from other lignicolous basidiomycete species commonly occurring on peach (Schizophyllum commune, Ganoderma lucidum, Stereum hirsutum, and Trametes versicolor) on the basis of FAME profiles using stepwise discriminant analysis (average squared canonical correlation = 0.953), whereby 9-C18:1, 9,12-C18:2, and 10-cis-hexadecenoic acid (10-C16:1) were the three strongest contributors. In a separate stepwise discriminant analysis, A. tabescens, A. mellea, and A. gallica were separated from one another based on their fatty acid profiles (average squared canonical correlation = 0.924), with 11-cis-octadecenoic acid (11-C18:1), 9-C18:1, and 2-hydroxy-hexadecanoic acid (OH-16:0) being most important for species separation. When fatty acids were extracted directly from mycelium dissected from naturally infected host tissue, the FAME-based discriminant functions developed in the preceding experiments classified all samples (n = 16) as A. tabescens; when applied to cultures derived from the same naturally infected samples, all unknowns were similarly classified as A. tabescens. Thus, FAME species classification of Armillaria unknowns directly from infected tissues may be feasible. Species designation of unknown Armillaria cultures by FAME analysis was identical to that indicated by IGS-RFLP classification with AluI.

  17. Indirect Tire Monitoring System - Machine Learning Approach

    NASA Astrophysics Data System (ADS)

    Svensson, O.; Thelin, S.; Byttner, S.; Fan, Y.

    2017-10-01

    The heavy vehicle industry has today no requirement to provide a tire pressure monitoring system by law. This has created issues surrounding unknown tire pressure and thread depth during active service. There is also no standardization for these kind of systems which means that different manufacturers and third party solutions work after their own principles and it can be hard to know what works for a given vehicle type. The objective is to create an indirect tire monitoring system that can generalize a method that detect both incorrect tire pressure and thread depth for different type of vehicles within a fleet without the need for additional physical sensors or vehicle specific parameters. The existing sensors that are connected communicate through CAN and are interpreted by the Drivec Bridge hardware that exist in the fleet. By using supervised machine learning a classifier was created for each axle where the main focus was the front axle which had the most issues. The classifier will classify the vehicles tires condition and will be implemented in Drivecs cloud service where it will receive its data. The resulting classifier is a random forest implemented in Python. The result from the front axle with a data set consisting of 9767 samples of buses with correct tire condition and 1909 samples of buses with incorrect tire condition it has an accuracy of 90.54% (0.96%). The data sets are created from 34 unique measurements from buses between January and May 2017. This classifier has been exported and is used inside a Node.js module created for Drivecs cloud service which is the result of the whole implementation. The developed solution is called Indirect Tire Monitoring System (ITMS) and is seen as a process. This process will predict bad classes in the cloud which will lead to warnings. The warnings are defined as incidents. They contain only the information needed and the bandwidth of the incidents are also controlled so incidents are created within an acceptable range over a period of time. These incidents will be notified through the cloud for the operator to analyze for upcoming maintenance decisions.

  18. Style consistent classification of isogenous patterns.

    PubMed

    Sarkar, Prateek; Nagy, George

    2005-01-01

    In many applications of pattern recognition, patterns appear together in groups (fields) that have a common origin. For example, a printed word is usually a field of character patterns printed in the same font. A common origin induces consistency of style in features measured on patterns. The features of patterns co-occurring in a field are statistically dependent because they share the same, albeit unknown, style. Style constrained classifiers achieve higher classification accuracy by modeling such dependence among patterns in a field. Effects of style consistency on the distributions of field-features (concatenation of pattern features) can be modeled by hierarchical mixtures. Each field derives from a mixture of styles, while, within a field, a pattern derives from a class-style conditional mixture of Gaussians. Based on this model, an optimal style constrained classifier processes entire fields of patterns rendered in a consistent but unknown style. In a laboratory experiment, style constrained classification reduced errors on fields of printed digits by nearly 25 percent over singlet classifiers. Longer fields favor our classification method because they furnish more information about the underlying style.

  19. Detection of inter-patient left and right bundle branch block heartbeats in ECG using ensemble classifiers

    PubMed Central

    2014-01-01

    Background Left bundle branch block (LBBB) and right bundle branch block (RBBB) not only mask electrocardiogram (ECG) changes that reflect diseases but also indicate important underlying pathology. The timely detection of LBBB and RBBB is critical in the treatment of cardiac diseases. Inter-patient heartbeat classification is based on independent training and testing sets to construct and evaluate a heartbeat classification system. Therefore, a heartbeat classification system with a high performance evaluation possesses a strong predictive capability for unknown data. The aim of this study was to propose a method for inter-patient classification of heartbeats to accurately detect LBBB and RBBB from the normal beat (NORM). Methods This study proposed a heartbeat classification method through a combination of three different types of classifiers: a minimum distance classifier constructed between NORM and LBBB; a weighted linear discriminant classifier between NORM and RBBB based on Bayesian decision making using posterior probabilities; and a linear support vector machine (SVM) between LBBB and RBBB. Each classifier was used with matching features to obtain better classification performance. The final types of the test heartbeats were determined using a majority voting strategy through the combination of class labels from the three classifiers. The optimal parameters for the classifiers were selected using cross-validation on the training set. The effects of different lead configurations on the classification results were assessed, and the performance of these three classifiers was compared for the detection of each pair of heartbeat types. Results The study results showed that a two-lead configuration exhibited better classification results compared with a single-lead configuration. The construction of a classifier with good performance between each pair of heartbeat types significantly improved the heartbeat classification performance. The results showed a sensitivity of 91.4% and a positive predictive value of 37.3% for LBBB and a sensitivity of 92.8% and a positive predictive value of 88.8% for RBBB. Conclusions A multi-classifier ensemble method was proposed based on inter-patient data and demonstrated a satisfactory classification performance. This approach has the potential for application in clinical practice to distinguish LBBB and RBBB from NORM of unknown patients. PMID:24903422

  20. Detection of inter-patient left and right bundle branch block heartbeats in ECG using ensemble classifiers.

    PubMed

    Huang, Huifang; Liu, Jie; Zhu, Qiang; Wang, Ruiping; Hu, Guangshu

    2014-06-05

    Left bundle branch block (LBBB) and right bundle branch block (RBBB) not only mask electrocardiogram (ECG) changes that reflect diseases but also indicate important underlying pathology. The timely detection of LBBB and RBBB is critical in the treatment of cardiac diseases. Inter-patient heartbeat classification is based on independent training and testing sets to construct and evaluate a heartbeat classification system. Therefore, a heartbeat classification system with a high performance evaluation possesses a strong predictive capability for unknown data. The aim of this study was to propose a method for inter-patient classification of heartbeats to accurately detect LBBB and RBBB from the normal beat (NORM). This study proposed a heartbeat classification method through a combination of three different types of classifiers: a minimum distance classifier constructed between NORM and LBBB; a weighted linear discriminant classifier between NORM and RBBB based on Bayesian decision making using posterior probabilities; and a linear support vector machine (SVM) between LBBB and RBBB. Each classifier was used with matching features to obtain better classification performance. The final types of the test heartbeats were determined using a majority voting strategy through the combination of class labels from the three classifiers. The optimal parameters for the classifiers were selected using cross-validation on the training set. The effects of different lead configurations on the classification results were assessed, and the performance of these three classifiers was compared for the detection of each pair of heartbeat types. The study results showed that a two-lead configuration exhibited better classification results compared with a single-lead configuration. The construction of a classifier with good performance between each pair of heartbeat types significantly improved the heartbeat classification performance. The results showed a sensitivity of 91.4% and a positive predictive value of 37.3% for LBBB and a sensitivity of 92.8% and a positive predictive value of 88.8% for RBBB. A multi-classifier ensemble method was proposed based on inter-patient data and demonstrated a satisfactory classification performance. This approach has the potential for application in clinical practice to distinguish LBBB and RBBB from NORM of unknown patients.

  1. Mobile/Modular BSL-4 Facilities for Meeting Restricted Earth Return Containment Requirements

    NASA Technical Reports Server (NTRS)

    Calaway, M. J.; McCubbin, F. M.; Allton, J. H.; Zeigler, R. A.; Pace, L. F.

    2017-01-01

    NASA robotic sample return missions designated Category V Restricted Earth Return by the NASA Planetary Protection Office require sample containment and biohazard testing in a receiving laboratory as directed by NASA Procedural Requirement (NPR) 8020.12D - ensuring the preservation and protection of Earth and the sample. Currently, NPR 8020.12D classifies Restricted Earth Return for robotic sample return missions from Mars, Europa, and Enceladus with the caveat that future proposed mission locations could be added or restrictions lifted on a case by case basis as scientific knowledge and understanding of biohazards progresses. Since the 1960s, sample containment from an unknown extraterrestrial biohazard have been related to the highest containment standards and protocols known to modern science. Today, Biosafety Level (BSL) 4 standards and protocols are used to study the most dangerous high-risk diseases and unknown biological agents on Earth. Over 30 BSL-4 facilities have been constructed worldwide with 12 residing in the United States; of theses, 8 are operational. In the last two decades, these brick and mortar facilities have cost in the hundreds of millions of dollars dependent on the facility requirements and size. Previous mission concept studies for constructing a NASA sample receiving facility with an integrated BSL-4 quarantine and biohazard testing facility have also been estimated in the hundreds of millions of dollars. As an alternative option, we have recently conducted an initial trade study for constructing a mobile and/or modular sample containment laboratory that would meet all BSL-4 and planetary protection standards and protocols at a faction of the cost. Mobile and modular BSL-2 and 3 facilities have been successfully constructed and deployed world-wide for government testing of pathogens and pharmaceutical production. Our study showed that a modular BSL-4 construction could result in approximately 90% cost reduction when compared to traditional construction methods without compromising the preservation of the sample or Earth.

  2. An assessment of the effectiveness of a random forest classifier for land-cover classification

    NASA Astrophysics Data System (ADS)

    Rodriguez-Galiano, V. F.; Ghimire, B.; Rogan, J.; Chica-Olmo, M.; Rigol-Sanchez, J. P.

    2012-01-01

    Land cover monitoring using remotely sensed data requires robust classification methods which allow for the accurate mapping of complex land cover and land use categories. Random forest (RF) is a powerful machine learning classifier that is relatively unknown in land remote sensing and has not been evaluated thoroughly by the remote sensing community compared to more conventional pattern recognition techniques. Key advantages of RF include: their non-parametric nature; high classification accuracy; and capability to determine variable importance. However, the split rules for classification are unknown, therefore RF can be considered to be black box type classifier. RF provides an algorithm for estimating missing values; and flexibility to perform several types of data analysis, including regression, classification, survival analysis, and unsupervised learning. In this paper, the performance of the RF classifier for land cover classification of a complex area is explored. Evaluation was based on several criteria: mapping accuracy, sensitivity to data set size and noise. Landsat-5 Thematic Mapper data captured in European spring and summer were used with auxiliary variables derived from a digital terrain model to classify 14 different land categories in the south of Spain. Results show that the RF algorithm yields accurate land cover classifications, with 92% overall accuracy and a Kappa index of 0.92. RF is robust to training data reduction and noise because significant differences in kappa values were only observed for data reduction and noise addition values greater than 50 and 20%, respectively. Additionally, variables that RF identified as most important for classifying land cover coincided with expectations. A McNemar test indicates an overall better performance of the random forest model over a single decision tree at the 0.00001 significance level.

  3. A Systems Engineering Survey of Artificial Intelligence and Smart Sensor Networks in a Network-Centric Environment

    DTIC Science & Technology

    2009-09-01

    problems, to better model the problem solving of computer systems. This research brought about the intertwining of AI and cognitive psychology . Much of...where symbol sequences are sequential intelligent states of the network, and must be classified as normal, abnormal , or unknown. These symbols...is associated with abnormal behavior; and abcbc is associated with unknown behavior, as it fits no known behavior. Predicted outcomes from

  4. Bayes estimation on parameters of the single-class classifier. [for remotely sensed crop data

    NASA Technical Reports Server (NTRS)

    Lin, G. C.; Minter, T. C.

    1976-01-01

    Normal procedures used for designing a Bayes classifier to classify wheat as the major crop of interest require not only training samples of wheat but also those of nonwheat. Therefore, ground truth must be available for the class of interest plus all confusion classes. The single-class Bayes classifier classifies data into the class of interest or the class 'other' but requires training samples only from the class of interest. This paper will present a procedure for Bayes estimation on the mean vector, covariance matrix, and a priori probability of the single-class classifier using labeled samples from the class of interest and unlabeled samples drawn from the mixture density function.

  5. Automatic classification of time-variable X-ray sources

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lo, Kitty K.; Farrell, Sean; Murphy, Tara

    2014-05-01

    To maximize the discovery potential of future synoptic surveys, especially in the field of transient science, it will be necessary to use automatic classification to identify some of the astronomical sources. The data mining technique of supervised classification is suitable for this problem. Here, we present a supervised learning method to automatically classify variable X-ray sources in the Second XMM-Newton Serendipitous Source Catalog (2XMMi-DR2). Random Forest is our classifier of choice since it is one of the most accurate learning algorithms available. Our training set consists of 873 variable sources and their features are derived from time series, spectra, andmore » other multi-wavelength contextual information. The 10 fold cross validation accuracy of the training data is ∼97% on a 7 class data set. We applied the trained classification model to 411 unknown variable 2XMM sources to produce a probabilistically classified catalog. Using the classification margin and the Random Forest derived outlier measure, we identified 12 anomalous sources, of which 2XMM J180658.7–500250 appears to be the most unusual source in the sample. Its X-ray spectra is suggestive of a ultraluminous X-ray source but its variability makes it highly unusual. Machine-learned classification and anomaly detection will facilitate scientific discoveries in the era of all-sky surveys.« less

  6. How large a training set is needed to develop a classifier for microarray data?

    PubMed

    Dobbin, Kevin K; Zhao, Yingdong; Simon, Richard M

    2008-01-01

    A common goal of gene expression microarray studies is the development of a classifier that can be used to divide patients into groups with different prognoses, or with different expected responses to a therapy. These types of classifiers are developed on a training set, which is the set of samples used to train a classifier. The question of how many samples are needed in the training set to produce a good classifier from high-dimensional microarray data is challenging. We present a model-based approach to determining the sample size required to adequately train a classifier. It is shown that sample size can be determined from three quantities: standardized fold change, class prevalence, and number of genes or features on the arrays. Numerous examples and important experimental design issues are discussed. The method is adapted to address ex post facto determination of whether the size of a training set used to develop a classifier was adequate. An interactive web site for performing the sample size calculations is provided. We showed that sample size calculations for classifier development from high-dimensional microarray data are feasible, discussed numerous important considerations, and presented examples.

  7. Research and characterisation of blazar candidates among the Fermi/LAT 3FGL catalogue using multivariate classifications

    NASA Astrophysics Data System (ADS)

    Lefaucheur, Julien; Pita, Santiago

    2017-06-01

    Context. In the recently published 3FGL catalogue, the Fermi/LAT collaboration reports the detection of γ-ray emission from 3034 sources obtained after four years of observations. The nature of 1010 of those sources is unknown, whereas 2023 have well-identified counterparts in other wavelengths. Most of the associated sources are labelled as blazars (1717/2023), but the BL Lac or FSRQ nature of 573 of these blazars is still undetermined. Aims: The aim of this study was two-fold. First, to significantly increase the number of blazar candidates from a search among the large number of Fermi/LAT 3FGL unassociated sources (case A). Second, to determine the BL Lac or FSRQ nature of the blazar candidates, including those determined as such in this work and the blazar candidates of uncertain type (BCU) that are already present in the 3FGL catalogue (case B). Methods: For this purpose, multivariate classifiers - boosted decision trees and multilayer perceptron neural networks - were trained using samples of labelled sources with no caution flag from the 3FGL catalogue and carefully chosen discriminant parameters. The decisions of the classifiers were combined in order to obtain a high level of source identification along with well controlled numbers of expected false associations. Specifically for case A, dedicated classifications were generated for high (| b | >10◦) and low (| b | ≤10◦) galactic latitude sources; in addition, the application of classifiers to samples of sources with caution flag was considered separately, and specific performance metrics were estimated. Results: We obtained a sample of 595 blazar candidates (high and low galactic latitude) among the unassociated sources of the 3FGL catalogue. We also obtained a sample of 509 BL Lacs and 295 FSRQs from the blazar candidates cited above and the BCUs of the 3FGL catalogue. The number of expected false associations is given for different samples of candidates. It is, in particular, notably low ( 9/425) for the sample of high-latitude blazar candidates from case A. Full Tables 5 and 7 are only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/602/A86

  8. A DNA-based pattern classifier with in vitro learning and associative recall for genomic characterization and biosensing without explicit sequence knowledge.

    PubMed

    Lee, Ju Seok; Chen, Junghuei; Deaton, Russell; Kim, Jin-Woo

    2014-01-01

    Genetic material extracted from in situ microbial communities has high promise as an indicator of biological system status. However, the challenge is to access genomic information from all organisms at the population or community scale to monitor the biosystem's state. Hence, there is a need for a better diagnostic tool that provides a holistic view of a biosystem's genomic status. Here, we introduce an in vitro methodology for genomic pattern classification of biological samples that taps large amounts of genetic information from all genes present and uses that information to detect changes in genomic patterns and classify them. We developed a biosensing protocol, termed Biological Memory, that has in vitro computational capabilities to "learn" and "store" genomic sequence information directly from genomic samples without knowledge of their explicit sequences, and that discovers differences in vitro between previously unknown inputs and learned memory molecules. The Memory protocol was designed and optimized based upon (1) common in vitro recombinant DNA operations using 20-base random probes, including polymerization, nuclease digestion, and magnetic bead separation, to capture a snapshot of the genomic state of a biological sample as a DNA memory and (2) the thermal stability of DNA duplexes between new input and the memory to detect similarities and differences. For efficient read out, a microarray was used as an output method. When the microarray-based Memory protocol was implemented to test its capability and sensitivity using genomic DNA from two model bacterial strains, i.e., Escherichia coli K12 and Bacillus subtilis, results indicate that the Memory protocol can "learn" input DNA, "recall" similar DNA, differentiate between dissimilar DNA, and detect relatively small concentration differences in samples. This study demonstrated not only the in vitro information processing capabilities of DNA, but also its promise as a genomic pattern classifier that could access information from all organisms in a biological system without explicit genomic information. The Memory protocol has high potential for many applications, including in situ biomonitoring of ecosystems, screening for diseases, biosensing of pathological features in water and food supplies, and non-biological information processing of memory devices, among many.

  9. Lung Adenocarcinoma with Anaplastic Lymphoma Kinase (ALK) Rearrangement Presenting as Carcinoma of Unknown Primary Site: Recognition and Treatment Implications.

    PubMed

    Hainsworth, John D; Anthony Greco, F

    Molecular cancer classifier assays are being used with increasing frequency to predict tissue of origin and direct site-specific therapy for patients with carcinoma of unknown primary site (CUP). We postulated some CUP patients predicted to have non-small-cell lung cancer (NSCLC) by molecular cancer classifier assay may have anaplastic lymphoma kinase (ALK) rearranged tumors, and benefit from treatment with ALK inhibitors. We retrospectively reviewed CUP patients who had the 92-gene molecular cancer classifier assay (CancerTYPE ID; bioTheranostics, Inc.) performed on tumor biopsies to identify patients predicted to have NSCLC. Beginning in 2011, we have tested these patients for ALK rearrangements and epidermal growth factor receptor (EGFR) activating mutations, based on the proven therapeutic value of these targets in NSCLC. We identified CUP patients with predicted NSCLC who were subsequently found to have ALK rearrangements. NSCLC was predicted by the molecular cancer classifier assay in 37 of 310 CUP patients. Twenty-one of these patients were tested for ALK rearrangements, and four had an EML4-ALK fusion gene detected. The diagnosis of lung cancer was strongly suggested in only one patient prior to molecular testing. One patient received ALK inhibitor treatment and has had prolonged benefit. We report on patients with lung adenocarcinoma and ALK rearrangements originally diagnosed as CUP who were identified using a molecular cancer classifier assay. Although ALK inhibitors treatment experience is limited, this newly identifiable group of lung cancer patients should be considered for therapy according to guidelines for stage IV ALK-positive NSCLC.

  10. Lung Adenocarcinoma with Anaplastic Lymphoma Kinase (ALK) Rearrangement Presenting as Carcinoma of Unknown Primary Site: Recognition and Treatment Implications.

    PubMed

    Hainsworth, John D; Anthony Greco, F

    2016-03-01

    Molecular cancer classifier assays are being used with increasing frequency to predict tissue of origin and direct site-specific therapy for patients with carcinoma of unknown primary site (CUP). We postulated some CUP patients predicted to have non-small-cell lung cancer (NSCLC) by molecular cancer classifier assay may have anaplastic lymphoma kinase (ALK) rearranged tumors, and benefit from treatment with ALK inhibitors. We retrospectively reviewed CUP patients who had the 92-gene molecular cancer classifier assay (CancerTYPE ID; bioTheranostics, Inc.) performed on tumor biopsies to identify patients predicted to have NSCLC. Beginning in 2011, we have tested these patients for ALK rearrangements and epidermal growth factor receptor (EGFR) activating mutations, based on the proven therapeutic value of these targets in NSCLC. We identified CUP patients with predicted NSCLC who were subsequently found to have ALK rearrangements. NSCLC was predicted by the molecular cancer classifier assay in 37 of 310 CUP patients. Twenty-one of these patients were tested for ALK rearrangements, and four had an EML4-ALK fusion gene detected. The diagnosis of lung cancer was strongly suggested in only one patient prior to molecular testing. One patient received ALK inhibitor treatment and has had prolonged benefit. We report on patients with lung adenocarcinoma and ALK rearrangements originally diagnosed as CUP who were identified using a molecular cancer classifier assay. Although ALK inhibitors treatment experience is limited, this newly identifiable group of lung cancer patients should be considered for therapy according to guidelines for stage IV ALK-positive NSCLC.

  11. Quantum pattern recognition with multi-neuron interactions

    NASA Astrophysics Data System (ADS)

    Fard, E. Rezaei; Aghayar, K.; Amniat-Talab, M.

    2018-03-01

    We present a quantum neural network with multi-neuron interactions for pattern recognition tasks by a combination of extended classic Hopfield network and adiabatic quantum computation. This scheme can be used as an associative memory to retrieve partial patterns with any number of unknown bits. Also, we propose a preprocessing approach to classifying the pattern space S to suppress spurious patterns. The results of pattern clustering show that for pattern association, the number of weights (η ) should equal the numbers of unknown bits in the input pattern ( d). It is also remarkable that associative memory function depends on the location of unknown bits apart from the d and load parameter α.

  12. Identification of Antibody Targets for Tuberculosis Serology using High-Density Nucleic Acid Programmable Protein Arrays*

    PubMed Central

    Song, Lusheng; Wallstrom, Garrick; Yu, Xiaobo; Hopper, Marika; Van Duine, Jennifer; Steel, Jason; Park, Jin; Wiktor, Peter; Kahn, Peter; Brunner, Al; Wilson, Douglas; Jenny-Avital, Elizabeth R.; Qiu, Ji; Labaer, Joshua; Magee, D. Mitchell; Achkar, Jacqueline M.

    2017-01-01

    Better and more diverse biomarkers for the development of simple point-of-care tests for active tuberculosis (TB), a clinically heterogeneous disease, are urgently needed. We generated a proteomic Mycobacterium tuberculosis (Mtb) High-Density Nucleic Acid Programmable Protein Array (HD-NAPPA) that used a novel multiplexed strategy for expedited high-throughput screening for antibody responses to the Mtb proteome. We screened sera from HIV uninfected and coinfected TB patients and controls (n = 120) from the US and South Africa (SA) using the multiplex HD-NAPPA for discovery, followed by deconvolution and validation through single protein HD-NAPPA with biologically independent samples (n = 124). We verified the top proteins with enzyme-linked immunosorbent assays (ELISA) using the original screening and validation samples (n = 244) and heretofore untested samples (n = 41). We identified 8 proteins with TB biomarker value; four (Rv0054, Rv0831c, Rv2031c and Rv0222) of these were previously identified in serology studies, and four (Rv0948c, Rv2853, Rv3405c, Rv3544c) were not known to elicit antibody responses. Using ELISA data, we created classifiers that could discriminate patients' TB status according to geography (US or SA) and HIV (HIV- or HIV+) status. With ROC curve analysis under cross validation, the classifiers performed with an AUC for US/HIV- at 0.807; US/HIV+ at 0.782; SA/HIV- at 0.868; and SA/HIV+ at 0.723. With this study we demonstrate a new platform for biomarker/antibody screening and delineate its utility to identify previously unknown immunoreactive proteins. PMID:28223349

  13. Active Learning with Rationales for Identifying Operationally Significant Anomalies in Aviation

    NASA Technical Reports Server (NTRS)

    Sharma, Manali; Das, Kamalika; Bilgic, Mustafa; Matthews, Bryan; Nielsen, David Lynn; Oza, Nikunj C.

    2016-01-01

    A major focus of the commercial aviation community is discovery of unknown safety events in flight operations data. Data-driven unsupervised anomaly detection methods are better at capturing unknown safety events compared to rule-based methods which only look for known violations. However, not all statistical anomalies that are discovered by these unsupervised anomaly detection methods are operationally significant (e.g., represent a safety concern). Subject Matter Experts (SMEs) have to spend significant time reviewing these statistical anomalies individually to identify a few operationally significant ones. In this paper we propose an active learning algorithm that incorporates SME feedback in the form of rationales to build a classifier that can distinguish between uninteresting and operationally significant anomalies. Experimental evaluation on real aviation data shows that our approach improves detection of operationally significant events by as much as 75% compared to the state-of-the-art. The learnt classifier also generalizes well to additional validation data sets.

  14. Malware analysis using visualized image matrices.

    PubMed

    Han, KyoungSoo; Kang, BooJoong; Im, Eul Gyu

    2014-01-01

    This paper proposes a novel malware visual analysis method that contains not only a visualization method to convert binary files into images, but also a similarity calculation method between these images. The proposed method generates RGB-colored pixels on image matrices using the opcode sequences extracted from malware samples and calculates the similarities for the image matrices. Particularly, our proposed methods are available for packed malware samples by applying them to the execution traces extracted through dynamic analysis. When the images are generated, we can reduce the overheads by extracting the opcode sequences only from the blocks that include the instructions related to staple behaviors such as functions and application programming interface (API) calls. In addition, we propose a technique that generates a representative image for each malware family in order to reduce the number of comparisons for the classification of unknown samples and the colored pixel information in the image matrices is used to calculate the similarities between the images. Our experimental results show that the image matrices of malware can effectively be used to classify malware families both statically and dynamically with accuracy of 0.9896 and 0.9732, respectively.

  15. Comparison of Hybrid Classifiers for Crop Classification Using Normalized Difference Vegetation Index Time Series: A Case Study for Major Crops in North Xinjiang, China

    PubMed Central

    Hao, Pengyu; Wang, Li; Niu, Zheng

    2015-01-01

    A range of single classifiers have been proposed to classify crop types using time series vegetation indices, and hybrid classifiers are used to improve discriminatory power. Traditional fusion rules use the product of multi-single classifiers, but that strategy cannot integrate the classification output of machine learning classifiers. In this research, the performance of two hybrid strategies, multiple voting (M-voting) and probabilistic fusion (P-fusion), for crop classification using NDVI time series were tested with different training sample sizes at both pixel and object levels, and two representative counties in north Xinjiang were selected as study area. The single classifiers employed in this research included Random Forest (RF), Support Vector Machine (SVM), and See 5 (C 5.0). The results indicated that classification performance improved (increased the mean overall accuracy by 5%~10%, and reduced standard deviation of overall accuracy by around 1%) substantially with the training sample number, and when the training sample size was small (50 or 100 training samples), hybrid classifiers substantially outperformed single classifiers with higher mean overall accuracy (1%~2%). However, when abundant training samples (4,000) were employed, single classifiers could achieve good classification accuracy, and all classifiers obtained similar performances. Additionally, although object-based classification did not improve accuracy, it resulted in greater visual appeal, especially in study areas with a heterogeneous cropping pattern. PMID:26360597

  16. Supernova Cosmology Without Spectroscopy

    NASA Astrophysics Data System (ADS)

    Johnson, Elizabeth; Scolnic, Daniel; Kessler, Rick; Rykoff, Eli; Rozo, Eduardo

    2018-01-01

    Present and future supernovae (SN) surveys face several challenges: the ability to acquire redshifts of either the SN or its host galaxy, the ability to classify a SN without a spectrum, and unknown relations between SN luminosity and host galaxy type. We present here a new approach that addresses these challenges. From the large sample of SNe discovered and measured by the Dark Energy Survey (DES), we cull the sample to only supernovae (SNe) located in luminous red galaxies (LRGs). For these galaxies, photometric redshift estimates are expected to be accurate to a standard deviation of 0.02x(1+z). In addition, only Type Ia Supernovae are expected to exist in these galaxies, thereby providing a pure SNIa sample. Furthermore, we can combine this high-redshift sample with a low-redshift SN sample of only SNe located in LRGs, thereby producing a sample that is less sensitive to host galaxy relations because the host galaxy demographic is consistent across the redshift range. We find that the current DES sample has ~250 SNe in LRGs, a similar amount to current SNIa samples used to measure cosmological parameters. We present our method to produce a photometric-only Hubble diagram and measure cosmological parameters. Finally, we discuss systematic uncertainties from this approach, and forecast constraints from this method for LSST, which should have a sample roughly 200 times as large.

  17. Speaker identification for the improvement of the security communication between law enforcement units

    NASA Astrophysics Data System (ADS)

    Tovarek, Jaromir; Partila, Pavol

    2017-05-01

    This article discusses the speaker identification for the improvement of the security communication between law enforcement units. The main task of this research was to develop the text-independent speaker identification system which can be used for real-time recognition. This system is designed for identification in the open set. It means that the unknown speaker can be anyone. Communication itself is secured, but we have to check the authorization of the communication parties. We have to decide if the unknown speaker is the authorized for the given action. The calls are recorded by IP telephony server and then these recordings are evaluate using classification If the system evaluates that the speaker is not authorized, it sends a warning message to the administrator. This message can detect, for example a stolen phone or other unusual situation. The administrator then performs the appropriate actions. Our novel proposal system uses multilayer neural network for classification and it consists of three layers (input layer, hidden layer, and output layer). A number of neurons in input layer corresponds with the length of speech features. Output layer then represents classified speakers. Artificial Neural Network classifies speech signal frame by frame, but the final decision is done over the complete record. This rule substantially increases accuracy of the classification. Input data for the neural network are a thirteen Mel-frequency cepstral coefficients, which describe the behavior of the vocal tract. These parameters are the most used for speaker recognition. Parameters for training, testing and validation were extracted from recordings of authorized users. Recording conditions for training data correspond with the real traffic of the system (sampling frequency, bit rate). The main benefit of the research is the system developed for text-independent speaker identification which is applied to secure communication between law enforcement units.

  18. Metabolomics for organic food authentication: Results from a long-term field study in carrots.

    PubMed

    Cubero-Leon, Elena; De Rudder, Olivier; Maquet, Alain

    2018-01-15

    Increasing demand for organic products and their premium prices make them an attractive target for fraudulent malpractices. In this study, a large-scale comparative metabolomics approach was applied to investigate the effect of the agronomic production system on the metabolite composition of carrots and to build statistical models for prediction purposes. Orthogonal projections to latent structures-discriminant analysis (OPLS-DA) was applied successfully to predict the origin of the agricultural system of the harvested carrots on the basis of features determined by liquid chromatography-mass spectrometry. When the training set used to build the OPLS-DA models contained samples representative of each harvest year, the models were able to classify unknown samples correctly (100% correct classification). If a harvest year was left out of the training sets and used for predictions, the correct classification rates achieved ranged from 76% to 100%. The results therefore highlight the potential of metabolomic fingerprinting for organic food authentication purposes. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.

  19. Streptococcus caprae sp. nov., isolated from Iberian ibex (Capra pyrenaica hispanica).

    PubMed

    Vela, A I; Mentaberre, G; Lavín, S; Domínguez, L; Fernández-Garayzábal, J F

    2016-01-01

    Biochemical and molecular genetic studies were performed on a novel Gram-stain-positive, catalase-negative, coccus-shaped organism isolated from tonsil samples of two Iberian ibexes. The micro-organism was identified as a streptococcal species based on its cellular, morphological and biochemical characteristics. 16S rRNA gene sequence comparison studies confirmed its identification as a member of the genus Streptococcus, but the organism did not correspond to any species of this genus. The nearest phylogenetic relative of the unknown coccus from ibex was Streptococcus porci 2923-03T (96.6 % 16S rRNA gene sequence similarity). Analysis based on rpoB and sodA gene sequences revealed sequence similarity values lower than 86.0 and 83.8 %, respectively, from the type strains of recognized Streptococcus species. The novel bacterial isolate was distinguished from Streptococcus porci and other Streptococcus species using biochemical tests. Based on both phenotypic and phylogenetic findings, it is proposed that the unknown bacterium be classified as representing a novel species of the genus Streptococcus, for which the name Streptococcus caprae sp. nov. is proposed. The type strain is DICM07-02790-1CT ( = CECT 8872T = CCUG 67170T).

  20. Ensemble positive unlabeled learning for disease gene identification.

    PubMed

    Yang, Peng; Li, Xiaoli; Chua, Hon-Nian; Kwoh, Chee-Keong; Ng, See-Kiong

    2014-01-01

    An increasing number of genes have been experimentally confirmed in recent years as causative genes to various human diseases. The newly available knowledge can be exploited by machine learning methods to discover additional unknown genes that are likely to be associated with diseases. In particular, positive unlabeled learning (PU learning) methods, which require only a positive training set P (confirmed disease genes) and an unlabeled set U (the unknown candidate genes) instead of a negative training set N, have been shown to be effective in uncovering new disease genes in the current scenario. Using only a single source of data for prediction can be susceptible to bias due to incompleteness and noise in the genomic data and a single machine learning predictor prone to bias caused by inherent limitations of individual methods. In this paper, we propose an effective PU learning framework that integrates multiple biological data sources and an ensemble of powerful machine learning classifiers for disease gene identification. Our proposed method integrates data from multiple biological sources for training PU learning classifiers. A novel ensemble-based PU learning method EPU is then used to integrate multiple PU learning classifiers to achieve accurate and robust disease gene predictions. Our evaluation experiments across six disease groups showed that EPU achieved significantly better results compared with various state-of-the-art prediction methods as well as ensemble learning classifiers. Through integrating multiple biological data sources for training and the outputs of an ensemble of PU learning classifiers for prediction, we are able to minimize the potential bias and errors in individual data sources and machine learning algorithms to achieve more accurate and robust disease gene predictions. In the future, our EPU method provides an effective framework to integrate the additional biological and computational resources for better disease gene predictions.

  1. Using Neural Networks to Classify Digitized Images of Galaxies

    NASA Astrophysics Data System (ADS)

    Goderya, S. N.; McGuire, P. C.

    2000-12-01

    Automated classification of Galaxies into Hubble types is of paramount importance to study the large scale structure of the Universe, particularly as survey projects like the Sloan Digital Sky Survey complete their data acquisition of one million galaxies. At present it is not possible to find robust and efficient artificial intelligence based galaxy classifiers. In this study we will summarize progress made in the development of automated galaxy classifiers using neural networks as machine learning tools. We explore the Bayesian linear algorithm, the higher order probabilistic network, the multilayer perceptron neural network and Support Vector Machine Classifier. The performance of any machine classifier is dependant on the quality of the parameters that characterize the different groups of galaxies. Our effort is to develop geometric and invariant moment based parameters as input to the machine classifiers instead of the raw pixel data. Such an approach reduces the dimensionality of the classifier considerably, and removes the effects of scaling and rotation, and makes it easier to solve for the unknown parameters in the galaxy classifier. To judge the quality of training and classification we develop the concept of Mathews coefficients for the galaxy classification community. Mathews coefficients are single numbers that quantify classifier performance even with unequal prior probabilities of the classes.

  2. A Corpus-Based Approach for Automatic Thai Unknown Word Recognition Using Boosting Techniques

    NASA Astrophysics Data System (ADS)

    Techo, Jakkrit; Nattee, Cholwich; Theeramunkong, Thanaruk

    While classification techniques can be applied for automatic unknown word recognition in a language without word boundary, it faces with the problem of unbalanced datasets where the number of positive unknown word candidates is dominantly smaller than that of negative candidates. To solve this problem, this paper presents a corpus-based approach that introduces a so-called group-based ranking evaluation technique into ensemble learning in order to generate a sequence of classification models that later collaborate to select the most probable unknown word from multiple candidates. Given a classification model, the group-based ranking evaluation (GRE) is applied to construct a training dataset for learning the succeeding model, by weighing each of its candidates according to their ranks and correctness when the candidates of an unknown word are considered as one group. A number of experiments have been conducted on a large Thai medical text to evaluate performance of the proposed group-based ranking evaluation approach, namely V-GRE, compared to the conventional naïve Bayes classifier and our vanilla version without ensemble learning. As the result, the proposed method achieves an accuracy of 90.93±0.50% when the first rank is selected while it gains 97.26±0.26% when the top-ten candidates are considered, that is 8.45% and 6.79% improvement over the conventional record-based naïve Bayes classifier and the vanilla version. Another result on applying only best features show 93.93±0.22% and up to 98.85±0.15% accuracy for top-1 and top-10, respectively. They are 3.97% and 9.78% improvement over naive Bayes and the vanilla version. Finally, an error analysis is given.

  3. Annotation of Sequence Variants in Cancer Samples: Processes and Pitfalls for Routine Assays in the Clinical Laboratory.

    PubMed

    Lee, Lobin A; Arvai, Kevin J; Jones, Dan

    2015-07-01

    As DNA sequencing of multigene panels becomes routine for cancer samples in the clinical laboratory, an efficient process for classifying variants has become more critical. Determining which germline variants are significant for cancer disposition and which somatic mutations are integral to cancer development or therapy response remains difficult, even for well-studied genes such as BRCA1 and TP53. We compare and contrast the general principles and lines of evidence commonly used to distinguish the significance of cancer-associated germline and somatic genetic variants. The factors important in each step of the analysis pipeline are reviewed, as are some of the publicly available annotation tools. Given the range of indications and uses of cancer sequencing assays, including diagnosis, staging, prognostication, theranostics, and residual disease detection, the need for flexible methods for scoring of variants is discussed. The usefulness of protein prediction tools and multimodal risk-based or Bayesian approaches are highlighted. Using TET2 variants encountered in hematologic neoplasms, several examples of this multifactorial approach to classifying sequence variants of unknown significance are presented. Although there are still significant gaps in the publicly available data for many cancer genes that limit the broad application of explicit algorithms for variant scoring, the elements of a more rigorous model are outlined. Copyright © 2015 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  4. Authentication of bee pollen grains in bright-field microscopy by combining one-class classification techniques and image processing.

    PubMed

    Chica, Manuel

    2012-11-01

    A novel method for authenticating pollen grains in bright-field microscopic images is presented in this work. The usage of this new method is clear in many application fields such as bee-keeping sector, where laboratory experts need to identify fraudulent bee pollen samples against local known pollen types. Our system is based on image processing and one-class classification to reject unknown pollen grain objects. The latter classification technique allows us to tackle the major difficulty of the problem, the existence of many possible fraudulent pollen types, and the impossibility of modeling all of them. Different one-class classification paradigms are compared to study the most suitable technique for solving the problem. In addition, feature selection algorithms are applied to reduce the complexity and increase the accuracy of the models. For each local pollen type, a one-class classifier is trained and aggregated into a multiclassifier model. This multiclassification scheme combines the output of all the one-class classifiers in a unique final response. The proposed method is validated by authenticating pollen grains belonging to different Spanish bee pollen types. The overall accuracy of the system on classifying fraudulent microscopic pollen grain objects is 92.3%. The system is able to rapidly reject pollen grains, which belong to nonlocal pollen types, reducing the laboratory work and effort. The number of possible applications of this authentication method in the microscopy research field is unlimited. Copyright © 2012 Wiley Periodicals, Inc.

  5. Soy sauce classification by geographic region and fermentation based on artificial neural network and genetic algorithm.

    PubMed

    Xu, Libin; Li, Yang; Xu, Ning; Hu, Yong; Wang, Chao; He, Jianjun; Cao, Yueze; Chen, Shigui; Li, Dongsheng

    2014-12-24

    This work demonstrated the possibility of using artificial neural networks to classify soy sauce from China. The aroma profiles of different soy sauce samples were differentiated using headspace solid-phase microextraction. The soy sauce samples were analyzed by gas chromatography-mass spectrometry, and 22 and 15 volatile aroma compounds were selected for sensitivity analysis to classify the samples by fermentation and geographic region, respectively. The 15 selected samples can be classified by fermentation and geographic region with a prediction success rate of 100%. Furans and phenols represented the variables with the greatest contribution in classifying soy sauce samples by fermentation and geographic region, respectively.

  6. An expert system shell for inferring vegetation characteristics

    NASA Technical Reports Server (NTRS)

    Harrison, P. Ann; Harrison, Patrick R.

    1993-01-01

    The NASA VEGetation Workbench (VEG) is a knowledge based system that infers vegetation characteristics from reflectance data. VEG is described in detail in several references. The first generation version of VEG was extended. In the first year of this contract, an interface to a file of unknown cover type data was constructed. An interface that allowed the results of VEG to be written to a file was also implemented. A learning system that learned class descriptions from a data base of historical cover type data and then used the learned class descriptions to classify an unknown sample was built. This system had an interface that integrated it into the rest of VEG. The VEG subgoal PROPORTION.GROUND.COVER was completed and a number of additional techniques that inferred the proportion ground cover of a sample were implemented. This work was previously described. The work carried out in the second year of the contract is described. The historical cover type database was removed from VEG and stored as a series of flat files that are external to VEG. An interface to the files was provided. The framework and interface for two new VEG subgoals that estimate the atmospheric effect on reflectance data were built. A new interface that allows the scientist to add techniques to VEG without assistance from the developer was designed and implemented. A prototype Help System that allows the user to get more information about each screen in the VEG interface was also added to VEG.

  7. [Application of precursor ion scanning method in rapid screening of illegally added phosphodiesterase-5 inhibitors and their unknown derivatives in Chinese traditional patent medicines and health foods].

    PubMed

    Sun, Jing; Cao, Ling; Feng, Youlong; Tan, Li

    2014-11-01

    The compounds with similar structure often have similar pharmacological activities. So it is a trend for illegal addition that new derivatives of effective drugs are synthesized to avoid the statutory test. This bring challenges to crack down on illegal addition behavior, however, modified derivatives usually have similar product ions, which allow for precursor ion scanning. In this work, precursor ion scanning mode of a triple quadrupole mass spectrometer was first applied to screen illegally added drugs in complex matrix such as Chinese traditional patent medicines and healthy foods. Phosphodiesterase-5 inhibitors were used as experimental examples. Through the analysis of the structure and mass spectrum characteristics of the compounds, phosphodiesterase-5 inhibitors were classified, and their common product ions were screened by full scan of product ions of typical compounds. Then high performance liquid chromatography-tandem mass spectrometry (HPLC-MS/MS) method with precursor ion scanning mode was established based on the optimization of MS parameters. The effect of mass parameters and the choice of fragment ions were also studied. The method was applied to determine actual samples and further refined. The results demonstrated that this method can meet the need of rapid screening of unknown derivatives of phosphodiesterase-5 inhibitors in complex matrix, and prevent unknown derivatives undetected. This method shows advantages in sensitivity, specificity and efficiency, and is worth to be further investigated.

  8. Predicting membrane protein types using various decision tree classifiers based on various modes of general PseAAC for imbalanced datasets.

    PubMed

    Sankari, E Siva; Manimegalai, D

    2017-12-21

    Predicting membrane protein types is an important and challenging research area in bioinformatics and proteomics. Traditional biophysical methods are used to classify membrane protein types. Due to large exploration of uncharacterized protein sequences in databases, traditional methods are very time consuming, expensive and susceptible to errors. Hence, it is highly desirable to develop a robust, reliable, and efficient method to predict membrane protein types. Imbalanced datasets and large datasets are often handled well by decision tree classifiers. Since imbalanced datasets are taken, the performance of various decision tree classifiers such as Decision Tree (DT), Classification And Regression Tree (CART), C4.5, Random tree, REP (Reduced Error Pruning) tree, ensemble methods such as Adaboost, RUS (Random Under Sampling) boost, Rotation forest and Random forest are analysed. Among the various decision tree classifiers Random forest performs well in less time with good accuracy of 96.35%. Another inference is RUS boost decision tree classifier is able to classify one or two samples in the class with very less samples while the other classifiers such as DT, Adaboost, Rotation forest and Random forest are not sensitive for the classes with fewer samples. Also the performance of decision tree classifiers is compared with SVM (Support Vector Machine) and Naive Bayes classifier. Copyright © 2017 Elsevier Ltd. All rights reserved.

  9. The Immune System as a Model for Pattern Recognition and Classification

    PubMed Central

    Carter, Jerome H.

    2000-01-01

    Objective: To design a pattern recognition engine based on concepts derived from mammalian immune systems. Design: A supervised learning system (Immunos-81) was created using software abstractions of T cells, B cells, antibodies, and their interactions. Artificial T cells control the creation of B-cell populations (clones), which compete for recognition of “unknowns.” The B-cell clone with the “simple highest avidity” (SHA) or “relative highest avidity” (RHA) is considered to have successfully classified the unknown. Measurement: Two standard machine learning data sets, consisting of eight nominal and six continuous variables, were used to test the recognition capabilities of Immunos-81. The first set (Cleveland), consisting of 303 cases of patients with suspected coronary artery disease, was used to perform a ten-way cross-validation. After completing the validation runs, the Cleveland data set was used as a training set prior to presentation of the second data set, consisting of 200 unknown cases. Results: For cross-validation runs, correct recognition using SHA ranged from a high of 96 percent to a low of 63.2 percent. The average correct classification for all runs was 83.2 percent. Using the RHA metric, 11.2 percent were labeled “too close to determine” and no further attempt was made to classify them. Of the remaining cases, 85.5 percent were correctly classified. When the second data set was presented, correct classification occurred in 73.5 percent of cases when SHA was used and in 80.3 percent of cases when RHA was used. Conclusions: The immune system offers a viable paradigm for the design of pattern recognition systems. Additional research is required to fully exploit the nuances of immune computation. PMID:10641961

  10. Classification of toxicity effects of biotransformed hepatic drugs using whale optimized support vector machines.

    PubMed

    Tharwat, Alaa; Moemen, Yasmine S; Hassanien, Aboul Ella

    2017-04-01

    Measuring toxicity is an important step in drug development. Nevertheless, the current experimental methods used to estimate the drug toxicity are expensive and time-consuming, indicating that they are not suitable for large-scale evaluation of drug toxicity in the early stage of drug development. Hence, there is a high demand to develop computational models that can predict the drug toxicity risks. In this study, we used a dataset that consists of 553 drugs that biotransformed in liver. The toxic effects were calculated for the current data, namely, mutagenic, tumorigenic, irritant and reproductive effect. Each drug is represented by 31 chemical descriptors (features). The proposed model consists of three phases. In the first phase, the most discriminative subset of features is selected using rough set-based methods to reduce the classification time while improving the classification performance. In the second phase, different sampling methods such as Random Under-Sampling, Random Over-Sampling and Synthetic Minority Oversampling Technique (SMOTE), BorderLine SMOTE and Safe Level SMOTE are used to solve the problem of imbalanced dataset. In the third phase, the Support Vector Machines (SVM) classifier is used to classify an unknown drug into toxic or non-toxic. SVM parameters such as the penalty parameter and kernel parameter have a great impact on the classification accuracy of the model. In this paper, Whale Optimization Algorithm (WOA) has been proposed to optimize the parameters of SVM, so that the classification error can be reduced. The experimental results proved that the proposed model achieved high sensitivity to all toxic effects. Overall, the high sensitivity of the WOA+SVM model indicates that it could be used for the prediction of drug toxicity in the early stage of drug development. Copyright © 2017 Elsevier Inc. All rights reserved.

  11. Advances in Doppler recognition for ground moving target indication

    NASA Astrophysics Data System (ADS)

    Kealey, Paul G.; Jahangir, Mohammed

    2006-05-01

    Ground Moving Target Indication (GMTI) radar provides a day/night, all-weather, wide-area surveillance capability to detect moving vehicles and personnel. Current GMTI radar sensors are limited to only detecting and tracking targets. The exploitation of GMTI data would be greatly enhanced by a capability to recognize accurately the detections as significant classes of target. Doppler classification exploits the differential internal motion of targets, e.g. due to the tracks, limbs and rotors. Recently, the QinetiQ Bayesian Doppler classifier has been extended to include a helicopter class in addition to wheeled, tracked and personnel classes. This paper presents the performance for these four classes using a traditional low-resolution GMTI surveillance waveform with an experimental radar system. We have determined the utility of an "unknown output decision" for enhancing the accuracy of the declared target classes. A confidence method has been derived, using a threshold of the difference in certainties, to assign uncertain classifications into an "unknown class". The trade-off between fraction of targets declared and accuracy of the classifier has been measured. To determine the operating envelope of a Doppler classification algorithm requires a detailed understanding of the Signal-to-Noise Ratio (SNR) performance of the algorithm. In this study the SNR dependence of the QinetiQ classifier has been determined.

  12. Does the Presence of Scrapie Affect the Ability of Current Statutory Discriminatory Tests To Detect the Presence of Bovine Spongiform Encephalopathy?

    PubMed Central

    Chaplin, M. J.; Vickery, C. M.; Simon, S.; Davis, L.; Denyer, M.; Lockey, R.; Stack, M. J.; O'Connor, M. J.; Bishop, K.; Gough, K. C.; Maddison, B. C.; Thorne, L.; Spiropoulos, J.

    2015-01-01

    Current European Commission (EC) surveillance regulations require discriminatory testing of all transmissible spongiform encephalopathy (TSE)-positive small ruminant (SR) samples in order to classify them as bovine spongiform encephalopathy (BSE) or non-BSE. This requires a range of tests, including characterization by bioassay in mouse models. Since 2005, naturally occurring BSE has been identified in two goats. It has also been demonstrated that more than one distinct TSE strain can coinfect a single animal in natural field situations. This study assesses the ability of the statutory methods as listed in the regulation to identify BSE in a blinded series of brain samples, in which ovine BSE and distinct isolates of scrapie are mixed at various ratios ranging from 99% to 1%. Additionally, these current statutory tests were compared with a new in vitro discriminatory method, which uses serial protein misfolding cyclic amplification (sPMCA). Western blotting consistently detected 50% BSE within a mixture, but at higher dilutions it had variable success. The enzyme-linked immunosorbent assay (ELISA) method consistently detected BSE only when it was present as 99% of the mixture, with variable success at higher dilutions. Bioassay and sPMCA reported BSE in all samples where it was present, down to 1%. sPMCA also consistently detected the presence of BSE in mixtures at 0.1%. While bioassay is the only validated method that allows comprehensive phenotypic characterization of an unknown TSE isolate, the sPMCA assay appears to offer a fast and cost-effective alternative for the screening of unknown isolates when the purpose of the investigation was solely to determine the presence or absence of BSE. PMID:26041899

  13. A methodology for the generation of the 2-D map from unknown navigation environment by traveling a short distance

    NASA Technical Reports Server (NTRS)

    Bourbakis, N.; Sarkar, D.

    1994-01-01

    A technique for generation of a 2-D space map by traveling a short distance is described. The space to be mapped can be classified as: (1) space without obstacles, (2) space with stationary obstacles, and (3) space with moving obstacles. This paper presents the methodology used to generate a 2-D map of an unknown navigation space. The ability to minimize the redundancy during traveling and maximize the confidence function for generation of the map are advantages of this technique.

  14. Consensus Classification Using Non-Optimized Classifiers.

    PubMed

    Brownfield, Brett; Lemos, Tony; Kalivas, John H

    2018-04-03

    Classifying samples into categories is a common problem in analytical chemistry and other fields. Classification is usually based on only one method, but numerous classifiers are available with some being complex, such as neural networks, and others are simple, such as k nearest neighbors. Regardless, most classification schemes require optimization of one or more tuning parameters for best classification accuracy, sensitivity, and specificity. A process not requiring exact selection of tuning parameter values would be useful. To improve classification, several ensemble approaches have been used in past work to combine classification results from multiple optimized single classifiers. The collection of classifications for a particular sample are then combined by a fusion process such as majority vote to form the final classification. Presented in this Article is a method to classify a sample by combining multiple classification methods without specifically classifying the sample by each method, that is, the classification methods are not optimized. The approach is demonstrated on three analytical data sets. The first is a beer authentication set with samples measured on five instruments, allowing fusion of multiple instruments by three ways. The second data set is composed of textile samples from three classes based on Raman spectra. This data set is used to demonstrate the ability to classify simultaneously with different data preprocessing strategies, thereby reducing the need to determine the ideal preprocessing method, a common prerequisite for accurate classification. The third data set contains three wine cultivars for three classes measured at 13 unique chemical and physical variables. In all cases, fusion of nonoptimized classifiers improves classification. Also presented are atypical uses of Procrustes analysis and extended inverted signal correction (EISC) for distinguishing sample similarities to respective classes.

  15. Liquid-Based Medium Used to Prepare Cytological Breast Nipple Fluid Improves the Quality of Cellular Samples Automatic Collection

    PubMed Central

    Zonta, Marco Antonio; Velame, Fernanda; Gema, Samara; Filassi, Jose Roberto; Longatto-Filho, Adhemar

    2014-01-01

    Background Breast cancer is the second cause of death in women worldwide. The spontaneous breast nipple discharge may contain cells that can be analyzed for malignancy. Halo® Mamo Cyto Test (HMCT) was recently developed as an automated system indicated to aspirate cells from the breast ducts. The objective of this study was to standardize the methodology of sampling and sample preparation of nipple discharge obtained by the automated method Halo breast test and perform cytological evaluation in samples preserved in liquid medium (SurePath™). Methods We analyzed 564 nipple fluid samples, from women between 20 and 85 years old, without history of breast disease and neoplasia, no pregnancy, and without gynecologic medical history, collected by HMCT method and preserved in two different vials with solutions for transport. Results From 306 nipple fluid samples from method 1, 199 (65%) were classified as unsatisfactory (class 0), 104 (34%) samples were classified as benign findings (class II), and three (1%) were classified as undetermined to neoplastic cells (class III). From 258 samples analyzed in method 2, 127 (49%) were classified as class 0, 124 (48%) were classified as class II, and seven (2%) were classified as class III. Conclusion Our study suggests an improvement in the quality and quantity of cellular samples when the association of the two methodologies is performed, Halo breast test and the method in liquid medium. PMID:29147397

  16. Improving Bayesian credibility intervals for classifier error rates using maximum entropy empirical priors.

    PubMed

    Gustafsson, Mats G; Wallman, Mikael; Wickenberg Bolin, Ulrika; Göransson, Hanna; Fryknäs, M; Andersson, Claes R; Isaksson, Anders

    2010-06-01

    Successful use of classifiers that learn to make decisions from a set of patient examples require robust methods for performance estimation. Recently many promising approaches for determination of an upper bound for the error rate of a single classifier have been reported but the Bayesian credibility interval (CI) obtained from a conventional holdout test still delivers one of the tightest bounds. The conventional Bayesian CI becomes unacceptably large in real world applications where the test set sizes are less than a few hundred. The source of this problem is that fact that the CI is determined exclusively by the result on the test examples. In other words, there is no information at all provided by the uniform prior density distribution employed which reflects complete lack of prior knowledge about the unknown error rate. Therefore, the aim of the study reported here was to study a maximum entropy (ME) based approach to improved prior knowledge and Bayesian CIs, demonstrating its relevance for biomedical research and clinical practice. It is demonstrated how a refined non-uniform prior density distribution can be obtained by means of the ME principle using empirical results from a few designs and tests using non-overlapping sets of examples. Experimental results show that ME based priors improve the CIs when employed to four quite different simulated and two real world data sets. An empirically derived ME prior seems promising for improving the Bayesian CI for the unknown error rate of a designed classifier. Copyright 2010 Elsevier B.V. All rights reserved.

  17. Malware Analysis Using Visualized Image Matrices

    PubMed Central

    Im, Eul Gyu

    2014-01-01

    This paper proposes a novel malware visual analysis method that contains not only a visualization method to convert binary files into images, but also a similarity calculation method between these images. The proposed method generates RGB-colored pixels on image matrices using the opcode sequences extracted from malware samples and calculates the similarities for the image matrices. Particularly, our proposed methods are available for packed malware samples by applying them to the execution traces extracted through dynamic analysis. When the images are generated, we can reduce the overheads by extracting the opcode sequences only from the blocks that include the instructions related to staple behaviors such as functions and application programming interface (API) calls. In addition, we propose a technique that generates a representative image for each malware family in order to reduce the number of comparisons for the classification of unknown samples and the colored pixel information in the image matrices is used to calculate the similarities between the images. Our experimental results show that the image matrices of malware can effectively be used to classify malware families both statically and dynamically with accuracy of 0.9896 and 0.9732, respectively. PMID:25133202

  18. Mystery Boxes: Helping Children Improve Their Reasoning

    ERIC Educational Resources Information Center

    Rule, Audrey C.

    2007-01-01

    This guest editorial describes ways teachers can use guessing games about an unknown item in a "mystery box" to help children improve their abilities to listen to others, recall information, ask purposeful questions, classify items by class, make inferences, synthesize information, and draw conclusions. The author presents information…

  19. Supervised Detection of Anomalous Light Curves in Massive Astronomical Catalogs

    NASA Astrophysics Data System (ADS)

    Nun, Isadora; Pichara, Karim; Protopapas, Pavlos; Kim, Dae-Won

    2014-09-01

    The development of synoptic sky surveys has led to a massive amount of data for which resources needed for analysis are beyond human capabilities. In order to process this information and to extract all possible knowledge, machine learning techniques become necessary. Here we present a new methodology to automatically discover unknown variable objects in large astronomical catalogs. With the aim of taking full advantage of all information we have about known objects, our method is based on a supervised algorithm. In particular, we train a random forest classifier using known variability classes of objects and obtain votes for each of the objects in the training set. We then model this voting distribution with a Bayesian network and obtain the joint voting distribution among the training objects. Consequently, an unknown object is considered as an outlier insofar it has a low joint probability. By leaving out one of the classes on the training set, we perform a validity test and show that when the random forest classifier attempts to classify unknown light curves (the class left out), it votes with an unusual distribution among the classes. This rare voting is detected by the Bayesian network and expressed as a low joint probability. Our method is suitable for exploring massive data sets given that the training process is performed offline. We tested our algorithm on 20 million light curves from the MACHO catalog and generated a list of anomalous candidates. After analysis, we divided the candidates into two main classes of outliers: artifacts and intrinsic outliers. Artifacts were principally due to air mass variation, seasonal variation, bad calibration, or instrumental errors and were consequently removed from our outlier list and added to the training set. After retraining, we selected about 4000 objects, which we passed to a post-analysis stage by performing a cross-match with all publicly available catalogs. Within these candidates we identified certain known but rare objects such as eclipsing Cepheids, blue variables, cataclysmic variables, and X-ray sources. For some outliers there was no additional information. Among them we identified three unknown variability types and a few individual outliers that will be followed up in order to perform a deeper analysis.

  20. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nun, Isadora; Pichara, Karim; Protopapas, Pavlos

    The development of synoptic sky surveys has led to a massive amount of data for which resources needed for analysis are beyond human capabilities. In order to process this information and to extract all possible knowledge, machine learning techniques become necessary. Here we present a new methodology to automatically discover unknown variable objects in large astronomical catalogs. With the aim of taking full advantage of all information we have about known objects, our method is based on a supervised algorithm. In particular, we train a random forest classifier using known variability classes of objects and obtain votes for each ofmore » the objects in the training set. We then model this voting distribution with a Bayesian network and obtain the joint voting distribution among the training objects. Consequently, an unknown object is considered as an outlier insofar it has a low joint probability. By leaving out one of the classes on the training set, we perform a validity test and show that when the random forest classifier attempts to classify unknown light curves (the class left out), it votes with an unusual distribution among the classes. This rare voting is detected by the Bayesian network and expressed as a low joint probability. Our method is suitable for exploring massive data sets given that the training process is performed offline. We tested our algorithm on 20 million light curves from the MACHO catalog and generated a list of anomalous candidates. After analysis, we divided the candidates into two main classes of outliers: artifacts and intrinsic outliers. Artifacts were principally due to air mass variation, seasonal variation, bad calibration, or instrumental errors and were consequently removed from our outlier list and added to the training set. After retraining, we selected about 4000 objects, which we passed to a post-analysis stage by performing a cross-match with all publicly available catalogs. Within these candidates we identified certain known but rare objects such as eclipsing Cepheids, blue variables, cataclysmic variables, and X-ray sources. For some outliers there was no additional information. Among them we identified three unknown variability types and a few individual outliers that will be followed up in order to perform a deeper analysis.« less

  1. Comparison of Random Forest, k-Nearest Neighbor, and Support Vector Machine Classifiers for Land Cover Classification Using Sentinel-2 Imagery

    PubMed Central

    Thanh Noi, Phan; Kappas, Martin

    2017-01-01

    In previous classification studies, three non-parametric classifiers, Random Forest (RF), k-Nearest Neighbor (kNN), and Support Vector Machine (SVM), were reported as the foremost classifiers at producing high accuracies. However, only a few studies have compared the performances of these classifiers with different training sample sizes for the same remote sensing images, particularly the Sentinel-2 Multispectral Imager (MSI). In this study, we examined and compared the performances of the RF, kNN, and SVM classifiers for land use/cover classification using Sentinel-2 image data. An area of 30 × 30 km2 within the Red River Delta of Vietnam with six land use/cover types was classified using 14 different training sample sizes, including balanced and imbalanced, from 50 to over 1250 pixels/class. All classification results showed a high overall accuracy (OA) ranging from 90% to 95%. Among the three classifiers and 14 sub-datasets, SVM produced the highest OA with the least sensitivity to the training sample sizes, followed consecutively by RF and kNN. In relation to the sample size, all three classifiers showed a similar and high OA (over 93.85%) when the training sample size was large enough, i.e., greater than 750 pixels/class or representing an area of approximately 0.25% of the total study area. The high accuracy was achieved with both imbalanced and balanced datasets. PMID:29271909

  2. Comparison of Random Forest, k-Nearest Neighbor, and Support Vector Machine Classifiers for Land Cover Classification Using Sentinel-2 Imagery.

    PubMed

    Thanh Noi, Phan; Kappas, Martin

    2017-12-22

    In previous classification studies, three non-parametric classifiers, Random Forest (RF), k-Nearest Neighbor (kNN), and Support Vector Machine (SVM), were reported as the foremost classifiers at producing high accuracies. However, only a few studies have compared the performances of these classifiers with different training sample sizes for the same remote sensing images, particularly the Sentinel-2 Multispectral Imager (MSI). In this study, we examined and compared the performances of the RF, kNN, and SVM classifiers for land use/cover classification using Sentinel-2 image data. An area of 30 × 30 km² within the Red River Delta of Vietnam with six land use/cover types was classified using 14 different training sample sizes, including balanced and imbalanced, from 50 to over 1250 pixels/class. All classification results showed a high overall accuracy (OA) ranging from 90% to 95%. Among the three classifiers and 14 sub-datasets, SVM produced the highest OA with the least sensitivity to the training sample sizes, followed consecutively by RF and kNN. In relation to the sample size, all three classifiers showed a similar and high OA (over 93.85%) when the training sample size was large enough, i.e., greater than 750 pixels/class or representing an area of approximately 0.25% of the total study area. The high accuracy was achieved with both imbalanced and balanced datasets.

  3. An expert support system for breast cancer diagnosis using color wavelet features.

    PubMed

    Issac Niwas, S; Palanisamy, P; Chibbar, Rajni; Zhang, W J

    2012-10-01

    Breast cancer diagnosis can be done through the pathologic assessments of breast tissue samples such as core needle biopsy technique. The result of analysis on this sample by pathologist is crucial for breast cancer patient. In this paper, nucleus of tissue samples are investigated after decomposition by means of the Log-Gabor wavelet on HSV color domain and an algorithm is developed to compute the color wavelet features. These features are used for breast cancer diagnosis using Support Vector Machine (SVM) classifier algorithm. The ability of properly trained SVM is to correctly classify patterns and make them particularly suitable for use in an expert system that aids in the diagnosis of cancer tissue samples. The results are compared with other multivariate classifiers such as Naïves Bayes classifier and Artificial Neural Network. The overall accuracy of the proposed method using SVM classifier will be further useful for automation in cancer diagnosis.

  4. Using pseudoalignment and base quality to accurately quantify microbial community composition

    PubMed Central

    Novembre, John

    2018-01-01

    Pooled DNA from multiple unknown organisms arises in a variety of contexts, for example microbial samples from ecological or human health research. Determining the composition of pooled samples can be difficult, especially at the scale of modern sequencing data and reference databases. Here we propose a novel method for taxonomic profiling in pooled DNA that combines the speed and low-memory requirements of k-mer based pseudoalignment with a likelihood framework that uses base quality information to better resolve multiply mapped reads. We apply the method to the problem of classifying 16S rRNA reads using a reference database of known organisms, a common challenge in microbiome research. Using simulations, we show the method is accurate across a variety of read lengths, with different length reference sequences, at different sample depths, and when samples contain reads originating from organisms absent from the reference. We also assess performance in real 16S data, where we reanalyze previous genetic association data to show our method discovers a larger number of quantitative trait associations than other widely used methods. We implement our method in the software Karp, for k-mer based analysis of read pools, to provide a novel combination of speed and accuracy that is uniquely suited for enhancing discoveries in microbial studies. PMID:29659582

  5. Multi-color space threshold segmentation and self-learning k-NN algorithm for surge test EUT status identification

    NASA Astrophysics Data System (ADS)

    Huang, Jian; Liu, Gui-xiong

    2016-09-01

    The identification of targets varies in different surge tests. A multi-color space threshold segmentation and self-learning k-nearest neighbor algorithm ( k-NN) for equipment under test status identification was proposed after using feature matching to identify equipment status had to train new patterns every time before testing. First, color space (L*a*b*, hue saturation lightness (HSL), hue saturation value (HSV)) to segment was selected according to the high luminance points ratio and white luminance points ratio of the image. Second, the unknown class sample S r was classified by the k-NN algorithm with training set T z according to the feature vector, which was formed from number of pixels, eccentricity ratio, compactness ratio, and Euler's numbers. Last, while the classification confidence coefficient equaled k, made S r as one sample of pre-training set T z '. The training set T z increased to T z+1 by T z ' if T z ' was saturated. In nine series of illuminant, indicator light, screen, and disturbances samples (a total of 21600 frames), the algorithm had a 98.65%identification accuracy, also selected five groups of samples to enlarge the training set from T 0 to T 5 by itself.

  6. Urine cell-based DNA methylation classifier for monitoring bladder cancer.

    PubMed

    van der Heijden, Antoine G; Mengual, Lourdes; Ingelmo-Torres, Mercedes; Lozano, Juan J; van Rijt-van de Westerlo, Cindy C M; Baixauli, Montserrat; Geavlete, Bogdan; Moldoveanud, Cristian; Ene, Cosmin; Dinney, Colin P; Czerniak, Bogdan; Schalken, Jack A; Kiemeney, Lambertus A L M; Ribal, Maria J; Witjes, J Alfred; Alcaraz, Antonio

    2018-01-01

    Current standard methods used to detect and monitor bladder cancer (BC) are invasive or have low sensitivity. This study aimed to develop a urine methylation biomarker classifier for BC monitoring and validate this classifier in patients in follow-up for bladder cancer (PFBC). Voided urine samples ( N  = 725) from BC patients, controls, and PFBC were prospectively collected in four centers. Finally, 626 urine samples were available for analysis. DNA was extracted from the urinary cells and bisulfite modificated, and methylation status was analyzed using pyrosequencing. Cytology was available from a subset of patients ( N  = 399). In the discovery phase, seven selected genes from the literature ( CDH13 , CFTR , NID2 , SALL3 , TMEFF2 , TWIST1 , and VIM2 ) were studied in 111 BC and 57 control samples. This training set was used to develop a gene classifier by logistic regression and was validated in 458 PFBC samples (173 with recurrence). A three-gene methylation classifier containing CFTR , SALL3 , and TWIST1 was developed in the training set (AUC 0.874). The classifier achieved an AUC of 0.741 in the validation series. Cytology results were available for 308 samples from the validation set. Cytology achieved AUC 0.696 whereas the classifier in this subset of patients reached an AUC 0.768. Combining the methylation classifier with cytology results achieved an AUC 0.86 in the validation set, with a sensitivity of 96%, a specificity of 40%, and a positive and negative predictive value of 56 and 92%, respectively. The combination of the three-gene methylation classifier and cytology results has high sensitivity and high negative predictive value in a real clinical scenario (PFBC). The proposed classifier is a useful test for predicting BC recurrence and decrease the number of cystoscopies in the follow-up of BC patients. If only patients with a positive combined classifier result would be cystoscopied, 36% of all cystoscopies can be prevented.

  7. Sustainability of bridge foundations using electrical resistivity imaging and induced polarization to support transportation safety.

    DOT National Transportation Integrated Search

    2014-04-01

    As of September 2007, there were 67,240 U.S. bridges in the National Bridge Inventory classified as having unknown : foundations (FHWA 2008). The bridges spanning rivers are of critical importance due to the risks of potential scour. In fact, it is :...

  8. A novel Bayesian framework for discriminative feature extraction in Brain-Computer Interfaces.

    PubMed

    Suk, Heung-Il; Lee, Seong-Whan

    2013-02-01

    As there has been a paradigm shift in the learning load from a human subject to a computer, machine learning has been considered as a useful tool for Brain-Computer Interfaces (BCIs). In this paper, we propose a novel Bayesian framework for discriminative feature extraction for motor imagery classification in an EEG-based BCI in which the class-discriminative frequency bands and the corresponding spatial filters are optimized by means of the probabilistic and information-theoretic approaches. In our framework, the problem of simultaneous spatiospectral filter optimization is formulated as the estimation of an unknown posterior probability density function (pdf) that represents the probability that a single-trial EEG of predefined mental tasks can be discriminated in a state. In order to estimate the posterior pdf, we propose a particle-based approximation method by extending a factored-sampling technique with a diffusion process. An information-theoretic observation model is also devised to measure discriminative power of features between classes. From the viewpoint of classifier design, the proposed method naturally allows us to construct a spectrally weighted label decision rule by linearly combining the outputs from multiple classifiers. We demonstrate the feasibility and effectiveness of the proposed method by analyzing the results and its success on three public databases.

  9. Protein classification using sequential pattern mining.

    PubMed

    Exarchos, Themis P; Papaloukas, Costas; Lampros, Christos; Fotiadis, Dimitrios I

    2006-01-01

    Protein classification in terms of fold recognition can be employed to determine the structural and functional properties of a newly discovered protein. In this work sequential pattern mining (SPM) is utilized for sequence-based fold recognition. One of the most efficient SPM algorithms, cSPADE, is employed for protein primary structure analysis. Then a classifier uses the extracted sequential patterns for classifying proteins of unknown structure in the appropriate fold category. The proposed methodology exhibited an overall accuracy of 36% in a multi-class problem of 17 candidate categories. The classification performance reaches up to 65% when the three most probable protein folds are considered.

  10. Assessment, origin, and implementation of breath volatile cancer markers

    PubMed Central

    Haick, Hossam; Broza, Yoav Y.; Mochalski, Pawel; Ruzsanyi, Vera; Amann, Anton

    2016-01-01

    A new non-invasive and potentially inexpensive frontier in the diagnosis of cancer relies on the detection of volatile organic compounds (VOCs) in exhaled breath samples. Breath can be sampled and analyzed in real-time, leading to fascinating and cost-effective clinical diagnostic procedures. Nevertheless, breath analysis is a very young field of research and faces challenges, mainly because the biochemical mechanisms behind the cancer-related VOCs are largely unknown. In this review, we present a list of 115 validated cancer-related VOCs published in the literature during the past decade, and classify them with respect to their “fat-to-blood” and “blood-to-air” partition coefficients. These partition coefficients provide an estimation of the relative concentrations of VOCs in alveolar breath, in blood and in the fat compartments of the human body. Additionally, we try to clarify controversial issues concerning possible experimental malpractice in the field, and propose ways to translate the basic science results as well as the mechanistic understanding to tools (sensors) that could serve as point-of-care diagnostics of cancer. We end this review with a conclusion and a future perspective. PMID:24305596

  11. Predicting invertebrate assemblage composition from harvesting pressure and environmental characteristics on tropical reef flats

    NASA Astrophysics Data System (ADS)

    Jimenez, H.; Dumas, P.; Ponton, D.; Ferraris, J.

    2012-03-01

    Invertebrates represent an essential component of coral reef ecosystems; they are ecologically important and a major resource, but their assemblages remain largely unknown, particularly on Pacific islands. Understanding their distribution and building predictive models of community composition as a function of environmental variables therefore constitutes a key issue for resource management. The goal of this study was to define and classify the main environmental factors influencing tropical invertebrate distributions in New Caledonian reef flats and to test the resulting predictive model. Invertebrate assemblages were sampled by visual counting during 2 years and 2 seasons, then coupled to different environmental conditions (habitat composition, hydrodynamics and sediment characteristics) and harvesting status (MPA vs. non-MPA and islets vs. coastal flats). Environmental conditions were described by a principal component analysis (PCA), and contributing variables were selected. Permutational analysis of variance (PERMANOVA) was used to test the effects of different factors (status, flat, year and season) on the invertebrate assemblage composition. Multivariate regression trees (MRT) were then used to hierarchically classify the effects of environmental and harvesting variables. MRT model explained at least 60% of the variation in structure of invertebrate communities. Results highlighted the influence of status (MPA vs. non-MPA) and location (islet vs. coastal flat), followed by habitat composition, organic matter content, hydrodynamics and sampling year. Predicted assemblages defined by indicator families were very different for each environment-exploitation scenario and correctly matched a calibration data matrix. Predictions from MRT including both environmental variables and harvesting pressure can be useful for management of invertebrates in coral reef environments.

  12. On the use of variability time-scales as an early classifier of radio transients and variables

    NASA Astrophysics Data System (ADS)

    Pietka, M.; Staley, T. D.; Pretorius, M. L.; Fender, R. P.

    2017-11-01

    We have shown previously that a broad correlation between the peak radio luminosity and the variability time-scales, approximately L ∝ τ5, exists for variable synchrotron emitting sources and that different classes of astrophysical sources occupy different regions of luminosity and time-scale space. Based on those results, we investigate whether the most basic information available for a newly discovered radio variable or transient - their rise and/or decline rate - can be used to set initial constraints on the class of events from which they originate. We have analysed a sample of ≈800 synchrotron flares, selected from light curves of ≈90 sources observed at 5-8 GHz, representing a wide range of astrophysical phenomena, from flare stars to supermassive black holes. Selection of outbursts from the noisy radio light curves has been done automatically in order to ensure reproducibility of results. The distribution of rise/decline rates for the selected flares is modelled as a Gaussian probability distribution for each class of object, and further convolved with estimated areal density of that class in order to correct for the strong bias in our sample. We show in this way that comparing the measured variability time-scale of a radio transient/variable of unknown origin can provide an early, albeit approximate, classification of the object, and could form part of a suite of measurements used to provide early categorization of such events. Finally, we also discuss the effect scintillating sources will have on our ability to classify events based on their variability time-scales.

  13. The predictive value of soluble biomarkers (CD14 subtype, interleukin-2 receptor, human leucocyte antigen-G) and procalcitonin in the detection of bacteremia and sepsis in pediatric oncology patients with chemotherapy-induced febrile neutropenia.

    PubMed

    Urbonas, Vincas; Eidukaitė, Audronė; Tamulienė, Indrė

    2013-04-01

    Prediction of bacteremia/sepsis in childhood oncology patients with febrile neutropenia still remains a challenge for the medical community due to the lack of reliable biomarkers, especially at the beginning of infectious process. The objective of this study was to evaluate diagnostic value of soluble biomarkers (CD14 subtype, interleukin-2 receptor, HLA-G) and procalcitonin (PCT) in the identification of infectious process at the beginning of a febrile episode in pediatric oncology patients. A total of 62 episodes of febrile neutropenia in 37 childhood oncology patients were enrolled in this study. Serum samples were collected at presentation after confirmation of febrile neutropenia and analyzed according to recommendations of manufacturers. Patients were classified into bacteremia/sepsis and fever of unknown origin groups. Median of PCT and sIL-2R were considerably higher in bacteremia/sepsis group compared to fever of unknown origin group, whereas median of sHLA-G and presepsin levels between investigated groups did not differ sufficiently. PCT and sIL-2R determination might be used as an additional diagnostic tool for the detection of bacteremia/sepsis in childhood oncology patients with febrile neutropenia. Copyright © 2013 Elsevier Ltd. All rights reserved.

  14. Occurrence of Radio Minihalos in a Mass-Limited Sample of Galaxy Clusters

    NASA Technical Reports Server (NTRS)

    Giacintucci, Simona; Markevitch, Maxim; Cassano, Rossella; Venturi, Tiziana; Clarke, Tracy E.; Brunetti, Gianfranco

    2017-01-01

    We investigate the occurrence of radio minihalos-diffuse radio sources of unknown origin observed in the cores of some galaxy clusters-in a statistical sample of 58 clusters drawn from the Planck Sunyaev-Zeldovich cluster catalog using a mass cut (M(sub 500) greater than 6 x 10(exp 14) solar mass). We supplement our statistical sample with a similarly sized nonstatistical sample mostly consisting of clusters in the ACCEPT X-ray catalog with suitable X-ray and radio data, which includes lower-mass clusters. Where necessary (for nine clusters), we reanalyzed the Very Large Array archival radio data to determine whether a minihalo is present. Our total sample includes all 28 currently known and recently discovered radio minihalos, including six candidates. We classify clusters as cool-core or non-cool-core according to the value of the specific entropy floor in the cluster center, rederived or newly derived from the Chandra X-ray density and temperature profiles where necessary (for 27 clusters). Contrary to the common wisdom that minihalos are rare, we find that almost all cool cores-at least 12 out of 15 (80%)-in our complete sample of massive clusters exhibit minihalos. The supplementary sample shows that the occurrence of minihalos may be lower in lower-mass cool-core clusters. No minihalos are found in non-cool cores or "warm cores." These findings will help test theories of the origin of minihalos and provide information on the physical processes and energetics of the cluster cores.

  15. Molecular differences in transition zone and peripheral zone prostate tumors

    PubMed Central

    Sinnott, Jennifer A.; Rider, Jennifer R.; Carlsson, Jessica; Gerke, Travis; Tyekucheva, Svitlana; Penney, Kathryn L.; Sesso, Howard D.; Loda, Massimo; Fall, Katja; Stampfer, Meir J.; Mucci, Lorelei A.; Pawitan, Yudi; Andersson, Sven-Olof; Andrén, Ove

    2015-01-01

    Prostate tumors arise primarily in the peripheral zone (PZ) of the prostate, but 20–30% arise in the transition zone (TZ). Zone of origin may have prognostic value or reflect distinct molecular subtypes; however, it can be difficult to determine in practice. Using whole-genome gene expression, we built a signature of zone using normal tissue from five individuals and found that it successfully classified nine tumors of known zone. Hypothesizing that this signature captures tumor zone of origin, we assessed its relationship with clinical factors among 369 tumors of unknown zone from radical prostatectomies (RPs) and found that tumors that molecularly resembled TZ tumors showed lower mortality (P = 0.09) that was explained by lower Gleason scores (P = 0.009). We further applied the signature to an earlier study of 88 RP and 333 transurethral resection of the prostate (TURP) tumor samples, also of unknown zone, with gene expression on ~6000 genes. We had observed previously substantial expression differences between RP and TURP specimens, and hypothesized that this might be because RPs capture primarily PZ tumors, whereas TURPs capture more TZ tumors. Our signature distinguished these two groups, with an area under the receiver operating characteristic curve of 87% (P < 0.0001). Our findings that zonal differences in normal tissue persist in tumor tissue and that these differences are associated with Gleason score and sample type suggest that subtypes potentially resulting from different etiologic pathways might arise in these zones. Zone of origin may be important to consider in prostate tumor biomarker research. PMID:25870172

  16. Confidence Preserving Machine for Facial Action Unit Detection

    PubMed Central

    Zeng, Jiabei; Chu, Wen-Sheng; De la Torre, Fernando; Cohn, Jeffrey F.; Xiong, Zhang

    2016-01-01

    Facial action unit (AU) detection from video has been a long-standing problem in automated facial expression analysis. While progress has been made, accurate detection of facial AUs remains challenging due to ubiquitous sources of errors, such as inter-personal variability, pose, and low-intensity AUs. In this paper, we refer to samples causing such errors as hard samples, and the remaining as easy samples. To address learning with the hard samples, we propose the Confidence Preserving Machine (CPM), a novel two-stage learning framework that combines multiple classifiers following an “easy-to-hard” strategy. During the training stage, CPM learns two confident classifiers. Each classifier focuses on separating easy samples of one class from all else, and thus preserves confidence on predicting each class. During the testing stage, the confident classifiers provide “virtual labels” for easy test samples. Given the virtual labels, we propose a quasi-semi-supervised (QSS) learning strategy to learn a person-specific (PS) classifier. The QSS strategy employs a spatio-temporal smoothness that encourages similar predictions for samples within a spatio-temporal neighborhood. In addition, to further improve detection performance, we introduce two CPM extensions: iCPM that iteratively augments training samples to train the confident classifiers, and kCPM that kernelizes the original CPM model to promote nonlinearity. Experiments on four spontaneous datasets GFT [15], BP4D [56], DISFA [42], and RU-FACS [3] illustrate the benefits of the proposed CPM models over baseline methods and state-of-the-art semisupervised learning and transfer learning methods. PMID:27479964

  17. What is the importance of classifying Aspergillus disease in cystic fibrosis patients?

    PubMed

    Jones, Andrew M; Horsley, Alex; Denning, David W

    2014-08-01

    Aspergillus species are commonly isolated from lower respiratory tract samples of patients with cystic fibrosis (CF) and markers of immunological sensation to Aspergillus are frequently encountered in this group of patients; however, the contribution of Aspergillus to CF lung disease outside of the typical complications of ABPA and aspergilloma formation remains largely unclear. Patients with CF show discretely different responses to Aspergillus, though the underlying reasons for this variation are unknown. Recent work has begun to allow us to categorize patient responses to Aspergillus based upon molecular markers of infection and immune sensitization. Aspergillus sensitization and/or airway infection is associated with worse FEV1, in CF and other patients (asthma, chronic obstructive pulmonary disease, bronchiectasis). Classification of different clinical phenotypes of Aspergillus will enable future studies to determine the natural history of different manifestations of Aspergillus disease and evaluate the effects of intervention with antifungal therapy.

  18. Centre-based restricted nearest feature plane with angle classifier for face recognition

    NASA Astrophysics Data System (ADS)

    Tang, Linlin; Lu, Huifen; Zhao, Liang; Li, Zuohua

    2017-10-01

    An improved classifier based on the nearest feature plane (NFP), called the centre-based restricted nearest feature plane with the angle (RNFPA) classifier, is proposed for the face recognition problems here. The famous NFP uses the geometrical information of samples to increase the number of training samples, but it increases the computation complexity and it also has an inaccuracy problem coursed by the extended feature plane. To solve the above problems, RNFPA exploits a centre-based feature plane and utilizes a threshold of angle to restrict extended feature space. By choosing the appropriate angle threshold, RNFPA can improve the performance and decrease computation complexity. Experiments in the AT&T face database, AR face database and FERET face database are used to evaluate the proposed classifier. Compared with the original NFP classifier, the nearest feature line (NFL) classifier, the nearest neighbour (NN) classifier and some other improved NFP classifiers, the proposed one achieves competitive performance.

  19. The ANISA Model of Education: A Critique. Issues in Native Education.

    ERIC Educational Resources Information Center

    Four Worlds Development Project, Lethbridge (Alberta).

    The ANISA model of education (D. Streets and D. Jordan) classifies curriculum content into four areas--the physical environment, the human environment, the unknown environment, and the self--and encourages horizontal integration between content areas. The ANISA model holds that the process of learning consists of differentiation, integration, and…

  20. Mystery Powders. [Modified Primary]. Revised. Anchorage School District Elementary Science Program.

    ERIC Educational Resources Information Center

    Anchorage School District, AK.

    This publication provides information and activities for identifying objects using the five senses and process skills including observing, classifying, collecting and interpreting data, inferring, and predicting. Lessons 1 through 3 deal with the identification of an unknown substance and the physical properties of powders. Lessons 4 through 6 are…

  1. Hepatitis A and E Viruses in Wastewaters, in River Waters, and in Bivalve Molluscs in Italy.

    PubMed

    Iaconelli, M; Purpari, G; Della Libera, S; Petricca, S; Guercio, A; Ciccaglione, A R; Bruni, R; Taffon, S; Equestre, M; Fratini, M; Muscillo, M; La Rosa, Giuseppina

    2015-12-01

    Several studies have reported the detection of hepatitis A (HAV) and E (HEV) virus in sewage waters, indicating a possibility of contamination of aquatic environments. The objective of the present study was to assess the occurrence of HAV and HEV in different water environments, following the route of contamination from raw sewage through treated effluent to the surface waters receiving wastewater discharges . Bivalve molluscan shellfish samples were also analyzed, as sentinel of marine pollution. Samples were tested by RT-PCR nested type in the VP1/2A junction for HAV, and in the ORF1 and ORF2 regions for HEV. Hepatitis A RNA was detected in 12 water samples: 7/21 (33.3%) raw sewage samples, 3/21 (14.3%) treated sewage samples, and 2/27 (7.4%) river water samples. Five sequences were classified as genotype IA, while the remaining 7 sequences belonged to genotype IB. In bivalves, HAV was detected in 13/56 samples (23.2%), 12 genotype IB and one genotype IA. Whether the presence of HAV in the matrices tested indicates the potential for waterborne and foodborne transmission is unknown, since infectivity of the virus was not demonstrated. HEV was detected in one raw sewage sample and in one river sample, both belonging to genotype 3. Sequences were similar to sequences detected previously in Italy in patients with autochthonous HEV (no travel history) and in animals (swine). To our knowledge, this is the first detection of HEV in river waters in Italy, suggesting that surface water can be a potential source for exposure .

  2. Effect of separate sampling on classification accuracy.

    PubMed

    Shahrokh Esfahani, Mohammad; Dougherty, Edward R

    2014-01-15

    Measurements are commonly taken from two phenotypes to build a classifier, where the number of data points from each class is predetermined, not random. In this 'separate sampling' scenario, the data cannot be used to estimate the class prior probabilities. Moreover, predetermined class sizes can severely degrade classifier performance, even for large samples. We employ simulations using both synthetic and real data to show the detrimental effect of separate sampling on a variety of classification rules. We establish propositions related to the effect on the expected classifier error owing to a sampling ratio different from the population class ratio. From these we derive a sample-based minimax sampling ratio and provide an algorithm for approximating it from the data. We also extend to arbitrary distributions the classical population-based Anderson linear discriminant analysis minimax sampling ratio derived from the discriminant form of the Bayes classifier. All the codes for synthetic data and real data examples are written in MATLAB. A function called mmratio, whose output is an approximation of the minimax sampling ratio of a given dataset, is also written in MATLAB. All the codes are available at: http://gsp.tamu.edu/Publications/supplementary/shahrokh13b.

  3. Computational Short-cutting the Big Data Classification Bottleneck: Using the MODIS Land Cover Product to Derive a Consistent 30 m Landsat Land Cover Product of the Conterminous United States

    NASA Astrophysics Data System (ADS)

    Zhang, H.; Roy, D. P.

    2016-12-01

    Classification is a fundamental process in remote sensing used to relate pixel values to land cover classes present on the surface. The state of the practice for large area land cover classification is to classify satellite time series metrics with a supervised (i.e., training data dependent) non-parametric classifier. Classification accuracy generally increases with training set size. However, training data collection is expensive and the optimal training distribution over large areas is unknown. The MODIS 500 m land cover product is available globally on an annual basis and so provides a potentially very large source of land cover training data. A novel methodology to classify large volume Landsat data using high quality training data derived automatically from the MODIS land cover product is demonstrated for all of the Conterminous United States (CONUS). The known misclassification accuracy of the MODIS land cover product and the scale difference between the 500 m MODIS and 30 m Landsat data are accommodated for by a novel MODIS product filtering, Landsat pixel selection, and iterative training approach to balance the proportion of local and CONUS training data used. Three years of global Web-enabled Landsat data (WELD) data for all of the CONUS are classified using a random forest classifier and the results assessed using random forest `out-of-bag' training samples. The global WELD data are corrected to surface nadir BRDF-Adjusted Reflectance and are defined in 158 × 158 km tiles in the same projection and nested to the MODIS land cover products. This reduces the need to pre-process the considerable Landsat data volume (more than 14,000 Landsat 5 and 7 scenes per year over the CONUS covering 11,000 million 30 m pixels). The methodology is implemented in a parallel manner on WELD tile by tile basis but provides a wall-to-wall seamless 30 m land cover product. Detailed tile and CONUS results are presented and the potential for global production using the recently available global WELD products are discussed.

  4. Recognition Using Hybrid Classifiers.

    PubMed

    Osadchy, Margarita; Keren, Daniel; Raviv, Dolev

    2016-04-01

    A canonical problem in computer vision is category recognition (e.g., find all instances of human faces, cars etc., in an image). Typically, the input for training a binary classifier is a relatively small sample of positive examples, and a huge sample of negative examples, which can be very diverse, consisting of images from a large number of categories. The difficulty of the problem sharply increases with the dimension and size of the negative example set. We propose to alleviate this problem by applying a "hybrid" classifier, which replaces the negative samples by a prior, and then finds a hyperplane which separates the positive samples from this prior. The method is extended to kernel space and to an ensemble-based approach. The resulting binary classifiers achieve an identical or better classification rate than SVM, while requiring far smaller memory and lower computational complexity to train and apply.

  5. A two-dimensional matrix image based feature extraction method for classification of sEMG: A comparative analysis based on SVM, KNN and RBF-NN.

    PubMed

    Wen, Tingxi; Zhang, Zhongnan; Qiu, Ming; Zeng, Ming; Luo, Weizhen

    2017-01-01

    The computer mouse is an important human-computer interaction device. But patients with physical finger disability are unable to operate this device. Surface EMG (sEMG) can be monitored by electrodes on the skin surface and is a reflection of the neuromuscular activities. Therefore, we can control limbs auxiliary equipment by utilizing sEMG classification in order to help the physically disabled patients to operate the mouse. To develop a new a method to extract sEMG generated by finger motion and apply novel features to classify sEMG. A window-based data acquisition method was presented to extract signal samples from sEMG electordes. Afterwards, a two-dimensional matrix image based feature extraction method, which differs from the classical methods based on time domain or frequency domain, was employed to transform signal samples to feature maps used for classification. In the experiments, sEMG data samples produced by the index and middle fingers at the click of a mouse button were separately acquired. Then, characteristics of the samples were analyzed to generate a feature map for each sample. Finally, the machine learning classification algorithms (SVM, KNN, RBF-NN) were employed to classify these feature maps on a GPU. The study demonstrated that all classifiers can identify and classify sEMG samples effectively. In particular, the accuracy of the SVM classifier reached up to 100%. The signal separation method is a convenient, efficient and quick method, which can effectively extract the sEMG samples produced by fingers. In addition, unlike the classical methods, the new method enables to extract features by enlarging sample signals' energy appropriately. The classical machine learning classifiers all performed well by using these features.

  6. [Pleural mesothelioma in a school teacher: asbestos exposure due to DAS paste].

    PubMed

    Barbieri, Pietro Gino; Somigliana, Anna; Girelli, Roberto; Lombardi, Sandra; Sarnico, Michela; Silvestri, Stefano

    2016-03-24

    Malignant mesothelioma cases among primary school teachers are usually linked with asbestos exposure due to the mineral contained in the building structure. Among the approximately 12,000 cases of mesothelioma described in the fourth report of the National Mesothelioma Register, 11 cases of primary school teachers are reported, in spite of the fact that the "catalogue of asbestos use" does not describe circumstances of asbestos exposure other than or different to that due to asbestos contained in the buildings. Four cases in the Brescia Provincial Mesothelioma Register are identified as teachers, without this circumstance of exposure. To characterize the asbestos concentration and fibre type retained in the lungs of a teacher reported as a new mesothelioma case and preliminarily classified as of unknown asbestos exposure. The mesothelioma case presented here was diagnosed at age 78 and malignant mesothelioma was confirmed at autopsy; the patient was interviewed directly for occupational history. Samples of lung parenchyma from necropsies were collected, stored and analyzed by scanning electron microscope (SEM) and samples of DAS paste were analyzed by SEM to detect asbestos fibre content. It was possible to confirm past exposure to DAS paste in forming and finishing dry items and toys during school recreational activity almost every day from the mid-60s to about the mid-70s. Subsequent SEM analysis showed: i) chrysotile fibres were found in an old and unused pack of DAS paste; ii) a lung burden of 1,400 asbestos bodies, 310.000 total asbestos fibres (33% chrysotile, 67% amphibole) and 210.000 talc fibre per gr/dry lung tissue was detected from necropsies performed on the subject. These results seem to be in agreement with an occupational exposure to asbestos due to past use of DAS paste. After the investigation, this case was reclassified from "unknowun" to " sure" occupational asbestos exposure. The occupational origin of the tumour was recognized by the Italian Workers' Compensation Authority (INAIL). This case suggests i) the need to carry out any possible detailed studies of the circumstances and exposure sources whenever any mesothelioma case is classified as "asbestos exposure unknown", according to the guidelines of the National Mesothelioma Register, ii) handling of DAS paste can be considered as sure asbestos exposure and iii) it should be borne in mind that mesothelioma cases can occur even after cumulative low, occupational exposure, even only to chrysotile.

  7. Transferring genomics to the clinic: distinguishing Burkitt and diffuse large B cell lymphomas.

    PubMed

    Sha, Chulin; Barrans, Sharon; Care, Matthew A; Cunningham, David; Tooze, Reuben M; Jack, Andrew; Westhead, David R

    2015-01-01

    Classifiers based on molecular criteria such as gene expression signatures have been developed to distinguish Burkitt lymphoma and diffuse large B cell lymphoma, which help to explore the intermediate cases where traditional diagnosis is difficult. Transfer of these research classifiers into a clinical setting is challenging because there are competing classifiers in the literature based on different methodology and gene sets with no clear best choice; classifiers based on one expression measurement platform may not transfer effectively to another; and, classifiers developed using fresh frozen samples may not work effectively with the commonly used and more convenient formalin fixed paraffin-embedded samples used in routine diagnosis. Here we thoroughly compared two published high profile classifiers developed on data from different Affymetrix array platforms and fresh-frozen tissue, examining their transferability and concordance. Based on this analysis, a new Burkitt and diffuse large B cell lymphoma classifier (BDC) was developed and employed on Illumina DASL data from our own paraffin-embedded samples, allowing comparison with the diagnosis made in a central haematopathology laboratory and evaluation of clinical relevance. We show that both previous classifiers can be recapitulated using very much smaller gene sets than originally employed, and that the classification result is closely dependent on the Burkitt lymphoma criteria applied in the training set. The BDC classification on our data exhibits high agreement (~95 %) with the original diagnosis. A simple outcome comparison in the patients presenting intermediate features on conventional criteria suggests that the cases classified as Burkitt lymphoma by BDC have worse response to standard diffuse large B cell lymphoma treatment than those classified as diffuse large B cell lymphoma. In this study, we comprehensively investigate two previous Burkitt lymphoma molecular classifiers, and implement a new gene expression classifier, BDC, that works effectively on paraffin-embedded samples and provides useful information for treatment decisions. The classifier is available as a free software package under the GNU public licence within the R statistical software environment through the link http://www.bioinformatics.leeds.ac.uk/labpages/softwares/ or on github https://github.com/Sharlene/BDC.

  8. Lake bed classification using acoustic data

    USGS Publications Warehouse

    Yin, Karen K.; Li, Xing; Bonde, John; Richards, Carl; Cholwek, Gary

    1998-01-01

    As part of our effort to identify the lake bed surficial substrates using remote sensing data, this work designs pattern classifiers by multivariate statistical methods. Probability distribution of the preprocessed acoustic signal is analyzed first. A confidence region approach is then adopted to improve the design of the existing classifier. A technique for further isolation is proposed which minimizes the expected loss from misclassification. The devices constructed are applicable for real-time lake bed categorization. A mimimax approach is suggested to treat more general cases where the a priori probability distribution of the substrate types is unknown. Comparison of the suggested methods with the traditional likelihood ratio tests is discussed.

  9. Local classifier weighting by quadratic programming.

    PubMed

    Cevikalp, Hakan; Polikar, Robi

    2008-10-01

    It has been widely accepted that the classification accuracy can be improved by combining outputs of multiple classifiers. However, how to combine multiple classifiers with various (potentially conflicting) decisions is still an open problem. A rich collection of classifier combination procedures -- many of which are heuristic in nature -- have been developed for this goal. In this brief, we describe a dynamic approach to combine classifiers that have expertise in different regions of the input space. To this end, we use local classifier accuracy estimates to weight classifier outputs. Specifically, we estimate local recognition accuracies of classifiers near a query sample by utilizing its nearest neighbors, and then use these estimates to find the best weights of classifiers to label the query. The problem is formulated as a convex quadratic optimization problem, which returns optimal nonnegative classifier weights with respect to the chosen objective function, and the weights ensure that locally most accurate classifiers are weighted more heavily for labeling the query sample. Experimental results on several data sets indicate that the proposed weighting scheme outperforms other popular classifier combination schemes, particularly on problems with complex decision boundaries. Hence, the results indicate that local classification-accuracy-based combination techniques are well suited for decision making when the classifiers are trained by focusing on different regions of the input space.

  10. RIBAVIRIN: The analysis of a polymorphic substance by LC-MS and FTIR spectroscopy

    NASA Astrophysics Data System (ADS)

    Machal, A. C.; Flurer, R. A.; Brueggemeyer, T. W.; Ellis, L. E.; Satzger, R. D.; Stewart, K. R.

    1998-06-01

    The FTIR laboratory often has the task of identifying unknown pharmaceuticals. This case involves unknown capsules received at the Forensic Chemistry Center. Through extensive searching of pharmaceutical data bases, it was concluded that the capsules might contain ribavirin, which is classified as an anti-viral agent. Mass spectral analysis (LC-MS) concluded that the capsules contained ribavirin; however, the FTIR results did not agree with the mass spectral results. Additional experiments were performed and the results demonstrate the capabilities of FTIR to discern differences between polymorphic forms of a substance, such as ribavirin, when other techniques are unable to provide this information.

  11. An exploratory study of a text classification framework for Internet-based surveillance of emerging epidemics

    PubMed Central

    Torii, Manabu; Yin, Lanlan; Nguyen, Thang; Mazumdar, Chand T.; Liu, Hongfang; Hartley, David M.; Nelson, Noele P.

    2014-01-01

    Purpose Early detection of infectious disease outbreaks is crucial to protecting the public health of a society. Online news articles provide timely information on disease outbreaks worldwide. In this study, we investigated automated detection of articles relevant to disease outbreaks using machine learning classifiers. In a real-life setting, it is expensive to prepare a training data set for classifiers, which usually consists of manually labeled relevant and irrelevant articles. To mitigate this challenge, we examined the use of randomly sampled unlabeled articles as well as labeled relevant articles. Methods Naïve Bayes and Support Vector Machine (SVM) classifiers were trained on 149 relevant and 149 or more randomly sampled unlabeled articles. Diverse classifiers were trained by varying the number of sampled unlabeled articles and also the number of word features. The trained classifiers were applied to 15 thousand articles published over 15 days. Top-ranked articles from each classifier were pooled and the resulting set of 1337 articles was reviewed by an expert analyst to evaluate the classifiers. Results Daily averages of areas under ROC curves (AUCs) over the 15-day evaluation period were 0.841 and 0.836, respectively, for the naïve Bayes and SVM classifier. We referenced a database of disease outbreak reports to confirm that this evaluation data set resulted from the pooling method indeed covered incidents recorded in the database during the evaluation period. Conclusions The proposed text classification framework utilizing randomly sampled unlabeled articles can facilitate a cost-effective approach to training machine learning classifiers in a real-life Internet-based biosurveillance project. We plan to examine this framework further using larger data sets and using articles in non-English languages. PMID:21134784

  12. Use of Unlabeled Samples for Mitigating the Hughes Phenomenon

    NASA Technical Reports Server (NTRS)

    Landgrebe, David A.; Shahshahani, Behzad M.

    1993-01-01

    The use of unlabeled samples in improving the performance of classifiers is studied. When the number of training samples is fixed and small, additional feature measurements may reduce the performance of a statistical classifier. It is shown that by using unlabeled samples, estimates of the parameters can be improved and therefore this phenomenon may be mitigated. Various methods for using unlabeled samples are reviewed and experimental results are provided.

  13. Microbial diversity in nonsulfur, sulfur and iron geothermal steam vents.

    PubMed

    Benson, Courtney A; Bizzoco, Richard W; Lipson, David A; Kelley, Scott T

    2011-04-01

    Fumaroles, commonly called steam vents, are ubiquitous features of geothermal habitats. Recent studies have discovered microorganisms in condensed fumarole steam, but fumarole deposits have proven refractory to DNA isolation. In this study, we report the development of novel DNA isolation approaches for fumarole deposit microbial community analysis. Deposit samples were collected from steam vents and caves in Hawaii Volcanoes National Park, Yellowstone National Park and Lassen Volcanic National Park. Samples were analyzed by X-ray microanalysis and classified as nonsulfur, sulfur or iron-dominated steam deposits. We experienced considerable difficulty in obtaining high-yield, high-quality DNA for cloning: only half of all the samples ultimately yielded sequences. Analysis of archaeal 16S rRNA gene sequences showed that sulfur steam deposits were dominated by Sulfolobus and Acidianus, while nonsulfur deposits contained mainly unknown Crenarchaeota. Several of these novel Crenarchaeota lineages were related to chemoautotrophic ammonia oxidizers, indicating that fumaroles represent a putative habitat for ammonia-oxidizing Archaea. We also generated archaeal and bacterial enrichment cultures from the majority of the deposits and isolated members of the Sulfolobales. Our results provide the first evidence of Archaea in geothermal steam deposits and show that fumaroles harbor diverse and novel microbial lineages. © 2011 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd. All rights reserved.

  14. Veterinary Medicine and Multi-Omics Research for Future Nutrition Targets: Metabolomics and Transcriptomics of the Common Degenerative Mitral Valve Disease in Dogs.

    PubMed

    Li, Qinghong; Freeman, Lisa M; Rush, John E; Huggins, Gordon S; Kennedy, Adam D; Labuda, Jeffrey A; Laflamme, Dorothy P; Hannah, Steven S

    2015-08-01

    Canine degenerative mitral valve disease (DMVD) is the most common form of heart disease in dogs. The objective of this study was to identify cellular and metabolic pathways that play a role in DMVD by performing metabolomics and transcriptomics analyses on serum and tissue (mitral valve and left ventricle) samples previously collected from dogs with DMVD or healthy hearts. Gas or liquid chromatography followed by mass spectrophotometry were used to identify metabolites in serum. Transcriptomics analysis of tissue samples was completed using RNA-seq, and selected targets were confirmed by RT-qPCR. Random Forest analysis was used to classify the metabolites that best predicted the presence of DMVD. Results identified 41 known and 13 unknown serum metabolites that were significantly different between healthy and DMVD dogs, representing alterations in fat and glucose energy metabolism, oxidative stress, and other pathways. The three metabolites with the greatest single effect in the Random Forest analysis were γ-glutamylmethionine, oxidized glutathione, and asymmetric dimethylarginine. Transcriptomics analysis identified 812 differentially expressed transcripts in left ventricle samples and 263 in mitral valve samples, representing changes in energy metabolism, antioxidant function, nitric oxide signaling, and extracellular matrix homeostasis pathways. Many of the identified alterations may benefit from nutritional or medical management. Our study provides evidence of the growing importance of integrative approaches in multi-omics research in veterinary and nutritional sciences.

  15. Employee adiposity and incivility: establishing a link and identifying demographic moderators and negative consequences.

    PubMed

    Sliter, Katherine A; Sliter, Michael T; Withrow, Scott A; Jex, Steve M

    2012-10-01

    The prevalence of increased adiposity among employees in the American workplace has resulted in significant economic costs to organizations. Unfortunately, relatively little research has examined the effects of excess adiposity on employees themselves. As a step toward remedying this, the current study examined a previously unknown link between adiposity and incivility, and how this might impact employee burnout and withdrawal. A student sample was used to initially establish a link between incivility and adiposity, and an applied sample of employees from across the United States was used to more fully test the relationships among incivility, adiposity, burnout, and withdrawal. Finally, the moderating effects of sex and race on these relationships were examined. Preliminary data from 341 student employees revealed that being overly adipose was related to greater reports of workplace incivility, with the effect strongest for those classified as obese. An interaction between sex and adiposity was also found, as well as a three-way interaction among sex, race, and adiposity. These relationships were replicated using a nationwide sample of 528 full-time employees. An interaction between race and adiposity was also found in this second sample. Finally, a model was tested in which incivility was shown to partially mediate the positive relationship between adiposity and the outcome of withdrawal, with both sex and race acting as moderators. Theoretical and practical implications of the findings and future directions are discussed.

  16. Biodosimetry results from space flight Mir-18.

    PubMed

    Yang, T C; George, K; Johnson, A S; Durante, M; Fedorenko, B S

    1997-11-01

    Astronauts are classified as radiation workers due to the presence of ionizing radiation in space. For the assessment of health risks, physical dosimetry has been indispensable. However, the change of the location of dosimeters on the crew members, the variation in dose rate with location inside the spacecraft and the unknown biological effects of microgravity can introduce significant uncertainties in estimating exposure. To circumvent such uncertainty, a study on the cytogenetic effects of space radiation in human lymphocytes was proposed and conducted for Mir-18, a 115-day mission. This study used fluorescence in situ hybridization (FISH) with whole-chromosome painting probes to score chromosomal exchanges and the Giemsa staining method to determine the frequency of dicentrics. The growth kinetics of cells and sister chromatid exchanges (SCEs) were examined to ensure that chromosomal aberrations were scored in the first mitosis and were induced primarily by space radiation. Our results showed that the frequency of chromosomal aberrations increased significantly in postflight samples compared to samples drawn prior to flight, and that the frequency of SCEs was similar for both pre- and postflight samples. Based on a dose-response curve for preflight samples exposed to gamma rays, the absorbed dose received by crew members during the mission was estimated to be about 14.75 cSv. Because the absorbed dose measured by physical dosimeters is 5.2 cGy for the entire mission, the RBE is about 2.8.

  17. Biodosimetry results from space flight Mir-18

    NASA Technical Reports Server (NTRS)

    Yang, T. C.; George, K.; Johnson, A. S.; Durante, M.; Fedorenko, B. S.

    1997-01-01

    Astronauts are classified as radiation workers due to the presence of ionizing radiation in space. For the assessment of health risks, physical dosimetry has been indispensable. However, the change of the location of dosimeters on the crew members, the variation in dose rate with location inside the spacecraft and the unknown biological effects of microgravity can introduce significant uncertainties in estimating exposure. To circumvent such uncertainty, a study on the cytogenetic effects of space radiation in human lymphocytes was proposed and conducted for Mir-18, a 115-day mission. This study used fluorescence in situ hybridization (FISH) with whole-chromosome painting probes to score chromosomal exchanges and the Giemsa staining method to determine the frequency of dicentrics. The growth kinetics of cells and sister chromatid exchanges (SCEs) were examined to ensure that chromosomal aberrations were scored in the first mitosis and were induced primarily by space radiation. Our results showed that the frequency of chromosomal aberrations increased significantly in postflight samples compared to samples drawn prior to flight, and that the frequency of SCEs was similar for both pre- and postflight samples. Based on a dose-response curve for preflight samples exposed to gamma rays, the absorbed dose received by crew members during the mission was estimated to be about 14.75 cSv. Because the absorbed dose measured by physical dosimeters is 5.2 cGy for the entire mission, the RBE is about 2.8.

  18. A consensus prognostic gene expression classifier for ER positive breast cancer

    PubMed Central

    Teschendorff, Andrew E; Naderi, Ali; Barbosa-Morais, Nuno L; Pinder, Sarah E; Ellis, Ian O; Aparicio, Sam; Brenton, James D; Caldas, Carlos

    2006-01-01

    Background A consensus prognostic gene expression classifier is still elusive in heterogeneous diseases such as breast cancer. Results Here we perform a combined analysis of three major breast cancer microarray data sets to hone in on a universally valid prognostic molecular classifier in estrogen receptor (ER) positive tumors. Using a recently developed robust measure of prognostic separation, we further validate the prognostic classifier in three external independent cohorts, confirming the validity of our molecular classifier in a total of 877 ER positive samples. Furthermore, we find that molecular classifiers may not outperform classical prognostic indices but that they can be used in hybrid molecular-pathological classification schemes to improve prognostic separation. Conclusion The prognostic molecular classifier presented here is the first to be valid in over 877 ER positive breast cancer samples and across three different microarray platforms. Larger multi-institutional studies will be needed to fully determine the added prognostic value of molecular classifiers when combined with standard prognostic factors. PMID:17076897

  19. Machine Learning Through Signature Trees. Applications to Human Speech.

    ERIC Educational Resources Information Center

    White, George M.

    A signature tree is a binary decision tree used to classify unknown patterns. An attempt was made to develop a computer program for manipulating signature trees as a general research tool for exploring machine learning and pattern recognition. The program was applied to the problem of speech recognition to test its effectiveness for a specific…

  20. Accuracy of Referring Provider and Endoscopist Impressions of Colonoscopy Indication.

    PubMed

    Naveed, Mariam; Clary, Meredith; Ahn, Chul; Kubiliun, Nisa; Agrawal, Deepak; Cryer, Byron; Murphy, Caitlin; Singal, Amit G

    2017-07-01

    Background: Referring provider and endoscopist impressions of colonoscopy indication are used for clinical care, reimbursement, and quality reporting decisions; however, the accuracy of these impressions is unknown. This study assessed the sensitivity, specificity, positive and negative predictive value, and overall accuracy of methods to classify colonoscopy indication, including referring provider impression, endoscopist impression, and administrative algorithm compared with gold standard chart review. Methods: We randomly sampled 400 patients undergoing a colonoscopy at a Veterans Affairs health system between January 2010 and December 2010. Referring provider and endoscopist impressions of colonoscopy indication were compared with gold-standard chart review. Indications were classified into 4 mutually exclusive categories: diagnostic, surveillance, high-risk screening, or average-risk screening. Results: Of 400 colonoscopies, 26% were performed for average-risk screening, 7% for high-risk screening, 26% for surveillance, and 41% for diagnostic indications. Accuracy of referring provider and endoscopist impressions of colonoscopy indication were 87% and 84%, respectively, which were significantly higher than that of the administrative algorithm (45%; P <.001 for both). There was substantial agreement between endoscopist and referring provider impressions (κ=0.76). All 3 methods showed high sensitivity (>90%) for determining screening (vs nonscreening) indication, but specificity of the administrative algorithm was lower (40.3%) compared with referring provider (93.7%) and endoscopist (84.0%) impressions. Accuracy of endoscopist, but not referring provider, impression was lower in patients with a family history of colon cancer than in those without (65% vs 84%; P =.001). Conclusions: Referring provider and endoscopist impressions of colonoscopy indication are both accurate and may be useful data to incorporate into algorithms classifying colonoscopy indication. Copyright © 2017 by the National Comprehensive Cancer Network.

  1. Mechanical Fault Diagnosis of High Voltage Circuit Breakers Based on Variational Mode Decomposition and Multi-Layer Classifier.

    PubMed

    Huang, Nantian; Chen, Huaijin; Cai, Guowei; Fang, Lihua; Wang, Yuqiang

    2016-11-10

    Mechanical fault diagnosis of high-voltage circuit breakers (HVCBs) based on vibration signal analysis is one of the most significant issues in improving the reliability and reducing the outage cost for power systems. The limitation of training samples and types of machine faults in HVCBs causes the existing mechanical fault diagnostic methods to recognize new types of machine faults easily without training samples as either a normal condition or a wrong fault type. A new mechanical fault diagnosis method for HVCBs based on variational mode decomposition (VMD) and multi-layer classifier (MLC) is proposed to improve the accuracy of fault diagnosis. First, HVCB vibration signals during operation are measured using an acceleration sensor. Second, a VMD algorithm is used to decompose the vibration signals into several intrinsic mode functions (IMFs). The IMF matrix is divided into submatrices to compute the local singular values (LSV). The maximum singular values of each submatrix are selected as the feature vectors for fault diagnosis. Finally, a MLC composed of two one-class support vector machines (OCSVMs) and a support vector machine (SVM) is constructed to identify the fault type. Two layers of independent OCSVM are adopted to distinguish normal or fault conditions with known or unknown fault types, respectively. On this basis, SVM recognizes the specific fault type. Real diagnostic experiments are conducted with a real SF₆ HVCB with normal and fault states. Three different faults (i.e., jam fault of the iron core, looseness of the base screw, and poor lubrication of the connecting lever) are simulated in a field experiment on a real HVCB to test the feasibility of the proposed method. Results show that the classification accuracy of the new method is superior to other traditional methods.

  2. Mechanical Fault Diagnosis of High Voltage Circuit Breakers Based on Variational Mode Decomposition and Multi-Layer Classifier

    PubMed Central

    Huang, Nantian; Chen, Huaijin; Cai, Guowei; Fang, Lihua; Wang, Yuqiang

    2016-01-01

    Mechanical fault diagnosis of high-voltage circuit breakers (HVCBs) based on vibration signal analysis is one of the most significant issues in improving the reliability and reducing the outage cost for power systems. The limitation of training samples and types of machine faults in HVCBs causes the existing mechanical fault diagnostic methods to recognize new types of machine faults easily without training samples as either a normal condition or a wrong fault type. A new mechanical fault diagnosis method for HVCBs based on variational mode decomposition (VMD) and multi-layer classifier (MLC) is proposed to improve the accuracy of fault diagnosis. First, HVCB vibration signals during operation are measured using an acceleration sensor. Second, a VMD algorithm is used to decompose the vibration signals into several intrinsic mode functions (IMFs). The IMF matrix is divided into submatrices to compute the local singular values (LSV). The maximum singular values of each submatrix are selected as the feature vectors for fault diagnosis. Finally, a MLC composed of two one-class support vector machines (OCSVMs) and a support vector machine (SVM) is constructed to identify the fault type. Two layers of independent OCSVM are adopted to distinguish normal or fault conditions with known or unknown fault types, respectively. On this basis, SVM recognizes the specific fault type. Real diagnostic experiments are conducted with a real SF6 HVCB with normal and fault states. Three different faults (i.e., jam fault of the iron core, looseness of the base screw, and poor lubrication of the connecting lever) are simulated in a field experiment on a real HVCB to test the feasibility of the proposed method. Results show that the classification accuracy of the new method is superior to other traditional methods. PMID:27834902

  3. The pathology of lumbosacral lipomas: macroscopic and microscopic disparity have implications for embryogenesis and mode of clinical deterioration.

    PubMed

    Jones, Victoria; Wykes, Victoria; Cohen, Nicki; Thompson, Dominic; Jacques, Tom S

    2018-06-01

    Lumbosacral lipomas (LSL) are congenital disorders of the terminal spinal cord region that have the potential to cause significant spinal cord dysfunction in children. They are of unknown embryogenesis with variable clinical presentation and natural history. It is unclear whether the spinal cord dysfunction reflects a primary developmental dysplasia or whether it occurs secondarily to mechanical traction (spinal cord tethering) with growth. While different anatomical subtypes are recognised and classified according to radiological criteria, these subtypes correlate poorly with clinical prognosis. We have undertaken an analysis of surgical specimens in order to describe the spectrum of histological changes that occur and have correlated the histology with the anatomical type of LSL to determine if there are distinct histological subtypes. The histopathology was reviewed of 64 patients who had undergone surgical resection of LSL. The presence of additional tissues and cell types were recorded. LSLs were classified from pre-operative magnetic resonance imaging (MRI) scans according to Chapman classification. Ninety-five per cent of the specimens consisted predominantly of mature adipocytes with all containing thickened bands of connective tissue and peripheral nerve fibres, 91% of samples contained ectatic blood vessels with thickened walls, while 22% contained central nervous system (CNS) glial tissue. Additional tissue was identified of both mesodermal and neuroectodermal origin. Our analysis highlights the heterogeneity of tissue types within all samples, not reflected in the nomenclature. The diversity of tissue types, consistent across all subtypes, challenges currently held notions regarding the embryogenesis of LSLs and the assumption that clinical deterioration is due simply to tethering. © 2018 The Authors. Histopathology Published by John Wiley & Sons Ltd.

  4. Towards optimized methods to study viral impacts on soil microbial carbon cycling

    NASA Astrophysics Data System (ADS)

    Trubl, G. G.; Roux, S.; Jang, H. B.; Solonenko, N.; Sullivan, M. B.; Rich, V. I.

    2016-12-01

    Permafrost contains 50% of global soil carbon and is rapidly thawing. While the fate of this carbon is currently unknown, it will undoubtedly be shaped by microbes and their associated viruses, which modulate host activities via mortality and metabolic control. However, little is known about soil viruses generally and their impact on terrestrial biogeochemistry; this is partially due to the presence of inhibitory substances (e.g. humic acids) in soils that interfere with sample processing and sequence-based metagenomics surveys. To address this problem, we examined viral populations in three different peat soils along a permafrost thaw gradient. These samples yielded low viral DNA recoveries, and shallow metagenomic sequencing, but still resulted in the recovery of 40 viral genome fragments. Genome- and network-based classification suggested that these new references represented 11 viral clusters, and ecological patterns (based upon non-redundant fragment recruitment) showed that viral populations were distinct in each habitat. Although only 31% of the genes could be functionally classified, pairwise genome comparisons classified 63% of the viruses taxonomically. Additionally, comparison of the 40 viral genome fragments to 53 previously recovered fragments from the same site showed no overlap, suggesting only a small portion of the resident viral community has been sampled. A follow-up experiment was performed to remove more humics during extraction and thereby obtain better viral metagenomes. Three DNA extraction protocols were tested (CTAB, PowerSoil, and Wizard columns) and the DNA was further purified with an AMPure clean-up. The PowerSoil kit maximized DNA yield (3x CTAB and 6x Wizard), and yielded the purest DNA (based on NanoDrop 260:230 ratio). Given the important roles of viruses in biogeochemical cycles in better-studied systems, further research and humic-removal optimization on these thawing permafrost-associated viral communities is needed to clarify their involvement in carbon cycle feedbacks.

  5. Occurrence of Radio Minihalos in a Mass-limited Sample of Galaxy Clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Giacintucci, Simona; Clarke, Tracy E.; Markevitch, Maxim

    2017-06-01

    We investigate the occurrence of radio minihalos—diffuse radio sources of unknown origin observed in the cores of some galaxy clusters—in a statistical sample of 58 clusters drawn from the Planck Sunyaev–Zel’dovich cluster catalog using a mass cut ( M {sub 500} > 6 × 10{sup 14} M {sub ⊙}). We supplement our statistical sample with a similarly sized nonstatistical sample mostly consisting of clusters in the ACCEPT X-ray catalog with suitable X-ray and radio data, which includes lower-mass clusters. Where necessary (for nine clusters), we reanalyzed the Very Large Array archival radio data to determine whether a minihalo is present.more » Our total sample includes all 28 currently known and recently discovered radio minihalos, including six candidates. We classify clusters as cool-core or non-cool-core according to the value of the specific entropy floor in the cluster center, rederived or newly derived from the Chandra X-ray density and temperature profiles where necessary (for 27 clusters). Contrary to the common wisdom that minihalos are rare, we find that almost all cool cores—at least 12 out of 15 (80%)—in our complete sample of massive clusters exhibit minihalos. The supplementary sample shows that the occurrence of minihalos may be lower in lower-mass cool-core clusters. No minihalos are found in non-cool cores or “warm cores.” These findings will help test theories of the origin of minihalos and provide information on the physical processes and energetics of the cluster cores.« less

  6. Four groups of new aromatic halogenated disinfection byproducts: effect of bromide concentration on their formation and speciation in chlorinated drinking water.

    PubMed

    Pan, Yang; Zhang, Xiangru

    2013-02-05

    Bromide is naturally present in source waters worldwide. Chlorination of drinking water can generate a variety of chlorinated and brominated disinfection byproducts (DBPs). Although substantial efforts have been made to examine the effect of bromide concentration on the formation and speciation of halogenated DBPs, almost all previous studies have focused on trihalomethanes and haloacetic acids. Given that about 50% of total organic halogen formed in chlorination remains unknown, it is still unclear how bromide concentration affects the formation and speciation of the new/unknown halogenated DBPs. In this study, chlorinated drinking water samples with different bromide concentrations were prepared, and a novel approach-precursor ion scan using ultra performance liquid chromatography/electrospray ionization-triple quadrupole mass spectrometry-was adopted for the detection and identification of polar halogenated DBPs in these water samples. With this approach, 11 new putative aromatic halogenated DBPs were identified, and they were classified into four groups: dihalo-4-hydroxybenzaldehydes, dihalo-4-hydroxybenzoic acids, dihalo-salicylic acids, and trihalo-phenols. A mechanism for the formation of the four groups of new aromatic halogenated DBPs was proposed. It was found that increasing the bromide concentration shifted the entire polar halogenated DBPs as well as the four groups of new DBPs from being less brominated to being more brominated; these new aromatic halogenated DBPs might be important intermediate DBPs formed in drinking water chlorination. Moreover, the speciation of the four groups of new DBPs was modeled: the speciation patterns of the four groups of new DBPs well matched those determined from the model equations, and the reactivity differences between HOBr and HOCl in reactions forming the four groups of new DBPs were larger than those in reactions forming trihalomethanes and haloacetic acids.

  7. The natural abundance of 13C with different agricultural management by NIRS with fibre optic probe technology.

    PubMed

    Fuentes, Mariela; González-Martín, Inmaculada; Hernández-Hierro, Jose Miguel; Hidalgo, Claudia; Govaerts, Bram; Etchevers, Jorge; Sayre, Ken D; Dendooven, Luc

    2009-06-30

    In the present study the natural abundance of (13)C is quantified in agricultural soils in Mexico which have been submitted to different agronomic practices, zero and conventional tillage, retention of crop residues (with and without) and rotation of crops (wheat and maize) for 17 years, which have influenced the physical, chemical and biological characteristics of the soil. The natural abundance of C13 is quantified by near infrared spectra (NIRS) with a remote reflectance fibre optic probe, applying the probe directly to the soil samples. Discriminate partial least squares analysis of the near infrared spectra allowed to classify soils with and without residues, regardless of the type of tillage or rotation systems used with a prediction rate of 90% in the internal validation and 94% in the external validation. The NIRS calibration model using a modified partial least squares regression allowed to determine the delta(13)C in soils with or without residues, with multiple correlation coefficients 0.81 and standard error prediction 0.5 per thousand in soils with residues and 0.92 and 0.2 per thousand in soils without residues. The ratio performance deviation for the quantification of delta(13)C in soil was 2.5 in soil with residues and 3.8 without residues. This indicated that the model was adequate to determine the delta(13)C of unknown soils in the -16.2 per thousand to -20.4 per thousand range. The development of the NIR calibration permits analytic determinations of the values of delta(13)C in unknown agricultural soils in less time, employing a non-destructive method, by the application of the fibre optic probe of remote reflectance to the soil sample.

  8. Improving Hospital-Wide Early Resource Allocation through Machine Learning.

    PubMed

    Gartner, Daniel; Padman, Rema

    2015-01-01

    The objective of this paper is to evaluate the extent to which early determination of diagnosis-related groups (DRGs) can be used for better allocation of scarce hospital resources. When elective patients seek admission, the true DRG, currently determined only at discharge, is unknown. We approach the problem of early DRG determination in three stages: (1) test how much a Naïve Bayes classifier can improve classification accuracy as compared to a hospital's current approach; (2) develop a statistical program that makes admission and scheduling decisions based on the patients' clincial pathways and scarce hospital resources; and (3) feed the DRG as classified by the Naïve Bayes classifier and the hospitals' baseline approach into the model (which we evaluate in simulation). Our results reveal that the DRG grouper performs poorly in classifying the DRG correctly before admission while the Naïve Bayes approach substantially improves the classification task. The results from the connection of the classification method with the mathematical program also reveal that resource allocation decisions can be more effective and efficient with the hybrid approach.

  9. Automatic threshold selection for multi-class open set recognition

    NASA Astrophysics Data System (ADS)

    Scherreik, Matthew; Rigling, Brian

    2017-05-01

    Multi-class open set recognition is the problem of supervised classification with additional unknown classes encountered after a model has been trained. An open set classifer often has two core components. The first component is a base classifier which estimates the most likely class of a given example. The second component consists of open set logic which estimates if the example is truly a member of the candidate class. Such a system is operated in a feed-forward fashion. That is, a candidate label is first estimated by the base classifier, and the true membership of the example to the candidate class is estimated afterward. Previous works have developed an iterative threshold selection algorithm for rejecting examples from classes which were not present at training time. In those studies, a Platt-calibrated SVM was used as the base classifier, and the thresholds were applied to class posterior probabilities for rejection. In this work, we investigate the effectiveness of other base classifiers when paired with the threshold selection algorithm and compare their performance with the original SVM solution.

  10. Age determination of bottled Chinese rice wine by VIS-NIR spectroscopy

    NASA Astrophysics Data System (ADS)

    Yu, Haiyan; Lin, Tao; Ying, Yibin; Pan, Xingxiang

    2006-10-01

    The feasibility of non-invasive visible and near infrared (VIS-NIR) spectroscopy for determining wine age (1, 2, 3, 4, and 5 years) of Chinese rice wine was investigated. Samples of Chinese rice wine were analyzed in 600 mL square brown glass bottles with side length of approximately 64 mm at room temperature. VIS-NIR spectra of 100 bottled Chinese rice wine samples were collected in transmission mode in the wavelength range of 350-1200 nm by a fiber spectrometer system. Discriminant models were developed based on discriminant analysis (DA) together with raw, first and second derivative spectra. The concentration of alcoholic degree, total acid, and °Brix was determined to validate the NIR results. The calibration result for raw spectra was better than that for first and second derivative spectra. The percentage of samples correctly classified for raw spectra was 98%. For 1-, 2-, and 3-year-old sample groups, the sample were all correctly classified, and for 4- and 5-year-old sample groups, the percentage of samples correctly classified was 92.9%, respectively. In validation analysis, the percentage of samples correctly classified was 100%. The results demonstrated that VIS-NIR spectroscopic technique could be used as a non-invasive, rapid and reliable method for predicting wine age of bottled Chinese rice wine.

  11. Fuzziness-based active learning framework to enhance hyperspectral image classification performance for discriminative and generative classifiers

    PubMed Central

    2018-01-01

    Hyperspectral image classification with a limited number of training samples without loss of accuracy is desirable, as collecting such data is often expensive and time-consuming. However, classifiers trained with limited samples usually end up with a large generalization error. To overcome the said problem, we propose a fuzziness-based active learning framework (FALF), in which we implement the idea of selecting optimal training samples to enhance generalization performance for two different kinds of classifiers, discriminative and generative (e.g. SVM and KNN). The optimal samples are selected by first estimating the boundary of each class and then calculating the fuzziness-based distance between each sample and the estimated class boundaries. Those samples that are at smaller distances from the boundaries and have higher fuzziness are chosen as target candidates for the training set. Through detailed experimentation on three publically available datasets, we showed that when trained with the proposed sample selection framework, both classifiers achieved higher classification accuracy and lower processing time with the small amount of training data as opposed to the case where the training samples were selected randomly. Our experiments demonstrate the effectiveness of our proposed method, which equates favorably with the state-of-the-art methods. PMID:29304512

  12. Ensemble stump classifiers and gene expression signatures in lung cancer.

    PubMed

    Frey, Lewis; Edgerton, Mary; Fisher, Douglas; Levy, Shawn

    2007-01-01

    Microarray data sets for cancer tumor tissue generally have very few samples, each sample having thousands of probes (i.e., continuous variables). The sparsity of samples makes it difficult for machine learning techniques to discover probes relevant to the classification of tumor tissue. By combining data from different platforms (i.e., data sources), data sparsity is reduced, but this typically requires normalizing data from the different platforms, which can be non-trivial. This paper proposes a variant on the idea of ensemble learners to circumvent the need for normalization. To facilitate comprehension we build ensembles of very simple classifiers known as decision stumps--decision trees of one test each. The Ensemble Stump Classifier (ESC) identifies an mRNA signature having three probes and high accuracy for distinguishing between adenocarcinoma and squamous cell carcinoma of the lung across four data sets. In terms of accuracy, ESC outperforms a decision tree classifier on all four data sets, outperforms ensemble decision trees on three data sets, and simple stump classifiers on two data sets.

  13. Correcting Classifiers for Sample Selection Bias in Two-Phase Case-Control Studies

    PubMed Central

    Theis, Fabian J.

    2017-01-01

    Epidemiological studies often utilize stratified data in which rare outcomes or exposures are artificially enriched. This design can increase precision in association tests but distorts predictions when applying classifiers on nonstratified data. Several methods correct for this so-called sample selection bias, but their performance remains unclear especially for machine learning classifiers. With an emphasis on two-phase case-control studies, we aim to assess which corrections to perform in which setting and to obtain methods suitable for machine learning techniques, especially the random forest. We propose two new resampling-based methods to resemble the original data and covariance structure: stochastic inverse-probability oversampling and parametric inverse-probability bagging. We compare all techniques for the random forest and other classifiers, both theoretically and on simulated and real data. Empirical results show that the random forest profits from only the parametric inverse-probability bagging proposed by us. For other classifiers, correction is mostly advantageous, and methods perform uniformly. We discuss consequences of inappropriate distribution assumptions and reason for different behaviors between the random forest and other classifiers. In conclusion, we provide guidance for choosing correction methods when training classifiers on biased samples. For random forests, our method outperforms state-of-the-art procedures if distribution assumptions are roughly fulfilled. We provide our implementation in the R package sambia. PMID:29312464

  14. Reconstruction from limited single-particle diffraction data via simultaneous determination of state, orientation, intensity, and phase

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Donatelli, Jeffrey J.; Sethian, James A.; Zwart, Peter H.

    Free-electron lasers now have the ability to collect X-ray diffraction patterns from individual molecules; however, each sample is delivered at unknown orientation and may be in one of several conformational states, each with a different molecular structure. Hit rates are often low, typically around 0.1%, limiting the number of useful images that can be collected. Determining accurate structural information requires classifying and orienting each image, accurately assembling them into a 3D diffraction intensity function, and determining missing phase information. Additionally, single particles typically scatter very few photons, leading to high image noise levels. We develop a multitiered iterative phasing algorithmmore » to reconstruct structural information from singleparticle diffraction data by simultaneously determining the states, orientations, intensities, phases, and underlying structure in a single iterative procedure. We leverage real-space constraints on the structure to help guide optimization and reconstruct underlying structure from very few images with excellent global convergence properties. We show that this approach can determine structural resolution beyond what is suggested by standard Shannon sampling arguments for ideal images and is also robust to noise.« less

  15. Reconstruction from limited single-particle diffraction data via simultaneous determination of state, orientation, intensity, and phase

    DOE PAGES

    Donatelli, Jeffrey J.; Sethian, James A.; Zwart, Peter H.

    2017-06-26

    Free-electron lasers now have the ability to collect X-ray diffraction patterns from individual molecules; however, each sample is delivered at unknown orientation and may be in one of several conformational states, each with a different molecular structure. Hit rates are often low, typically around 0.1%, limiting the number of useful images that can be collected. Determining accurate structural information requires classifying and orienting each image, accurately assembling them into a 3D diffraction intensity function, and determining missing phase information. Additionally, single particles typically scatter very few photons, leading to high image noise levels. We develop a multitiered iterative phasing algorithmmore » to reconstruct structural information from singleparticle diffraction data by simultaneously determining the states, orientations, intensities, phases, and underlying structure in a single iterative procedure. We leverage real-space constraints on the structure to help guide optimization and reconstruct underlying structure from very few images with excellent global convergence properties. We show that this approach can determine structural resolution beyond what is suggested by standard Shannon sampling arguments for ideal images and is also robust to noise.« less

  16. An expert system shell for inferring vegetation characteristics: The learning system (tasks C and D)

    NASA Technical Reports Server (NTRS)

    Harrison, P. Ann; Harrison, Patrick R.

    1992-01-01

    This report describes the implementation of a learning system that uses a data base of historical cover type reflectance data taken at different solar zenith angles and wavelengths to learn class descriptions of classes of cover types. It has been integrated with the VEG system and requires that the VEG system be loaded to operate. VEG is the NASA VEGetation workbench - an expert system for inferring vegetation characteristics from reflectance data. The learning system provides three basic options. Using option one, the system learns class descriptions of one or more classes. Using option two, the system learns class descriptions of one or more classes and then uses the learned classes to classify an unknown sample. Using option three, the user can test the system's classification performance. The learning system can also be run in an automatic mode. In this mode, options two and three are executed on each sample from an input file. The system was developed using KEE. It is menu driven and contains a sophisticated window and mouse driven interface which guides the user through various computations. Input and output file management and data formatting facilities are also provided.

  17. Managing pregnancy of unknown location based on initial serum progesterone and serial serum hCG levels: development and validation of a two-step triage protocol.

    PubMed

    Van Calster, B; Bobdiwala, S; Guha, S; Van Hoorde, K; Al-Memar, M; Harvey, R; Farren, J; Kirk, E; Condous, G; Sur, S; Stalder, C; Timmerman, D; Bourne, T

    2016-11-01

    A uniform rationalized management protocol for pregnancies of unknown location (PUL) is lacking. We developed a two-step triage protocol to select PUL at high risk of ectopic pregnancy (EP), based on serum progesterone level at presentation (step 1) and the serum human chorionic gonadotropin (hCG) ratio, defined as the ratio of hCG at 48 h to hCG at presentation (step 2). This was a cohort study of 2753 PUL (301 EP), involving a secondary analysis of prospectively and consecutively collected PUL data from two London-based university teaching hospitals. Using a chronological split we used 1449 PUL for development and 1304 for validation. We aimed to assign PUL as low risk with high confidence (high negative predictive value (NPV)) while classifying most EP as high risk (high sensitivity). The first triage step assigned PUL as low risk using a threshold of serum progesterone at presentation. The remaining PUL were triaged using a novel logistic regression risk model based on hCG ratio and initial serum progesterone (second step), defining low risk as an estimated EP risk of < 5%. On validation, initial serum progesterone ≤ 2 nmol/L (step 1) classified 16.1% PUL as low risk. Second-step classification with the risk model selected an additional 46.0% of all PUL as low risk. Overall, the two-step protocol classified 62.1% of PUL as low risk, with an NPV of 98.6% and a sensitivity of 92.0%. When the risk model was used in isolation (i.e. without the first step), 60.5% of PUL were classified as low risk with 99.1% NPV and 94.9% sensitivity. PUL can be classified efficiently into being either high or low risk for complications using a two-step protocol involving initial progesterone and hCG levels and the hCG ratio. Copyright © 2016 ISUOG. Published by John Wiley & Sons Ltd. Copyright © 2016 ISUOG. Published by John Wiley & Sons Ltd.

  18. Simulation techniques for estimating error in the classification of normal patterns

    NASA Technical Reports Server (NTRS)

    Whitsitt, S. J.; Landgrebe, D. A.

    1974-01-01

    Methods of efficiently generating and classifying samples with specified multivariate normal distributions were discussed. Conservative confidence tables for sample sizes are given for selective sampling. Simulation results are compared with classified training data. Techniques for comparing error and separability measure for two normal patterns are investigated and used to display the relationship between the error and the Chernoff bound.

  19. Determination of Minimum Training Sample Size for Microarray-Based Cancer Outcome Prediction–An Empirical Assessment

    PubMed Central

    Cheng, Ningtao; Wu, Leihong; Cheng, Yiyu

    2013-01-01

    The promise of microarray technology in providing prediction classifiers for cancer outcome estimation has been confirmed by a number of demonstrable successes. However, the reliability of prediction results relies heavily on the accuracy of statistical parameters involved in classifiers. It cannot be reliably estimated with only a small number of training samples. Therefore, it is of vital importance to determine the minimum number of training samples and to ensure the clinical value of microarrays in cancer outcome prediction. We evaluated the impact of training sample size on model performance extensively based on 3 large-scale cancer microarray datasets provided by the second phase of MicroArray Quality Control project (MAQC-II). An SSNR-based (scale of signal-to-noise ratio) protocol was proposed in this study for minimum training sample size determination. External validation results based on another 3 cancer datasets confirmed that the SSNR-based approach could not only determine the minimum number of training samples efficiently, but also provide a valuable strategy for estimating the underlying performance of classifiers in advance. Once translated into clinical routine applications, the SSNR-based protocol would provide great convenience in microarray-based cancer outcome prediction in improving classifier reliability. PMID:23861920

  20. A novel modular ANN architecture for efficient monitoring of gases/odours in real-time

    NASA Astrophysics Data System (ADS)

    Mishra, A.; Rajput, N. S.

    2018-04-01

    Data pre-processing is tremendously used for enhanced classification of gases. However, it suppresses the concentration variances of different gas samples. A classical solution of using single artificial neural network (ANN) architecture is also inefficient and renders degraded quantification. In this paper, a novel modular ANN design has been proposed to provide an efficient and scalable solution in real–time. Here, two separate ANN blocks viz. classifier block and quantifier block have been used to provide efficient and scalable gas monitoring in real—time. The classifier ANN consists of two stages. In the first stage, the Net 1-NDSRT has been trained to transform raw sensor responses into corresponding virtual multi-sensor responses using normalized difference sensor response transformation (NDSRT). These responses have been fed to the second stage (i.e., Net 2-classifier ). The Net 2-classifier has been trained to classify various gas samples to their respective class. Further, the quantifier block has parallel ANN modules, multiplexed to quantify each gas. Therefore, the classifier ANN decides class and quantifier ANN decides the exact quantity of the gas/odor present in the respective sample of that class.

  1. Effective Sequential Classifier Training for SVM-Based Multitemporal Remote Sensing Image Classification

    NASA Astrophysics Data System (ADS)

    Guo, Yiqing; Jia, Xiuping; Paull, David

    2018-06-01

    The explosive availability of remote sensing images has challenged supervised classification algorithms such as Support Vector Machines (SVM), as training samples tend to be highly limited due to the expensive and laborious task of ground truthing. The temporal correlation and spectral similarity between multitemporal images have opened up an opportunity to alleviate this problem. In this study, a SVM-based Sequential Classifier Training (SCT-SVM) approach is proposed for multitemporal remote sensing image classification. The approach leverages the classifiers of previous images to reduce the required number of training samples for the classifier training of an incoming image. For each incoming image, a rough classifier is firstly predicted based on the temporal trend of a set of previous classifiers. The predicted classifier is then fine-tuned into a more accurate position with current training samples. This approach can be applied progressively to sequential image data, with only a small number of training samples being required from each image. Experiments were conducted with Sentinel-2A multitemporal data over an agricultural area in Australia. Results showed that the proposed SCT-SVM achieved better classification accuracies compared with two state-of-the-art model transfer algorithms. When training data are insufficient, the overall classification accuracy of the incoming image was improved from 76.18% to 94.02% with the proposed SCT-SVM, compared with those obtained without the assistance from previous images. These results demonstrate that the leverage of a priori information from previous images can provide advantageous assistance for later images in multitemporal image classification.

  2. Molecular differential diagnosis of follicular thyroid carcinoma and adenoma based on gene expression profiling by using formalin-fixed paraffin-embedded tissues

    PubMed Central

    2013-01-01

    Background Differential diagnosis between malignant follicular thyroid cancer (FTC) and benign follicular thyroid adenoma (FTA) is a great challenge for even an experienced pathologist and requires special effort. Molecular markers may potentially support a differential diagnosis between FTC and FTA in postoperative specimens. The purpose of this study was to derive molecular support for differential post-operative diagnosis, in the form of a simple multigene mRNA-based classifier that would differentiate between FTC and FTA tissue samples. Methods A molecular classifier was created based on a combined analysis of two microarray datasets (using 66 thyroid samples). The performance of the classifier was assessed using an independent dataset comprising 71 formalin-fixed paraffin-embedded (FFPE) samples (31 FTC and 40 FTA), which were analysed by quantitative real-time PCR (qPCR). In addition, three other microarray datasets (62 samples) were used to confirm the utility of the classifier. Results Five of 8 genes selected from training datasets (ELMO1, EMCN, ITIH5, KCNAB1, SLCO2A1) were amplified by qPCR in FFPE material from an independent sample set. Three other genes did not amplify in FFPE material, probably due to low abundance. All 5 analysed genes were downregulated in FTC compared to FTA. The sensitivity and specificity of the 5-gene classifier tested on the FFPE dataset were 71% and 72%, respectively. Conclusions The proposed approach could support histopathological examination: 5-gene classifier may aid in molecular discrimination between FTC and FTA in FFPE material. PMID:24099521

  3. Investigating the Geological History of Asteroid 101955 Bennu Through Remote Sensing and Returned Sample Analyses

    NASA Technical Reports Server (NTRS)

    Messenger, S.; Connolly, H. C., Jr.; Lauretta, D. S.; Bottke, W. F.

    2014-01-01

    The NASA New Frontiers Mission OSRIS-REx will return surface regolith samples from near-Earth asteroid 101955 Bennu in September 2023. This target is classified as a B-type asteroid and is spectrally similar to CI and CM chondrite meteorites [1]. The returned samples are thus expected to contain primitive ancient Solar System materials that formed in planetary, nebular, interstellar, and circumstellar environments. Laboratory studies of primitive astromaterials have yielded detailed constraints on the origins, properties, and evolutionary histories of a wide range of Solar System bodies. Yet, the parent bodies of meteorites and cosmic dust are generally unknown, genetic and evolutionary relationships among asteroids and comets are unsettled, and links between laboratory and remote observations remain tenuous. The OSIRIS-REx mission will offer the opportunity to coordinate detailed laboratory analyses of asteroidal materials with known and well characterized geological context from which the samples originated. A primary goal of the OSIRIS-REx mission will be to provide detailed constraints on the origin and geological and dynamical history of Bennu through coordinated analytical studies of the returned samples. These microanalytical studies will be placed in geological context through an extensive orbital remote sensing campaign that will characterize the global geological features and chemical diversity of Bennu. The first views of the asteroid surface and of the returned samples will undoubtedly bring remarkable surprises. However, a wealth of laboratory studies of meteorites and spacecraft encounters with primitive bodies provides a useful framework to formulate priority scientific questions and effective analytical approaches well before the samples are returned. Here we summarize our approach to unraveling the geological history of Bennu through returned sample analyses.

  4. Frequent Detection and Genetic Diversity of Human Bocavirus in Urban Sewage Samples.

    PubMed

    Iaconelli, M; Divizia, M; Della Libera, S; Di Bonito, P; La Rosa, Giuseppina

    2016-12-01

    The prevalence and genetic diversity of human bocaviruses (HBoVs) in sewage water samples are largely unknown. In this study, 134 raw sewage samples from 25 wastewater treatment plants (WTPs) in Italy were analyzed by nested PCR and sequencing using species-specific primer pairs and broad-range primer pairs targeting the capsid proteins VP1/VP2. A large number of samples (106, 79.1 %) were positive for HBoV. Out of these, 49 were classified as HBoV species 2, and 27 as species 3. For the remaining 30 samples, sequencing results showed mixed electropherograms. By cloning PCR amplicons and sequencing, we confirmed the copresence of species 2 and 3 in 29 samples and species 2 and 4 in only one sample. A real-time PCR assay was also performed, using a newly designed TaqMan assay, for quantification of HBoVs in sewage water samples. Viral load quantification ranged from 5.51E+03 to 1.84E+05 GC/L (mean value 4.70E+04 GC/L) for bocavirus 2 and from 1.89E+03 to 1.02E+05 GC/L (mean value 2.27E+04 GC/L) for bocavirus 3. The wide distribution of HBoV in sewages suggests that this virus is common in the population, and the most prevalent are the species 2 and 3. HBoV-4 was also found, representing the first detection of this species in Italy. Although there is no indication of waterborne transmission for HBoV, the significant presence in sewage waters suggests that HBoV may spread to other water environments, and therefore, a potential role of water in the HBoV transmission should not be neglected.

  5. Lightweight Adaptation of Classifiers to Users and Contexts: Trends of the Emerging Domain

    PubMed Central

    Vildjiounaite, Elena; Gimel'farb, Georgy; Kyllönen, Vesa; Peltola, Johannes

    2015-01-01

    Intelligent computer applications need to adapt their behaviour to contexts and users, but conventional classifier adaptation methods require long data collection and/or training times. Therefore classifier adaptation is often performed as follows: at design time application developers define typical usage contexts and provide reasoning models for each of these contexts, and then at runtime an appropriate model is selected from available ones. Typically, definition of usage contexts and reasoning models heavily relies on domain knowledge. However, in practice many applications are used in so diverse situations that no developer can predict them all and collect for each situation adequate training and test databases. Such applications have to adapt to a new user or unknown context at runtime just from interaction with the user, preferably in fairly lightweight ways, that is, requiring limited user effort to collect training data and limited time of performing the adaptation. This paper analyses adaptation trends in several emerging domains and outlines promising ideas, proposed for making multimodal classifiers user-specific and context-specific without significant user efforts, detailed domain knowledge, and/or complete retraining of the classifiers. Based on this analysis, this paper identifies important application characteristics and presents guidelines to consider these characteristics in adaptation design. PMID:26473165

  6. Predicting Malignant and Paramalignant Pleural Effusions by Combining Clinical, Radiological and Pleural Fluid Analytical Parameters.

    PubMed

    Herrera Lara, Susana; Fernández-Fabrellas, Estrella; Juan Samper, Gustavo; Marco Buades, Josefa; Andreu Lapiedra, Rafael; Pinilla Moreno, Amparo; Morales Suárez-Varela, María

    2017-10-01

    The usefulness of clinical, radiological and pleural fluid analytical parameters for diagnosing malignant and paramalignant pleural effusion is not clearly stated. Hence this study aimed to identify possible predictor variables of diagnosing malignancy in pleural effusion of unknown aetiology. Clinical, radiological and pleural fluid analytical parameters were obtained from consecutive patients who had suffered pleural effusion of unknown aetiology. They were classified into three groups according to their final diagnosis: malignant, paramalignant and benign pleural effusion. The CHAID (Chi-square automatic interaction detector) methodology was used to estimate the implication of the clinical, radiological and analytical variables in daily practice through decision trees. Of 71 patients, malignant (n = 31), paramalignant (n = 15) and benign (n = 25), smoking habit, dyspnoea, weight loss, radiological characteristics (mass, node, adenopathies and pleural thickening) and pleural fluid analytical parameters (pH and glucose) distinguished malignant and paramalignant pleural effusions (all with a p < 0.05). Decision tree 1 classified 77.8% of malignant and paramalignant pleural effusions in step 2. Decision tree 2 classified 83.3% of malignant pleural effusions in step 2, 73.3% of paramalignant pleural effusions and 91.7% of benign ones. The data herein suggest that the identified predictor values applied to tree diagrams, which required no extraordinary measures, have a higher rate of correct identification of malignant, paramalignant and benign effusions when compared to techniques available today and proved most useful for usual clinical practice. Future studies are still needed to further improve the classification of patients.

  7. Identification of Unknown Contaminants in Water Samples from ISS Employing Liquid Chromatography/Mass Spectrometry/Mass Spectrometry

    NASA Technical Reports Server (NTRS)

    Rutz, Jeffrey A.; Schultz, John R.

    2008-01-01

    Mass Spectrometry/Mass Spectrometry (MS/MS) is a powerful technique for identifying unknown organic compounds. For non-volatile or thermally unstable unknowns dissolved in liquids, liquid chromatography/mass spectrometry/mass spectrometry (LC/MS/MS) is often the variety of MS/MS used for the identification. One type of LC/MS/MS that is rapidly becoming popular is time-of-flight (TOF) mass spectrometry. This technique is now in use at the Johnson Space Center for identification of unknown nonvolatile organics in water samples from the space program. An example of the successful identification of one unknown is reviewed in detail in this paper. The advantages of time-of-flight instrumentation are demonstrated through this example as well as the strategy employed in using time-of-flight data to identify unknowns.

  8. [Local Regression Algorithm Based on Net Analyte Signal and Its Application in Near Infrared Spectral Analysis].

    PubMed

    Zhang, Hong-guang; Lu, Jian-gang

    2016-02-01

    Abstract To overcome the problems of significant difference among samples and nonlinearity between the property and spectra of samples in spectral quantitative analysis, a local regression algorithm is proposed in this paper. In this algorithm, net signal analysis method(NAS) was firstly used to obtain the net analyte signal of the calibration samples and unknown samples, then the Euclidean distance between net analyte signal of the sample and net analyte signal of calibration samples was calculated and utilized as similarity index. According to the defined similarity index, the local calibration sets were individually selected for each unknown sample. Finally, a local PLS regression model was built on each local calibration sets for each unknown sample. The proposed method was applied to a set of near infrared spectra of meat samples. The results demonstrate that the prediction precision and model complexity of the proposed method are superior to global PLS regression method and conventional local regression algorithm based on spectral Euclidean distance.

  9. Bacterial sexually transmitted infections among HIV-infected patients in the United States: estimates from the Medical Monitoring Project.

    PubMed

    Flagg, Elaine W; Weinstock, Hillard S; Frazier, Emma L; Valverde, Eduardo E; Heffelfinger, James D; Skarbinski, Jacek

    2015-04-01

    Bacterial sexually transmitted infections may facilitate HIV transmission. Bacterial sexually transmitted infection testing is recommended for sexually active HIV-infected patients annually and more frequently for those at elevated sexual risk. We estimated percentages of HIV-infected patients in the United States receiving at least one syphilis, gonorrhea, or chlamydia test, and repeat (≥2 tests, ≥3 months apart) tests for any of these sexually transmitted infections from mid-2008 through mid-2010. The Medical Monitoring Project collects behavioral and clinical characteristics of HIV-infected adults receiving medical care in the United States using nationally representative sampling. Sexual activity included self-reported oral, vaginal, or anal sex in the past 12 months. Participants reporting more than 1 sexual partner or illicit drug use before/during sex in the past year were classified as having elevated sexual risk. Among participants with only 1 sex partner and no drug use before/during sex, those reporting consistent condom use were classified as low risk; those reporting sex without a condom (or for whom this was unknown) were classified as at elevated sexual risk only if they considered their sex partner to be a casual partner, or if their partner was HIV-negative or partner HIV status was unknown. Bacterial sexually transmitted infection testing was ascertained through medical record abstraction. Among sexually active patients, 55% were tested at least once in 12 months for syphilis, whereas 23% and 24% received at least one gonorrhea and chlamydia test, respectively. Syphilis testing did not vary by sex/sexual orientation. Receipt of at least 3 CD4+ T-lymphocyte cell counts and/or HIV viral load tests in 12 months was associated with syphilis testing in men who have sex with men (MSM), men who have sex with women only, and women. Chlamydia testing was significantly higher in sexually active women (30%) compared with men who have sex with women only (19%), but not compared with MSM (22%). Forty-six percent of MSM were at elevated sexual risk; 26% of these MSM received repeat syphilis testing, whereas repeat testing for gonorrhea and chlamydia was only 7% for each infection. Bacterial sexually transmitted infection testing among sexually active HIV-infected patients was low, particularly for those at elevated sexual risk. Patient encounters in which CD4+ T-lymphocyte cell counts and/or HIV viral load testing occurs present opportunities for increased bacterial sexually transmitted infection testing.

  10. A blood-based proteomic classifier for the molecular characterization of pulmonary nodules.

    PubMed

    Li, Xiao-jun; Hayward, Clive; Fong, Pui-Yee; Dominguez, Michel; Hunsucker, Stephen W; Lee, Lik Wee; McLean, Matthew; Law, Scott; Butler, Heather; Schirm, Michael; Gingras, Olivier; Lamontagne, Julie; Allard, Rene; Chelsky, Daniel; Price, Nathan D; Lam, Stephen; Massion, Pierre P; Pass, Harvey; Rom, William N; Vachani, Anil; Fang, Kenneth C; Hood, Leroy; Kearney, Paul

    2013-10-16

    Each year, millions of pulmonary nodules are discovered by computed tomography and subsequently biopsied. Because most of these nodules are benign, many patients undergo unnecessary and costly invasive procedures. We present a 13-protein blood-based classifier that differentiates malignant and benign nodules with high confidence, thereby providing a diagnostic tool to avoid invasive biopsy on benign nodules. Using a systems biology strategy, we identified 371 protein candidates and developed a multiple reaction monitoring (MRM) assay for each. The MRM assays were applied in a three-site discovery study (n = 143) on plasma samples from patients with benign and stage IA lung cancer matched for nodule size, age, gender, and clinical site, producing a 13-protein classifier. The classifier was validated on an independent set of plasma samples (n = 104), exhibiting a negative predictive value (NPV) of 90%. Validation performance on samples from a nondiscovery clinical site showed an NPV of 94%, indicating the general effectiveness of the classifier. A pathway analysis demonstrated that the classifier proteins are likely modulated by a few transcription regulators (NF2L2, AHR, MYC, and FOS) that are associated with lung cancer, lung inflammation, and oxidative stress networks. The classifier score was independent of patient nodule size, smoking history, and age, which are risk factors used for clinical management of pulmonary nodules. Thus, this molecular test provides a potential complementary tool to help physicians in lung cancer diagnosis.

  11. Feature genes in metastatic breast cancer identified by MetaDE and SVM classifier methods.

    PubMed

    Tuo, Youlin; An, Ning; Zhang, Ming

    2018-03-01

    The aim of the present study was to investigate the feature genes in metastatic breast cancer samples. A total of 5 expression profiles of metastatic breast cancer samples were downloaded from the Gene Expression Omnibus database, which were then analyzed using the MetaQC and MetaDE packages in R language. The feature genes between metastasis and non‑metastasis samples were screened under the threshold of P<0.05. Based on the protein‑protein interactions (PPIs) in the Biological General Repository for Interaction Datasets, Human Protein Reference Database and Biomolecular Interaction Network Database, the PPI network of the feature genes was constructed. The feature genes identified by topological characteristics were then used for support vector machine (SVM) classifier training and verification. The accuracy of the SVM classifier was then evaluated using another independent dataset from The Cancer Genome Atlas database. Finally, function and pathway enrichment analyses for genes in the SVM classifier were performed. A total of 541 feature genes were identified between metastatic and non‑metastatic samples. The top 10 genes with the highest betweenness centrality values in the PPI network of feature genes were Nuclear RNA Export Factor 1, cyclin‑dependent kinase 2 (CDK2), myelocytomatosis proto‑oncogene protein (MYC), Cullin 5, SHC Adaptor Protein 1, Clathrin heavy chain, Nucleolin, WD repeat domain 1, proteasome 26S subunit non‑ATPase 2 and telomeric repeat binding factor 2. The cyclin‑dependent kinase inhibitor 1A (CDKN1A), E2F transcription factor 1 (E2F1), and MYC interacted with CDK2. The SVM classifier constructed by the top 30 feature genes was able to distinguish metastatic samples from non‑metastatic samples [correct rate, specificity, positive predictive value and negative predictive value >0.89; sensitivity >0.84; area under the receiver operating characteristic curve (AUROC) >0.96]. The verification of the SVM classifier in an independent dataset (35 metastatic samples and 143 non‑metastatic samples) revealed an accuracy of 94.38% and AUROC of 0.958. Cell cycle associated functions and pathways were the most significant terms of the 30 feature genes. A SVM classifier was constructed to assess the possibility of breast cancer metastasis, which presented high accuracy in several independent datasets. CDK2, CDKN1A, E2F1 and MYC were indicated as the potential feature genes in metastatic breast cancer.

  12. Stackable differential mobility analyzer for aerosol measurement

    DOEpatents

    Cheng, Meng-Dawn [Oak Ridge, TN; Chen, Da-Ren [Creve Coeur, MO

    2007-05-08

    A multi-stage differential mobility analyzer (MDMA) for aerosol measurements includes a first electrode or grid including at least one inlet or injection slit for receiving an aerosol including charged particles for analysis. A second electrode or grid is spaced apart from the first electrode. The second electrode has at least one sampling outlet disposed at a plurality different distances along its length. A volume between the first and the second electrode or grid between the inlet or injection slit and a distal one of the plurality of sampling outlets forms a classifying region, the first and second electrodes for charging to suitable potentials to create an electric field within the classifying region. At least one inlet or injection slit in the second electrode receives a sheath gas flow into an upstream end of the classifying region, wherein each sampling outlet functions as an independent DMA stage and classifies different size ranges of charged particles based on electric mobility simultaneously.

  13. EPA Unmix 6.0 Fundamentals & User Guide

    EPA Pesticide Factsheets

    Unmix seeks to solve the general mixture problem where the data are assumed to be a linear combination of an unknown number of sources of unknown composition, which contribute an unknown amount to each sample.

  14. Ectodermal dysplasias: A clinical classification and a causal review

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pinheiro, M.; Freire-Maia, N.

    1994-11-01

    The authors present a causal review of 154 ectodermal dysplasias (EDs) as classified into 11 clinical subgroups. The number of EDs in each subgroup varies from one to 43. The numbers of conditions due to autosomal dominant, autosomal recessive, and X-linked genes are, respectively, 41, 52, and 8. In 53 conditions cause is unknown; 35 of them present some causal (genetic) suggestion.

  15. A survey of supervised machine learning models for mobile-phone based pathogen identification and classification

    NASA Astrophysics Data System (ADS)

    Ceylan Koydemir, Hatice; Feng, Steve; Liang, Kyle; Nadkarni, Rohan; Tseng, Derek; Benien, Parul; Ozcan, Aydogan

    2017-03-01

    Giardia lamblia causes a disease known as giardiasis, which results in diarrhea, abdominal cramps, and bloating. Although conventional pathogen detection methods used in water analysis laboratories offer high sensitivity and specificity, they are time consuming, and need experts to operate bulky equipment and analyze the samples. Here we present a field-portable and cost-effective smartphone-based waterborne pathogen detection platform that can automatically classify Giardia cysts using machine learning. Our platform enables the detection and quantification of Giardia cysts in one hour, including sample collection, labeling, filtration, and automated counting steps. We evaluated the performance of three prototypes using Giardia-spiked water samples from different sources (e.g., reagent-grade, tap, non-potable, and pond water samples). We populated a training database with >30,000 cysts and estimated our detection sensitivity and specificity using 20 different classifier models, including decision trees, nearest neighbor classifiers, support vector machines (SVMs), and ensemble classifiers, and compared their speed of training and classification, as well as predicted accuracies. Among them, cubic SVM, medium Gaussian SVM, and bagged-trees were the most promising classifier types with accuracies of 94.1%, 94.2%, and 95%, respectively; we selected the latter as our preferred classifier for the detection and enumeration of Giardia cysts that are imaged using our mobile-phone fluorescence microscope. Without the need for any experts or microbiologists, this field-portable pathogen detection platform can present a useful tool for water quality monitoring in resource-limited-settings.

  16. Discrimination of Clover and Citrus Honeys from Egypt According to Floral Type Using Easily Assessable Physicochemical Parameters and Discriminant Analysis: An External Validation of the Chemometric Approach.

    PubMed

    Karabagias, Ioannis K; Karabournioti, Sofia

    2018-05-03

    Twenty-two honey samples, namely clover and citrus honeys, were collected from the greater Cairo area during the harvesting year 2014⁻2015. The main purpose of the present study was to characterize the aforementioned honey types and to investigate whether the use of easily assessable physicochemical parameters, including color attributes in combination with chemometrics, could differentiate honey floral origin. Parameters taken into account were: pH, electrical conductivity, ash, free acidity, lactonic acidity, total acidity, moisture content, total sugars (degrees Brix-°Bx), total dissolved solids and their ratio to total acidity, salinity, CIELAB color parameters, along with browning index values. Results showed that all honey samples analyzed met the European quality standards set for honey and had variations in the aforementioned physicochemical parameters depending on floral origin. Application of linear discriminant analysis showed that eight physicochemical parameters, including color, could classify Egyptian honeys according to floral origin ( p < 0.05). Correct classification rate was 95.5% using the original method and 90.9% using the cross validation method. The discriminatory ability of the developed model was further validated using unknown honey samples. The overall correct classification rate was not affected. Specific physicochemical parameter analysis in combination with chemometrics has the potential to enhance the differences in floral honeys produced in a given geographical zone.

  17. Influence of the Inherited Glucose-6-phosphate Dehydrogenase Deficiency on the Appearance of Neonatal Hyperbilirubinemia in Southern Croatia.

    PubMed

    Cherepnalkovski, Anet Papazovska; Marusic, Eugenija; Piperkova, Katica; Lozic, Bernarda; Skelin, Ana; Gruev, Todor; Krzelj, Vjekoslav

    2015-10-01

    Neonatal hyperbilirubinemia is a common clinical manifestation of the inherited glucose-6-phosphate dehydrogenase (G6PD) deficiency. The aim of this study was to investigate the influence of the inherited G6PD deficiency on the appearance of neonatal hyperbilirubinemia in southern Croatia. The fluorescent spot test (FST) was used in a retrospective study to screen blood samples of 513 male children who had neonatal hyperbilirubinemia, of unknown cause, higher than 240 μmol/L. Fluorescence readings were performed at the beginning and at the fifth and tenth minute of incubation and were classified into three groups bright fluorescence (BF), weak fluorescence (WF) and no fluorescence (NF). Normal samples show bright fluorescence. All NF and WF samples at the fifth minute were quantitatively measured using the spectrophotometric method. Bright fluorescence was present in 461 patients (89.9%) at the fifth minute. The remaining 52 (10.1%) were quantitatively estimated using the spectrophotometric method. G6PD deficiency was observed in 38 patients (7.4%). Prevalence rate of G6PD deficiency among male newborns with hyperbilirubinemia in southern Croatia is significantly higher (p < 0.01) compared with the previously reported prevalence rate among male in general population of southern Croatia (0.75%). We recommend FST to be performed in hyperbilirubinemic newborns in southern Croatia.

  18. Discrimination of Clover and Citrus Honeys from Egypt According to Floral Type Using Easily Assessable Physicochemical Parameters and Discriminant Analysis: An External Validation of the Chemometric Approach

    PubMed Central

    Karabournioti, Sofia

    2018-01-01

    Twenty-two honey samples, namely clover and citrus honeys, were collected from the greater Cairo area during the harvesting year 2014–2015. The main purpose of the present study was to characterize the aforementioned honey types and to investigate whether the use of easily assessable physicochemical parameters, including color attributes in combination with chemometrics, could differentiate honey floral origin. Parameters taken into account were: pH, electrical conductivity, ash, free acidity, lactonic acidity, total acidity, moisture content, total sugars (degrees Brix-°Bx), total dissolved solids and their ratio to total acidity, salinity, CIELAB color parameters, along with browning index values. Results showed that all honey samples analyzed met the European quality standards set for honey and had variations in the aforementioned physicochemical parameters depending on floral origin. Application of linear discriminant analysis showed that eight physicochemical parameters, including color, could classify Egyptian honeys according to floral origin (p < 0.05). Correct classification rate was 95.5% using the original method and 90.9% using the cross validation method. The discriminatory ability of the developed model was further validated using unknown honey samples. The overall correct classification rate was not affected. Specific physicochemical parameter analysis in combination with chemometrics has the potential to enhance the differences in floral honeys produced in a given geographical zone. PMID:29751543

  19. Determination of trace elements in bovine semen samples by inductively coupled plasma mass spectrometry and data mining techniques for identification of bovine class.

    PubMed

    Aguiar, G F M; Batista, B L; Rodrigues, J L; Silva, L R S; Campiglia, A D; Barbosa, R M; Barbosa, F

    2012-12-01

    The reproductive performance of cattle may be influenced by several factors, but mineral imbalances are crucial in terms of direct effects on reproduction. Several studies have shown that elements such as calcium, copper, iron, magnesium, selenium, and zinc are essential for reproduction and can prevent oxidative stress. However, toxic elements such as lead, nickel, and arsenic can have adverse effects on reproduction. In this paper, we applied a simple and fast method of multi-element analysis to bovine semen samples from Zebu and European classes used in reproduction programs and artificial insemination. Samples were analyzed by inductively coupled plasma spectrometry (ICP-MS) using aqueous medium calibration and the samples were diluted in a proportion of 1:50 in a solution containing 0.01% (vol/vol) Triton X-100 and 0.5% (vol/vol) nitric acid. Rhodium, iridium, and yttrium were used as the internal standards for ICP-MS analysis. To develop a reliable method of tracing the class of bovine semen, we used data mining techniques that make it possible to classify unknown samples after checking the differentiation of known-class samples. Based on the determination of 15 elements in 41 samples of bovine semen, 3 machine-learning tools for classification were applied to determine cattle class. Our results demonstrate the potential of support vector machine (SVM), multilayer perceptron (MLP), and random forest (RF) chemometric tools to identify cattle class. Moreover, the selection tools made it possible to reduce the number of chemical elements needed from 15 to just 8. Copyright © 2012 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  20. Classification of THz pulse signals using two-dimensional cross-correlation feature extraction and non-linear classifiers.

    PubMed

    Siuly; Yin, Xiaoxia; Hadjiloucas, Sillas; Zhang, Yanchun

    2016-04-01

    This work provides a performance comparison of four different machine learning classifiers: multinomial logistic regression with ridge estimators (MLR) classifier, k-nearest neighbours (KNN), support vector machine (SVM) and naïve Bayes (NB) as applied to terahertz (THz) transient time domain sequences associated with pixelated images of different powder samples. The six substances considered, although have similar optical properties, their complex insertion loss at the THz part of the spectrum is significantly different because of differences in both their frequency dependent THz extinction coefficient as well as differences in their refractive index and scattering properties. As scattering can be unquantifiable in many spectroscopic experiments, classification solely on differences in complex insertion loss can be inconclusive. The problem is addressed using two-dimensional (2-D) cross-correlations between background and sample interferograms, these ensure good noise suppression of the datasets and provide a range of statistical features that are subsequently used as inputs to the above classifiers. A cross-validation procedure is adopted to assess the performance of the classifiers. Firstly the measurements related to samples that had thicknesses of 2mm were classified, then samples at thicknesses of 4mm, and after that 3mm were classified and the success rate and consistency of each classifier was recorded. In addition, mixtures having thicknesses of 2 and 4mm as well as mixtures of 2, 3 and 4mm were presented simultaneously to all classifiers. This approach provided further cross-validation of the classification consistency of each algorithm. The results confirm the superiority in classification accuracy and robustness of the MLR (least accuracy 88.24%) and KNN (least accuracy 90.19%) algorithms which consistently outperformed the SVM (least accuracy 74.51%) and NB (least accuracy 56.86%) classifiers for the same number of feature vectors across all studies. The work establishes a general methodology for assessing the performance of other hyperspectral dataset classifiers on the basis of 2-D cross-correlations in far-infrared spectroscopy or other parts of the electromagnetic spectrum. It also advances the wider proliferation of automated THz imaging systems across new application areas e.g., biomedical imaging, industrial processing and quality control where interpretation of hyperspectral images is still under development. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  1. Active Self-Paced Learning for Cost-Effective and Progressive Face Identification.

    PubMed

    Lin, Liang; Wang, Keze; Meng, Deyu; Zuo, Wangmeng; Zhang, Lei

    2018-01-01

    This paper aims to develop a novel cost-effective framework for face identification, which progressively maintains a batch of classifiers with the increasing face images of different individuals. By naturally combining two recently rising techniques: active learning (AL) and self-paced learning (SPL), our framework is capable of automatically annotating new instances and incorporating them into training under weak expert recertification. We first initialize the classifier using a few annotated samples for each individual, and extract image features using the convolutional neural nets. Then, a number of candidates are selected from the unannotated samples for classifier updating, in which we apply the current classifiers ranking the samples by the prediction confidence. In particular, our approach utilizes the high-confidence and low-confidence samples in the self-paced and the active user-query way, respectively. The neural nets are later fine-tuned based on the updated classifiers. Such heuristic implementation is formulated as solving a concise active SPL optimization problem, which also advances the SPL development by supplementing a rational dynamic curriculum constraint. The new model finely accords with the "instructor-student-collaborative" learning mode in human education. The advantages of this proposed framework are two-folds: i) The required number of annotated samples is significantly decreased while the comparable performance is guaranteed. A dramatic reduction of user effort is also achieved over other state-of-the-art active learning techniques. ii) The mixture of SPL and AL effectively improves not only the classifier accuracy compared to existing AL/SPL methods but also the robustness against noisy data. We evaluate our framework on two challenging datasets, which include hundreds of persons under diverse conditions, and demonstrate very promising results. Please find the code of this project at: http://hcp.sysu.edu.cn/projects/aspl/.

  2. Classification and identification of molecules through factor analysis method based on terahertz spectroscopy

    NASA Astrophysics Data System (ADS)

    Huang, Jianglou; Liu, Jinsong; Wang, Kejia; Yang, Zhengang; Liu, Xiaming

    2018-06-01

    By means of factor analysis approach, a method of molecule classification is built based on the measured terahertz absorption spectra of the molecules. A data matrix can be obtained by sampling the absorption spectra at different frequency points. The data matrix is then decomposed into the product of two matrices: a weight matrix and a characteristic matrix. By using the K-means clustering to deal with the weight matrix, these molecules can be classified. A group of samples (spirobenzopyran, indole, styrene derivatives and inorganic salts) has been prepared, and measured via a terahertz time-domain spectrometer. These samples are classified with 75% accuracy compared to that directly classified via their molecular formulas.

  3. Classifying Acute Ischemic Stroke Onset Time using Deep Imaging Features

    PubMed Central

    Ho, King Chung; Speier, William; El-Saden, Suzie; Arnold, Corey W.

    2017-01-01

    Models have been developed to predict stroke outcomes (e.g., mortality) in attempt to provide better guidance for stroke treatment. However, there is little work in developing classification models for the problem of unknown time-since-stroke (TSS), which determines a patient’s treatment eligibility based on a clinical defined cutoff time point (i.e., <4.5hrs). In this paper, we construct and compare machine learning methods to classify TSS<4.5hrs using magnetic resonance (MR) imaging features. We also propose a deep learning model to extract hidden representations from the MR perfusion-weighted images and demonstrate classification improvement by incorporating these additional imaging features. Finally, we discuss a strategy to visualize the learned features from the proposed deep learning model. The cross-validation results show that our best classifier achieved an area under the curve of 0.68, which improves significantly over current clinical methods (0.58), demonstrating the potential benefit of using advanced machine learning methods in TSS classification. PMID:29854156

  4. Emotion detection model of Filipino music

    NASA Astrophysics Data System (ADS)

    Noblejas, Kathleen Alexis; Isidro, Daryl Arvin; Samonte, Mary Jane C.

    2017-02-01

    This research explored the creation of a model to detect emotion from Filipino songs. The emotion model used was based from Paul Ekman's six basic emotions. The songs were classified into the following genres: kundiman, novelty, pop, and rock. The songs were annotated by a group of music experts based on the emotion the song induces to the listener. Musical features of the songs were extracted using jAudio while the lyric features were extracted by Bag-of- Words feature representation. The audio and lyric features of the Filipino songs were extracted for classification by the chosen three classifiers, Naïve Bayes, Support Vector Machines, and k-Nearest Neighbors. The goal of the research was to know which classifier would work best for Filipino music. Evaluation was done by 10-fold cross validation and accuracy, precision, recall, and F-measure results were compared. Models were also tested with unknown test data to further determine the models' accuracy through the prediction results.

  5. Ex vivo optical coherence tomography and laser induced fluorescence spectroscopy imaging of murine gastrointestinal tract

    NASA Astrophysics Data System (ADS)

    Hariri, Lida; Tumlinson, Alexandre R.; Wade, Norman; Besselsen, David; Utzinger, Urs; Gerner, Eugene; Barton, Jennifer

    2005-04-01

    Optical Coherence Tomography (OCT) and Laser Induced Fluorescence Spectroscopy (LIF) have separately been found to have clinical potential in identifying human gastrointestinal (GI) pathologies, yet their diagnostic capability in mouse models of human disease is unknown. We combine the two modalities to survey the GI tract of a variety of mouse strains and sample dysplasias and inflammatory bowel disease (IBD) of the small and large intestine. Segments of duodenum and lower colon 2.5 cm in length and the entire esophagus from 10 mice each of two colon cancer models (ApcMin and AOM treated A/J) and two IBD models (Il-2 and Il-10) and 5 mice each of their respective controls were excised. OCT images and LIF spectra were obtained simultaneously from each tissue sample within 1 hour of extraction. Histology was used to classify tissue regions as normal, Peyer"s patch, dysplasia, adenoma, or IBD. Features in corresponding regions of OCT images were analyzed. Spectra from each of these categories were averaged and compared via the student's t-test. Features in OCT images correlated to histology in both normal and diseased tissue samples. In the diseased samples, OCT was able to identify early stages of mild colitis and dysplasia. In the sample of IBD, the LIF spectra displayed unique peaks at 635nm and 670nm, which were attributed to increased porphyrin production in the proliferating bacteria of the disease. These peaks have the potential to act as a diagnostic for IBD. OCT and LIF appear to be useful and complementary modalities for imaging mouse models.

  6. Hazardous-waste-characterization survey of unknown drums at the 21st Tactical Fighter Wing, Elmendorf and Shemya Air Force Bases, and Galena and King Salmon Airports, Alaska. Final report 2-13 Aug 91

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bishop, M.S.

    1991-12-01

    At the request of the USAF Regional Hospital Elmendorf/SGPB (PACAF), the Armstrong Laboratory, Occupational and Environmental Health Directorate, conducted a hazardous waste characterization survey of unknown drums at Elmendorf AFB from 2 Aug - 13 Aug 91. The scope of the survey was to sample and characterize drums of unknown material stored at Elmendorf AFB, Shemya AFB, and Galena and King Salmon Airports. Several waste streams were sampled at Elmendorf AFB to revalidate sample results from a previous survey.

  7. Characterization of Actinomyces Isolates from Infected Root Canals of Teeth: Description of Actinomyces radicidentis sp. nov.

    PubMed Central

    Collins, Matthew D.; Hoyles, Lesley; Kalfas, Sotos; Sundquist, Goran; Monsen, Tor; Nikolaitchouk, Natalia; Falsen, Enevold

    2000-01-01

    Two strains of a previously undescribed Actinomyces-like bacterium were recovered in pure culture from infected root canals of teeth. Analysis by biochemical testing and polyacrylamide gel electrophoresis of whole-cell proteins indicated that the strains closely resembled each other phenotypically but were distinct from previously described Actinomyces and Arcanobacterium species. Comparative 16S rRNA gene-sequencing studies showed the bacterium to be a hitherto unknown subline within a group of Actinomyces species which includes Actinomyces bovis, the type species of the genus. Based on phylogenetic and phenotypic evidence, we propose that the unknown bacterium isolated from human clinical specimens be classified as Actinomyces radicidentis sp. nov. The type strain of Actinomyces radicidentis is CCUG 36733. PMID:10970390

  8. Actinomyces cardiffensis sp. nov. from Human Clinical Sources

    PubMed Central

    Hall, Val; Collins, Mattew D.; Hutson, Roger; Falsen, Enevold; Duerden, Brian I.

    2002-01-01

    Eight strains of a previously undescribed catalase-negative Actinomyces-like bacterium were recovered from human clinical specimens. The morphological and biochemical characteristics of the isolates were consistent with their assignment to the genus Actinomyces, but they did not appear to correspond to any recognized species. 16S rRNA gene sequence analysis showed the organisms represent a hitherto unknown species within the genus Actinomyces related to, albeit distinct from, a group of species which includes Actinomyces turicensis and close relatives. Based on biochemical and molecular genetic evidence, it is proposed that the unknown isolates from human clinical sources be classified as a new species, Actinomyces cardiffensis sp. nov. The type strain of Actinomyces cardiffensis is CCUG 44997T. PMID:12202588

  9. Planning schistosomiasis control: investigation of alternative sampling strategies for Schistosoma mansoni to target mass drug administration of praziquantel in East Africa.

    PubMed

    Sturrock, Hugh J W; Gething, Pete W; Ashton, Ruth A; Kolaczinski, Jan H; Kabatereine, Narcis B; Brooker, Simon

    2011-09-01

    In schistosomiasis control, there is a need to geographically target treatment to populations at high risk of morbidity. This paper evaluates alternative sampling strategies for surveys of Schistosoma mansoni to target mass drug administration in Kenya and Ethiopia. Two main designs are considered: lot quality assurance sampling (LQAS) of children from all schools; and a geostatistical design that samples a subset of schools and uses semi-variogram analysis and spatial interpolation to predict prevalence in the remaining unsurveyed schools. Computerized simulations are used to investigate the performance of sampling strategies in correctly classifying schools according to treatment needs and their cost-effectiveness in identifying high prevalence schools. LQAS performs better than geostatistical sampling in correctly classifying schools, but at a cost with a higher cost per high prevalence school correctly classified. It is suggested that the optimal surveying strategy for S. mansoni needs to take into account the goals of the control programme and the financial and drug resources available.

  10. Random forests for classification in ecology

    USGS Publications Warehouse

    Cutler, D.R.; Edwards, T.C.; Beard, K.H.; Cutler, A.; Hess, K.T.; Gibson, J.; Lawler, J.J.

    2007-01-01

    Classification procedures are some of the most widely used statistical methods in ecology. Random forests (RF) is a new and powerful statistical classifier that is well established in other disciplines but is relatively unknown in ecology. Advantages of RF compared to other statistical classifiers include (1) very high classification accuracy; (2) a novel method of determining variable importance; (3) ability to model complex interactions among predictor variables; (4) flexibility to perform several types of statistical data analysis, including regression, classification, survival analysis, and unsupervised learning; and (5) an algorithm for imputing missing values. We compared the accuracies of RF and four other commonly used statistical classifiers using data on invasive plant species presence in Lava Beds National Monument, California, USA, rare lichen species presence in the Pacific Northwest, USA, and nest sites for cavity nesting birds in the Uinta Mountains, Utah, USA. We observed high classification accuracy in all applications as measured by cross-validation and, in the case of the lichen data, by independent test data, when comparing RF to other common classification methods. We also observed that the variables that RF identified as most important for classifying invasive plant species coincided with expectations based on the literature. ?? 2007 by the Ecological Society of America.

  11. Mining big data sets of plankton images: a zero-shot learning approach to retrieve labels without training data

    NASA Astrophysics Data System (ADS)

    Orenstein, E. C.; Morgado, P. M.; Peacock, E.; Sosik, H. M.; Jaffe, J. S.

    2016-02-01

    Technological advances in instrumentation and computing have allowed oceanographers to develop imaging systems capable of collecting extremely large data sets. With the advent of in situ plankton imaging systems, scientists must now commonly deal with "big data" sets containing tens of millions of samples spanning hundreds of classes, making manual classification untenable. Automated annotation methods are now considered to be the bottleneck between collection and interpretation. Typically, such classifiers learn to approximate a function that predicts a predefined set of classes for which a considerable amount of labeled training data is available. The requirement that the training data span all the classes of concern is problematic for plankton imaging systems since they sample such diverse, rapidly changing populations. These data sets may contain relatively rare, sparsely distributed, taxa that will not have associated training data; a classifier trained on a limited set of classes will miss these samples. The computer vision community, leveraging advances in Convolutional Neural Networks (CNNs), has recently attempted to tackle such problems using "zero-shot" object categorization methods. Under a zero-shot framework, a classifier is trained to map samples onto a set of attributes rather than a class label. These attributes can include visual and non-visual information such as what an organism is made out of, where it is distributed globally, or how it reproduces. A second stage classifier is then used to extrapolate a class. In this work, we demonstrate a zero-shot classifier, implemented with a CNN, to retrieve out-of-training-set labels from images. This method is applied to data from two continuously imaging, moored instruments: the Scripps Plankton Camera System (SPCS) and the Imaging FlowCytobot (IFCB). Results from simulated deployment scenarios indicate zero-shot classifiers could be successful at recovering samples of rare taxa in image sets. This capability will allow ecologists to identify trends in the distribution of difficult to sample organisms in their data.

  12. [MicroRNA Target Prediction Based on Support Vector Machine Ensemble Classification Algorithm of Under-sampling Technique].

    PubMed

    Chen, Zhiru; Hong, Wenxue

    2016-02-01

    Considering the low accuracy of prediction in the positive samples and poor overall classification effects caused by unbalanced sample data of MicroRNA (miRNA) target, we proposes a support vector machine (SVM)-integration of under-sampling and weight (IUSM) algorithm in this paper, an under-sampling based on the ensemble learning algorithm. The algorithm adopts SVM as learning algorithm and AdaBoost as integration framework, and embeds clustering-based under-sampling into the iterative process, aiming at reducing the degree of unbalanced distribution of positive and negative samples. Meanwhile, in the process of adaptive weight adjustment of the samples, the SVM-IUSM algorithm eliminates the abnormal ones in negative samples with robust sample weights smoothing mechanism so as to avoid over-learning. Finally, the prediction of miRNA target integrated classifier is achieved with the combination of multiple weak classifiers through the voting mechanism. The experiment revealed that the SVM-IUSW, compared with other algorithms on unbalanced dataset collection, could not only improve the accuracy of positive targets and the overall effect of classification, but also enhance the generalization ability of miRNA target classifier.

  13. Association between microbiological and serological prevalence of human pathogenic Yersinia spp. in pigs and pig batches.

    PubMed

    Vanantwerpen, Gerty; Berkvens, Dirk; De Zutter, Lieven; Houf, Kurt

    2015-07-09

    Pigs are the main reservoir of human pathogenic Y. enterocolitica, and the microbiological and serological prevalence of this pathogen differs between pig farms. The infection status of pig batches at moment of slaughter is unknown while it is a possibility to classify batches. A relation between the presence of human pathogenic Yersinia spp. and the presence of antibodies could help to predict the infection of the pigs prior to slaughter. Pigs from 100 different batches were sampled. Tonsils and pieces of diaphragm were collected from 7047 pigs (on average 70 pigs per batch). The tonsils were analyzed using a direct plating method and the meat juice collected from the pieces of diaphragm was analyzed by Enzyme Linked ImmunoSorbent Assay. The microbiological and serological results were compared using a mixed-effects logistic regression at pig and batch level. Yersinia spp. were found in 2031 (28.8%) pigs, antibodies were present in 4692 (66.6%) pigs. According to the logistic regression, there was no relation at pig level between the presence of Yersinia spp. in tonsils and the presence of antibodies. Contrarily, at batch level, a mean activity value of 37 Optical Density (OD)% indicated a Yersinia spp. positive farm and the microbiological prevalence in pig batches could be estimated before shipment to the slaughterhouse. This offers the opportunity to classify batches based on their potential risk to contaminate carcasses with human pathogenic Yersinia spp. Copyright © 2015 Elsevier B.V. All rights reserved.

  14. Inquiry-Based Approach to a Carbohydrate Analysis Experiment

    NASA Astrophysics Data System (ADS)

    Senkbeil, Edward G.

    1999-01-01

    The analysis of an unknown carbohydrate in an inquiry-based learning format has proven to be a valuable and interesting undergraduate biochemistry laboratory experiment. Students are given a list of carbohydrates and a list of references for carbohydrate analysis. The references contain a variety of well-characterized wet chemistry and instrumental techniques for carbohydrate identification, but the students must develop an appropriate sequential protocol for unknown identification. The students are required to provide a list of chemicals and procedures and a flow chart for identification before the lab. During the 3-hour laboratory period, they utilize their accumulated information and knowledge to classify and identify their unknown. Advantages of the inquiry-based format are (i) students must be well prepared in advance to be successful in the laboratory, (ii) students feel a sense of accomplishment in both designing and carrying out a successful experiment, and (iii) the carbohydrate background information digested by the students significantly decreases the amount of lecture time required for this topic.

  15. Performance study of LMS based adaptive algorithms for unknown system identification

    NASA Astrophysics Data System (ADS)

    Javed, Shazia; Ahmad, Noor Atinah

    2014-07-01

    Adaptive filtering techniques have gained much popularity in the modeling of unknown system identification problem. These techniques can be classified as either iterative or direct. Iterative techniques include stochastic descent method and its improved versions in affine space. In this paper we present a comparative study of the least mean square (LMS) algorithm and some improved versions of LMS, more precisely the normalized LMS (NLMS), LMS-Newton, transform domain LMS (TDLMS) and affine projection algorithm (APA). The performance evaluation of these algorithms is carried out using adaptive system identification (ASI) model with random input signals, in which the unknown (measured) signal is assumed to be contaminated by output noise. Simulation results are recorded to compare the performance in terms of convergence speed, robustness, misalignment, and their sensitivity to the spectral properties of input signals. Main objective of this comparative study is to observe the effects of fast convergence rate of improved versions of LMS algorithms on their robustness and misalignment.

  16. Performance study of LMS based adaptive algorithms for unknown system identification

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Javed, Shazia; Ahmad, Noor Atinah

    Adaptive filtering techniques have gained much popularity in the modeling of unknown system identification problem. These techniques can be classified as either iterative or direct. Iterative techniques include stochastic descent method and its improved versions in affine space. In this paper we present a comparative study of the least mean square (LMS) algorithm and some improved versions of LMS, more precisely the normalized LMS (NLMS), LMS-Newton, transform domain LMS (TDLMS) and affine projection algorithm (APA). The performance evaluation of these algorithms is carried out using adaptive system identification (ASI) model with random input signals, in which the unknown (measured) signalmore » is assumed to be contaminated by output noise. Simulation results are recorded to compare the performance in terms of convergence speed, robustness, misalignment, and their sensitivity to the spectral properties of input signals. Main objective of this comparative study is to observe the effects of fast convergence rate of improved versions of LMS algorithms on their robustness and misalignment.« less

  17. Optical neural net for classifying imaging spectrometer data

    NASA Technical Reports Server (NTRS)

    Barnard, Etienne; Casasent, David P.

    1989-01-01

    The problem of determining the composition of an unknown input mixture from its measured spectrum, given the spectra of a number of elements, is studied. The Hopfield minimization procedure was used to express the determination of the compositions as a problem suitable for solution by neural nets. A mathematical description of the problem was developed and used as a basis for a neural network solution and an optical implementation.

  18. Using FIA and GIS Data to Estimate Areas and Volumes of Potential Stream Management Zones and Road Beautifying Buffers

    Treesearch

    Michael Zasada; Chris J. Cieszewski; Roger C. Lowe; Jarek Zawadzki; Mike Clutter; Jacek P. Siry

    2005-01-01

    Georgia Stream Management Zones (SMZ) are voluntary and have an unknown extent and impact. We use FIA data, Landsat TM imagery, and GAP and other GIS data to estimate the acreages and volumes of these buffers. We use stream data classified into trout, perennial, and intermittent, combined with DEM files containing elevation values, to assess buffers with widths...

  19. Infrasound Assessment of Infrastructure Report 6: Scour Detection and Riverine Health Assessment Using Infrasound

    DTIC Science & Technology

    2016-05-01

    construed as an official Department of the Army position unless so designated by other authorized documents. DESTROY THIS REPORT WHEN NO LONGER NEEDED...20 Figure 14. Seismic and infrasound detection of a barge strike on the I-20 bridge pier during the...foundations meaning that no plans, either design or as-built, existed for the structure. Initially, bridges classified as having unknown foundations

  20. SVM and SVM Ensembles in Breast Cancer Prediction.

    PubMed

    Huang, Min-Wei; Chen, Chih-Wen; Lin, Wei-Chao; Ke, Shih-Wen; Tsai, Chih-Fong

    2017-01-01

    Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM) have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers.

  1. SVM and SVM Ensembles in Breast Cancer Prediction

    PubMed Central

    Huang, Min-Wei; Chen, Chih-Wen; Lin, Wei-Chao; Ke, Shih-Wen; Tsai, Chih-Fong

    2017-01-01

    Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM) have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers. PMID:28060807

  2. BRCA1/2 missense mutations and the value of in-silico analyses.

    PubMed

    Sadowski, Carolin E; Kohlstedt, Daniela; Meisel, Cornelia; Keller, Katja; Becker, Kerstin; Mackenroth, Luisa; Rump, Andreas; Schröck, Evelin; Wimberger, Pauline; Kast, Karin

    2017-11-01

    The clinical implications of genetic variants in BRCA1/2 in healthy and affected individuals are considerable. Variant interpretation, however, is especially challenging for missense variants. The majority of them are classified as variants of unknown clinical significance (VUS). Computational (in-silico) predictive programs are easy to access, but represent only one tool out of a wide range of complemental approaches to classify VUS. With this single-center study, we aimed to evaluate the impact of in-silico analyses in a spectrum of different BRCA1/2 missense variants. We conducted mutation analysis of BRCA1/2 in 523 index patients with suspected hereditary breast and ovarian cancer (HBOC). Classification of the genetic variants was performed according to the German Consortium (GC)-HBOC database. Additionally, all missense variants were classified by the following three in-silico prediction tools: SIFT, Mutation Taster (MT2) and PolyPhen2 (PPH2). Overall 201 different variants, 68 of which constituted missense variants were ranked as pathogenic, neutral, or unknown. The classification of missense variants by in-silico tools resulted in a higher amount of pathogenic mutations (25% vs. 13.2%) compared to the GC-HBOC-classification. Altogether, more than fifty percent (38/68, 55.9%) of missense variants were ranked differently. Sensitivity of in-silico-tools for mutation prediction was 88.9% (PPH2), 100% (SIFT) and 100% (MT2). We found a relevant discrepancy in variant classification by using in-silico prediction tools, resulting in potential overestimation and/or underestimation of cancer risk. More reliable, notably gene-specific, prediction tools and functional tests are needed to improve clinical counseling. Copyright © 2017 Elsevier Masson SAS. All rights reserved.

  3. Use of Lot Quality Assurance Sampling to Ascertain Levels of Drug Resistant Tuberculosis in Western Kenya

    PubMed Central

    Cohen, Ted; Zignol, Matteo; Nyakan, Edwin; Hedt-Gauthier, Bethany L.; Gardner, Adrian; Kamle, Lydia; Injera, Wilfred; Carter, E. Jane

    2016-01-01

    Objective To classify the prevalence of multi-drug resistant tuberculosis (MDR-TB) in two different geographic settings in western Kenya using the Lot Quality Assurance Sampling (LQAS) methodology. Design The prevalence of drug resistance was classified among treatment-naïve smear positive TB patients in two settings, one rural and one urban. These regions were classified as having high or low prevalence of MDR-TB according to a static, two-way LQAS sampling plan selected to classify high resistance regions at greater than 5% resistance and low resistance regions at less than 1% resistance. Results This study classified both the urban and rural settings as having low levels of TB drug resistance. Out of the 105 patients screened in each setting, two patients were diagnosed with MDR-TB in the urban setting and one patient was diagnosed with MDR-TB in the rural setting. An additional 27 patients were diagnosed with a variety of mono- and poly- resistant strains. Conclusion Further drug resistance surveillance using LQAS may help identify the levels and geographical distribution of drug resistance in Kenya and may have applications in other countries in the African Region facing similar resource constraints. PMID:27167381

  4. Use of Lot Quality Assurance Sampling to Ascertain Levels of Drug Resistant Tuberculosis in Western Kenya.

    PubMed

    Jezmir, Julia; Cohen, Ted; Zignol, Matteo; Nyakan, Edwin; Hedt-Gauthier, Bethany L; Gardner, Adrian; Kamle, Lydia; Injera, Wilfred; Carter, E Jane

    2016-01-01

    To classify the prevalence of multi-drug resistant tuberculosis (MDR-TB) in two different geographic settings in western Kenya using the Lot Quality Assurance Sampling (LQAS) methodology. The prevalence of drug resistance was classified among treatment-naïve smear positive TB patients in two settings, one rural and one urban. These regions were classified as having high or low prevalence of MDR-TB according to a static, two-way LQAS sampling plan selected to classify high resistance regions at greater than 5% resistance and low resistance regions at less than 1% resistance. This study classified both the urban and rural settings as having low levels of TB drug resistance. Out of the 105 patients screened in each setting, two patients were diagnosed with MDR-TB in the urban setting and one patient was diagnosed with MDR-TB in the rural setting. An additional 27 patients were diagnosed with a variety of mono- and poly- resistant strains. Further drug resistance surveillance using LQAS may help identify the levels and geographical distribution of drug resistance in Kenya and may have applications in other countries in the African Region facing similar resource constraints.

  5. AUTOCLASSIFICATION OF THE VARIABLE 3XMM SOURCES USING THE RANDOM FOREST MACHINE LEARNING ALGORITHM

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Farrell, Sean A.; Murphy, Tara; Lo, Kitty K., E-mail: s.farrell@physics.usyd.edu.au

    In the current era of large surveys and massive data sets, autoclassification of astrophysical sources using intelligent algorithms is becoming increasingly important. In this paper we present the catalog of variable sources in the Third XMM-Newton Serendipitous Source catalog (3XMM) autoclassified using the Random Forest machine learning algorithm. We used a sample of manually classified variable sources from the second data release of the XMM-Newton catalogs (2XMMi-DR2) to train the classifier, obtaining an accuracy of ∼92%. We also evaluated the effectiveness of identifying spurious detections using a sample of spurious sources, achieving an accuracy of ∼95%. Manual investigation of amore » random sample of classified sources confirmed these accuracy levels and showed that the Random Forest machine learning algorithm is highly effective at automatically classifying 3XMM sources. Here we present the catalog of classified 3XMM variable sources. We also present three previously unidentified unusual sources that were flagged as outlier sources by the algorithm: a new candidate supergiant fast X-ray transient, a 400 s X-ray pulsar, and an eclipsing 5 hr binary system coincident with a known Cepheid.« less

  6. Stackable differential mobility analyzer for aerosol measurement

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cheng, Meng-Dawn; Chen, Da-Ren

    2007-05-08

    A multi-stage differential mobility analyzer (MDMA) for aerosol measurements includes a first electrode or grid including at least one inlet or injection slit for receiving an aerosol including charged particles for analysis. A second electrode or grid is spaced apart from the first electrode. The second electrode has at least one sampling outlet disposed at a plurality different distances along its length. A volume between the first and the second electrode or grid between the inlet or injection slit and a distal one of the plurality of sampling outlets forms a classifying region, the first and second electrodes for chargingmore » to suitable potentials to create an electric field within the classifying region. At least one inlet or injection slit in the second electrode receives a sheath gas flow into an upstream end of the classifying region, wherein each sampling outlet functions as an independent DMA stage and classifies different size ranges of charged particles based on electric mobility simultaneously.« less

  7. Predicting Classifier Performance with Limited Training Data: Applications to Computer-Aided Diagnosis in Breast and Prostate Cancer

    PubMed Central

    Basavanhally, Ajay; Viswanath, Satish; Madabhushi, Anant

    2015-01-01

    Clinical trials increasingly employ medical imaging data in conjunction with supervised classifiers, where the latter require large amounts of training data to accurately model the system. Yet, a classifier selected at the start of the trial based on smaller and more accessible datasets may yield inaccurate and unstable classification performance. In this paper, we aim to address two common concerns in classifier selection for clinical trials: (1) predicting expected classifier performance for large datasets based on error rates calculated from smaller datasets and (2) the selection of appropriate classifiers based on expected performance for larger datasets. We present a framework for comparative evaluation of classifiers using only limited amounts of training data by using random repeated sampling (RRS) in conjunction with a cross-validation sampling strategy. Extrapolated error rates are subsequently validated via comparison with leave-one-out cross-validation performed on a larger dataset. The ability to predict error rates as dataset size increases is demonstrated on both synthetic data as well as three different computational imaging tasks: detecting cancerous image regions in prostate histopathology, differentiating high and low grade cancer in breast histopathology, and detecting cancerous metavoxels in prostate magnetic resonance spectroscopy. For each task, the relationships between 3 distinct classifiers (k-nearest neighbor, naive Bayes, Support Vector Machine) are explored. Further quantitative evaluation in terms of interquartile range (IQR) suggests that our approach consistently yields error rates with lower variability (mean IQRs of 0.0070, 0.0127, and 0.0140) than a traditional RRS approach (mean IQRs of 0.0297, 0.0779, and 0.305) that does not employ cross-validation sampling for all three datasets. PMID:25993029

  8. Random forests ensemble classifier trained with data resampling strategy to improve cardiac arrhythmia diagnosis.

    PubMed

    Ozçift, Akin

    2011-05-01

    Supervised classification algorithms are commonly used in the designing of computer-aided diagnosis systems. In this study, we present a resampling strategy based Random Forests (RF) ensemble classifier to improve diagnosis of cardiac arrhythmia. Random forests is an ensemble classifier that consists of many decision trees and outputs the class that is the mode of the class's output by individual trees. In this way, an RF ensemble classifier performs better than a single tree from classification performance point of view. In general, multiclass datasets having unbalanced distribution of sample sizes are difficult to analyze in terms of class discrimination. Cardiac arrhythmia is such a dataset that has multiple classes with small sample sizes and it is therefore adequate to test our resampling based training strategy. The dataset contains 452 samples in fourteen types of arrhythmias and eleven of these classes have sample sizes less than 15. Our diagnosis strategy consists of two parts: (i) a correlation based feature selection algorithm is used to select relevant features from cardiac arrhythmia dataset. (ii) RF machine learning algorithm is used to evaluate the performance of selected features with and without simple random sampling to evaluate the efficiency of proposed training strategy. The resultant accuracy of the classifier is found to be 90.0% and this is a quite high diagnosis performance for cardiac arrhythmia. Furthermore, three case studies, i.e., thyroid, cardiotocography and audiology, are used to benchmark the effectiveness of the proposed method. The results of experiments demonstrated the efficiency of random sampling strategy in training RF ensemble classification algorithm. Copyright © 2011 Elsevier Ltd. All rights reserved.

  9. A qualitative signature for early diagnosis of hepatocellular carcinoma based on relative expression orderings.

    PubMed

    Ao, Lu; Zhang, Zimei; Guan, Qingzhou; Guo, Yating; Guo, You; Zhang, Jiahui; Lv, Xingwei; Huang, Haiyan; Zhang, Huarong; Wang, Xianlong; Guo, Zheng

    2018-04-23

    Currently, using biopsy specimens to confirm suspicious liver lesions of early hepatocellular carcinoma are not entirely reliable because of insufficient sampling amount and inaccurate sampling location. It is necessary to develop a signature to aid early hepatocellular carcinoma diagnosis using biopsy specimens even when the sampling location is inaccurate. Based on the within-sample relative expression orderings of gene pairs, we identified a simple qualitative signature to distinguish both hepatocellular carcinoma and adjacent non-tumour tissues from cirrhosis tissues of non-hepatocellular carcinoma patients. A signature consisting of 19 gene pairs was identified in the training data sets and validated in 2 large collections of samples from biopsy and surgical resection specimens. For biopsy specimens, 95.7% of 141 hepatocellular carcinoma tissues and all (100%) of 108 cirrhosis tissues of non-hepatocellular carcinoma patients were correctly classified. Especially, all (100%) of 60 hepatocellular carcinoma adjacent normal tissues and 77.5% of 80 hepatocellular carcinoma adjacent cirrhosis tissues were classified to hepatocellular carcinoma. For surgical resection specimens, 99.7% of 733 hepatocellular carcinoma specimens were correctly classified to hepatocellular carcinoma, while 96.1% of 254 hepatocellular carcinoma adjacent cirrhosis tissues and 95.9% of 538 hepatocellular carcinoma adjacent normal tissues were classified to hepatocellular carcinoma. In contrast, 17.0% of 47 cirrhosis from non-hepatocellular carcinoma patients waiting for liver transplantation were classified to hepatocellular carcinoma, indicating that some patients with long-lasting cirrhosis could have already gained hepatocellular carcinoma characteristics. The signature can distinguish both hepatocellular carcinoma tissues and tumour-adjacent tissues from cirrhosis tissues of non-hepatocellular carcinoma patients even using inaccurately sampled biopsy specimens, which can aid early diagnosis of hepatocellular carcinoma. © 2018 The Authors. Liver International Published by John Wiley & Sons Ltd.

  10. Multiclass classification of microarray data samples with a reduced number of genes

    PubMed Central

    2011-01-01

    Background Multiclass classification of microarray data samples with a reduced number of genes is a rich and challenging problem in Bioinformatics research. The problem gets harder as the number of classes is increased. In addition, the performance of most classifiers is tightly linked to the effectiveness of mandatory gene selection methods. Critical to gene selection is the availability of estimates about the maximum number of genes that can be handled by any classification algorithm. Lack of such estimates may lead to either computationally demanding explorations of a search space with thousands of dimensions or classification models based on gene sets of unrestricted size. In the former case, unbiased but possibly overfitted classification models may arise. In the latter case, biased classification models unable to support statistically significant findings may be obtained. Results A novel bound on the maximum number of genes that can be handled by binary classifiers in binary mediated multiclass classification algorithms of microarray data samples is presented. The bound suggests that high-dimensional binary output domains might favor the existence of accurate and sparse binary mediated multiclass classifiers for microarray data samples. Conclusions A comprehensive experimental work shows that the bound is indeed useful to induce accurate and sparse multiclass classifiers for microarray data samples. PMID:21342522

  11. Heterocyclic Aromatics in Petroleum Coke, Snow, Lake Sediments, and Air Samples from the Athabasca Oil Sands Region.

    PubMed

    Manzano, Carlos A; Marvin, Chris; Muir, Derek; Harner, Tom; Martin, Jonathan; Zhang, Yifeng

    2017-05-16

    The aromatic fractions of snow, lake sediment, and air samples collected during 2011-2014 in the Athabasca oil sands region were analyzed using two-dimensional gas chromatography following a nontargeted approach. Commonly monitored aromatics (parent and alkylated-polycyclic aromatic hydrocarbons and dibenzothiophenes) were excluded from the analysis, focusing mainly on other heterocyclic aromatics. The unknowns detected were classified into isomeric groups and tentatively identified using mass spectral libraries. Relative concentrations of heterocyclic aromatics were estimated and were found to decrease with distance from a reference site near the center of the developments and with increasing depth of sediments. The same heterocyclic aromatics identified in snow, lake sediments, and air were observed in extracts of delayed petroleum coke, with similar distributions. This suggests that petroleum coke particles are a potential source of heterocyclic aromatics to the local environment, but other oil sands sources must also be considered. Although the signals of these heterocyclic aromatics diminished with distance, some were detected at large distances (>100 km) in snow and surface lake sediments, suggesting that the impact of industry can extend >50 km. The list of heterocyclic aromatics and the mass spectral library generated in this study can be used for future source apportionment studies.

  12. Retronasal olfaction in vegetable liking and disliking.

    PubMed

    Lim, Juyun; Padmanabhan, Arthi

    2013-01-01

    While previous research has suggested that bitterness is a key determinant of vegetable rejection, it is unknown what role odor may play. We therefore investigated the impact of retronasal odors on hedonic responses to 4 vegetables. Subjects (N = 132) tasted small samples with the nose open and closed and rated the degree of liking/disliking, as well as the perceived intensity of sweetness, bitterness, saltiness, and vegetable flavor. The subjects were classified as "likers" or "dislikers" of each vegetable. The degree to which "likers" liked and "dislikers" disliked the vegetables was significantly less in the nose-closed condition, indicating that retronasal odor was a significant driver of vegetable hedonics. In contrast, bitterness ratings for all 4 vegetables did not differ significantly between the groups. The perceived intensity of vegetable flavor also did not differ significantly between groups, implying that the quality of vegetable odors rather than their perceived intensity drove the hedonic ratings. In a follow-up experiment, returning subjects (N = 89) rated the degree of liking/disliking of the vegetable odors alone, which were presented retronasally. Liking/disliking of specific odors was positively correlated with that for the sampled vegetables across all stimuli (r = 0.32~0.57). Overall, these results suggest that retronasal odor plays an important role in vegetable liking/disliking.

  13. Hybrid Radar Emitter Recognition Based on Rough k-Means Classifier and Relevance Vector Machine

    PubMed Central

    Yang, Zhutian; Wu, Zhilu; Yin, Zhendong; Quan, Taifan; Sun, Hongjian

    2013-01-01

    Due to the increasing complexity of electromagnetic signals, there exists a significant challenge for recognizing radar emitter signals. In this paper, a hybrid recognition approach is presented that classifies radar emitter signals by exploiting the different separability of samples. The proposed approach comprises two steps, namely the primary signal recognition and the advanced signal recognition. In the former step, a novel rough k-means classifier, which comprises three regions, i.e., certain area, rough area and uncertain area, is proposed to cluster the samples of radar emitter signals. In the latter step, the samples within the rough boundary are used to train the relevance vector machine (RVM). Then RVM is used to recognize the samples in the uncertain area; therefore, the classification accuracy is improved. Simulation results show that, for recognizing radar emitter signals, the proposed hybrid recognition approach is more accurate, and presents lower computational complexity than traditional approaches. PMID:23344380

  14. Method for genetic identification of unknown organisms

    DOEpatents

    Colston, Jr., Billy W.; Fitch, Joseph P.; Hindson, Benjamin J.; Carter, Chance J.; Beer, Neil Reginald

    2016-08-23

    A method of rapid, genome and proteome based identification of unknown pathogenic or non-pathogenic organisms in a complex sample. The entire sample is analyzed by creating millions of emulsion encapsulated microdroplets, each containing a single pathogenic or non-pathogenic organism sized particle and appropriate reagents for amplification. Following amplification, the amplified product is analyzed.

  15. Identification of usual interstitial pneumonia pattern using RNA-Seq and machine learning: challenges and solutions.

    PubMed

    Choi, Yoonha; Liu, Tiffany Ting; Pankratz, Daniel G; Colby, Thomas V; Barth, Neil M; Lynch, David A; Walsh, P Sean; Raghu, Ganesh; Kennedy, Giulia C; Huang, Jing

    2018-05-09

    We developed a classifier using RNA sequencing data that identifies the usual interstitial pneumonia (UIP) pattern for the diagnosis of idiopathic pulmonary fibrosis. We addressed significant challenges, including limited sample size, biological and technical sample heterogeneity, and reagent and assay batch effects. We identified inter- and intra-patient heterogeneity, particularly within the non-UIP group. The models classified UIP on transbronchial biopsy samples with a receiver-operating characteristic area under the curve of ~ 0.9 in cross-validation. Using in silico mixed samples in training, we prospectively defined a decision boundary to optimize specificity at ≥85%. The penalized logistic regression model showed greater reproducibility across technical replicates and was chosen as the final model. The final model showed sensitivity of 70% and specificity of 88% in the test set. We demonstrated that the suggested methodologies appropriately addressed challenges of the sample size, disease heterogeneity and technical batch effects and developed a highly accurate and robust classifier leveraging RNA sequencing for the classification of UIP.

  16. Molecular Characterization of Hypoderma SPP. in Domestic Ruminants from Turkey and Pakistan.

    PubMed

    Ahmed, Haroon; Simsek, Sami; Saki, Cem Ecmel; Kesik, Harun Kaya; Kilinc, Seyma Gunyakti

    2017-08-01

    The aim of this study was to determine the morphological and molecular characterization of Hypoderma spp. in cattle and yak from provinces in Turkey and Pakistan. In total, 78 Hypoderma larvae were collected from slaughtered animals in Turkey and Pakistan from October 2015 to January 2016. Thirty-eight of these 78 Hypoderma larvae were morphologically classified as third instar larvae (L3s) of Hypoderma bovis, 37 were classified as Hypoderma lineatum, and 3 were classified as suspected or unidentified. The restriction enzyme TaqI was used to differentiate the Hypoderma spp. by polymerase chain reaction (PCR)-restriction fragment length polymorphism (RFLP). According to the sequences and the PCR-RFLP results, all larval samples from cattle from Turkey were classified as H. bovis, except for 1 sample classified as H. lineatum. All Hypoderma larvae from Pakistan were classified as H. lineatum from cattle and as Hypoderma sinense from yak. This study provides the first molecular characterization of H. lineatum (cattle) and H. sinense (yak) in Pakistan based on PCR-RFLP and sequencing results.

  17. An Exemplar-Based Multi-View Domain Generalization Framework for Visual Recognition.

    PubMed

    Niu, Li; Li, Wen; Xu, Dong; Cai, Jianfei

    2018-02-01

    In this paper, we propose a new exemplar-based multi-view domain generalization (EMVDG) framework for visual recognition by learning robust classifier that are able to generalize well to arbitrary target domain based on the training samples with multiple types of features (i.e., multi-view features). In this framework, we aim to address two issues simultaneously. First, the distribution of training samples (i.e., the source domain) is often considerably different from that of testing samples (i.e., the target domain), so the performance of the classifiers learnt on the source domain may drop significantly on the target domain. Moreover, the testing data are often unseen during the training procedure. Second, when the training data are associated with multi-view features, the recognition performance can be further improved by exploiting the relation among multiple types of features. To address the first issue, considering that it has been shown that fusing multiple SVM classifiers can enhance the domain generalization ability, we build our EMVDG framework upon exemplar SVMs (ESVMs), in which a set of ESVM classifiers are learnt with each one trained based on one positive training sample and all the negative training samples. When the source domain contains multiple latent domains, the learnt ESVM classifiers are expected to be grouped into multiple clusters. To address the second issue, we propose two approaches under the EMVDG framework based on the consensus principle and the complementary principle, respectively. Specifically, we propose an EMVDG_CO method by adding a co-regularizer to enforce the cluster structures of ESVM classifiers on different views to be consistent based on the consensus principle. Inspired by multiple kernel learning, we also propose another EMVDG_MK method by fusing the ESVM classifiers from different views based on the complementary principle. In addition, we further extend our EMVDG framework to exemplar-based multi-view domain adaptation (EMVDA) framework when the unlabeled target domain data are available during the training procedure. The effectiveness of our EMVDG and EMVDA frameworks for visual recognition is clearly demonstrated by comprehensive experiments on three benchmark data sets.

  18. Effect of finite sample size on feature selection and classification: a simulation study.

    PubMed

    Way, Ted W; Sahiner, Berkman; Hadjiiski, Lubomir M; Chan, Heang-Ping

    2010-02-01

    The small number of samples available for training and testing is often the limiting factor in finding the most effective features and designing an optimal computer-aided diagnosis (CAD) system. Training on a limited set of samples introduces bias and variance in the performance of a CAD system relative to that trained with an infinite sample size. In this work, the authors conducted a simulation study to evaluate the performances of various combinations of classifiers and feature selection techniques and their dependence on the class distribution, dimensionality, and the training sample size. The understanding of these relationships will facilitate development of effective CAD systems under the constraint of limited available samples. Three feature selection techniques, the stepwise feature selection (SFS), sequential floating forward search (SFFS), and principal component analysis (PCA), and two commonly used classifiers, Fisher's linear discriminant analysis (LDA) and support vector machine (SVM), were investigated. Samples were drawn from multidimensional feature spaces of multivariate Gaussian distributions with equal or unequal covariance matrices and unequal means, and with equal covariance matrices and unequal means estimated from a clinical data set. Classifier performance was quantified by the area under the receiver operating characteristic curve Az. The mean Az values obtained by resubstitution and hold-out methods were evaluated for training sample sizes ranging from 15 to 100 per class. The number of simulated features available for selection was chosen to be 50, 100, and 200. It was found that the relative performance of the different combinations of classifier and feature selection method depends on the feature space distributions, the dimensionality, and the available training sample sizes. The LDA and SVM with radial kernel performed similarly for most of the conditions evaluated in this study, although the SVM classifier showed a slightly higher hold-out performance than LDA for some conditions and vice versa for other conditions. PCA was comparable to or better than SFS and SFFS for LDA at small samples sizes, but inferior for SVM with polynomial kernel. For the class distributions simulated from clinical data, PCA did not show advantages over the other two feature selection methods. Under this condition, the SVM with radial kernel performed better than the LDA when few training samples were available, while LDA performed better when a large number of training samples were available. None of the investigated feature selection-classifier combinations provided consistently superior performance under the studied conditions for different sample sizes and feature space distributions. In general, the SFFS method was comparable to the SFS method while PCA may have an advantage for Gaussian feature spaces with unequal covariance matrices. The performance of the SVM with radial kernel was better than, or comparable to, that of the SVM with polynomial kernel under most conditions studied.

  19. Analyzing Large Gene Expression and Methylation Data Profiles Using StatBicRM: Statistical Biclustering-Based Rule Mining

    PubMed Central

    Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra

    2015-01-01

    Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data-matrix. Finally, we have also included the integrated analysis of gene expression and methylation for determining epigenetic effect (viz., effect of methylation) on gene expression level. PMID:25830807

  20. Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.

    PubMed

    Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra

    2015-01-01

    Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data-matrix. Finally, we have also included the integrated analysis of gene expression and methylation for determining epigenetic effect (viz., effect of methylation) on gene expression level.

  1. The Double Burden of Obesity and Malnutrition in a Protracted Emergency Setting: A Cross-Sectional Study of Western Sahara Refugees

    PubMed Central

    Grijalva-Eternod, Carlos S.; Wells, Jonathan C. K.; Cortina-Borja, Mario; Salse-Ubach, Nuria; Tondeur, Mélody C.; Dolan, Carmen; Meziani, Chafik; Wilkinson, Caroline; Spiegel, Paul; Seal, Andrew J.

    2012-01-01

    Background Households from vulnerable groups experiencing epidemiological transitions are known to be affected concomitantly by under-nutrition and obesity. Yet, it is unknown to what extent this double burden affects refugee populations dependent on food assistance. We assessed the double burden of malnutrition among Western Sahara refugees living in a protracted emergency. Methods and Findings We implemented a stratified nutrition survey in October–November 2010 in the four Western Sahara refugee camps in Algeria. We sampled 2,005 households, collecting anthropometric measurements (weight, height, and waist circumference) in 1,608 children (6–59 mo) and 1,781 women (15–49 y). We estimated the prevalence of global acute malnutrition (GAM), stunting, underweight, and overweight in children; and stunting, underweight, overweight, and central obesity in women. To assess the burden of malnutrition within households, households were first classified according to the presence of each type of malnutrition. Households were then classified as undernourished, overweight, or affected by the double burden if they presented members with under-nutrition, overweight, or both, respectively. The prevalence of GAM in children was 9.1%, 29.1% were stunted, 18.6% were underweight, and 2.4% were overweight; among the women, 14.8% were stunted, 53.7% were overweight or obese, and 71.4% had central obesity. Central obesity (47.2%) and overweight (38.8%) in women affected a higher proportion of households than did GAM (7.0%), stunting (19.5%), or underweight (13.3%) in children. Overall, households classified as overweight (31.5%) were most common, followed by undernourished (25.8%), and then double burden–affected (24.7%). Conclusions The double burden of obesity and under-nutrition is highly prevalent in households among Western Sahara refugees. The results highlight the need to focus more attention on non-communicable diseases in this population and balance obesity prevention and management with interventions to tackle under-nutrition. Please see later in the article for the Editors' Summary PMID:23055833

  2. Urinary proteomics for prediction of mortality in patients with type 2 diabetes and microalbuminuria.

    PubMed

    Currie, Gemma E; von Scholten, Bernt Johan; Mary, Sheon; Flores Guerrero, Jose-Luis; Lindhardt, Morten; Reinhard, Henrik; Jacobsen, Peter K; Mullen, William; Parving, Hans-Henrik; Mischak, Harald; Rossing, Peter; Delles, Christian

    2018-04-06

    The urinary proteomic classifier CKD273 has shown promise for prediction of progressive diabetic nephropathy (DN). Whether it is also a determinant of mortality and cardiovascular disease in patients with microalbuminuria (MA) is unknown. Urine samples were obtained from 155 patients with type 2 diabetes and confirmed microalbuminuria. Proteomic analysis was undertaken using capillary electrophoresis coupled to mass spectrometry to determine the CKD273 classifier score. A previously defined CKD273 threshold of 0.343 for identification of DN was used to categorise the cohort in Kaplan-Meier and Cox regression models with all-cause mortality as the primary endpoint. Outcomes were traced through national health registers after 6 years. CKD273 correlated with urine albumin excretion rate (UAER) (r = 0.481, p = <0.001), age (r = 0.238, p = 0.003), coronary artery calcium (CAC) score (r = 0.236, p = 0.003), N-terminal pro-brain natriuretic peptide (NT-proBNP) (r = 0.190, p = 0.018) and estimated glomerular filtration rate (eGFR) (r = 0.265, p = 0.001). On multivariate analysis only UAER (β = 0.402, p < 0.001) and eGFR (β = - 0.184, p = 0.039) were statistically significant determinants of CKD273. Twenty participants died during follow-up. CKD273 was a determinant of mortality (log rank [Mantel-Cox] p = 0.004), and retained significance (p = 0.048) after adjustment for age, sex, blood pressure, NT-proBNP and CAC score in a Cox regression model. A multidimensional biomarker can provide information on outcomes associated with its primary diagnostic purpose. Here we demonstrate that the urinary proteomic classifier CKD273 is associated with mortality in individuals with type 2 diabetes and MA even when adjusted for other established cardiovascular and renal biomarkers.

  3. EPA UNMIX 6.0 USER GUIDE

    EPA Science Inventory

    The underlying philosophy of Unmix is to let the data speak for itself. Unmix seeks to solve the general mixture problem where the data are assumed to be a linear combination of an unknown number of sources of unknown composition, which contribute an unknown amount to each sample...

  4. An Intelligent System for Monitoring the Microgravity Environment Quality On-Board the International Space Station

    NASA Technical Reports Server (NTRS)

    Lin, Paul P.; Jules, Kenol

    2002-01-01

    An intelligent system for monitoring the microgravity environment quality on-board the International Space Station is presented. The monitoring system uses a new approach combining Kohonen's self-organizing feature map, learning vector quantization, and back propagation neural network to recognize and classify the known and unknown patterns. Finally, fuzzy logic is used to assess the level of confidence associated with each vibrating source activation detected by the system.

  5. [A novel treatment of cholera by a Mexican physician in the 19th century].

    PubMed

    Rodríguez-de-Romo, A C

    1995-01-01

    Doctor Felipe Castillo, head of the Hospital de San Pablo during the cholera epidemic of 1850, used "Salty water" as treatment for the patients who attended the hospital. The etiology and pathogenesis of this sickness were unknown in those days, so Castillo's conduct was surprising. This study is based on an unpublished report, classified as anonymous, that Castillo gave to the Governor of Mexico City during the cholera epidemic.

  6. An Enhanced Collaborative-Software Environment for Information Fusion at the Unit of Action

    DTIC Science & Technology

    2007-12-07

    GRAY CONVOY RED CONVOY DISMOUNT SA-18 D O DISMOUNT W/ SURVEILANCE EQUIPDISMOUNT UNKNOWN zC-LFFRLIFEFORM < SA-18 FIXED WING CLASSIFIED INFORMATION...Ground Truth Semantic-aggregation hierarchy (evaluation-use only) BSGs GTGs BSOS GTOS Reports Figure 4: Semantic-Aggregation Hierarchy PIR/SIR CIFAR...Finally, GTOs can be aggregated into GTGs (ground-truth groups) using the provided ground-truth force structure hierarchy for GTOs. GTGs can only be

  7. Minimum distance classification in remote sensing

    NASA Technical Reports Server (NTRS)

    Wacker, A. G.; Landgrebe, D. A.

    1972-01-01

    The utilization of minimum distance classification methods in remote sensing problems, such as crop species identification, is considered. Literature concerning both minimum distance classification problems and distance measures is reviewed. Experimental results are presented for several examples. The objective of these examples is to: (a) compare the sample classification accuracy of a minimum distance classifier, with the vector classification accuracy of a maximum likelihood classifier, and (b) compare the accuracy of a parametric minimum distance classifier with that of a nonparametric one. Results show the minimum distance classifier performance is 5% to 10% better than that of the maximum likelihood classifier. The nonparametric classifier is only slightly better than the parametric version.

  8. Identifying determinants of care for tailoring implementation in chronic diseases: an evaluation of different methods.

    PubMed

    Krause, Jane; Van Lieshout, Jan; Klomp, Rien; Huntink, Elke; Aakhus, Eivind; Flottorp, Signe; Jaeger, Cornelia; Steinhaeuser, Jost; Godycki-Cwirko, Maciek; Kowalczyk, Anna; Agarwal, Shona; Wensing, Michel; Baker, Richard

    2014-08-12

    The tailoring of implementation interventions includes the identification of the determinants of, or barriers to, healthcare practice. Different methods for identifying determinants have been used in implementation projects, but which methods are most appropriate to use is unknown. The study was undertaken in five European countries, recommendations for a different chronic condition being addressed in each country: Germany (polypharmacy in multimorbid patients); the Netherlands (cardiovascular risk management); Norway (depression in the elderly); Poland (chronic obstructive pulmonary disease--COPD); and the United Kingdom (UK) (obesity). Using samples of professionals and patients in each country, three methods were compared directly: brainstorming amongst health professionals, interviews of health professionals, and interviews of patients. The additional value of discussion structured through reference to a checklist of determinants in addition to brainstorming, and determinants identified by open questions in a questionnaire survey, were investigated separately. The questionnaire, which included closed questions derived from a checklist of determinants, was administered to samples of health professionals in each country. Determinants were classified according to whether it was likely that they would inform the design of an implementation intervention (defined as plausibly important determinants). A total of 601 determinants judged to be plausibly important were identified. An additional 609 determinants were judged to be unlikely to inform an implementation intervention, and were classified as not plausibly important. Brainstorming identified 194 of the plausibly important determinants, health professional interviews 152, patient interviews 63, and open questions 48. Structured group discussion identified 144 plausibly important determinants in addition to those already identified by brainstorming. Systematic methods can lead to the identification of large numbers of determinants. Tailoring will usually include a process to decide, from all the determinants that are identified, those to be addressed by implementation interventions. There is no best buy of methods to identify determinants, and a combination should be used, depending on the topic and setting. Brainstorming is a simple, low cost method that could be relevant to many tailored implementation projects.

  9. Discordance in the diagnosis of diabetes: Comparison between HbA1c and fasting plasma glucose.

    PubMed

    Ho-Pham, Lan T; Nguyen, Uyen D T; Tran, Truong X; Nguyen, Tuan V

    2017-01-01

    HbA1c has been introduced as a complementary diagnostic test for diabetes, but its impact on disease prevalence is unknown. This study evaluated the concordance between HbA1c and fasting plasma glucose (FPG) in the diagnosis of diabetes in the general population. The study was designed as a population based investigation, with participants being sampled from the Ho Chi Minh City, Vietnam. Blood samples were collected after overnight fasting and analyzed within 4 hours after collection. HbA1c was measured with high pressure liquid chromatography (Arkray Adams, Japan). FPG was measured by the hexokinase method (Advia Autoanalyzer; Bayer Diagnostics, Germany). Diabetes was defined as HbA1c ≥ 6.5% or FPG ≥ 7.0 mmol/L. Prediabetes was classified as HbA1c between 5.7% and 6.4%. The study included 3523 individuals (2356 women) aged 30 years and above. Based on the HbA1c test, the prevalence of diabetes and prediabetes was 9.7% (95%CI, 8.7-10.7%; n = 342) and 34.6% (33.0-36.2; n = 1219), respectively. Based on the FPG test, the prevalence of diabetes and prediabetes was 6.3% (95%CI, 5.5-7.2%; n = 223) and 12.1% (11.1-13.2; n = 427). Among the 427 individuals identified by FPG as "pre-diabetes", 28.6% were classified as diabetes by HbA1c test. The weighted kappa statistic of concordance between HbA1c and FPG was 0.55, with most of the discordance being in the prediabetes group. These data indicate that there is a significant discordance in the diagnosis of diabetes between FPG and HbA1c measurements, and the discordance could have significant impact on clinical practice. FPG appears to underestimate the burden of undiagnosed diabetes.

  10. Emotional Relationships between Mothers* and Infants: Knowns, Unknowns, and Unknown Unknowns

    PubMed Central

    Bornstein, Marc H.; Suwalsky, Joan T. D.; Breakstone, Dana A.

    2012-01-01

    An overview of the literature pertaining to the construct of emotional availability is presented, illustrated by a sampling of relevant studies. Methodological, statistical, and conceptual problems in the existing corpus of research are discussed, and suggestions for improving future investigations of this important construct are offered. PMID:22292998

  11. Study design in high-dimensional classification analysis.

    PubMed

    Sánchez, Brisa N; Wu, Meihua; Song, Peter X K; Wang, Wen

    2016-10-01

    Advances in high throughput technology have accelerated the use of hundreds to millions of biomarkers to construct classifiers that partition patients into different clinical conditions. Prior to classifier development in actual studies, a critical need is to determine the sample size required to reach a specified classification precision. We develop a systematic approach for sample size determination in high-dimensional (large [Formula: see text] small [Formula: see text]) classification analysis. Our method utilizes the probability of correct classification (PCC) as the optimization objective function and incorporates the higher criticism thresholding procedure for classifier development. Further, we derive the theoretical bound of maximal PCC gain from feature augmentation (e.g. when molecular and clinical predictors are combined in classifier development). Our methods are motivated and illustrated by a study using proteomics markers to classify post-kidney transplantation patients into stable and rejecting classes. © The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  12. Using Sieving and Unknown Sand Samples for a Sedimentation-Stratigraphy Class Project with Linkage to Introductory Courses

    ERIC Educational Resources Information Center

    Videtich, Patricia E.; Neal, William J.

    2012-01-01

    Using sieving and sample "unknowns" for instructional grain-size analysis and interpretation of sands in undergraduate sedimentology courses has advantages over other techniques. Students (1) learn to calculate and use statistics; (2) visually observe differences in the grain-size fractions, thereby developing a sense of specific size…

  13. Toward diagnostic and phenotype markers for genetically transmitted speech delay.

    PubMed

    Shriberg, Lawrence D; Lewis, Barbara A; Tomblin, J Bruce; McSweeny, Jane L; Karlsson, Heather B; Scheer, Alison R

    2005-08-01

    Converging evidence supports the hypothesis that the most common subtype of childhood speech sound disorder (SSD) of currently unknown origin is genetically transmitted. We report the first findings toward a set of diagnostic markers to differentiate this proposed etiological subtype (provisionally termed speech delay-genetic) from other proposed subtypes of SSD of unknown origin. Conversational speech samples from 72 preschool children with speech delay of unknown origin from 3 research centers were selected from an audio archive. Participants differed on the number of biological, nuclear family members (0 or 2+) classified as positive for current and/or prior speech-language disorder. Although participants in the 2 groups were found to have similar speech competence, as indexed by their Percentage of Consonants Correct scores, their speech error patterns differed significantly in 3 ways. Compared with children who may have reduced genetic load for speech delay (no affected nuclear family members), children with possibly higher genetic load (2+ affected members) had (a) a significantly higher proportion of relative omission errors on the Late-8 consonants; (b) a significantly lower proportion of relative distortion errors on these consonants, particularly on the sibilant fricatives /s/, /z/, and //; and (c) a significantly lower proportion of backed /s/ distortions, as assessed by both perceptual and acoustic methods. Machine learning routines identified a 3-part classification rule that included differential weightings of these variables. The classification rule had diagnostic accuracy value of 0.83 (95% confidence limits = 0.74-0.92), with positive and negative likelihood ratios of 9.6 (95% confidence limits = 3.1-29.9) and 0.40 (95% confidence limits = 0.24-0.68), respectively. The diagnostic accuracy findings are viewed as promising. The error pattern for this proposed subtype of SSD is viewed as consistent with the cognitive-linguistic processing deficits that have been reported for genetically transmitted verbal disorders.

  14. National quality assessment evaluating spironolactone use during hospitalization for acute myocardial infarction (AMI) in China: China Patient-centered Evaluation Assessment of Cardiac Events (PEACE)-Retrospective AMI Study, 2001, 2006, and 2011.

    PubMed

    Guan, Wenchi; Murugiah, Karthik; Downing, Nicholas; Li, Jing; Wang, Qing; Ross, Joseph S; Desai, Nihar R; Masoudi, Frederick A; Spertus, John A; Li, Xi; Krumholz, Harlan M; Jiang, Lixin

    2015-06-12

    Spironolactone, the only aldosterone antagonist available in China, improves outcomes in acute myocardial infarction (AMI) among patients with systolic dysfunction and either diabetes or heart failure (HF). However, national practice patterns in the use of spironolactone in China are unknown. From a nationally representative sample of AMI patients from in 2001, 2006, and 2011, we identified 6906 patients with either diabetes or HF and classified them into 1 of 4 groups according to their eligibility for spironolactone-"ideal"(left ventricular ejection fraction [LVEF] ≤40% and without contraindications), "contraindicated," "not indicated" (neither ideal nor contraindicated), and "unknown indications" (LVEF unmeasured)-to determine how frequently patient eligibility for this drug is assessed in the hospital, how it is used in several groups, and to identify factors associated with the use in these groups. From 2001 to 2011, the proportion of patients whose eligibility for spironolactone was not assessed decreased (66.9% in 2001 to 32.8% in 2011). Spironolactone use significantly increased among ideal patients over this period (28.6% to 72.4%; P<0.001 for trend), but also in contraindicated patients (11.4% to 27.5%; P=0.002 for trend) and in other patients groups (not indicated: 27.5% to 38.3%; unknown indications: 21.3% to 35.1%; both P<0.01 for trend). In all 4 groups, patients presenting with HF on admission were more likely to receive spironolactone. Although the appropriate use of spironolactone and assessment of eligibility increased in China over the past decade, there remains marked opportunities for improvement. URL: http://www.clinicaltrials.gov Unique identifier: NCT01624883. © 2015 The Authors. Published on behalf of the American Heart Association, Inc, by Wiley Blackwell.

  15. Frog sound identification using extended k-nearest neighbor classifier

    NASA Astrophysics Data System (ADS)

    Mukahar, Nordiana; Affendi Rosdi, Bakhtiar; Athiar Ramli, Dzati; Jaafar, Haryati

    2017-09-01

    Frog sound identification based on the vocalization becomes important for biological research and environmental monitoring. As a result, different types of feature extractions and classifiers have been employed to evaluate the accuracy of frog sound identification. This paper presents a frog sound identification with Extended k-Nearest Neighbor (EKNN) classifier. The EKNN classifier integrates the nearest neighbors and mutual sharing of neighborhood concepts, with the aims of improving the classification performance. It makes a prediction based on who are the nearest neighbors of the testing sample and who consider the testing sample as their nearest neighbors. In order to evaluate the classification performance in frog sound identification, the EKNN classifier is compared with competing classifier, k -Nearest Neighbor (KNN), Fuzzy k -Nearest Neighbor (FKNN) k - General Nearest Neighbor (KGNN)and Mutual k -Nearest Neighbor (MKNN) on the recorded sounds of 15 frog species obtained in Malaysia forest. The recorded sounds have been segmented using Short Time Energy and Short Time Average Zero Crossing Rate (STE+STAZCR), sinusoidal modeling (SM), manual and the combination of Energy (E) and Zero Crossing Rate (ZCR) (E+ZCR) while the features are extracted by Mel Frequency Cepstrum Coefficient (MFCC). The experimental results have shown that the EKNCN classifier exhibits the best performance in terms of accuracy compared to the competing classifiers, KNN, FKNN, GKNN and MKNN for all cases.

  16. An Improvement To The k-Nearest Neighbor Classifier For ECG Database

    NASA Astrophysics Data System (ADS)

    Jaafar, Haryati; Hidayah Ramli, Nur; Nasir, Aimi Salihah Abdul

    2018-03-01

    The k nearest neighbor (kNN) is a non-parametric classifier and has been widely used for pattern classification. However, in practice, the performance of kNN often tends to fail due to the lack of information on how the samples are distributed among them. Moreover, kNN is no longer optimal when the training samples are limited. Another problem observed in kNN is regarding the weighting issues in assigning the class label before classification. Thus, to solve these limitations, a new classifier called Mahalanobis fuzzy k-nearest centroid neighbor (MFkNCN) is proposed in this study. Here, a Mahalanobis distance is applied to avoid the imbalance of samples distribition. Then, a surrounding rule is employed to obtain the nearest centroid neighbor based on the distributions of training samples and its distance to the query point. Consequently, the fuzzy membership function is employed to assign the query point to the class label which is frequently represented by the nearest centroid neighbor Experimental studies from electrocardiogram (ECG) signal is applied in this study. The classification performances are evaluated in two experimental steps i.e. different values of k and different sizes of feature dimensions. Subsequently, a comparative study of kNN, kNCN, FkNN and MFkCNN classifier is conducted to evaluate the performances of the proposed classifier. The results show that the performance of MFkNCN consistently exceeds the kNN, kNCN and FkNN with the best classification rates of 96.5%.

  17. Combining classifiers to predict gene function in Arabidopsis thaliana using large-scale gene expression measurements.

    PubMed

    Lan, Hui; Carson, Rachel; Provart, Nicholas J; Bonner, Anthony J

    2007-09-21

    Arabidopsis thaliana is the model species of current plant genomic research with a genome size of 125 Mb and approximately 28,000 genes. The function of half of these genes is currently unknown. The purpose of this study is to infer gene function in Arabidopsis using machine-learning algorithms applied to large-scale gene expression data sets, with the goal of identifying genes that are potentially involved in plant response to abiotic stress. Using in house and publicly available data, we assembled a large set of gene expression measurements for A. thaliana. Using those genes of known function, we first evaluated and compared the ability of basic machine-learning algorithms to predict which genes respond to stress. Predictive accuracy was measured using ROC50 and precision curves derived through cross validation. To improve accuracy, we developed a method for combining these classifiers using a weighted-voting scheme. The combined classifier was then trained on genes of known function and applied to genes of unknown function, identifying genes that potentially respond to stress. Visual evidence corroborating the predictions was obtained using electronic Northern analysis. Three of the predicted genes were chosen for biological validation. Gene knockout experiments confirmed that all three are involved in a variety of stress responses. The biological analysis of one of these genes (At1g16850) is presented here, where it is shown to be necessary for the normal response to temperature and NaCl. Supervised learning methods applied to large-scale gene expression measurements can be used to predict gene function. However, the ability of basic learning methods to predict stress response varies widely and depends heavily on how much dimensionality reduction is used. Our method of combining classifiers can improve the accuracy of such predictions - in this case, predictions of genes involved in stress response in plants - and it effectively chooses the appropriate amount of dimensionality reduction automatically. The method provides a useful means of identifying genes in A. thaliana that potentially respond to stress, and we expect it would be useful in other organisms and for other gene functions.

  18. A Classification Table for Achondrites

    NASA Technical Reports Server (NTRS)

    Chennaoui-Aoudjehane, H.; Larouci, N.; Jambon, A.; Mittlefehldt, D. W.

    2014-01-01

    Classifying chondrites is relatively easy and the criteria are well documented. It is based on mineral compositions, textural characteristics and more recently, magnetic susceptibility. It can be more difficult to classify achondrites, especially those that are very similar to terrestrial igneous rocks, because mineralogical, textural and compositional properties can be quite variable. Achondrites contain essentially olivine, pyroxenes, plagioclases, oxides, sulphides and accessory minerals. Their origin is attributed to differentiated parents bodies: large asteroids (Vesta); planets (Mars); a satellite (the Moon); and numerous asteroids of unknown size. In most cases, achondrites are not eye witnessed falls and some do not have fusion crust. Because of the mineralogical and magnetic susceptibility similarity with terrestrial igneous rocks for some achondrites, it can be difficult for classifiers to confirm their extra-terrestrial origin. We -as classifiers of meteorites- are confronted with this problem with every suspected achondrite we receive for identification. We are developing a "grid" of classification to provide an easier approach for initial classification. We use simple but reproducible criteria based on mineralogical, petrological and geochemical studies. We presented the classes: acapulcoites, lodranites, winonaites and Martian meteorites (shergottite, chassignites, nakhlites). In this work we are completing the classification table by including the groups: angrites, aubrites, brachinites, ureilites, HED (howardites, eucrites, and diogenites), lunar meteorites, pallasites and mesosiderites. Iron meteorites are not presented in this abstract.

  19. [Fast discrimination of edible vegetable oil based on Raman spectroscopy].

    PubMed

    Zhou, Xiu-Jun; Dai, Lian-Kui; Li, Sheng

    2012-07-01

    A novel method to fast discriminate edible vegetable oils by Raman spectroscopy is presented. The training set is composed of different edible vegetable oils with known classes. Based on their original Raman spectra, baseline correction and normalization were applied to obtain standard spectra. Two characteristic peaks describing the unsaturated degree of vegetable oil were selected as feature vectors; then the centers of all classes were calculated. For an edible vegetable oil with unknown class, the same pretreatment and feature extraction methods were used. The Euclidian distances between the feature vector of the unknown sample and the center of each class were calculated, and the class of the unknown sample was finally determined by the minimum distance. For 43 edible vegetable oil samples from seven different classes, experimental results show that the clustering effect of each class was more obvious and the class distance was much larger with the new feature extraction method compared with PCA. The above classification model can be applied to discriminate unknown edible vegetable oils rapidly and accurately.

  20. Comparison of disease prevalence in two populations in the presence of misclassification.

    PubMed

    Tang, Man-Lai; Qiu, Shi-Fang; Poon, Wai-Yin

    2012-11-01

    Comparing disease prevalence in two groups is an important topic in medical research, and prevalence rates are obtained by classifying subjects according to whether they have the disease. Both high-cost infallible gold-standard classifiers or low-cost fallible classifiers can be used to classify subjects. However, statistical analysis that is based on data sets with misclassifications leads to biased results. As a compromise between the two classification approaches, partially validated sets are often used in which all individuals are classified by fallible classifiers, and some of the individuals are validated by the accurate gold-standard classifiers. In this article, we develop several reliable test procedures and approximate sample size formulas for disease prevalence studies based on the difference between two disease prevalence rates with two independent partially validated series. Empirical studies show that (i) the Score test produces close-to-nominal level and is preferred in practice; and (ii) the sample size formula based on the Score test is also fairly accurate in terms of the empirical power and type I error rate, and is hence recommended. A real example from an aplastic anemia study is used to illustrate the proposed methodologies. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  1. Quality Evaluation of Juniperus rigida Sieb. et Zucc. Based on Phenolic Profiles, Bioactivity, and HPLC Fingerprint Combined with Chemometrics

    PubMed Central

    Liu, Zehua; Wang, Dongmei; Li, Dengwu; Zhang, Shuai

    2017-01-01

    Juniperus rigida (J. rigida) which is endemic to East Asia, has traditionally been used as an ethnomedicinal plant in China. This study was undertaken to evaluate the quality of J. rigida samples derived from 11 primary regions in China. Ten phenolic compounds were simultaneously quantified using reversed-phase high-performance liquid chromatography (RP-HPLC), and chlorogenic acid, catechin, podophyllotoxin, and amentoflavone were found to be the main compounds in J. rigida needles, with the highest contents detected for catechin and podophyllotoxin. J. rigida from Jilin (S9, S10) and Liaoning (S11) exhibited the highest contents of phenolic profiles (total phenolics, total flavonoids and 10 phenolic compounds) and the strongest antioxidant and antibacterial activities, followed by Shaanxi (S2, S3). A similarity analysis (SA) demonstrated substantial similarities in fingerprint chromatograms, from which 14 common peaks were selected. The similarity values varied from 0.85 to 0.98. Chemometrics techniques, including hierarchical cluster analysis (HCA), principal component analysis (PCA), and discriminant analysis (DA), were further applied to facilitate accurate classification and quantification of the J. rigida samples derived from the 11 regions. The results supported HPLC data showing that all J. rigida samples exhibit considerable variations in phenolic profiles, and the samples were further clustered into three major groups coincident with their geographical regions of origin. In addition, two discriminant functions with a 100% discrimination ratio were constructed to further distinguish and classify samples with unknown membership on the basis of eigenvalues to allow optimal discrimination among the groups. Our comprehensive findings on matching phenolic profiles and bioactivities along with data from fingerprint chromatograms with chemometrics provide an effective tool for screening and quality evaluation of J. rigida and related medicinal preparations. PMID:28469573

  2. Multi-layer cube sampling for liver boundary detection in PET-CT images.

    PubMed

    Liu, Xinxin; Yang, Jian; Song, Shuang; Song, Hong; Ai, Danni; Zhu, Jianjun; Jiang, Yurong; Wang, Yongtian

    2018-06-01

    Liver metabolic information is considered as a crucial diagnostic marker for the diagnosis of fever of unknown origin, and liver recognition is the basis of automatic diagnosis of metabolic information extraction. However, the poor quality of PET and CT images is a challenge for information extraction and target recognition in PET-CT images. The existing detection method cannot meet the requirement of liver recognition in PET-CT images, which is the key problem in the big data analysis of PET-CT images. A novel texture feature descriptor called multi-layer cube sampling (MLCS) is developed for liver boundary detection in low-dose CT and PET images. The cube sampling feature is proposed for extracting more texture information, which uses a bi-centric voxel strategy. Neighbour voxels are divided into three regions by the centre voxel and the reference voxel in the histogram, and the voxel distribution information is statistically classified as texture feature. Multi-layer texture features are also used to improve the ability and adaptability of target recognition in volume data. The proposed feature is tested on the PET and CT images for liver boundary detection. For the liver in the volume data, mean detection rate (DR) and mean error rate (ER) reached 95.15 and 7.81% in low-quality PET images, and 83.10 and 21.08% in low-contrast CT images. The experimental results demonstrated that the proposed method is effective and robust for liver boundary detection.

  3. Standoff detection of chemical and biological threats using laser-induced breakdown spectroscopy.

    PubMed

    Gottfried, Jennifer L; De Lucia, Frank C; Munson, Chase A; Miziolek, Andrzej W

    2008-04-01

    Laser-induced breakdown spectroscopy (LIBS) is a promising technique for real-time chemical and biological warfare agent detection in the field. We have demonstrated the detection and discrimination of the biological warfare agent surrogates Bacillus subtilis (BG) (2% false negatives, 0% false positives) and ovalbumin (0% false negatives, 1% false positives) at 20 meters using standoff laser-induced breakdown spectroscopy (ST-LIBS) and linear correlation. Unknown interferent samples (not included in the model), samples on different substrates, and mixtures of BG and Arizona road dust have been classified with reasonable success using partial least squares discriminant analysis (PLS-DA). A few of the samples tested such as the soot (not included in the model) and the 25% BG:75% dust mixture resulted in a significant number of false positives or false negatives, respectively. Our preliminary results indicate that while LIBS is able to discriminate biomaterials with similar elemental compositions at standoff distances based on differences in key intensity ratios, further work is needed to reduce the number of false positives/negatives by refining the PLS-DA model to include a sufficient range of material classes and carefully selecting a detection threshold. In addition, we have demonstrated that LIBS can distinguish five different organophosphate nerve agent simulants at 20 meters, despite their similar stoichiometric formulas. Finally, a combined PLS-DA model for chemical, biological, and explosives detection using a single ST-LIBS sensor has been developed in order to demonstrate the potential of standoff LIBS for universal hazardous materials detection.

  4. Bayesian methods for the design and interpretation of clinical trials in very rare diseases

    PubMed Central

    Hampson, Lisa V; Whitehead, John; Eleftheriou, Despina; Brogan, Paul

    2014-01-01

    This paper considers the design and interpretation of clinical trials comparing treatments for conditions so rare that worldwide recruitment efforts are likely to yield total sample sizes of 50 or fewer, even when patients are recruited over several years. For such studies, the sample size needed to meet a conventional frequentist power requirement is clearly infeasible. Rather, the expectation of any such trial has to be limited to the generation of an improved understanding of treatment options. We propose a Bayesian approach for the conduct of rare-disease trials comparing an experimental treatment with a control where patient responses are classified as a success or failure. A systematic elicitation from clinicians of their beliefs concerning treatment efficacy is used to establish Bayesian priors for unknown model parameters. The process of determining the prior is described, including the possibility of formally considering results from related trials. As sample sizes are small, it is possible to compute all possible posterior distributions of the two success rates. A number of allocation ratios between the two treatment groups can be considered with a view to maximising the prior probability that the trial concludes recommending the new treatment when in fact it is non-inferior to control. Consideration of the extent to which opinion can be changed, even by data from the best feasible design, can help to determine whether such a trial is worthwhile. © 2014 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd. PMID:24957522

  5. Prediction of Tubal Ectopic Pregnancy Using Offline Analysis of 3-Dimensional Transvaginal Ultrasonographic Data Sets: An Interobserver and Diagnostic Accuracy Study.

    PubMed

    Infante, Fernando; Espada Vaquero, Mercedes; Bignardi, Tommaso; Lu, Chuan; Testa, Antonia C; Fauchon, David; Epstein, Elisabeth; Leone, Francesco P G; Van den Bosch, Thierry; Martins, Wellington P; Condous, George

    2018-06-01

    To assess interobserver reproducibility in detecting tubal ectopic pregnancies by reading data sets from 3-dimensional (3D) transvaginal ultrasonography (TVUS) and comparing it with real-time 2-dimensional (2D) TVUS. Images were initially classified as showing pregnancies of unknown location or tubal ectopic pregnancies on real time 2D TVUS by an experienced sonologist, who acquired 5 3D volumes. Data sets were analyzed offline by 5 observers who had to classify each case as ectopic pregnancy or pregnancy of unknown location. The interobserver reproducibility was evaluated by the Fleiss κ statistic. The performance of each observer in predicting ectopic pregnancies was compared to that of the experienced sonologist. Women were followed until they were reclassified as follows: (1) failed pregnancy of unknown location; (2) intrauterine pregnancy; (3) ectopic pregnancy; or (4) persistent pregnancy of unknown location. Sixty-one women were included. The agreement between reading offline 3D data sets and the first real-time 2D TVUS was very good (80%-82%; κ = 0.89). The overall interobserver agreement among observers reading offline 3D data sets was moderate (κ = 0.52). The diagnostic performance of experienced observers reading offline 3D data sets had accuracy of 78.3% to 85.0%, sensitivity of 66.7% to 81.3%, specificity of 79.5% to 88.4%, positive predictive value of 57.1% to 72.2%, and negative predictive value of 87.5% to 91.3%, compared to the experienced sonologist's real-time 2D TVUS: accuracy of 94.5%, sensitivity of 94.4%, specificity of 94.5%, positive predictive value of 85.0%, and negative predictive value of 98.1%. The diagnostic accuracy of 3D TVUS by reading offline data sets for predicting ectopic pregnancies is dependent on experience. Reading only static 3D data sets without clinical information does not match the diagnostic performance of real time 2D TVUS combined with clinical information obtained during the scan. © 2017 by the American Institute of Ultrasound in Medicine.

  6. Etiology and clinical presentation of birth defects: population based study

    PubMed Central

    Carey, John C; Byrne, Janice L B; Krikov, Sergey; Botto, Lorenzo D

    2017-01-01

    Objective To assess causation and clinical presentation of major birth defects. Design Population based case cohort. Setting Cases of birth defects in children born 2005-09 to resident women, ascertained through Utah’s population based surveillance system. All records underwent clinical re-review. Participants 5504 cases among 270 878 births (prevalence 2.03%), excluding mild isolated conditions (such as muscular ventricular septal defects, distal hypospadias). Main outcome measures The primary outcomes were the proportion of birth defects with a known etiology (chromosomal, genetic, human teratogen, twinning) or unknown etiology, by morphology (isolated, multiple, minors only), and by pathogenesis (sequence, developmental field defect, or known pattern of birth defects). Results Definite cause was assigned in 20.2% (n=1114) of cases: chromosomal or genetic conditions accounted for 94.4% (n=1052), teratogens for 4.1% (n=46, mostly poorly controlled pregestational diabetes), and twinning for 1.4% (n=16, conjoined or acardiac). The 79.8% (n=4390) remaining were classified as unknown etiology; of these 88.2% (n=3874) were isolated birth defects. Family history (similarly affected first degree relative) was documented in 4.8% (n=266). In this cohort, 92.1% (5067/5504) were live born infants (isolated and non-isolated birth defects): 75.3% (4147/5504) were classified as having an isolated birth defect (unknown or known etiology). Conclusions These findings underscore the gaps in our knowledge regarding the causes of birth defects. For the causes that are known, such as smoking or diabetes, assigning causation in individual cases remains challenging. Nevertheless, the ongoing impact of these exposures on fetal development highlights the urgency and benefits of population based preventive interventions. For the causes that are still unknown, better strategies are needed. These can include greater integration of the key elements of etiology, morphology, and pathogenesis into epidemiologic studies; greater collaboration between researchers (such as developmental biologists), clinicians (such as medical geneticists), and epidemiologists; and better ways to objectively measure fetal exposures (beyond maternal self reports) and closer (prenatally) to the critical period of organogenesis. PMID:28559234

  7. Novel Approach to Classify Plants Based on Metabolite-Content Similarity.

    PubMed

    Liu, Kang; Abdullah, Azian Azamimi; Huang, Ming; Nishioka, Takaaki; Altaf-Ul-Amin, Md; Kanaya, Shigehiko

    2017-01-01

    Secondary metabolites are bioactive substances with diverse chemical structures. Depending on the ecological environment within which they are living, higher plants use different combinations of secondary metabolites for adaptation (e.g., defense against attacks by herbivores or pathogenic microbes). This suggests that the similarity in metabolite content is applicable to assess phylogenic similarity of higher plants. However, such a chemical taxonomic approach has limitations of incomplete metabolomics data. We propose an approach for successfully classifying 216 plants based on their known incomplete metabolite content. Structurally similar metabolites have been clustered using the network clustering algorithm DPClus. Plants have been represented as binary vectors, implying relations with structurally similar metabolite groups, and classified using Ward's method of hierarchical clustering. Despite incomplete data, the resulting plant clusters are consistent with the known evolutional relations of plants. This finding reveals the significance of metabolite content as a taxonomic marker. We also discuss the predictive power of metabolite content in exploring nutritional and medicinal properties in plants. As a byproduct of our analysis, we could predict some currently unknown species-metabolite relations.

  8. Novel Approach to Classify Plants Based on Metabolite-Content Similarity

    PubMed Central

    Abdullah, Azian Azamimi; Huang, Ming; Nishioka, Takaaki

    2017-01-01

    Secondary metabolites are bioactive substances with diverse chemical structures. Depending on the ecological environment within which they are living, higher plants use different combinations of secondary metabolites for adaptation (e.g., defense against attacks by herbivores or pathogenic microbes). This suggests that the similarity in metabolite content is applicable to assess phylogenic similarity of higher plants. However, such a chemical taxonomic approach has limitations of incomplete metabolomics data. We propose an approach for successfully classifying 216 plants based on their known incomplete metabolite content. Structurally similar metabolites have been clustered using the network clustering algorithm DPClus. Plants have been represented as binary vectors, implying relations with structurally similar metabolite groups, and classified using Ward's method of hierarchical clustering. Despite incomplete data, the resulting plant clusters are consistent with the known evolutional relations of plants. This finding reveals the significance of metabolite content as a taxonomic marker. We also discuss the predictive power of metabolite content in exploring nutritional and medicinal properties in plants. As a byproduct of our analysis, we could predict some currently unknown species-metabolite relations. PMID:28164123

  9. Sloan Digital Sky Survey III photometric quasar clustering: probing the initial conditions of the Universe

    NASA Astrophysics Data System (ADS)

    Ho, Shirley; Agarwal, Nishant; Myers, Adam D.; Lyons, Richard; Disbrow, Ashley; Seo, Hee-Jong; Ross, Ashley; Hirata, Christopher; Padmanabhan, Nikhil; O'Connell, Ross; Huff, Eric; Schlegel, David; Slosar, Anže; Weinberg, David; Strauss, Michael; Ross, Nicholas P.; Schneider, Donald P.; Bahcall, Neta; Brinkmann, J.; Palanque-Delabrouille, Nathalie; Yèche, Christophe

    2015-05-01

    The Sloan Digital Sky Survey has surveyed 14,555 square degrees of the sky, and delivered over a trillion pixels of imaging data. We present the large-scale clustering of 1.6 million quasars between z=0.5 and z=2.5 that have been classified from this imaging, representing the highest density of quasars ever studied for clustering measurements. This data set spans 0~ 11,00 square degrees and probes a volume of 80 h-3 Gpc3. In principle, such a large volume and medium density of tracers should facilitate high-precision cosmological constraints. We measure the angular clustering of photometrically classified quasars using an optimal quadratic estimator in four redshift slices with an accuracy of ~ 25% over a bin width of δl ~ 10-15 on scales corresponding to matter-radiation equality and larger (0l ~ 2-3). Observational systematics can strongly bias clustering measurements on large scales, which can mimic cosmologically relevant signals such as deviations from Gaussianity in the spectrum of primordial perturbations. We account for systematics by employing a new method recently proposed by Agarwal et al. (2014) to the clustering of photometrically classified quasars. We carefully apply our methodology to mitigate known observational systematics and further remove angular bins that are contaminated by unknown systematics. Combining quasar data with the photometric luminous red galaxy (LRG) sample of Ross et al. (2011) and Ho et al. (2012), and marginalizing over all bias and shot noise-like parameters, we obtain a constraint on local primordial non-Gaussianity of fNL = -113+154-154 (1σ error). We next assume that the bias of quasar and galaxy distributions can be obtained independently from quasar/galaxy-CMB lensing cross-correlation measurements (such as those in Sherwin et al. (2013)). This can be facilitated by spectroscopic observations of the sources, enabling the redshift distribution to be completely determined, and allowing precise estimates of the bias parameters. In this paper, if the bias and shot noise parameters are fixed to their known values (which we model by fixing them to their best-fit Gaussian values), we find that the error bar reduces to 1σ simeq 65. We expect this error bar to reduce further by at least another factor of five if the data is free of any observational systematics. We therefore emphasize that in order to make best use of large scale structure data we need an accurate modeling of known systematics, a method to mitigate unknown systematics, and additionally independent theoretical models or observations to probe the bias of dark matter halos.

  10. Computer-aided diagnosis of early knee osteoarthritis based on MRI T2 mapping.

    PubMed

    Wu, Yixiao; Yang, Ran; Jia, Sen; Li, Zhanjun; Zhou, Zhiyang; Lou, Ting

    2014-01-01

    This work was aimed at studying the method of computer-aided diagnosis of early knee OA (OA: osteoarthritis). Based on the technique of MRI (MRI: Magnetic Resonance Imaging) T2 Mapping, through computer image processing, feature extraction, calculation and analysis via constructing a classifier, an effective computer-aided diagnosis method for knee OA was created to assist doctors in their accurate, timely and convenient detection of potential risk of OA. In order to evaluate this method, a total of 1380 data from the MRI images of 46 samples of knee joints were collected. These data were then modeled through linear regression on an offline general platform by the use of the ImageJ software, and a map of the physical parameter T2 was reconstructed. After the image processing, the T2 values of ten regions in the WORMS (WORMS: Whole-organ Magnetic Resonance Imaging Score) areas of the articular cartilage were extracted to be used as the eigenvalues in data mining. Then,a RBF (RBF: Radical Basis Function) network classifier was built to classify and identify the collected data. The classifier exhibited a final identification accuracy of 75%, indicating a good result of assisting diagnosis. Since the knee OA classifier constituted by a weights-directly-determined RBF neural network didn't require any iteration, our results demonstrated that the optimal weights, appropriate center and variance could be yielded through simple procedures. Furthermore, the accuracy for both the training samples and the testing samples from the normal group could reach 100%. Finally, the classifier was superior both in time efficiency and classification performance to the frequently used classifiers based on iterative learning. Thus it was suitable to be used as an aid to computer-aided diagnosis of early knee OA.

  11. Clinical application of modified bag-of-features coupled with hybrid neural-based classifier in dengue fever classification using gene expression data.

    PubMed

    Chatterjee, Sankhadeep; Dey, Nilanjan; Shi, Fuqian; Ashour, Amira S; Fong, Simon James; Sen, Soumya

    2018-04-01

    Dengue fever detection and classification have a vital role due to the recent outbreaks of different kinds of dengue fever. Recently, the advancement in the microarray technology can be employed for such classification process. Several studies have established that the gene selection phase takes a significant role in the classifier performance. Subsequently, the current study focused on detecting two different variations, namely, dengue fever (DF) and dengue hemorrhagic fever (DHF). A modified bag-of-features method has been proposed to select the most promising genes in the classification process. Afterward, a modified cuckoo search optimization algorithm has been engaged to support the artificial neural (ANN-MCS) to classify the unknown subjects into three different classes namely, DF, DHF, and another class containing convalescent and normal cases. The proposed method has been compared with other three well-known classifiers, namely, multilayer perceptron feed-forward network (MLP-FFN), artificial neural network (ANN) trained with cuckoo search (ANN-CS), and ANN trained with PSO (ANN-PSO). Experiments have been carried out with different number of clusters for the initial bag-of-features-based feature selection phase. After obtaining the reduced dataset, the hybrid ANN-MCS model has been employed for the classification process. The results have been compared in terms of the confusion matrix-based performance measuring metrics. The experimental results indicated a highly statistically significant improvement with the proposed classifier over the traditional ANN-CS model.

  12. Imagining Sisyphus happy: DNA barcoding and the unnamed majority.

    PubMed

    Blaxter, Mark

    2016-09-05

    The vast majority of life on the Earth is physically small, and is classifiable as micro- or meiobiota. These organisms are numerically dominant and it is likely that they are also abundantly speciose. By contrast, the vast majority of taxonomic effort has been expended on 'charismatic megabionts': larger organisms where a wealth of morphology has facilitated Linnaean species definition. The hugely successful Linnaean project is unlikely to be extensible to the totality of approximately 10 million species in a reasonable time frame and thus alternative toolkits and methodologies need to be developed. One such toolkit is DNA barcoding, particularly in its metabarcoding or metagenetics mode, where organisms are identified purely by the presence of a diagnostic DNA sequence in samples that are not processed for morphological identification. Building on secure Linnaean foundations, classification of unknown (and unseen) organisms to molecular operational taxonomic units (MOTUs) and deployment of these MOTUs in biodiversity science promises a rewarding resolution to the Sisyphean task of naming all the world's species.This article is part of the themed issue 'From DNA barcodes to biomes'. © 2016 The Authors.

  13. Multi-view L2-SVM and its multi-view core vector machine.

    PubMed

    Huang, Chengquan; Chung, Fu-lai; Wang, Shitong

    2016-03-01

    In this paper, a novel L2-SVM based classifier Multi-view L2-SVM is proposed to address multi-view classification tasks. The proposed Multi-view L2-SVM classifier does not have any bias in its objective function and hence has the flexibility like μ-SVC in the sense that the number of the yielded support vectors can be controlled by a pre-specified parameter. The proposed Multi-view L2-SVM classifier can make full use of the coherence and the difference of different views through imposing the consensus among multiple views to improve the overall classification performance. Besides, based on the generalized core vector machine GCVM, the proposed Multi-view L2-SVM classifier is extended into its GCVM version MvCVM which can realize its fast training on large scale multi-view datasets, with its asymptotic linear time complexity with the sample size and its space complexity independent of the sample size. Our experimental results demonstrated the effectiveness of the proposed Multi-view L2-SVM classifier for small scale multi-view datasets and the proposed MvCVM classifier for large scale multi-view datasets. Copyright © 2015 Elsevier Ltd. All rights reserved.

  14. An evaluation of sampling and full enumeration strategies for Fisher Jenks classification in big data settings

    USGS Publications Warehouse

    Rey, Sergio J.; Stephens, Philip A.; Laura, Jason R.

    2017-01-01

    Large data contexts present a number of challenges to optimal choropleth map classifiers. Application of optimal classifiers to a sample of the attribute space is one proposed solution. The properties of alternative sampling-based classification methods are examined through a series of Monte Carlo simulations. The impacts of spatial autocorrelation, number of desired classes, and form of sampling are shown to have significant impacts on the accuracy of map classifications. Tradeoffs between improved speed of the sampling approaches and loss of accuracy are also considered. The results suggest the possibility of guiding the choice of classification scheme as a function of the properties of large data sets.

  15. Chemical recognition of gases and gas mixtures with terahertz waves.

    PubMed

    Jacobsen, R H; Mittleman, D M; Nuss, M C

    1996-12-15

    A time-domain chemical-recognition system for classifying gases and analyzing gas mixtures is presented. We analyze the free induction decay exhibited by gases excited by far-infrared (terahertz) pulses in the time domain, using digital signal-processing techniques. A simple geometric picture is used for the classif ication of the waveforms measured for unknown gas species. We demonstrate how the recognition system can be used to determine the partial pressures of an ammonia-water gas mixture.

  16. Chemical recognition of gases and gas mixtures with terahertz waves

    NASA Astrophysics Data System (ADS)

    Jacobsen, R. H.; Mittleman, D. M.; Nuss, M. C.

    1996-12-01

    A time-domain chemical-recognition system for classifying gases and analyzing gas mixtures is presented. We analyze the free induction decay exhibited by gases excited by far-infrared (terahertz) pulses in the time domain, using digital signal-processing techniques. A simple geometric picture is used for the classification of the waveforms measured for unknown gas species. We demonstrate how the recognition system can be used to determine the partial pressures of an ammonia-water gas mixture.

  17. Deep Learning Accurately Predicts Estrogen Receptor Status in Breast Cancer Metabolomics Data.

    PubMed

    Alakwaa, Fadhl M; Chaudhary, Kumardeep; Garmire, Lana X

    2018-01-05

    Metabolomics holds the promise as a new technology to diagnose highly heterogeneous diseases. Conventionally, metabolomics data analysis for diagnosis is done using various statistical and machine learning based classification methods. However, it remains unknown if deep neural network, a class of increasingly popular machine learning methods, is suitable to classify metabolomics data. Here we use a cohort of 271 breast cancer tissues, 204 positive estrogen receptor (ER+), and 67 negative estrogen receptor (ER-) to test the accuracies of feed-forward networks, a deep learning (DL) framework, as well as six widely used machine learning models, namely random forest (RF), support vector machines (SVM), recursive partitioning and regression trees (RPART), linear discriminant analysis (LDA), prediction analysis for microarrays (PAM), and generalized boosted models (GBM). DL framework has the highest area under the curve (AUC) of 0.93 in classifying ER+/ER- patients, compared to the other six machine learning algorithms. Furthermore, the biological interpretation of the first hidden layer reveals eight commonly enriched significant metabolomics pathways (adjusted P-value <0.05) that cannot be discovered by other machine learning methods. Among them, protein digestion and absorption and ATP-binding cassette (ABC) transporters pathways are also confirmed in integrated analysis between metabolomics and gene expression data in these samples. In summary, deep learning method shows advantages for metabolomics based breast cancer ER status classification, with both the highest prediction accuracy (AUC = 0.93) and better revelation of disease biology. We encourage the adoption of feed-forward networks based deep learning method in the metabolomics research community for classification.

  18. Convolutional neural networks for prostate cancer recurrence prediction

    NASA Astrophysics Data System (ADS)

    Kumar, Neeraj; Verma, Ruchika; Arora, Ashish; Kumar, Abhay; Gupta, Sanchit; Sethi, Amit; Gann, Peter H.

    2017-03-01

    Accurate prediction of the treatment outcome is important for cancer treatment planning. We present an approach to predict prostate cancer (PCa) recurrence after radical prostatectomy using tissue images. We used a cohort whose case vs. control (recurrent vs. non-recurrent) status had been determined using post-treatment follow up. Further, to aid the development of novel biomarkers of PCa recurrence, cases and controls were paired based on matching of other predictive clinical variables such as Gleason grade, stage, age, and race. For this cohort, tissue resection microarray with up to four cores per patient was available. The proposed approach is based on deep learning, and its novelty lies in the use of two separate convolutional neural networks (CNNs) - one to detect individual nuclei even in the crowded areas, and the other to classify them. To detect nuclear centers in an image, the first CNN predicts distance transform of the underlying (but unknown) multi-nuclear map from the input HE image. The second CNN classifies the patches centered at nuclear centers into those belonging to cases or controls. Voting across patches extracted from image(s) of a patient yields the probability of recurrence for the patient. The proposed approach gave 0.81 AUC for a sample of 30 recurrent cases and 30 non-recurrent controls, after being trained on an independent set of 80 case-controls pairs. If validated further, such an approach might help in choosing between a combination of treatment options such as active surveillance, radical prostatectomy, radiation, and hormone therapy. It can also generalize to the prediction of treatment outcomes in other cancers.

  19. Distinguishing body mass and activity level from the lower limb: can entheses diagnose obesity?

    PubMed

    Godde, Kanya; Taylor, Rebecca Wilson

    2013-03-10

    The ability to estimate body size from the skeleton has broad applications, but is especially important to the forensic community when identifying unknown skeletal remains. This research investigates the utility of using entheses/muscle skeletal markers of the lower limb to estimate body size and to classify individuals into average, obese, and active categories, while using a biomechanical approach to interpret the results. Eighteen muscle attachment sites of the lower limb, known to be involved in the sit-to-stand transition, were scored for robusticity and stress in 105 white males (aged 31-81 years) from the William M. Bass Donated Skeletal Collection. Both logistic regression and log linear models were applied to the data to (1) test the utility of entheses as an indicator of body weight and activity level, and (2) to generate classification percentages that speak to the accuracy of the method. Thirteen robusticity scores differed significantly between the groups, but classification percentages were only slightly greater than chance. However, clear differences could be seen between the average and obese and the average and active groups. Stress scores showed no value in discriminating between groups. These results were interpreted in relation to biomechanical forces at the microscopic and macroscopic levels. Even though robusticity alone is not able to classify individuals well, its significance may show greater value when incorporated into a model that has multiple skeletal indicators. Further research needs to evaluate a larger sample and incorporate several lines of evidence to improve classification rates. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  20. VizieR Online Data Catalog: Spectroscopically Identified Hot Subdwarf Stars (Kilkenny+ 1988)

    NASA Astrophysics Data System (ADS)

    Kilkenny, D.; Heber, U.; Drilling, J. S.

    1996-05-01

    Prior to 1986 there were around 200 spectroscopically classified hot subdwarf stars. The Palomar-Green survey (Green et al., 1986ApJS...61..305G) detected over 900 hot subdwarfs, mostly in the North Galactic Cap and mostly previously unknown objects; the Kitt-Peak_Downes survey found another 60 near the Galactic Plane (Downes, 1986ApJS...61..569D). These form the basis of the present catalog but new subdwarfs are continually being found by spectroscopic surveys of photographically discovered faint blue star samples; examples are the work of Wegner and his co-workers on the Kiso survey (Wegner et al., 1985AJ.....90.1511W, 1986AJ.....91..139W, 1987AJ.....94.1271W) and of Kilkenny and Muller (1987) on southern discoveries by Luyten and collaborators (e.g. Haro and Luyten, 1962, Cat. III/74; Luyten and Anderson, 1958, 1959, 1967, "A Search for Faint Blue Stars"). Only stars for which a spectroscopic classification exists have been included. There is a significant probability that stars with only photometric classifications can be normal high-latitude B stars, white dwarfs or cataclysmic variable, for example. Hot subdwarfs in binary systems have been included but not planetary nebulae nuclei classified 'sd' since the latter have been catalogued elsewhere. Although there is not a universally accepted classification scheme for hot subdwarfs, it is fairly clear that the main criterion is a surface gravity higher than that of hot main sequence stars but less than that of hot white dwarfs. Also, hot subdwarf stars typically show helium abundance anomalies. (3 data files).

  1. Target discrimination method for SAR images based on semisupervised co-training

    NASA Astrophysics Data System (ADS)

    Wang, Yan; Du, Lan; Dai, Hui

    2018-01-01

    Synthetic aperture radar (SAR) target discrimination is usually performed in a supervised manner. However, supervised methods for SAR target discrimination may need lots of labeled training samples, whose acquirement is costly, time consuming, and sometimes impossible. This paper proposes an SAR target discrimination method based on semisupervised co-training, which utilizes a limited number of labeled samples and an abundant number of unlabeled samples. First, Lincoln features, widely used in SAR target discrimination, are extracted from the training samples and partitioned into two sets according to their physical meanings. Second, two support vector machine classifiers are iteratively co-trained with the extracted two feature sets based on the co-training algorithm. Finally, the trained classifiers are exploited to classify the test data. The experimental results on real SAR images data not only validate the effectiveness of the proposed method compared with the traditional supervised methods, but also demonstrate the superiority of co-training over self-training, which only uses one feature set.

  2. An evaluation of open set recognition for FLIR images

    NASA Astrophysics Data System (ADS)

    Scherreik, Matthew; Rigling, Brian

    2015-05-01

    Typical supervised classification algorithms label inputs according to what was learned in a training phase. Thus, test inputs that were not seen in training are always given incorrect labels. Open set recognition algorithms address this issue by accounting for inputs that are not present in training and providing the classifier with an option to reject" unknown samples. A number of such techniques have been developed in the literature, many of which are based on support vector machines (SVMs). One approach, the 1-vs-set machine, constructs a slab" in feature space using the SVM hyperplane. Inputs falling on one side of the slab or within the slab belong to a training class, while inputs falling on the far side of the slab are rejected. We note that rejection of unknown inputs can be achieved by thresholding class posterior probabilities. Another recently developed approach, the Probabilistic Open Set SVM (POS-SVM), empirically determines good probability thresholds. We apply the 1-vs-set machine, POS-SVM, and closed set SVMs to FLIR images taken from the Comanche SIG dataset. Vehicles in the dataset are divided into three general classes: wheeled, armored personnel carrier (APC), and tank. For each class, a coarse pose estimate (front, rear, left, right) is taken. In a closed set sense, we analyze these algorithms for prediction of vehicle class and pose. To test open set performance, one or more vehicle classes are held out from training. By considering closed and open set performance separately, we may closely analyze both inter-class discrimination and threshold effectiveness.

  3. An approach to the analysis of SDSS spectroscopic outliers based on self-organizing maps. Designing the outlier analysis software package for the next Gaia survey

    NASA Astrophysics Data System (ADS)

    Fustes, D.; Manteiga, M.; Dafonte, C.; Arcay, B.; Ulla, A.; Smith, K.; Borrachero, R.; Sordo, R.

    2013-11-01

    Aims: A new method applied to the segmentation and further analysis of the outliers resulting from the classification of astronomical objects in large databases is discussed. The method is being used in the framework of the Gaia satellite Data Processing and Analysis Consortium (DPAC) activities to prepare automated software tools that will be used to derive basic astrophysical information that is to be included in final Gaia archive. Methods: Our algorithm has been tested by means of simulated Gaia spectrophotometry, which is based on SDSS observations and theoretical spectral libraries covering a wide sample of astronomical objects. Self-organizing maps networks are used to organize the information in clusters of objects, as homogeneously as possible according to their spectral energy distributions, and to project them onto a 2D grid where the data structure can be visualized. Results: We demonstrate the usefulness of the method by analyzing the spectra that were rejected by the SDSS spectroscopic classification pipeline and thus classified as "UNKNOWN". First, our method can help distinguish between astrophysical objects and instrumental artifacts. Additionally, the application of our algorithm to SDSS objects of unknown nature has allowed us to identify classes of objects with similar astrophysical natures. In addition, the method allows for the potential discovery of hundreds of new objects, such as white dwarfs and quasars. Therefore, the proposed method is shown to be very promising for data exploration and knowledge discovery in very large astronomical databases, such as the archive from the upcoming Gaia mission.

  4. The evaluation of alternate methodologies for land cover classification in an urbanizing area

    NASA Technical Reports Server (NTRS)

    Smekofski, R. M.

    1981-01-01

    The usefulness of LANDSAT in classifying land cover and in identifying and classifying land use change was investigated using an urbanizing area as the study area. The question of what was the best technique for classification was the primary focus of the study. The many computer-assisted techniques available to analyze LANDSAT data were evaluated. Techniques of statistical training (polygons from CRT, unsupervised clustering, polygons from digitizer and binary masks) were tested with minimum distance to the mean, maximum likelihood and canonical analysis with minimum distance to the mean classifiers. The twelve output images were compared to photointerpreted samples, ground verified samples and a current land use data base. Results indicate that for a reconnaissance inventory, the unsupervised training with canonical analysis-minimum distance classifier is the most efficient. If more detailed ground truth and ground verification is available, the polygons from the digitizer training with the canonical analysis minimum distance is more accurate.

  5. Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations

    PubMed Central

    Zhang, Yi; Ren, Jinchang; Jiang, Jianmin

    2015-01-01

    Maximum likelihood classifier (MLC) and support vector machines (SVM) are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions. PMID:26089862

  6. Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations.

    PubMed

    Zhang, Yi; Ren, Jinchang; Jiang, Jianmin

    2015-01-01

    Maximum likelihood classifier (MLC) and support vector machines (SVM) are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions.

  7. Mapping raised bogs with an iterative one-class classification approach

    NASA Astrophysics Data System (ADS)

    Mack, Benjamin; Roscher, Ribana; Stenzel, Stefanie; Feilhauer, Hannes; Schmidtlein, Sebastian; Waske, Björn

    2016-10-01

    Land use and land cover maps are one of the most commonly used remote sensing products. In many applications the user only requires a map of one particular class of interest, e.g. a specific vegetation type or an invasive species. One-class classifiers are appealing alternatives to common supervised classifiers because they can be trained with labeled training data of the class of interest only. However, training an accurate one-class classification (OCC) model is challenging, particularly when facing a large image, a small class and few training samples. To tackle these problems we propose an iterative OCC approach. The presented approach uses a biased Support Vector Machine as core classifier. In an iterative pre-classification step a large part of the pixels not belonging to the class of interest is classified. The remaining data is classified by a final classifier with a novel model and threshold selection approach. The specific objective of our study is the classification of raised bogs in a study site in southeast Germany, using multi-seasonal RapidEye data and a small number of training sample. Results demonstrate that the iterative OCC outperforms other state of the art one-class classifiers and approaches for model selection. The study highlights the potential of the proposed approach for an efficient and improved mapping of small classes such as raised bogs. Overall the proposed approach constitutes a feasible approach and useful modification of a regular one-class classifier.

  8. Generic Learning-Based Ensemble Framework for Small Sample Size Face Recognition in Multi-Camera Networks.

    PubMed

    Zhang, Cuicui; Liang, Xuefeng; Matsuyama, Takashi

    2014-12-08

    Multi-camera networks have gained great interest in video-based surveillance systems for security monitoring, access control, etc. Person re-identification is an essential and challenging task in multi-camera networks, which aims to determine if a given individual has already appeared over the camera network. Individual recognition often uses faces as a trial and requires a large number of samples during the training phrase. This is difficult to fulfill due to the limitation of the camera hardware system and the unconstrained image capturing conditions. Conventional face recognition algorithms often encounter the "small sample size" (SSS) problem arising from the small number of training samples compared to the high dimensionality of the sample space. To overcome this problem, interest in the combination of multiple base classifiers has sparked research efforts in ensemble methods. However, existing ensemble methods still open two questions: (1) how to define diverse base classifiers from the small data; (2) how to avoid the diversity/accuracy dilemma occurring during ensemble. To address these problems, this paper proposes a novel generic learning-based ensemble framework, which augments the small data by generating new samples based on a generic distribution and introduces a tailored 0-1 knapsack algorithm to alleviate the diversity/accuracy dilemma. More diverse base classifiers can be generated from the expanded face space, and more appropriate base classifiers are selected for ensemble. Extensive experimental results on four benchmarks demonstrate the higher ability of our system to cope with the SSS problem compared to the state-of-the-art system.

  9. Generic Learning-Based Ensemble Framework for Small Sample Size Face Recognition in Multi-Camera Networks

    PubMed Central

    Zhang, Cuicui; Liang, Xuefeng; Matsuyama, Takashi

    2014-01-01

    Multi-camera networks have gained great interest in video-based surveillance systems for security monitoring, access control, etc. Person re-identification is an essential and challenging task in multi-camera networks, which aims to determine if a given individual has already appeared over the camera network. Individual recognition often uses faces as a trial and requires a large number of samples during the training phrase. This is difficult to fulfill due to the limitation of the camera hardware system and the unconstrained image capturing conditions. Conventional face recognition algorithms often encounter the “small sample size” (SSS) problem arising from the small number of training samples compared to the high dimensionality of the sample space. To overcome this problem, interest in the combination of multiple base classifiers has sparked research efforts in ensemble methods. However, existing ensemble methods still open two questions: (1) how to define diverse base classifiers from the small data; (2) how to avoid the diversity/accuracy dilemma occurring during ensemble. To address these problems, this paper proposes a novel generic learning-based ensemble framework, which augments the small data by generating new samples based on a generic distribution and introduces a tailored 0–1 knapsack algorithm to alleviate the diversity/accuracy dilemma. More diverse base classifiers can be generated from the expanded face space, and more appropriate base classifiers are selected for ensemble. Extensive experimental results on four benchmarks demonstrate the higher ability of our system to cope with the SSS problem compared to the state-of-the-art system. PMID:25494350

  10. ANALYSIS OF SAMPLING TECHNIQUES FOR IMBALANCED DATA: AN N=648 ADNI STUDY

    PubMed Central

    Dubey, Rashmi; Zhou, Jiayu; Wang, Yalin; Thompson, Paul M.; Ye, Jieping

    2013-01-01

    Many neuroimaging applications deal with imbalanced imaging data. For example, in Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset, the mild cognitive impairment (MCI) cases eligible for the study are nearly two times the Alzheimer’s disease (AD) patients for structural magnetic resonance imaging (MRI) modality and six times the control cases for proteomics modality. Constructing an accurate classifier from imbalanced data is a challenging task. Traditional classifiers that aim to maximize the overall prediction accuracy tend to classify all data into the majority class. In this paper, we study an ensemble system of feature selection and data sampling for the class imbalance problem. We systematically analyze various sampling techniques by examining the efficacy of different rates and types of undersampling, oversampling, and a combination of over and under sampling approaches. We thoroughly examine six widely used feature selection algorithms to identify significant biomarkers and thereby reduce the complexity of the data. The efficacy of the ensemble techniques is evaluated using two different classifiers including Random Forest and Support Vector Machines based on classification accuracy, area under the receiver operating characteristic curve (AUC), sensitivity, and specificity measures. Our extensive experimental results show that for various problem settings in ADNI, (1). a balanced training set obtained with K-Medoids technique based undersampling gives the best overall performance among different data sampling techniques and no sampling approach; and (2). sparse logistic regression with stability selection achieves competitive performance among various feature selection algorithms. Comprehensive experiments with various settings show that our proposed ensemble model of multiple undersampled datasets yields stable and promising results. PMID:24176869

  11. Differentiation of Candida albicans, Candida glabrata, and Candida krusei by FT-IR and chemometrics by CHROMagar™ Candida.

    PubMed

    Wohlmeister, Denise; Vianna, Débora Renz Barreto; Helfer, Virginia Etges; Calil, Luciane Noal; Buffon, Andréia; Fuentefria, Alexandre Meneghello; Corbellini, Valeriano Antonio; Pilger, Diogo André

    2017-10-01

    Pathogenic Candida species are detected in clinical infections. CHROMagar™ is a phenotypical method used to identify Candida species, although it has limitations, which indicates the need for more sensitive and specific techniques. Infrared Spectroscopy (FT-IR) is an analytical vibrational technique used to identify patterns of metabolic fingerprint of biological matrixes, particularly whole microbial cell systems as Candida sp. in association of classificatory chemometrics algorithms. On the other hand, Soft Independent Modeling by Class Analogy (SIMCA) is one of the typical algorithms still little employed in microbiological classification. This study demonstrates the applicability of the FT-IR-technique by specular reflectance associated with SIMCA to discriminate Candida species isolated from vaginal discharges and grown on CHROMagar™. The differences in spectra of C. albicans, C. glabrata and C. krusei were suitable for use in the discrimination of these species, which was observed by PCA. Then, a SIMCA model was constructed with standard samples of three species and using the spectral region of 1792-1561cm -1 . All samples (n=48) were properly classified based on the chromogenic method using CHROMagar™ Candida. In total, 93.4% (n=45) of the samples were correctly and unambiguously classified (Class I). Two samples of C. albicans were classified correctly, though these could have been C. glabrata (Class II). Also, one C. glabrata sample could have been classified as C. krusei (Class II). Concerning these three samples, one triplicate of each was included in Class II and two in Class I. Therefore, FT-IR associated with SIMCA can be used to identify samples of C. albicans, C. glabrata, and C. krusei grown in CHROMagar™ Candida aiming to improve clinical applications of this technique. Copyright © 2017 Elsevier B.V. All rights reserved.

  12. Quasi-Supervised Scoring of Human Sleep in Polysomnograms Using Augmented Input Variables

    PubMed Central

    Yaghouby, Farid; Sunderam, Sridhar

    2015-01-01

    The limitations of manual sleep scoring make computerized methods highly desirable. Scoring errors can arise from human rater uncertainty or inter-rater variability. Sleep scoring algorithms either come as supervised classifiers that need scored samples of each state to be trained, or as unsupervised classifiers that use heuristics or structural clues in unscored data to define states. We propose a quasi-supervised classifier that models observations in an unsupervised manner but mimics a human rater wherever training scores are available. EEG, EMG, and EOG features were extracted in 30s epochs from human-scored polysomnograms recorded from 42 healthy human subjects (18 to 79 years) and archived in an anonymized, publicly accessible database. Hypnograms were modified so that: 1. Some states are scored but not others; 2. Samples of all states are scored but not for transitional epochs; and 3. Two raters with 67% agreement are simulated. A framework for quasi-supervised classification was devised in which unsupervised statistical models—specifically Gaussian mixtures and hidden Markov models—are estimated from unlabeled training data, but the training samples are augmented with variables whose values depend on available scores. Classifiers were fitted to signal features incorporating partial scores, and used to predict scores for complete recordings. Performance was assessed using Cohen's K statistic. The quasi-supervised classifier performed significantly better than an unsupervised model and sometimes as well as a completely supervised model despite receiving only partial scores. The quasi-supervised algorithm addresses the need for classifiers that mimic scoring patterns of human raters while compensating for their limitations. PMID:25679475

  13. Quasi-supervised scoring of human sleep in polysomnograms using augmented input variables.

    PubMed

    Yaghouby, Farid; Sunderam, Sridhar

    2015-04-01

    The limitations of manual sleep scoring make computerized methods highly desirable. Scoring errors can arise from human rater uncertainty or inter-rater variability. Sleep scoring algorithms either come as supervised classifiers that need scored samples of each state to be trained, or as unsupervised classifiers that use heuristics or structural clues in unscored data to define states. We propose a quasi-supervised classifier that models observations in an unsupervised manner but mimics a human rater wherever training scores are available. EEG, EMG, and EOG features were extracted in 30s epochs from human-scored polysomnograms recorded from 42 healthy human subjects (18-79 years) and archived in an anonymized, publicly accessible database. Hypnograms were modified so that: 1. Some states are scored but not others; 2. Samples of all states are scored but not for transitional epochs; and 3. Two raters with 67% agreement are simulated. A framework for quasi-supervised classification was devised in which unsupervised statistical models-specifically Gaussian mixtures and hidden Markov models--are estimated from unlabeled training data, but the training samples are augmented with variables whose values depend on available scores. Classifiers were fitted to signal features incorporating partial scores, and used to predict scores for complete recordings. Performance was assessed using Cohen's Κ statistic. The quasi-supervised classifier performed significantly better than an unsupervised model and sometimes as well as a completely supervised model despite receiving only partial scores. The quasi-supervised algorithm addresses the need for classifiers that mimic scoring patterns of human raters while compensating for their limitations. Copyright © 2015 Elsevier Ltd. All rights reserved.

  14. Blood Based Biomarkers of Early Onset Breast Cancer

    DTIC Science & Technology

    2016-12-01

    discretizes the data, and also using logistic elastic net – a form of linear regression - we were unable to build a classifier that could accurately...classifier for differentiating cases from controls off discretized data. The first pass analysis demonstrated a 35 gene signature that differentiated...to the discretized data for mRNA gene signature, the samples used to “train” were also included in the final samples used to “test” the algorithm

  15. Self-similarity Clustering Event Detection Based on Triggers Guidance

    NASA Astrophysics Data System (ADS)

    Zhang, Xianfei; Li, Bicheng; Tian, Yuxuan

    Traditional method of Event Detection and Characterization (EDC) regards event detection task as classification problem. It makes words as samples to train classifier, which can lead to positive and negative samples of classifier imbalance. Meanwhile, there is data sparseness problem of this method when the corpus is small. This paper doesn't classify event using word as samples, but cluster event in judging event types. It adopts self-similarity to convergence the value of K in K-means algorithm by the guidance of event triggers, and optimizes clustering algorithm. Then, combining with named entity and its comparative position information, the new method further make sure the pinpoint type of event. The new method avoids depending on template of event in tradition methods, and its result of event detection can well be used in automatic text summarization, text retrieval, and topic detection and tracking.

  16. Spectral classifier design with ensemble classifiers and misclassification-rejection: application to elastic-scattering spectroscopy for detection of colonic neoplasia.

    PubMed

    Rodriguez-Diaz, Eladio; Castanon, David A; Singh, Satish K; Bigio, Irving J

    2011-06-01

    Optical spectroscopy has shown potential as a real-time, in vivo, diagnostic tool for identifying neoplasia during endoscopy. We present the development of a diagnostic algorithm to classify elastic-scattering spectroscopy (ESS) spectra as either neoplastic or non-neoplastic. The algorithm is based on pattern recognition methods, including ensemble classifiers, in which members of the ensemble are trained on different regions of the ESS spectrum, and misclassification-rejection, where the algorithm identifies and refrains from classifying samples that are at higher risk of being misclassified. These "rejected" samples can be reexamined by simply repositioning the probe to obtain additional optical readings or ultimately by sending the polyp for histopathological assessment, as per standard practice. Prospective validation using separate training and testing sets result in a baseline performance of sensitivity = .83, specificity = .79, using the standard framework of feature extraction (principal component analysis) followed by classification (with linear support vector machines). With the developed algorithm, performance improves to Se ∼ 0.90, Sp ∼ 0.90, at a cost of rejecting 20-33% of the samples. These results are on par with a panel of expert pathologists. For colonoscopic prevention of colorectal cancer, our system could reduce biopsy risk and cost, obviate retrieval of non-neoplastic polyps, decrease procedure time, and improve assessment of cancer risk.

  17. Spectral classifier design with ensemble classifiers and misclassification-rejection: application to elastic-scattering spectroscopy for detection of colonic neoplasia

    PubMed Central

    Rodriguez-Diaz, Eladio; Castanon, David A.; Singh, Satish K.; Bigio, Irving J.

    2011-01-01

    Optical spectroscopy has shown potential as a real-time, in vivo, diagnostic tool for identifying neoplasia during endoscopy. We present the development of a diagnostic algorithm to classify elastic-scattering spectroscopy (ESS) spectra as either neoplastic or non-neoplastic. The algorithm is based on pattern recognition methods, including ensemble classifiers, in which members of the ensemble are trained on different regions of the ESS spectrum, and misclassification-rejection, where the algorithm identifies and refrains from classifying samples that are at higher risk of being misclassified. These “rejected” samples can be reexamined by simply repositioning the probe to obtain additional optical readings or ultimately by sending the polyp for histopathological assessment, as per standard practice. Prospective validation using separate training and testing sets result in a baseline performance of sensitivity = .83, specificity = .79, using the standard framework of feature extraction (principal component analysis) followed by classification (with linear support vector machines). With the developed algorithm, performance improves to Se ∼ 0.90, Sp ∼ 0.90, at a cost of rejecting 20–33% of the samples. These results are on par with a panel of expert pathologists. For colonoscopic prevention of colorectal cancer, our system could reduce biopsy risk and cost, obviate retrieval of non-neoplastic polyps, decrease procedure time, and improve assessment of cancer risk. PMID:21721830

  18. Classifying Imbalanced Data Streams via Dynamic Feature Group Weighting with Importance Sampling.

    PubMed

    Wu, Ke; Edwards, Andrea; Fan, Wei; Gao, Jing; Zhang, Kun

    2014-04-01

    Data stream classification and imbalanced data learning are two important areas of data mining research. Each has been well studied to date with many interesting algorithms developed. However, only a few approaches reported in literature address the intersection of these two fields due to their complex interplay. In this work, we proposed an importance sampling driven, dynamic feature group weighting framework (DFGW-IS) for classifying data streams of imbalanced distribution. Two components are tightly incorporated into the proposed approach to address the intrinsic characteristics of concept-drifting, imbalanced streaming data. Specifically, the ever-evolving concepts are tackled by a weighted ensemble trained on a set of feature groups with each sub-classifier (i.e. a single classifier or an ensemble) weighed by its discriminative power and stable level. The un-even class distribution, on the other hand, is typically battled by the sub-classifier built in a specific feature group with the underlying distribution rebalanced by the importance sampling technique. We derived the theoretical upper bound for the generalization error of the proposed algorithm. We also studied the empirical performance of our method on a set of benchmark synthetic and real world data, and significant improvement has been achieved over the competing algorithms in terms of standard evaluation metrics and parallel running time. Algorithm implementations and datasets are available upon request.

  19. A Synoptic Study of Fecal-Indicator Bacteria in the Wind River, Bighorn River, and Goose Creek Basins, Wyoming, June-July 2000

    USGS Publications Warehouse

    Clark, Melanie L.; Gamper, Merry E.

    2003-01-01

    A synoptic study of fecal-indicator bacteria was conducted during June and July 2000 in the Wind River, Bighorn River, and Goose Creek Basins in Wyoming as part of the U.S. Geological Survey's National Water-Quality Assessment Program for the Yellowstone River Basin. Fecal-coliform concentrations ranged from 2 to 3,000 col/100 mL (colonies per 100 milliliters) for 100 samples, and Escherichia coli concentrations ranged from 1 to 2,800 col/100 mL for 97 samples. Fecal-coliform concentrations exceeded the U.S. Environmental Protection Agency's recommended limit for a single sample for recreational contact with water in 37.0 percent of the samples. Escherichia coli concentrations exceeded the U.S. Environmental Protection Agency's recommended limit for a single sample for moderate use, full-body recreational contact with water in 38.1 percent of the samples and the recommended limit for infrequent use, full-body recreational contact with water in 24.7 percent of the samples. Fecal-indicator-bacteria concentrations varied by basin. Samples from the Bighorn River Basin had the highest median concentrations for fecal coliform of 340 col/100 mL and for Escherichia coli of 300 col/100 mL. Samples from the Wind River Basin had the lowest median concentrations for fecal coliform of 50 col/100 mL and for Escherichia coli of 62 col/100 mL. Fecal-indicator-bacteria concentrations varied by land cover. Samples from sites with an urban land cover had the highest median concentrations for fecal coliform of 540 col/100 mL and for Escherichia coli of 420 col/100 mL. Maximum concentrations for fecal coliform of 3,000 col/100 mL and for Escherichia coli of 2,800 col/100 mL were in samples from sites with an agricultural land cover. The lowest median concentrations for fecal coliform of 130 col/100 mL and for Escherichia coli of 67 col/100 mL were for samples from sites with a forested land cover. A strong and positive relation existed between fecal coliform and Escherichia coli (Spearman's Rho value of 0.976). The majority of the fecal coliforms were Escherichia coli during the synoptic study. Fecal-indicator-bacteria concentrations were not correlated to streamflow, water temperature, dissolved oxygen, pH, specific conduc-tance, and alkalinity. Fecal-indicator-bacteria concentrations were moderately correlated with turbidity (Spearman's Rho values of 0.662 and 0.640 for fecal coliform and Escherichia coli, respectively) and sediment (Spearman's Rho values of 0.628 and 0.636 for fecal coliform and Escherichia coli, respectively). Escherichia coli isolates analyzed by discriminant analysis of ribotype patterns for samples from the Bighorn River at Basin, Wyoming, and Bitter Creek near Garland, Wyoming, in the Bighorn River Basin were determined to be from nonhuman and human sources. Using a confidence interval of 90 percent, more of the isolates from both sites were classified as being from nonhuman than human sources; however, both samples had additional isolates that were classified as unknown sources. --------------------------------------------------------------------------------

  20. Radioassay kit for method of determining methotrexate

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Charm, S.E.; Blair, H.E.

    1978-07-25

    A radioassay system for the determination of methotrexate in biological fluids based on the competitive binding of labeled and unlabeled methotrexate to the enzyme dihydrofolate reductase. Samples of unknown methotrexate level are mixed with I/sup 125/ labeled methotrexate. A portion of the total methotrexate present is bound by the addition of enzyme, and the unbound methotrexate is removed with charcoal. The level of bound I/sup 125/ labeled methotrexate is measured in a gamma counter. To calculate the methotrexate level of the unknown samples, the displacement of bound labeled methotrexate caused by the unknowns is compared to the displacement caused bymore » known methotrexate standards.« less

  1. Classifying Microorganisms.

    ERIC Educational Resources Information Center

    Baker, William P.; Leyva, Kathryn J.; Lang, Michael; Goodmanis, Ben

    2002-01-01

    Focuses on an activity in which students sample air at school and generate ideas about how to classify the microorganisms they observe. The results are used to compare air quality among schools via the Internet. Supports the development of scientific inquiry and technology skills. (DDR)

  2. An ensemble predictive modeling framework for breast cancer classification.

    PubMed

    Nagarajan, Radhakrishnan; Upreti, Meenakshi

    2017-12-01

    Molecular changes often precede clinical presentation of diseases and can be useful surrogates with potential to assist in informed clinical decision making. Recent studies have demonstrated the usefulness of modeling approaches such as classification that can predict the clinical outcomes from molecular expression profiles. While useful, a majority of these approaches implicitly use all molecular markers as features in the classification process often resulting in sparse high-dimensional projection of the samples often comparable to that of the sample size. In this study, a variant of the recently proposed ensemble classification approach is used for predicting good and poor-prognosis breast cancer samples from their molecular expression profiles. In contrast to traditional single and ensemble classifiers, the proposed approach uses multiple base classifiers with varying feature sets obtained from two-dimensional projection of the samples in conjunction with a majority voting strategy for predicting the class labels. In contrast to our earlier implementation, base classifiers in the ensembles are chosen based on maximal sensitivity and minimal redundancy by choosing only those with low average cosine distance. The resulting ensemble sets are subsequently modeled as undirected graphs. Performance of four different classification algorithms is shown to be better within the proposed ensemble framework in contrast to using them as traditional single classifier systems. Significance of a subset of genes with high-degree centrality in the network abstractions across the poor-prognosis samples is also discussed. Copyright © 2017 Elsevier Inc. All rights reserved.

  3. Opportunistic pathology-based screening for diabetes

    PubMed Central

    Simpson, Aaron J; Krowka, Renata; Kerrigan, Jennifer L; Southcott, Emma K; Wilson, J Dennis; Potter, Julia M; Nolan, Christopher J; Hickman, Peter E

    2013-01-01

    Objective To determine the potential of opportunistic glycated haemoglobin (HbA1c) testing of pathology samples to detect previously unknown diabetes. Design Pathology samples from participants collected for other reasons and suitable for HbA1c testing were utilised for opportunistic diabetes screening. HbA1c was measured with a Biorad Variant II turbo analyser and HbA1c levels of ≥6.5% (48 mmol/mol) were considered diagnostic for diabetes. Confirmation of previously unknown diabetes status was obtained by a review of hospital medical records and phone calls to general practitioners. Setting Hospital pathology laboratory receiving samples from hospital-based and community-based (CB) settings. Participants Participants were identified based on the blood sample collection location in the CB, emergency department (ED) and inpatient (IP) groups. Exclusions pretesting were made based on the electronic patient history of: age <18 years, previous diabetes diagnosis, query for diabetes status in the past 12 months, evidence of pregnancy and sample collected postsurgery or transfusion. Only one sample per individual participant was tested. Results Of the 22 396 blood samples collected, 4505 (1142 CB, 1113 ED, 2250 IP) were tested of which 327 (7.3%) had HbA1c levels ≥6.5% (48 mmol/mol). Of these 120 (2.7%) were determined to have previously unknown diabetes (11 (1%) CB, 21 (1.9%) ED, 88 (3.9%) IP). The prevalence of previously unknown diabetes was substantially higher (5.4%) in hospital-based (ED and IP) participants aged over 54 years. Conclusions Opportunistic testing of referred pathology samples can be an effective method of screening for diabetes, especially in hospital-based and older persons. PMID:24065696

  4. Mutation risk associated with paternal and maternal age in a cohort of retinoblastoma survivors.

    PubMed

    Mills, Melissa B; Hudgins, Louanne; Balise, Raymond R; Abramson, David H; Kleinerman, Ruth A

    2012-07-01

    Autosomal dominant conditions are known to be associated with advanced paternal age, and it has been suggested that retinoblastoma (Rb) also exhibits a paternal age effect due to the paternal origin of most new germline RB1 mutations. To further our understanding of the association of parental age and risk of de novo germline RB1 mutations, we evaluated the effect of parental age in a cohort of Rb survivors in the United States. A cohort of 262 Rb patients was retrospectively identified at one institution, and telephone interviews were conducted with parents of 160 survivors (65.3%). We classified Rb survivors into three groups: those with unilateral Rb were classified as sporadic if they had no or unknown family history of Rb, those with bilateral Rb were classified as having a de novo germline mutation if they had no or unknown family history of Rb, and those with unilateral or bilateral Rb, who had a family history of Rb, were classified as familial. We built two sets of nested logistic regression models to detect an increased odds of the de novo germline mutation classification related to older parental age compared to sporadic and familial Rb classifications. The modeling strategy evaluated effects of continuous increasing maternal and paternal age and 5-year age increases adjusted for the age of the other parent. Mean maternal ages for survivors classified as having de novo germline mutations and sporadic Rb were similar (28.3 and 28.5, respectively) as were mean paternal ages (31.9 and 31.2, respectively), and all were significantly higher than the weighted general US population means. In contrast, maternal and paternal ages for familial Rb did not differ significantly from the weighted US general population means. Although we noted no significant differences between mean maternal and paternal ages between each of the three Rb classification groups, we found increased odds of a survivor being in the de novo germline mutation group for each 5-year increase in paternal age, but these findings were not statistically significant (de novo vs. sporadic ORs 30-34 = 1.7 [0.7-4], ≥ 35 = 1.3 [0.5-3.3]; de novo vs. familial ORs 30-34 = 2.8 [1.0-8.4], ≥ 35 = 1.6 [0.6-4.6]). Our study suggests a weak paternal age effect for Rb resulting from de novo germline mutations consistent with the paternal origin of most of these mutations.

  5. Geochemical Influence on Microbial Communities at CO2-Leakage Analog Sites.

    PubMed

    Ham, Baknoon; Choi, Byoung-Young; Chae, Gi-Tak; Kirk, Matthew F; Kwon, Man Jae

    2017-01-01

    Microorganisms influence the chemical and physical properties of subsurface environments and thus represent an important control on the fate and environmental impact of CO 2 that leaks into aquifers from deep storage reservoirs. How leakage will influence microbial populations over long time scales is largely unknown. This study uses natural analog sites to investigate the long-term impact of CO 2 leakage from underground storage sites on subsurface biogeochemistry. We considered two sites with elevated CO 2 levels (sample groups I and II) and one control site with low CO 2 content (group III). Samples from sites with elevated CO 2 had pH ranging from 6.2 to 4.5 and samples from the low-CO 2 control group had pH ranging from 7.3 to 6.2. Solute concentrations were relatively low for samples from the control group and group I but high for samples from group II, reflecting varying degrees of water-rock interaction. Microbial communities were analyzed through clone library and MiSeq sequencing. Each 16S rRNA analysis identified various bacteria, methane-producing archaea, and ammonia-oxidizing archaea. Both bacterial and archaeal diversities were low in groundwater with high CO 2 content and community compositions between the groups were also clearly different. In group II samples, sequences classified in groups capable of methanogenesis, metal reduction, and nitrate reduction had higher relative abundance in samples with relative high methane, iron, and manganese concentrations and low nitrate levels. Sequences close to Comamonadaceae were abundant in group I, while the taxa related to methanogens, Nitrospirae , and Anaerolineaceae were predominant in group II. Our findings provide insight into subsurface biogeochemical reactions that influence the carbon budget of the system including carbon fixation, carbon trapping, and CO 2 conversion to methane. The results also suggest that monitoring groundwater microbial community can be a potential tool for tracking CO 2 leakage from geologic storage sites.

  6. Geochemical Influence on Microbial Communities at CO2-Leakage Analog Sites

    PubMed Central

    Ham, Baknoon; Choi, Byoung-Young; Chae, Gi-Tak; Kirk, Matthew F.; Kwon, Man Jae

    2017-01-01

    Microorganisms influence the chemical and physical properties of subsurface environments and thus represent an important control on the fate and environmental impact of CO2 that leaks into aquifers from deep storage reservoirs. How leakage will influence microbial populations over long time scales is largely unknown. This study uses natural analog sites to investigate the long-term impact of CO2 leakage from underground storage sites on subsurface biogeochemistry. We considered two sites with elevated CO2 levels (sample groups I and II) and one control site with low CO2 content (group III). Samples from sites with elevated CO2 had pH ranging from 6.2 to 4.5 and samples from the low-CO2 control group had pH ranging from 7.3 to 6.2. Solute concentrations were relatively low for samples from the control group and group I but high for samples from group II, reflecting varying degrees of water-rock interaction. Microbial communities were analyzed through clone library and MiSeq sequencing. Each 16S rRNA analysis identified various bacteria, methane-producing archaea, and ammonia-oxidizing archaea. Both bacterial and archaeal diversities were low in groundwater with high CO2 content and community compositions between the groups were also clearly different. In group II samples, sequences classified in groups capable of methanogenesis, metal reduction, and nitrate reduction had higher relative abundance in samples with relative high methane, iron, and manganese concentrations and low nitrate levels. Sequences close to Comamonadaceae were abundant in group I, while the taxa related to methanogens, Nitrospirae, and Anaerolineaceae were predominant in group II. Our findings provide insight into subsurface biogeochemical reactions that influence the carbon budget of the system including carbon fixation, carbon trapping, and CO2 conversion to methane. The results also suggest that monitoring groundwater microbial community can be a potential tool for tracking CO2 leakage from geologic storage sites. PMID:29170659

  7. Nuclear Forensic Inferences Using Iterative Multidimensional Statistics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Robel, M; Kristo, M J; Heller, M A

    2009-06-09

    Nuclear forensics involves the analysis of interdicted nuclear material for specific material characteristics (referred to as 'signatures') that imply specific geographical locations, production processes, culprit intentions, etc. Predictive signatures rely on expert knowledge of physics, chemistry, and engineering to develop inferences from these material characteristics. Comparative signatures, on the other hand, rely on comparison of the material characteristics of the interdicted sample (the 'questioned sample' in FBI parlance) with those of a set of known samples. In the ideal case, the set of known samples would be a comprehensive nuclear forensics database, a database which does not currently exist. Inmore » fact, our ability to analyze interdicted samples and produce an extensive list of precise materials characteristics far exceeds our ability to interpret the results. Therefore, as we seek to develop the extensive databases necessary for nuclear forensics, we must also develop the methods necessary to produce the necessary inferences from comparison of our analytical results with these large, multidimensional sets of data. In the work reported here, we used a large, multidimensional dataset of results from quality control analyses of uranium ore concentrate (UOC, sometimes called 'yellowcake'). We have found that traditional multidimensional techniques, such as principal components analysis (PCA), are especially useful for understanding such datasets and drawing relevant conclusions. In particular, we have developed an iterative partial least squares-discriminant analysis (PLS-DA) procedure that has proven especially adept at identifying the production location of unknown UOC samples. By removing classes which fell far outside the initial decision boundary, and then rebuilding the PLS-DA model, we have consistently produced better and more definitive attributions than with a single pass classification approach. Performance of the iterative PLS-DA method compared favorably to that of classification and regression tree (CART) and k nearest neighbor (KNN) algorithms, with the best combination of accuracy and robustness, as tested by classifying samples measured independently in our laboratories against the vendor QC based reference set.« less

  8. [Research on fast classification based on LIBS technology and principle component analyses].

    PubMed

    Yu, Qi; Ma, Xiao-Hong; Wang, Rui; Zhao, Hua-Feng

    2014-11-01

    Laser-induced breakdown spectroscopy (LIBS) and the principle component analysis (PCA) were combined to study aluminum alloy classification in the present article. Classification experiments were done on thirteen different kinds of standard samples of aluminum alloy which belong to 4 different types, and the results suggested that the LIBS-PCA method can be used to aluminum alloy fast classification. PCA was used to analyze the spectrum data from LIBS experiments, three principle components were figured out that contribute the most, the principle component scores of the spectrums were calculated, and the scores of the spectrums data in three-dimensional coordinates were plotted. It was found that the spectrum sample points show clear convergence phenomenon according to the type of aluminum alloy they belong to. This result ensured the three principle components and the preliminary aluminum alloy type zoning. In order to verify its accuracy, 20 different aluminum alloy samples were used to do the same experiments to verify the aluminum alloy type zoning. The experimental result showed that the spectrum sample points all located in their corresponding area of the aluminum alloy type, and this proved the correctness of the earlier aluminum alloy standard sample type zoning method. Based on this, the identification of unknown type of aluminum alloy can be done. All the experimental results showed that the accuracy of principle component analyses method based on laser-induced breakdown spectroscopy is more than 97.14%, and it can classify the different type effectively. Compared to commonly used chemical methods, laser-induced breakdown spectroscopy can do the detection of the sample in situ and fast with little sample preparation, therefore, using the method of the combination of LIBS and PCA in the areas such as quality testing and on-line industrial controlling can save a lot of time and cost, and improve the efficiency of detection greatly.

  9. A fuzzy classifier system for process control

    NASA Technical Reports Server (NTRS)

    Karr, C. L.; Phillips, J. C.

    1994-01-01

    A fuzzy classifier system that discovers rules for controlling a mathematical model of a pH titration system was developed by researchers at the U.S. Bureau of Mines (USBM). Fuzzy classifier systems successfully combine the strengths of learning classifier systems and fuzzy logic controllers. Learning classifier systems resemble familiar production rule-based systems, but they represent their IF-THEN rules by strings of characters rather than in the traditional linguistic terms. Fuzzy logic is a tool that allows for the incorporation of abstract concepts into rule based-systems, thereby allowing the rules to resemble the familiar 'rules-of-thumb' commonly used by humans when solving difficult process control and reasoning problems. Like learning classifier systems, fuzzy classifier systems employ a genetic algorithm to explore and sample new rules for manipulating the problem environment. Like fuzzy logic controllers, fuzzy classifier systems encapsulate knowledge in the form of production rules. The results presented in this paper demonstrate the ability of fuzzy classifier systems to generate a fuzzy logic-based process control system.

  10. AVNM: A Voting based Novel Mathematical Rule for Image Classification.

    PubMed

    Vidyarthi, Ankit; Mittal, Namita

    2016-12-01

    In machine learning, the accuracy of the system depends upon classification result. Classification accuracy plays an imperative role in various domains. Non-parametric classifier like K-Nearest Neighbor (KNN) is the most widely used classifier for pattern analysis. Besides its easiness, simplicity and effectiveness characteristics, the main problem associated with KNN classifier is the selection of a number of nearest neighbors i.e. "k" for computation. At present, it is hard to find the optimal value of "k" using any statistical algorithm, which gives perfect accuracy in terms of low misclassification error rate. Motivated by the prescribed problem, a new sample space reduction weighted voting mathematical rule (AVNM) is proposed for classification in machine learning. The proposed AVNM rule is also non-parametric in nature like KNN. AVNM uses the weighted voting mechanism with sample space reduction to learn and examine the predicted class label for unidentified sample. AVNM is free from any initial selection of predefined variable and neighbor selection as found in KNN algorithm. The proposed classifier also reduces the effect of outliers. To verify the performance of the proposed AVNM classifier, experiments are made on 10 standard datasets taken from UCI database and one manually created dataset. The experimental result shows that the proposed AVNM rule outperforms the KNN classifier and its variants. Experimentation results based on confusion matrix accuracy parameter proves higher accuracy value with AVNM rule. The proposed AVNM rule is based on sample space reduction mechanism for identification of an optimal number of nearest neighbor selections. AVNM results in better classification accuracy and minimum error rate as compared with the state-of-art algorithm, KNN, and its variants. The proposed rule automates the selection of nearest neighbor selection and improves classification rate for UCI dataset and manually created dataset. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  11. Littoral Combat Ship: Knowledge of Survivability and Lethality Capabilities Needed Prior to Making Major Funding Decisions

    DTIC Science & Technology

    2015-12-01

    USS Port Royal hit a coral reef in order to provide an independent review of the damage the ship sustained. Our classified report discussed...explosion. Underwater explosions create a shock wave and a highly compressed gas bubble that expands and contracts. This can cause a type of vertical or...conditions also remains unknown. Due to the dynamic nature of waves , the Navy cannot rely on modeling and simulation alone to provide an accurate

  12. Petrology of unshocked crystalline rocks and shock effects in lunar rocks and minerals

    USGS Publications Warehouse

    Chao, E.C.T.; James, O.B.; Minkin, J.A.; Boreman, J.A.; Jackson, E.D.; Raleigh, C.B.

    1970-01-01

    On the basis of rock modes, textures, and mineralogy, unshocked crystalline rocks are classified into a dominant ilmenite-rich suite (subdivided into intersertal, ophitic, and hornfels types) and a subordinate feldspar-rich suite (subdivided into poikilitic and granular types). Weakly to moderately shocked rocks show high strain-rate deformation and solid-state transformation of minerals to glasses; intensely shocked rocks are converted to rock glasses. Data on an unknown calcium-bearing iron metasilicate are presented.

  13. Self-organized network with a supervised training and its comparison with FALVQ in artificial odor recognition system

    NASA Astrophysics Data System (ADS)

    Kusumoputro, Benyamin; Rostiviani, Linda; Saptawijaya, Ari

    2000-07-01

    Artificial odor recognition system is developed in order to mimic the human sensory test in cosmetics, parfum and beverage industries. The developed system however, lacks of ability to recognize the unknown type of odor. To improve the system's capability, a hybrid neural system with a supervised learning paradigm is developed and used as a pattern classifier. In this paper, the performance of the hybrid neural system is investigated, together with that of FALVQ neural system.

  14. Optimal number of features as a function of sample size for various classification rules.

    PubMed

    Hua, Jianping; Xiong, Zixiang; Lowey, James; Suh, Edward; Dougherty, Edward R

    2005-04-15

    Given the joint feature-label distribution, increasing the number of features always results in decreased classification error; however, this is not the case when a classifier is designed via a classification rule from sample data. Typically (but not always), for fixed sample size, the error of a designed classifier decreases and then increases as the number of features grows. The potential downside of using too many features is most critical for small samples, which are commonplace for gene-expression-based classifiers for phenotype discrimination. For fixed sample size and feature-label distribution, the issue is to find an optimal number of features. Since only in rare cases is there a known distribution of the error as a function of the number of features and sample size, this study employs simulation for various feature-label distributions and classification rules, and across a wide range of sample and feature-set sizes. To achieve the desired end, finding the optimal number of features as a function of sample size, it employs massively parallel computation. Seven classifiers are treated: 3-nearest-neighbor, Gaussian kernel, linear support vector machine, polynomial support vector machine, perceptron, regular histogram and linear discriminant analysis. Three Gaussian-based models are considered: linear, nonlinear and bimodal. In addition, real patient data from a large breast-cancer study is considered. To mitigate the combinatorial search for finding optimal feature sets, and to model the situation in which subsets of genes are co-regulated and correlation is internal to these subsets, we assume that the covariance matrix of the features is blocked, with each block corresponding to a group of correlated features. Altogether there are a large number of error surfaces for the many cases. These are provided in full on a companion website, which is meant to serve as resource for those working with small-sample classification. For the companion website, please visit http://public.tgen.org/tamu/ofs/ e-dougherty@ee.tamu.edu.

  15. Impaired glucose metabolism and type 2 diabetes in apparently healthy senior citizens.

    PubMed

    Medina Escobar, Pedro; Moser, Michel; Risch, Lorenz; Risch, Martin; Nydegger, Urs Ernst; Stanga, Zeno

    2015-01-01

    To estimate the prevalence of unknown impaired glucose metabolism, also referred to as prediabetes (PreD), and unknown type 2 diabetes mellitus (T2DM) among subjectively healthy Swiss senior citizens. The fasting plasma glucose (FPG) and glycated haemoglobin A(1c) (HbA(1c)) levels were used for screening. A total of 1 362 subjects were included (613 men and 749 women; age range 60-99 years). Subjects with known T2DM were excluded. The FPG was processed immediately for analysis under standardised preanalytical conditions in a cross-sectional cohort study; plasma glucose levels were measured by means of the hexokinase procedure, and HbA(1c) was measured chromatographically and classified using the current American Diabetes Association (ADA) criteria. The crude prevalence of individuals unaware of having prediabetic FPG or HbA(1c) levels, was 64.5% (n = 878). Analogously, unknown T2DM was found in 8.4% (n = 114) On the basis of HbA(1c) criteria alone, significantly more subjects with unknown fasting glucose impairment and laboratory T2DM could be identified than with the FPG. The prevalence of PreD as well as of T2DM increased with age. The mean HOMA indices (homeostasis model assessment) for the different age groups, between 2.12 and 2.59, are consistent with clinically hidden disease and are in agreement with the largely orderly Body Mass Indices found in the normal range. Laboratory evidence of impaired glucose metabolism and, to a lesser extent, unknown T2DM, has a high prevalence among subjectively healthy older Swiss individuals. Laboratory identification of people with unknown out-of-range glucose values and overt diabetic hyperglycaemia might improve the prognosis by delaying the emergence of overt disease.

  16. Characterization of distinct classes of differential gene expression in osteoblast cultures from non-syndromic craniosynostosis bone.

    PubMed

    Rojas-Peña, Monica L; Olivares-Navarrete, Rene; Hyzy, Sharon; Arafat, Dalia; Schwartz, Zvi; Boyan, Barbara D; Williams, Joseph; Gibson, Greg

    2014-01-01

    Craniosynostosis, the premature fusion of one or more skull sutures, occurs in approximately 1 in 2500 infants, with the majority of cases non-syndromic and of unknown etiology. Two common reasons proposed for premature suture fusion are abnormal compression forces on the skull and rare genetic abnormalities. Our goal was to evaluate whether different sub-classes of disease can be identified based on total gene expression profiles. RNA-Seq data were obtained from 31 human osteoblast cultures derived from bone biopsy samples collected between 2009 and 2011, representing 23 craniosynostosis fusions and 8 normal cranial bones or long bones. No differentiation between regions of the skull was detected, but variance component analysis of gene expression patterns nevertheless supports transcriptome-based classification of craniosynostosis. Cluster analysis showed 4 distinct groups of samples; 1 predominantly normal and 3 craniosynostosis subtypes. Similar constellations of sub-types were also observed upon re-analysis of a similar dataset of 199 calvarial osteoblast cultures. Annotation of gene function of differentially expressed transcripts strongly implicates physiological differences with respect to cell cycle and cell death, stromal cell differentiation, extracellular matrix (ECM) components, and ribosomal activity. Based on these results, we propose non-syndromic craniosynostosis cases can be classified by differences in their gene expression patterns and that these may provide targets for future clinical intervention.

  17. Characterization of Distinct Classes of Differential Gene Expression in Osteoblast Cultures from Non-Syndromic Craniosynostosis Bone

    PubMed Central

    Rojas-Peña, Monica L.; Olivares-Navarrete, Rene; Hyzy, Sharon; Arafat, Dalia; Schwartz, Zvi; Boyan, Barbara D.; Williams, Joseph; Gibson, Greg

    2014-01-01

    Craniosynostosis, the premature fusion of one or more skull sutures, occurs in approximately 1 in 2500 infants, with the majority of cases non-syndromic and of unknown etiology. Two common reasons proposed for premature suture fusion are abnormal compression forces on the skull and rare genetic abnormalities. Our goal was to evaluate whether different sub-classes of disease can be identified based on total gene expression profiles. RNA-Seq data were obtained from 31 human osteoblast cultures derived from bone biopsy samples collected between 2009 and 2011, representing 23 craniosynostosis fusions and 8 normal cranial bones or long bones. No differentiation between regions of the skull was detected, but variance component analysis of gene expression patterns nevertheless supports transcriptome-based classification of craniosynostosis. Cluster analysis showed 4 distinct groups of samples; 1 predominantly normal and 3 craniosynostosis subtypes. Similar constellations of sub-types were also observed upon re-analysis of a similar dataset of 199 calvarial osteoblast cultures. Annotation of gene function of differentially expressed transcripts strongly implicates physiological differences with respect to cell cycle and cell death, stromal cell differentiation, extracellular matrix (ECM) components, and ribosomal activity. Based on these results, we propose non-syndromic craniosynostosis cases can be classified by differences in their gene expression patterns and that these may provide targets for future clinical intervention. PMID:25184005

  18. Combined data mining/NIR spectroscopy for purity assessment of lime juice

    NASA Astrophysics Data System (ADS)

    Shafiee, Sahameh; Minaei, Saeid

    2018-06-01

    This paper reports the data mining study on the NIR spectrum of lime juice samples to determine their purity (natural or synthetic). NIR spectra for 72 pure and synthetic lime juice samples were recorded in reflectance mode. Sample outliers were removed using PCA analysis. Different data mining techniques for feature selection (Genetic Algorithm (GA)) and classification (including the radial basis function (RBF) network, Support Vector Machine (SVM), and Random Forest (RF) tree) were employed. Based on the results, SVM proved to be the most accurate classifier as it achieved the highest accuracy (97%) using the raw spectrum information. The classifier accuracy dropped to 93% when selected feature vector by GA search method was applied as classifier input. It can be concluded that some relevant features which produce good performance with the SVM classifier are removed by feature selection. Also, reduced spectra using PCA do not show acceptable performance (total accuracy of 66% by RBFNN), which indicates that dimensional reduction methods such as PCA do not always lead to more accurate results. These findings demonstrate the potential of data mining combination with near-infrared spectroscopy for monitoring lime juice quality in terms of natural or synthetic nature.

  19. Fuzzy Nonlinear Proximal Support Vector Machine for Land Extraction Based on Remote Sensing Image

    PubMed Central

    Zhong, Xiaomei; Li, Jianping; Dou, Huacheng; Deng, Shijun; Wang, Guofei; Jiang, Yu; Wang, Yongjie; Zhou, Zebing; Wang, Li; Yan, Fei

    2013-01-01

    Currently, remote sensing technologies were widely employed in the dynamic monitoring of the land. This paper presented an algorithm named fuzzy nonlinear proximal support vector machine (FNPSVM) by basing on ETM+ remote sensing image. This algorithm is applied to extract various types of lands of the city Da’an in northern China. Two multi-category strategies, namely “one-against-one” and “one-against-rest” for this algorithm were described in detail and then compared. A fuzzy membership function was presented to reduce the effects of noises or outliers on the data samples. The approaches of feature extraction, feature selection, and several key parameter settings were also given. Numerous experiments were carried out to evaluate its performances including various accuracies (overall accuracies and kappa coefficient), stability, training speed, and classification speed. The FNPSVM classifier was compared to the other three classifiers including the maximum likelihood classifier (MLC), back propagation neural network (BPN), and the proximal support vector machine (PSVM) under different training conditions. The impacts of the selection of training samples, testing samples and features on the four classifiers were also evaluated in these experiments. PMID:23936016

  20. Generative Models for Similarity-based Classification

    DTIC Science & Technology

    2007-01-01

    NC), local nearest centroid (local NC), k-nearest neighbors ( kNN ), and condensed nearest neighbors (CNN) are all similarity-based classifiers which...vector machine to the k nearest neighbors of the test sample [80]. The SVM- KNN method was developed to address the robustness and dimensionality...concerns that afflict nearest neighbors and SVMs. Similarly to the nearest-means classifier, the SVM- KNN is a hybrid local and global classifier developed

  1. Students' Conscious Unknowns about Artefacts and Natural Objects

    ERIC Educational Resources Information Center

    Vaz-Rebelo, Piedade; Fernandes, Paula; Morgado, Julia; Monteiro, António; Otero, José

    2016-01-01

    This study attempts to characterise what 7th- and 12th-grade students believe they do not know about artefacts and natural objects, as well as the dependence of what is unknown on a knowledge of these objects. The students were asked to make explicit through questioning what they did not know about a sample of objects. The unknowns generated were…

  2. Walking Objectively Measured: Classifying Accelerometer Data with GPS and Travel Diaries

    PubMed Central

    Kang, Bumjoon; Moudon, Anne V.; Hurvitz, Philip M.; Reichley, Lucas; Saelens, Brian E.

    2013-01-01

    Purpose This study developed and tested an algorithm to classify accelerometer data as walking or non-walking using either GPS or travel diary data within a large sample of adults under free-living conditions. Methods Participants wore an accelerometer and a GPS unit, and concurrently completed a travel diary for 7 consecutive days. Physical activity (PA) bouts were identified using accelerometry count sequences. PA bouts were then classified as walking or non-walking based on a decision-tree algorithm consisting of 7 classification scenarios. Algorithm reliability was examined relative to two independent analysts’ classification of a 100-bout verification sample. The algorithm was then applied to the entire set of PA bouts. Results The 706 participants’ (mean age 51 years, 62% female, 80% non-Hispanic white, 70% college graduate or higher) yielded 4,702 person-days of data and had a total of 13,971 PA bouts. The algorithm showed a mean agreement of 95% with the independent analysts. It classified physical activity into 8,170 (58.5 %) walking bouts and 5,337 (38.2%) non-walking bouts; 464 (3.3%) bouts were not classified for lack of GPS and diary data. Nearly 70% of the walking bouts and 68% of the non-walking bouts were classified using only the objective accelerometer and GPS data. Travel diary data helped classify 30% of all bouts with no GPS data. The mean duration of PA bouts classified as walking was 15.2 min (SD=12.9). On average, participants had 1.7 walking bouts and 25.4 total walking minutes per day. Conclusions GPS and travel diary information can be helpful in classifying most accelerometer-derived PA bouts into walking or non-walking behavior. PMID:23439414

  3. Multivariate analysis of remote LIBS spectra using partial least squares, principal component analysis, and related techniques

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Clegg, Samuel M; Barefield, James E; Wiens, Roger C

    2008-01-01

    Quantitative analysis with LIBS traditionally employs calibration curves that are complicated by the chemical matrix effects. These chemical matrix effects influence the LIBS plasma and the ratio of elemental composition to elemental emission line intensity. Consequently, LIBS calibration typically requires a priori knowledge of the unknown, in order for a series of calibration standards similar to the unknown to be employed. In this paper, three new Multivariate Analysis (MV A) techniques are employed to analyze the LIBS spectra of 18 disparate igneous and highly-metamorphosed rock samples. Partial Least Squares (PLS) analysis is used to generate a calibration model from whichmore » unknown samples can be analyzed. Principal Components Analysis (PCA) and Soft Independent Modeling of Class Analogy (SIMCA) are employed to generate a model and predict the rock type of the samples. These MV A techniques appear to exploit the matrix effects associated with the chemistries of these 18 samples.« less

  4. Discrimination of almonds (Prunus dulcis) geographical origin by minerals and fatty acids profiling.

    PubMed

    Amorello, Diana; Orecchio, Santino; Pace, Andrea; Barreca, Salvatore

    2016-09-01

    Twenty-one almond samples from three different geographical origins (Sicily, Spain and California) were investigated by determining minerals and fatty acids compositions. Data were used to discriminate by chemometry almond origin by linear discriminant analysis. With respect to previous PCA profiling studies, this work provides a simpler analytical protocol for the identification of almonds geographical origin. Classification by using mineral contents data only was correct in 77% of the samples, while, by using fatty acid profiles, the percentages of samples correctly classified reached 82%. The coupling of mineral contents and fatty acid profiles lead to an increased efficiency of the classification with 87% of samples correctly classified.

  5. Cancer classification through filtering progressive transductive support vector machine based on gene expression data

    NASA Astrophysics Data System (ADS)

    Lu, Xinguo; Chen, Dan

    2017-08-01

    Traditional supervised classifiers neglect a large amount of data which not have sufficient follow-up information, only work with labeled data. Consequently, the small sample size limits the advancement of design appropriate classifier. In this paper, a transductive learning method which combined with the filtering strategy in transductive framework and progressive labeling strategy is addressed. The progressive labeling strategy does not need to consider the distribution of labeled samples to evaluate the distribution of unlabeled samples, can effective solve the problem of evaluate the proportion of positive and negative samples in work set. Our experiment result demonstrate that the proposed technique have great potential in cancer prediction based on gene expression.

  6. Machine learning-based patient specific prompt-gamma dose monitoring in proton therapy

    NASA Astrophysics Data System (ADS)

    Gueth, P.; Dauvergne, D.; Freud, N.; Létang, J. M.; Ray, C.; Testa, E.; Sarrut, D.

    2013-07-01

    Online dose monitoring in proton therapy is currently being investigated with prompt-gamma (PG) devices. PG emission was shown to be correlated with dose deposition. This relationship is mostly unknown under real conditions. We propose a machine learning approach based on simulations to create optimized treatment-specific classifiers that detect discrepancies between planned and delivered dose. Simulations were performed with the Monte-Carlo platform Gate/Geant4 for a spot-scanning proton therapy treatment and a PG camera prototype currently under investigation. The method first builds a learning set of perturbed situations corresponding to a range of patient translation. This set is then used to train a combined classifier using distal falloff and registered correlation measures. Classifier performances were evaluated using receiver operating characteristic curves and maximum associated specificity and sensitivity. A leave-one-out study showed that it is possible to detect discrepancies of 5 mm with specificity and sensitivity of 85% whereas using only distal falloff decreases the sensitivity down to 77% on the same data set. The proposed method could help to evaluate performance and to optimize the design of PG monitoring devices. It is generic: other learning sets of deviations, other measures and other types of classifiers could be studied to potentially reach better performance. At the moment, the main limitation lies in the computation time needed to perform the simulations.

  7. An Investigation to Improve Classifier Accuracy for Myo Collected Data

    DTIC Science & Technology

    2017-02-01

    distribution is unlimited. 13. SUPPLEMENTARY NOTES 14. ABSTRACT A naïve Bayes classifier trained with 1,360 samples from 17 volunteers performs at...movement data from 17 volunteers . Each volunteer performed 8 gestures (Freeze, Rally Point, Hurry Up, Down, Come, Stop, Line Abreast Formation, and Vehicle...line chart was plotted for each gesture’s feature (e.g., Pitch, xAcc) per user. All 10 recorded samples of a particular gesture for a single volunteer

  8. Research of mine water source identification based on LIF technology

    NASA Astrophysics Data System (ADS)

    Zhou, Mengran; Yan, Pengcheng

    2016-09-01

    According to the problem that traditional chemical methods to the mine water source identification takes a long time, put forward a method for rapid source identification system of mine water inrush based on the technology of laser induced fluorescence (LIF). Emphatically analyzes the basic principle of LIF technology. The hardware composition of LIF system are analyzed and the related modules were selected. Through the fluorescence experiment with the water samples of coal mine in the LIF system, fluorescence spectra of water samples are got. Traditional water source identification mainly according to the ion concentration representative of the water, but it is hard to analysis the ion concentration of the water from the fluorescence spectra. This paper proposes a simple and practical method of rapid identification of water by fluorescence spectrum, which measure the space distance between unknown water samples and standard samples, and then based on the clustering analysis, the category of the unknown water sample can be get. Water source identification for unknown samples verified the reliability of the LIF system, and solve the problem that the current coal mine can't have a better real-time and online monitoring on water inrush, which is of great significance for coal mine safety in production.

  9. Characterization of pathogenic SORL1 genetic variants for association with Alzheimer’s disease: a clinical interpretation strategy

    PubMed Central

    Holstege, Henne; van der Lee, Sven J; Hulsman, Marc; Wong, Tsz Hang; van Rooij, Jeroen GJ; Weiss, Marjan; Louwersheimer, Eva; Wolters, Frank J; Amin, Najaf; Uitterlinden, André G; Hofman, Albert; Ikram, M Arfan; van Swieten, John C; Meijers-Heijboer, Hanne; van der Flier, Wiesje M; Reinders, Marcel JT; van Duijn, Cornelia M; Scheltens, Philip

    2017-01-01

    Accumulating evidence suggests that genetic variants in the SORL1 gene are associated with Alzheimer disease (AD), but a strategy to identify which variants are pathogenic is lacking. In a discovery sample of 115 SORL1 variants detected in 1908 Dutch AD cases and controls, we identified the variant characteristics associated with SORL1 variant pathogenicity. Findings were replicated in an independent sample of 103 SORL1 variants detected in 3193 AD cases and controls. In a combined sample of the discovery and replication samples, comprising 181 unique SORL1 variants, we developed a strategy to classify SORL1 variants into five subtypes ranging from pathogenic to benign. We tested this pathogenicity screen in SORL1 variants reported in two independent published studies. SORL1 variant pathogenicity is defined by the Combined Annotation Dependent Depletion (CADD) score and the minor allele frequency (MAF) reported by the Exome Aggregation Consortium (ExAC) database. Variants predicted strongly damaging (CADD score >30), which are extremely rare (ExAC-MAF <1 × 10−5) increased AD risk by 12-fold (95% CI 4.2–34.3; P=5 × 10−9). Protein-truncating SORL1 mutations were all unknown to ExAC and occurred exclusively in AD cases. More common SORL1 variants (ExAC-MAF≥1 × 10−5) were not associated with increased AD risk, even when predicted strongly damaging. Findings were independent of gender and the APOE-ε4 allele. High-risk SORL1 variants were observed in a substantial proportion of the AD cases analyzed (2%). Based on their effect size, we propose to consider high-risk SORL1 variants next to variants in APOE, PSEN1, PSEN2 and APP for personalized risk assessments in clinical practice. PMID:28537274

  10. Probabilistic images (PBIS): A concise image representation technique for multiple parameters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, L.C.; Yeh, S.H.; Chen, Z.

    1984-01-01

    Based on m parametric images (PIs) derived from a dynamic series (DS), each pixel of DS is regarded as an m-dimensional vector. Given one set of normal samples (pixels) N and another of abnormal samples A, probability density functions (pdfs) of both sets are estimated. Any unknown sample is classified into N or A by calculating the probability of its being in the abnormal set using the Bayes' theorem. Instead of estimating the multivariate pdfs, a distance ratio transformation is introduced to map the m-dimensional sample space to one dimensional Euclidean space. Consequently, the image that localizes the regional abnormalitiesmore » is characterized by the probability of being abnormal. This leads to the new representation scheme of PBIs. Tc-99m HIDA study for detecting intrahepatic lithiasis (IL) was chosen as an example of constructing PBI from 3 parameters derived from DS and such a PBI was compared with those 3 PIs, namely, retention ratio image (RRI), peak time image (TNMAX) and excretion mean transit time image (EMTT). 32 normal subjects and 20 patients with proved IL were collected and analyzed. The resultant sensitivity and specificity of PBI were 97% and 98% respectively. They were superior to those of any of the 3 PIs: RRI (94/97), TMAX (86/88) and EMTT (94/97). Furthermore, the contrast of PBI was much better than that of any other image. This new image formation technique, based on multiple parameters, shows the functional abnormalities in a structural way. Its good contrast makes the interpretation easy. This technique is powerful compared to the existing parametric image method.« less

  11. Distribution of Blastocystis subtypes isolated from humans from an urban community in Rio de Janeiro, Brazil.

    PubMed

    Valença Barbosa, Carolina; de Jesus Batista, Rosemary; Pereira Igreja, Ricardo; d'Avila Levy, Claudia Masini; Werneck de Macedo, Heloisa; Carneiro Santos, Helena Lúcia

    2017-10-25

    Blastocystis is a cosmopolitan protist parasite found in the human gastrointestinal tract and is highly prevalent in developing countries. Recent molecular studies have revealed extensive genetic diversity, which has been classified into different subtypes (STs) based on sequence analysis of small subunit ribosomal RNA gene. Blastocystis is one of the most common fecal parasites in Brazil, but the diversity of subtypes remains unknown in the country. This study aimed to determine the distribution of Blastocystis STs in an urban community in Duque de Caxias, Rio de Janeiro, Brazil. A total of 64 stool samples positive for Blastocystis in Pavlova's medium were subtyped by PCR and sequenced using primers targeting the small subunit rRNA gene, in addition to phylogenetic analysis and subtype-specific PCR using sequence-tagged-site (STS) primers. Endolimax nana (14%), Entamoeba complex (10.5%), Taenia sp. (0.6%), Trichuris trichiura (1.3%) and Enterobius vermicularis (1.3%) were detected in Blastocystis-positive samples. Of the 64 samples tested by PCR/DNA sequencing, 55 were identified as ST1 (42%), ST3 (49%), ST2 (7%) and ST4 (2%), and the presence of mixed ST (ST1 + ST3) infection was detected in nine samples (14%). DNA sequencing and phylogenetic analysis of Brazilian Blastocystis isolates identified four different subtypes. To our knowledge, this study provided the first genetic characterization of Blastocystis subtypes in an urban area of Rio de Janeiro, Brazil. We also identified ST4 for the first time in Brazil. Further studies are necessary to determine the distribution of STs across human populations in Rio de Janeiro.

  12. Classifying a Smoker Scale in Adult Daily and Nondaily Smokers

    PubMed Central

    2014-01-01

    Introduction: Smoker identity, or the strength of beliefs about oneself as a smoker, is a robust marker of smoking behavior. However, many nondaily smokers do not identify as smokers, underestimating their risk for tobacco-related disease and resulting in missed intervention opportunities. Assessing underlying beliefs about characteristics used to classify smokers may help explain the discrepancy between smoking behavior and smoker identity. This study examines the factor structure, reliability, and validity of the Classifying a Smoker scale among a racially diverse sample of adult smokers. Methods: A cross-sectional survey was administered through an online panel survey service to 2,376 current smokers who were at least 25 years of age. The sample was stratified to obtain equal numbers of 3 racial/ethnic groups (African American, Latino, and White) across smoking level (nondaily and daily smoking). Results: The Classifying a Smoker scale displayed a single factor structure and excellent internal consistency (α = .91). Classifying a Smoker scores significantly increased at each level of smoking, F(3,2375) = 23.68, p < .0001. Those with higher scores had a stronger smoker identity, stronger dependence on cigarettes, greater health risk perceptions, more smoking friends, and were more likely to carry cigarettes. Classifying a Smoker scores explained unique variance in smoking variables above and beyond that explained by smoker identity. Conclusions: The present study supports the use of the Classifying a Smoker scale among diverse, experienced smokers. Stronger endorsement of characteristics used to classify a smoker (i.e., stricter criteria) was positively associated with heavier smoking and related characteristics. Prospective studies are needed to inform prevention and treatment efforts. PMID:24297807

  13. Nonparametric, Coupled ,Bayesian ,Dictionary ,and Classifier Learning for Hyperspectral Classification.

    PubMed

    Akhtar, Naveed; Mian, Ajmal

    2017-10-03

    We present a principled approach to learn a discriminative dictionary along a linear classifier for hyperspectral classification. Our approach places Gaussian Process priors over the dictionary to account for the relative smoothness of the natural spectra, whereas the classifier parameters are sampled from multivariate Gaussians. We employ two Beta-Bernoulli processes to jointly infer the dictionary and the classifier. These processes are coupled under the same sets of Bernoulli distributions. In our approach, these distributions signify the frequency of the dictionary atom usage in representing class-specific training spectra, which also makes the dictionary discriminative. Due to the coupling between the dictionary and the classifier, the popularity of the atoms for representing different classes gets encoded into the classifier. This helps in predicting the class labels of test spectra that are first represented over the dictionary by solving a simultaneous sparse optimization problem. The labels of the spectra are predicted by feeding the resulting representations to the classifier. Our approach exploits the nonparametric Bayesian framework to automatically infer the dictionary size--the key parameter in discriminative dictionary learning. Moreover, it also has the desirable property of adaptively learning the association between the dictionary atoms and the class labels by itself. We use Gibbs sampling to infer the posterior probability distributions over the dictionary and the classifier under the proposed model, for which, we derive analytical expressions. To establish the effectiveness of our approach, we test it on benchmark hyperspectral images. The classification performance is compared with the state-of-the-art dictionary learning-based classification methods.

  14. Unresolved Galaxy Classifier for ESA/Gaia mission: Support Vector Machines approach

    NASA Astrophysics Data System (ADS)

    Bellas-Velidis, Ioannis; Kontizas, Mary; Dapergolas, Anastasios; Livanou, Evdokia; Kontizas, Evangelos; Karampelas, Antonios

    A software package Unresolved Galaxy Classifier (UGC) is being developed for the ground-based pipeline of ESA's Gaia mission. It aims to provide an automated taxonomic classification and specific parameters estimation analyzing Gaia BP/RP instrument low-dispersion spectra of unresolved galaxies. The UGC algorithm is based on a supervised learning technique, the Support Vector Machines (SVM). The software is implemented in Java as two separate modules. An offline learning module provides functions for SVM-models training. Once trained, the set of models can be repeatedly applied to unknown galaxy spectra by the pipeline's application module. A library of galaxy models synthetic spectra, simulated for the BP/RP instrument, is used to train and test the modules. Science tests show a very good classification performance of UGC and relatively good regression performance, except for some of the parameters. Possible approaches to improve the performance are discussed.

  15. Mapping forest vegetation with ERTS-1 MSS data and automatic data processing techniques

    NASA Technical Reports Server (NTRS)

    Messmore, J.; Copeland, G. E.; Levy, G. F.

    1975-01-01

    This study was undertaken with the intent of elucidating the forest mapping capabilities of ERTS-1 MSS data when analyzed with the aid of LARS' automatic data processing techniques. The site for this investigation was the Great Dismal Swamp, a 210,000 acre wilderness area located on the Middle Atlantic coastal plain. Due to inadequate ground truth information on the distribution of vegetation within the swamp, an unsupervised classification scheme was utilized. Initially pictureprints, resembling low resolution photographs, were generated in each of the four ERTS-1 channels. Data found within rectangular training fields was then clustered into 13 spectral groups and defined statistically. Using a maximum likelihood classification scheme, the unknown data points were subsequently classified into one of the designated training classes. Training field data was classified with a high degree of accuracy (greater than 95%), and progress is being made towards identifying the mapped spectral classes.

  16. Mapping forest vegetation with ERTS-1 MSS data and automatic data processing techniques

    NASA Technical Reports Server (NTRS)

    Messmore, J.; Copeland, G. E.; Levy, G. F.

    1975-01-01

    This study was undertaken with the intent of elucidating the forest mapping capabilities of ERTS-1 MSS data when analyzed with the aid of LARS' automatic data processing techniques. The site for this investigation was the Great Dismal Swamp, a 210,000 acre wilderness area located on the Middle Atlantic coastal plain. Due to inadequate ground truth information on the distribution of vegetation within the swamp, an unsupervised classification scheme was utilized. Initially pictureprints, resembling low resolution photographs, were generated in each of the four ERTS-1 channels. Data found within rectangular training fields was then clustered into 13 spectral groups and defined statistically. Using a maximum likelihood classification scheme, the unknown data points were subsequently classified into one of the designated training classes. Training field data was classified with a high degree of accuracy (greater than 95 percent), and progress is being made towards identifying the mapped spectral classes.

  17. Mining sequential patterns for protein fold recognition.

    PubMed

    Exarchos, Themis P; Papaloukas, Costas; Lampros, Christos; Fotiadis, Dimitrios I

    2008-02-01

    Protein data contain discriminative patterns that can be used in many beneficial applications if they are defined correctly. In this work sequential pattern mining (SPM) is utilized for sequence-based fold recognition. Protein classification in terms of fold recognition plays an important role in computational protein analysis, since it can contribute to the determination of the function of a protein whose structure is unknown. Specifically, one of the most efficient SPM algorithms, cSPADE, is employed for the analysis of protein sequence. A classifier uses the extracted sequential patterns to classify proteins in the appropriate fold category. For training and evaluating the proposed method we used the protein sequences from the Protein Data Bank and the annotation of the SCOP database. The method exhibited an overall accuracy of 25% in a classification problem with 36 candidate categories. The classification performance reaches up to 56% when the five most probable protein folds are considered.

  18. Spontaneous tumours in captive African hedgehogs (Atelerix albiventris): a retrospective study.

    PubMed

    Raymond, J T; Garner, M M

    2001-01-01

    Forty tumours were diagnosed in 35 (53%) of 66 captive African hedgehogs documented at Northwest ZooPath (NZP) between 1994 and 1999. Three hedgehogs had more than one type of tumour and the remaining 32 had a single type. Of the 35 hedgehogs with tumours, 14 were female, 11 were male, and 10 were of unknown gender; 21 were from zoological parks and 14 were privately owned. Twenty of the hedgehogs with tumours were adult (>1 year old) with a median age of 3.5 years (range 2-5.5 years); 15, of unreported age, were classified as adult. Thirty-four (85%) of the 40 tumours were classified as malignant and six (15%) as benign. The integumentary, haemolymphatic, digestive and endocrine systems were common sites for tumours. The most common tumours were mammary gland adenocarcinoma, lympho-sarcoma and oral squamous cell carcinoma. Copyright Harcourt Publishers Ltd.

  19. A Hierarchical Feature and Sample Selection Framework and Its Application for Alzheimer’s Disease Diagnosis

    NASA Astrophysics Data System (ADS)

    An, Le; Adeli, Ehsan; Liu, Mingxia; Zhang, Jun; Lee, Seong-Whan; Shen, Dinggang

    2017-03-01

    Classification is one of the most important tasks in machine learning. Due to feature redundancy or outliers in samples, using all available data for training a classifier may be suboptimal. For example, the Alzheimer’s disease (AD) is correlated with certain brain regions or single nucleotide polymorphisms (SNPs), and identification of relevant features is critical for computer-aided diagnosis. Many existing methods first select features from structural magnetic resonance imaging (MRI) or SNPs and then use those features to build the classifier. However, with the presence of many redundant features, the most discriminative features are difficult to be identified in a single step. Thus, we formulate a hierarchical feature and sample selection framework to gradually select informative features and discard ambiguous samples in multiple steps for improved classifier learning. To positively guide the data manifold preservation process, we utilize both labeled and unlabeled data during training, making our method semi-supervised. For validation, we conduct experiments on AD diagnosis by selecting mutually informative features from both MRI and SNP, and using the most discriminative samples for training. The superior classification results demonstrate the effectiveness of our approach, as compared with the rivals.

  20. Adverse events following yellow fever preventive vaccination campaigns in eight African countries from 2007 to 2010.

    PubMed

    Breugelmans, J G; Lewis, R F; Agbenu, E; Veit, O; Jackson, D; Domingo, C; Böthe, M; Perea, W; Niedrig, M; Gessner, B D; Yactayo, S

    2013-04-03

    Serious, but rare adverse events following immunization (AEFI) have been reported with yellow fever (YF) 17D vaccine, including severe allergic reactions, YF vaccine-associated neurologic disease (YEL-AND) and YF vaccine-associated viscerotropic disease (YEL-AVD). The frequency with which YEL-AND and YEL-AVD occur in YF endemic countries is mostly unknown. From 2007 to 2010, eight African countries - Benin, Cameroon, Guinea, Liberia, Mali, Senegal, Sierra Leone, and Togo- implemented large-scale YF preventive vaccination campaigns. Each country established vaccine pharmacovigilance systems that included standard case definitions, procedures to collect and transport biological specimens, and National Expert Committees to review data and classify cases. Staff in all countries received training and laboratory capacity expanded. In total, just over 38 million people were vaccinated against YF and 3116 AEFIs were reported of which 164 (5%) were classified as serious. Of these, 22 (13%) were classified as YF vaccine reactions, including 11 (50%) hypersensitivity reactions, six (27%) suspected YEL-AND, and five (23%) suspected YEL-AVD. The incidence per 100,000 vaccine doses administered was 8.2 for all reported AEFIs, 0.43 for any serious AEFI, 0.058 for YF vaccine related AEFIs, 0.029 for hypersensitivity reactions, 0.016 for YEL-AND, and 0.013 for YEL-AVD. Our findings were limited by operational challenges, including difficulties in obtaining recommended biological specimens leading to incomplete laboratory evaluation, unknown case ascertainment, and variable levels of staff training and experience. Despite limitations, active case-finding in the eight different countries did not find an incidence of YF vaccine associated AEFIs that was higher than previous reports. These data reinforce the safety profile of YF vaccine and support the continued use of attenuated YF vaccine during preventive mass vaccination campaigns in YF endemic areas. Copyright © 2013 Elsevier Ltd. All rights reserved.

  1. Publications - GMC 337 | Alaska Division of Geological & Geophysical

    Science.gov Websites

    (102.6'-1142.5') from the US Navy Sentinel Hill Core Test #1 Authors: Unknown Publication Date: Dec 2006 Reference Unknown, 2006, Palynology and micropaleontological evaluation of core samples (102.6'-1142.5

  2. Discriminant Analysis of Defective and Non-Defective Field Pea (Pisum sativum L.) into Broad Market Grades Based on Digital Image Features.

    PubMed

    McDonald, Linda S; Panozzo, Joseph F; Salisbury, Phillip A; Ford, Rebecca

    2016-01-01

    Field peas (Pisum sativum L.) are generally traded based on seed appearance, which subjectively defines broad market-grades. In this study, we developed an objective Linear Discriminant Analysis (LDA) model to classify market grades of field peas based on seed colour, shape and size traits extracted from digital images. Seeds were imaged in a high-throughput system consisting of a camera and laser positioned over a conveyor belt. Six colour intensity digital images were captured (under 405, 470, 530, 590, 660 and 850nm light) for each seed, and surface height was measured at each pixel by laser. Colour, shape and size traits were compiled across all seed in each sample to determine the median trait values. Defective and non-defective seed samples were used to calibrate and validate the model. Colour components were sufficient to correctly classify all non-defective seed samples into correct market grades. Defective samples required a combination of colour, shape and size traits to achieve 87% and 77% accuracy in market grade classification of calibration and validation sample-sets respectively. Following these results, we used the same colour, shape and size traits to develop an LDA model which correctly classified over 97% of all validation samples as defective or non-defective.

  3. Discriminant Analysis of Defective and Non-Defective Field Pea (Pisum sativum L.) into Broad Market Grades Based on Digital Image Features

    PubMed Central

    McDonald, Linda S.; Panozzo, Joseph F.; Salisbury, Phillip A.; Ford, Rebecca

    2016-01-01

    Field peas (Pisum sativum L.) are generally traded based on seed appearance, which subjectively defines broad market-grades. In this study, we developed an objective Linear Discriminant Analysis (LDA) model to classify market grades of field peas based on seed colour, shape and size traits extracted from digital images. Seeds were imaged in a high-throughput system consisting of a camera and laser positioned over a conveyor belt. Six colour intensity digital images were captured (under 405, 470, 530, 590, 660 and 850nm light) for each seed, and surface height was measured at each pixel by laser. Colour, shape and size traits were compiled across all seed in each sample to determine the median trait values. Defective and non-defective seed samples were used to calibrate and validate the model. Colour components were sufficient to correctly classify all non-defective seed samples into correct market grades. Defective samples required a combination of colour, shape and size traits to achieve 87% and 77% accuracy in market grade classification of calibration and validation sample-sets respectively. Following these results, we used the same colour, shape and size traits to develop an LDA model which correctly classified over 97% of all validation samples as defective or non-defective. PMID:27176469

  4. Interpretation of standard leaching test BS EN 12457-2: is your sample hazardous or inert?

    PubMed

    Zandi, Mohammad; Russell, Nigel V; Edyvean, Robert G J; Hand, Russell J; Ward, Philip

    2007-12-01

    A slag sample from a lead refiner has been obtained and given to two analytical laboratories to determine the release of trace elements from the sample according to BS EN 12457-2. Samples analysed by one laboratory passed waste acceptance criteria, leading it to be classified as an inert material; samples of the same material analysed by the other laboratory failed waste acceptance criteria and were classified as hazardous. It was found that the sample preparation procedure is the critical step in the leaching analysis and that the effects of particle size on leachability should be taken into account when using this standard. The purpose of this paper is to open a debate on designing a better defined standard leaching test and making current waste acceptance criteria more flexible.

  5. Extracting scene feature vectors through modeling, volume 3

    NASA Technical Reports Server (NTRS)

    Berry, J. K.; Smith, J. A.

    1976-01-01

    The remote estimation of the leaf area index of winter wheat at Finney County, Kansas was studied. The procedure developed consists of three activities: (1) field measurements; (2) model simulations; and (3) response classifications. The first activity is designed to identify model input parameters and develop a model evaluation data set. A stochastic plant canopy reflectance model is employed to simulate reflectance in the LANDSAT bands as a function of leaf area index for two phenological stages. An atmospheric model is used to translate these surface reflectances into simulated satellite radiance. A divergence classifier determines the relative similarity between model derived spectral responses and those of areas with unknown leaf area index. The unknown areas are assigned the index associated with the closest model response. This research demonstrated that the SRVC canopy reflectance model is appropriate for wheat scenes and that broad categories of leaf area index can be inferred from the procedure developed.

  6. A method for detecting fungal contaminants in wall cavities.

    PubMed

    Spurgeon, Joe C

    2003-01-01

    This article describes a practical method for detecting the presence of both fungal spores and culturable fungi in wall cavities. Culturable fungi were collected in 25 mm cassettes containing 0.8 microm mixed cellulose ester filters using aggressive sampling conditions. Both culturable fungi and fungal spores were collected in modified slotted-disk cassettes. The sample volume was 4 L. The filters were examined microscopically and dilution plated onto multiple culture media. Collecting airborne samples in filter cassettes was an effective method for assessing wall cavities for fungal contaminants, especially because this method allowed the sample to be analyzed by both microscopy and culture media. Assessment criteria were developed that allowed the sample results to be used to classify wall cavities as either uncontaminated or contaminated. As a criterion, wall cavities with concentrations of culturable fungi below the limit of detection (LOD) were classified as uncontaminated, whereas those cavities with detectable concentrations of culturable fungi were classified as contaminated. A total of 150 wall cavities was sampled as part of a field project. The concentrations of culturable fungi were below the LOD in 34% of the samples, whereas Aspergillus and/or Penicillium were the only fungal genera detected in 69% of the samples in which culturable fungi were detected. Spore counting resulted in the detection of Stachybotrys-like spores in 25% of the samples that were analyzed, whereas Stachybotrys chartarum colonies were only detected on 2% of malt extract agar plates and on 6% of corn meal agar plates.

  7. Improving Classification of Cancer and Mining Biomarkers from Gene Expression Profiles Using Hybrid Optimization Algorithms and Fuzzy Support Vector Machine

    PubMed Central

    Moteghaed, Niloofar Yousefi; Maghooli, Keivan; Garshasbi, Masoud

    2018-01-01

    Background: Gene expression data are characteristically high dimensional with a small sample size in contrast to the feature size and variability inherent in biological processes that contribute to difficulties in analysis. Selection of highly discriminative features decreases the computational cost and complexity of the classifier and improves its reliability for prediction of a new class of samples. Methods: The present study used hybrid particle swarm optimization and genetic algorithms for gene selection and a fuzzy support vector machine (SVM) as the classifier. Fuzzy logic is used to infer the importance of each sample in the training phase and decrease the outlier sensitivity of the system to increase the ability to generalize the classifier. A decision-tree algorithm was applied to the most frequent genes to develop a set of rules for each type of cancer. This improved the abilities of the algorithm by finding the best parameters for the classifier during the training phase without the need for trial-and-error by the user. The proposed approach was tested on four benchmark gene expression profiles. Results: Good results have been demonstrated for the proposed algorithm. The classification accuracy for leukemia data is 100%, for colon cancer is 96.67% and for breast cancer is 98%. The results show that the best kernel used in training the SVM classifier is the radial basis function. Conclusions: The experimental results show that the proposed algorithm can decrease the dimensionality of the dataset, determine the most informative gene subset, and improve classification accuracy using the optimal parameters of the classifier with no user interface. PMID:29535919

  8. Fatty acid profiles as a potential lipidomic biomarker of exposure to brevetoxin for endangered Florida manatees (Trichechus manatus latirostris).

    PubMed

    Wetzel, Dana L; Reynolds, John E; Sprinkel, Jay M; Schwacke, Lori; Mercurio, Philip; Rommel, Sentiel A

    2010-11-15

    Fatty acid signature analysis (FASA) is an important tool by which marine mammal scientists gain insight into foraging ecology. Fatty acid profiles (resulting from FASA) represent a potential biomarker to assess exposure to natural and anthropogenic stressors. Florida manatees are well studied, and an excellent necropsy program provides a basis against which to assess this budding tool. Results using samples from 54 manatees assigned to four cause-of-death categories indicated that those animals exposed to or that died due to brevetoxin exposure (red tide, or RT samples) demonstrate a distinctive hepatic fatty acid profile. Discriminant function analysis indicated that hepatic fatty acids could be used to classify RT versus non-RT liver samples with reasonable certainty. A discriminant function was derived based on 8 fatty acids which correctly classified 100% of samples from a training dataset (10 RT and 25 non-RT) and 85% of samples in a cross-validation dataset (5 RT and 13 non-RT). Of the latter dataset, all RT samples were correctly classified, but two of thirteen non-RT samples were incorrectly classified. However, the "incorrect" samples came from manatees that died due to other causes during documented red tide outbreaks; thus although the proximal cause of death was due to watercraft collisions, exposure to brevetoxin may have affected these individuals in ways that increased their vulnerability. This use of FASA could: a) provide an additional forensic tool to help scientists and managers to understand cause of death or debilitation due to exposure to red tide in manatees; b) serve as a model that could be applied to studies to improve assessments of cause of death in other marine mammals; and c) be used, as in humans, to help diagnose metabolic disorders or disease states in manatees and other species. Copyright © 2010 Elsevier B.V. All rights reserved.

  9. Force Sensor Based Tool Condition Monitoring Using a Heterogeneous Ensemble Learning Model

    PubMed Central

    Wang, Guofeng; Yang, Yinwei; Li, Zhimeng

    2014-01-01

    Tool condition monitoring (TCM) plays an important role in improving machining efficiency and guaranteeing workpiece quality. In order to realize reliable recognition of the tool condition, a robust classifier needs to be constructed to depict the relationship between tool wear states and sensory information. However, because of the complexity of the machining process and the uncertainty of the tool wear evolution, it is hard for a single classifier to fit all the collected samples without sacrificing generalization ability. In this paper, heterogeneous ensemble learning is proposed to realize tool condition monitoring in which the support vector machine (SVM), hidden Markov model (HMM) and radius basis function (RBF) are selected as base classifiers and a stacking ensemble strategy is further used to reflect the relationship between the outputs of these base classifiers and tool wear states. Based on the heterogeneous ensemble learning classifier, an online monitoring system is constructed in which the harmonic features are extracted from force signals and a minimal redundancy and maximal relevance (mRMR) algorithm is utilized to select the most prominent features. To verify the effectiveness of the proposed method, a titanium alloy milling experiment was carried out and samples with different tool wear states were collected to build the proposed heterogeneous ensemble learning classifier. Moreover, the homogeneous ensemble learning model and majority voting strategy are also adopted to make a comparison. The analysis and comparison results show that the proposed heterogeneous ensemble learning classifier performs better in both classification accuracy and stability. PMID:25405514

  10. Force sensor based tool condition monitoring using a heterogeneous ensemble learning model.

    PubMed

    Wang, Guofeng; Yang, Yinwei; Li, Zhimeng

    2014-11-14

    Tool condition monitoring (TCM) plays an important role in improving machining efficiency and guaranteeing workpiece quality. In order to realize reliable recognition of the tool condition, a robust classifier needs to be constructed to depict the relationship between tool wear states and sensory information. However, because of the complexity of the machining process and the uncertainty of the tool wear evolution, it is hard for a single classifier to fit all the collected samples without sacrificing generalization ability. In this paper, heterogeneous ensemble learning is proposed to realize tool condition monitoring in which the support vector machine (SVM), hidden Markov model (HMM) and radius basis function (RBF) are selected as base classifiers and a stacking ensemble strategy is further used to reflect the relationship between the outputs of these base classifiers and tool wear states. Based on the heterogeneous ensemble learning classifier, an online monitoring system is constructed in which the harmonic features are extracted from force signals and a minimal redundancy and maximal relevance (mRMR) algorithm is utilized to select the most prominent features. To verify the effectiveness of the proposed method, a titanium alloy milling experiment was carried out and samples with different tool wear states were collected to build the proposed heterogeneous ensemble learning classifier. Moreover, the homogeneous ensemble learning model and majority voting strategy are also adopted to make a comparison. The analysis and comparison results show that the proposed heterogeneous ensemble learning classifier performs better in both classification accuracy and stability.

  11. Classification of biosensor time series using dynamic time warping: applications in screening cancer cells with characteristic biomarkers.

    PubMed

    Rai, Shesh N; Trainor, Patrick J; Khosravi, Farhad; Kloecker, Goetz; Panchapakesan, Balaji

    2016-01-01

    The development of biosensors that produce time series data will facilitate improvements in biomedical diagnostics and in personalized medicine. The time series produced by these devices often contains characteristic features arising from biochemical interactions between the sample and the sensor. To use such characteristic features for determining sample class, similarity-based classifiers can be utilized. However, the construction of such classifiers is complicated by the variability in the time domains of such series that renders the traditional distance metrics such as Euclidean distance ineffective in distinguishing between biological variance and time domain variance. The dynamic time warping (DTW) algorithm is a sequence alignment algorithm that can be used to align two or more series to facilitate quantifying similarity. In this article, we evaluated the performance of DTW distance-based similarity classifiers for classifying time series that mimics electrical signals produced by nanotube biosensors. Simulation studies demonstrated the positive performance of such classifiers in discriminating between time series containing characteristic features that are obscured by noise in the intensity and time domains. We then applied a DTW distance-based k -nearest neighbors classifier to distinguish the presence/absence of mesenchymal biomarker in cancer cells in buffy coats in a blinded test. Using a train-test approach, we find that the classifier had high sensitivity (90.9%) and specificity (81.8%) in differentiating between EpCAM-positive MCF7 cells spiked in buffy coats and those in plain buffy coats.

  12. [Leptospirosis in animal reproduction: III. Role of the hardjo serovar in bovine leptospirosis in Rio de Janeiro, Brazil].

    PubMed

    Lilenbaum, W; Dos Santos, M R

    1995-01-01

    Four hundred and five serum samples were drawn from cows with reproductive problems which were not vaccinated against leptospirosis from 21 dairy farms. Three distinct geographic regions were determined and the farms were also classified considering the production system, based on technological, zootechnical and sanitary resources. A total of 277 positive reactions were observed, corresponding to 68.39% of the samples. The predominant serovar was hardjo, reactive on 85 samples (20.98%), predominant on nine farms and observed on 17 farms (80.95%). It was observed the predominance of hardjo in all studied regions and on properties classified as type "A" (22 samples) and type "B" (49 samples). The role of this serovar on bovine leptospirosis in Brazil compared with other countries is discussed.

  13. Evaluation of a segment-based LANDSAT full-frame approach to corp area estimation

    NASA Technical Reports Server (NTRS)

    Bauer, M. E. (Principal Investigator); Hixson, M. M.; Davis, S. M.

    1981-01-01

    As the registration of LANDSAT full frames enters the realm of current technology, sampling methods should be examined which utilize other than the segment data used for LACIE. The effect of separating the functions of sampling for training and sampling for area estimation. The frame selected for analysis was acquired over north central Iowa on August 9, 1978. A stratification of he full-frame was defined. Training data came from segments within the frame. Two classification and estimation procedures were compared: statistics developed on one segment were used to classify that segment, and pooled statistics from the segments were used to classify a systematic sample of pixels. Comparisons to USDA/ESCS estimates illustrate that the full-frame sampling approach can provide accurate and precise area estimates.

  14. Constructing better classifier ensemble based on weighted accuracy and diversity measure.

    PubMed

    Zeng, Xiaodong; Wong, Derek F; Chao, Lidia S

    2014-01-01

    A weighted accuracy and diversity (WAD) method is presented, a novel measure used to evaluate the quality of the classifier ensemble, assisting in the ensemble selection task. The proposed measure is motivated by a commonly accepted hypothesis; that is, a robust classifier ensemble should not only be accurate but also different from every other member. In fact, accuracy and diversity are mutual restraint factors; that is, an ensemble with high accuracy may have low diversity, and an overly diverse ensemble may negatively affect accuracy. This study proposes a method to find the balance between accuracy and diversity that enhances the predictive ability of an ensemble for unknown data. The quality assessment for an ensemble is performed such that the final score is achieved by computing the harmonic mean of accuracy and diversity, where two weight parameters are used to balance them. The measure is compared to two representative measures, Kappa-Error and GenDiv, and two threshold measures that consider only accuracy or diversity, with two heuristic search algorithms, genetic algorithm, and forward hill-climbing algorithm, in ensemble selection tasks performed on 15 UCI benchmark datasets. The empirical results demonstrate that the WAD measure is superior to others in most cases.

  15. Classification of neocortical interneurons using affinity propagation.

    PubMed

    Santana, Roberto; McGarry, Laura M; Bielza, Concha; Larrañaga, Pedro; Yuste, Rafael

    2013-01-01

    In spite of over a century of research on cortical circuits, it is still unknown how many classes of cortical neurons exist. In fact, neuronal classification is a difficult problem because it is unclear how to designate a neuronal cell class and what are the best characteristics to define them. Recently, unsupervised classifications using cluster analysis based on morphological, physiological, or molecular characteristics, have provided quantitative and unbiased identification of distinct neuronal subtypes, when applied to selected datasets. However, better and more robust classification methods are needed for increasingly complex and larger datasets. Here, we explored the use of affinity propagation, a recently developed unsupervised classification algorithm imported from machine learning, which gives a representative example or exemplar for each cluster. As a case study, we applied affinity propagation to a test dataset of 337 interneurons belonging to four subtypes, previously identified based on morphological and physiological characteristics. We found that affinity propagation correctly classified most of the neurons in a blind, non-supervised manner. Affinity propagation outperformed Ward's method, a current standard clustering approach, in classifying the neurons into 4 subtypes. Affinity propagation could therefore be used in future studies to validly classify neurons, as a first step to help reverse engineer neural circuits.

  16. Constructing Better Classifier Ensemble Based on Weighted Accuracy and Diversity Measure

    PubMed Central

    Chao, Lidia S.

    2014-01-01

    A weighted accuracy and diversity (WAD) method is presented, a novel measure used to evaluate the quality of the classifier ensemble, assisting in the ensemble selection task. The proposed measure is motivated by a commonly accepted hypothesis; that is, a robust classifier ensemble should not only be accurate but also different from every other member. In fact, accuracy and diversity are mutual restraint factors; that is, an ensemble with high accuracy may have low diversity, and an overly diverse ensemble may negatively affect accuracy. This study proposes a method to find the balance between accuracy and diversity that enhances the predictive ability of an ensemble for unknown data. The quality assessment for an ensemble is performed such that the final score is achieved by computing the harmonic mean of accuracy and diversity, where two weight parameters are used to balance them. The measure is compared to two representative measures, Kappa-Error and GenDiv, and two threshold measures that consider only accuracy or diversity, with two heuristic search algorithms, genetic algorithm, and forward hill-climbing algorithm, in ensemble selection tasks performed on 15 UCI benchmark datasets. The empirical results demonstrate that the WAD measure is superior to others in most cases. PMID:24672402

  17. Skeletal muscle biopsy studies of cardiac patients.

    PubMed

    Fekete, G; Boros, Z; Cserhalmi, L; Apor, P

    1987-01-01

    Eleven patients diagnosed and treated for congestive cardiomyopathy (COCM) of unknown aetiology, and another 10 patients, with congestive alcoholic heart muscle disease (ACOCM) were studied. Muscle biopsy samples were obtained from the vastus lateralis (VL) and the gastrocnemius (G) muscles. In part of the sample muscle the fibre pattern was classified by means of ATPase activity staining, a technique based on the pH lability of the fibres concerned. Fibre typing and area measurements were carried out by light microscope. The other part of the sample was used as muscle homogenate of which the Ca2+-activated ATPase activity as well as citrate synthetase (CS) and aldolase activities were measured. No significant difference was found in these enzyme activities between the two groups of patients. The proportion of the slow twitch (ST) fibres in the VL, mainly in the patients with ACOCM, was lower as compared to data for healthy subjects. A similar tendency was revealed for G. In both muscles tested, the area of ST fibres was smaller in the ACOCM group. The fast twitch (FT) fibre area proved to be slightly different in the two groups of subjects tested. Occurrence of degenerative signs in the histological tests was higher in the ACOCM than in the COCM group. It was concluded that differences in the skeletal muscles of patients with ACOCM and COCM may primarily account for the alcoholism. The disease of the heart muscle has little effect on the function of skeletal muscle. Even so, a low amount or lack of physical activity may have an unfavourable influence on the skeletal muscles of patients with heart muscle disease.

  18. Classifying Radio Galaxies with the Convolutional Neural Network

    NASA Astrophysics Data System (ADS)

    Aniyan, A. K.; Thorat, K.

    2017-06-01

    We present the application of a deep machine learning technique to classify radio images of extended sources on a morphological basis using convolutional neural networks (CNN). In this study, we have taken the case of the Fanaroff-Riley (FR) class of radio galaxies as well as radio galaxies with bent-tailed morphology. We have used archival data from the Very Large Array (VLA)—Faint Images of the Radio Sky at Twenty Centimeters survey and existing visually classified samples available in the literature to train a neural network for morphological classification of these categories of radio sources. Our training sample size for each of these categories is ˜200 sources, which has been augmented by rotated versions of the same. Our study shows that CNNs can classify images of the FRI and FRII and bent-tailed radio galaxies with high accuracy (maximum precision at 95%) using well-defined samples and a “fusion classifier,” which combines the results of binary classifications, while allowing for a mechanism to find sources with unusual morphologies. The individual precision is highest for bent-tailed radio galaxies at 95% and is 91% and 75% for the FRI and FRII classes, respectively, whereas the recall is highest for FRI and FRIIs at 91% each, while the bent-tailed class has a recall of 79%. These results show that our results are comparable to that of manual classification, while being much faster. Finally, we discuss the computational and data-related challenges associated with the morphological classification of radio galaxies with CNNs.

  19. A machine learning approach for classification of anatomical coverage in CT

    NASA Astrophysics Data System (ADS)

    Wang, Xiaoyong; Lo, Pechin; Ramakrishna, Bharath; Goldin, Johnathan; Brown, Matthew

    2016-03-01

    Automatic classification of anatomical coverage of medical images is critical for big data mining and as a pre-processing step to automatically trigger specific computer aided diagnosis systems. The traditional way to identify scans through DICOM headers has various limitations due to manual entry of series descriptions and non-standardized naming conventions. In this study, we present a machine learning approach where multiple binary classifiers were used to classify different anatomical coverages of CT scans. A one-vs-rest strategy was applied. For a given training set, a template scan was selected from the positive samples and all other scans were registered to it. Each registered scan was then evenly split into k × k × k non-overlapping blocks and for each block the mean intensity was computed. This resulted in a 1 × k3 feature vector for each scan. The feature vectors were then used to train a SVM based classifier. In this feasibility study, four classifiers were built to identify anatomic coverages of brain, chest, abdomen-pelvis, and chest-abdomen-pelvis CT scans. Each classifier was trained and tested using a set of 300 scans from different subjects, composed of 150 positive samples and 150 negative samples. Area under the ROC curve (AUC) of the testing set was measured to evaluate the performance in a two-fold cross validation setting. Our results showed good classification performance with an average AUC of 0.96.

  20. Probabilistic classifiers with high-dimensional data

    PubMed Central

    Kim, Kyung In; Simon, Richard

    2011-01-01

    For medical classification problems, it is often desirable to have a probability associated with each class. Probabilistic classifiers have received relatively little attention for small n large p classification problems despite of their importance in medical decision making. In this paper, we introduce 2 criteria for assessment of probabilistic classifiers: well-calibratedness and refinement and develop corresponding evaluation measures. We evaluated several published high-dimensional probabilistic classifiers and developed 2 extensions of the Bayesian compound covariate classifier. Based on simulation studies and analysis of gene expression microarray data, we found that proper probabilistic classification is more difficult than deterministic classification. It is important to ensure that a probabilistic classifier is well calibrated or at least not “anticonservative” using the methods developed here. We provide this evaluation for several probabilistic classifiers and also evaluate their refinement as a function of sample size under weak and strong signal conditions. We also present a cross-validation method for evaluating the calibration and refinement of any probabilistic classifier on any data set. PMID:21087946

  1. Classifier utility modeling and analysis of hypersonic inlet start/unstart considering training data costs

    NASA Astrophysics Data System (ADS)

    Chang, Juntao; Hu, Qinghua; Yu, Daren; Bao, Wen

    2011-11-01

    Start/unstart detection is one of the most important issues of hypersonic inlets and is also the foundation of protection control of scramjet. The inlet start/unstart detection can be attributed to a standard pattern classification problem, and the training sample costs have to be considered for the classifier modeling as the CFD numerical simulations and wind tunnel experiments of hypersonic inlets both cost time and money. To solve this problem, the CFD simulation of inlet is studied at first step, and the simulation results could provide the training data for pattern classification of hypersonic inlet start/unstart. Then the classifier modeling technology and maximum classifier utility theories are introduced to analyze the effect of training data cost on classifier utility. In conclusion, it is useful to introduce support vector machine algorithms to acquire the classifier model of hypersonic inlet start/unstart, and the minimum total cost of hypersonic inlet start/unstart classifier can be obtained by the maximum classifier utility theories.

  2. A computational pipeline for the development of multi-marker bio-signature panels and ensemble classifiers

    PubMed Central

    2012-01-01

    Background Biomarker panels derived separately from genomic and proteomic data and with a variety of computational methods have demonstrated promising classification performance in various diseases. An open question is how to create effective proteo-genomic panels. The framework of ensemble classifiers has been applied successfully in various analytical domains to combine classifiers so that the performance of the ensemble exceeds the performance of individual classifiers. Using blood-based diagnosis of acute renal allograft rejection as a case study, we address the following question in this paper: Can acute rejection classification performance be improved by combining individual genomic and proteomic classifiers in an ensemble? Results The first part of the paper presents a computational biomarker development pipeline for genomic and proteomic data. The pipeline begins with data acquisition (e.g., from bio-samples to microarray data), quality control, statistical analysis and mining of the data, and finally various forms of validation. The pipeline ensures that the various classifiers to be combined later in an ensemble are diverse and adequate for clinical use. Five mRNA genomic and five proteomic classifiers were developed independently using single time-point blood samples from 11 acute-rejection and 22 non-rejection renal transplant patients. The second part of the paper examines five ensembles ranging in size from two to 10 individual classifiers. Performance of ensembles is characterized by area under the curve (AUC), sensitivity, and specificity, as derived from the probability of acute rejection for individual classifiers in the ensemble in combination with one of two aggregation methods: (1) Average Probability or (2) Vote Threshold. One ensemble demonstrated superior performance and was able to improve sensitivity and AUC beyond the best values observed for any of the individual classifiers in the ensemble, while staying within the range of observed specificity. The Vote Threshold aggregation method achieved improved sensitivity for all 5 ensembles, but typically at the cost of decreased specificity. Conclusion Proteo-genomic biomarker ensemble classifiers show promise in the diagnosis of acute renal allograft rejection and can improve classification performance beyond that of individual genomic or proteomic classifiers alone. Validation of our results in an international multicenter study is currently underway. PMID:23216969

  3. A computational pipeline for the development of multi-marker bio-signature panels and ensemble classifiers.

    PubMed

    Günther, Oliver P; Chen, Virginia; Freue, Gabriela Cohen; Balshaw, Robert F; Tebbutt, Scott J; Hollander, Zsuzsanna; Takhar, Mandeep; McMaster, W Robert; McManus, Bruce M; Keown, Paul A; Ng, Raymond T

    2012-12-08

    Biomarker panels derived separately from genomic and proteomic data and with a variety of computational methods have demonstrated promising classification performance in various diseases. An open question is how to create effective proteo-genomic panels. The framework of ensemble classifiers has been applied successfully in various analytical domains to combine classifiers so that the performance of the ensemble exceeds the performance of individual classifiers. Using blood-based diagnosis of acute renal allograft rejection as a case study, we address the following question in this paper: Can acute rejection classification performance be improved by combining individual genomic and proteomic classifiers in an ensemble? The first part of the paper presents a computational biomarker development pipeline for genomic and proteomic data. The pipeline begins with data acquisition (e.g., from bio-samples to microarray data), quality control, statistical analysis and mining of the data, and finally various forms of validation. The pipeline ensures that the various classifiers to be combined later in an ensemble are diverse and adequate for clinical use. Five mRNA genomic and five proteomic classifiers were developed independently using single time-point blood samples from 11 acute-rejection and 22 non-rejection renal transplant patients. The second part of the paper examines five ensembles ranging in size from two to 10 individual classifiers. Performance of ensembles is characterized by area under the curve (AUC), sensitivity, and specificity, as derived from the probability of acute rejection for individual classifiers in the ensemble in combination with one of two aggregation methods: (1) Average Probability or (2) Vote Threshold. One ensemble demonstrated superior performance and was able to improve sensitivity and AUC beyond the best values observed for any of the individual classifiers in the ensemble, while staying within the range of observed specificity. The Vote Threshold aggregation method achieved improved sensitivity for all 5 ensembles, but typically at the cost of decreased specificity. Proteo-genomic biomarker ensemble classifiers show promise in the diagnosis of acute renal allograft rejection and can improve classification performance beyond that of individual genomic or proteomic classifiers alone. Validation of our results in an international multicenter study is currently underway.

  4. Analysis of sampling techniques for imbalanced data: An n = 648 ADNI study.

    PubMed

    Dubey, Rashmi; Zhou, Jiayu; Wang, Yalin; Thompson, Paul M; Ye, Jieping

    2014-02-15

    Many neuroimaging applications deal with imbalanced imaging data. For example, in Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, the mild cognitive impairment (MCI) cases eligible for the study are nearly two times the Alzheimer's disease (AD) patients for structural magnetic resonance imaging (MRI) modality and six times the control cases for proteomics modality. Constructing an accurate classifier from imbalanced data is a challenging task. Traditional classifiers that aim to maximize the overall prediction accuracy tend to classify all data into the majority class. In this paper, we study an ensemble system of feature selection and data sampling for the class imbalance problem. We systematically analyze various sampling techniques by examining the efficacy of different rates and types of undersampling, oversampling, and a combination of over and undersampling approaches. We thoroughly examine six widely used feature selection algorithms to identify significant biomarkers and thereby reduce the complexity of the data. The efficacy of the ensemble techniques is evaluated using two different classifiers including Random Forest and Support Vector Machines based on classification accuracy, area under the receiver operating characteristic curve (AUC), sensitivity, and specificity measures. Our extensive experimental results show that for various problem settings in ADNI, (1) a balanced training set obtained with K-Medoids technique based undersampling gives the best overall performance among different data sampling techniques and no sampling approach; and (2) sparse logistic regression with stability selection achieves competitive performance among various feature selection algorithms. Comprehensive experiments with various settings show that our proposed ensemble model of multiple undersampled datasets yields stable and promising results. © 2013 Elsevier Inc. All rights reserved.

  5. Glycosyltransferase Gene Expression Profiles Classify Cancer Types and Propose Prognostic Subtypes

    NASA Astrophysics Data System (ADS)

    Ashkani, Jahanshah; Naidoo, Kevin J.

    2016-05-01

    Aberrant glycosylation in tumours stem from altered glycosyltransferase (GT) gene expression but can the expression profiles of these signature genes be used to classify cancer types and lead to cancer subtype discovery? The differential structural changes to cellular glycan structures are predominantly regulated by the expression patterns of GT genes and are a hallmark of neoplastic cell metamorphoses. We found that the expression of 210 GT genes taken from 1893 cancer patient samples in The Cancer Genome Atlas (TCGA) microarray data are able to classify six cancers; breast, ovarian, glioblastoma, kidney, colon and lung. The GT gene expression profiles are used to develop cancer classifiers and propose subtypes. The subclassification of breast cancer solid tumour samples illustrates the discovery of subgroups from GT genes that match well against basal-like and HER2-enriched subtypes and correlates to clinical, mutation and survival data. This cancer type glycosyltransferase gene signature finding provides foundational evidence for the centrality of glycosylation in cancer.

  6. Classification of epileptic EEG signals based on simple random sampling and sequential feature selection.

    PubMed

    Ghayab, Hadi Ratham Al; Li, Yan; Abdulla, Shahab; Diykh, Mohammed; Wan, Xiangkui

    2016-06-01

    Electroencephalogram (EEG) signals are used broadly in the medical fields. The main applications of EEG signals are the diagnosis and treatment of diseases such as epilepsy, Alzheimer, sleep problems and so on. This paper presents a new method which extracts and selects features from multi-channel EEG signals. This research focuses on three main points. Firstly, simple random sampling (SRS) technique is used to extract features from the time domain of EEG signals. Secondly, the sequential feature selection (SFS) algorithm is applied to select the key features and to reduce the dimensionality of the data. Finally, the selected features are forwarded to a least square support vector machine (LS_SVM) classifier to classify the EEG signals. The LS_SVM classifier classified the features which are extracted and selected from the SRS and the SFS. The experimental results show that the method achieves 99.90, 99.80 and 100 % for classification accuracy, sensitivity and specificity, respectively.

  7. The role of cytokeratins 20 and 7 and estrogen receptor analysis in separation of metastatic lobular carcinoma of the breast and metastatic signet ring cell carcinoma of the gastrointestinal tract.

    PubMed

    Tot, T

    2000-06-01

    Metastatic signet ring cell carcinomas of unknown primary site can represent a clinical problem. Gastrointestinal signet ring cell carcinomas and invasive lobular carcinomas of the breast are the most common sources of these metastases. Immunohistochemical algorithms have been successfully used in the search for the unknown primary adenocarcinomas. In the present study a series of primary invasive lobular breast carcinomas (79 cases) and their metastases and a series of gastrointestinal signet ring cell carcinomas (22 primary and 13 metastases) were stained with monoclonal antibodies for cytokeratin (CK) 20 and CK7 and for estrogen receptors (ER). The staining was evaluated as negative (no staining), focally (less than 10% of the tumor cells stained) or diffusely positive. All the primary and metastatic gastrointestinal signet ring cell carcinomas proved to be CK20 positive, while only 2/79 (3%) of the primary and 1/21 metastatic lobular carcinomas (5%) stained positively for this CK. None of the gastrointestinal carcinomas and the majority of the lobular carcinomas expressed ER. The majority of the tumors were CK7+. Using CK20 alone, 33 of 34 metastases could be properly classified as gastrointestinal (CK20+) or mammary (CK20-). ER identified 31/34 of breast cancer metastases. By combining the results of CK20 and ER staining all the metastases could be properly classified as the CK20+/ER- pattern identified all the gastrointestinal tumors.

  8. Epidemiology and antimicrobial susceptibility of Gram-negative aerobic bacteria causing intra-abdominal infections during 2010-2011.

    PubMed

    Hawser, Stephen; Hoban, Daryl J; Badal, Robert E; Bouchillon, Samuel K; Biedenbach, Douglas; Hackel, Meredith; Morrissey, Ian

    2015-02-01

    The study for monitoring antimicrobial resistance trends (SMART) surveillance program monitors the epidemiology and trends in antibiotic resistance of intra-abdominal pathogens to currently used therapies. The current report describes such trends during 2010-2011. A total of 25,746 Gram-negative clinical isolates from intra-abdominal infections were collected and classified as hospital-associated (HA) if the hospital length of stay (LOS) at the time of specimen collection was ≥48 hours, community-associated (CA) if LOS at the time of specimen collection was <48 hours, or unknown (no designation given by participating centre). A total of 92 different species were collected of which the most common was Escherichia coli: 39% of all isolates in North America to 55% in Africa. Klebsiella pneumoniae was the second most common pathogen: 11% of all isolates from Europe to 19% of all isolates from Asia. Isolates were from multiple intra-abdominal sources of which 32% were peritoneal fluid, 20% were intra-abdominal abscesses, and 16.5% were gall bladder infections. Isolates were further classified as HA (55% of all isolates), CA (39% of all isolates), or unknown (6% of all isolates). The most active antibiotics tested were imipenem, ertapenem, amikacin, and piperacillin-tazobactam. Resistance rates to all other antibiotics tested were high. Considering the current data set and high-level resistance of intra-abdominal pathogens to various antibiotics, further monitoring of the epidemiology of intra-abdominal infections and their susceptibility to antibiotics through SMART is warranted.

  9. The Role of Balanced Training and Testing Data Sets for Binary Classifiers in Bioinformatics

    PubMed Central

    Wei, Qiong; Dunbrack, Roland L.

    2013-01-01

    Training and testing of conventional machine learning models on binary classification problems depend on the proportions of the two outcomes in the relevant data sets. This may be especially important in practical terms when real-world applications of the classifier are either highly imbalanced or occur in unknown proportions. Intuitively, it may seem sensible to train machine learning models on data similar to the target data in terms of proportions of the two binary outcomes. However, we show that this is not the case using the example of prediction of deleterious and neutral phenotypes of human missense mutations in human genome data, for which the proportion of the binary outcome is unknown. Our results indicate that using balanced training data (50% neutral and 50% deleterious) results in the highest balanced accuracy (the average of True Positive Rate and True Negative Rate), Matthews correlation coefficient, and area under ROC curves, no matter what the proportions of the two phenotypes are in the testing data. Besides balancing the data by undersampling the majority class, other techniques in machine learning include oversampling the minority class, interpolating minority-class data points and various penalties for misclassifying the minority class. However, these techniques are not commonly used in either the missense phenotype prediction problem or in the prediction of disordered residues in proteins, where the imbalance problem is substantial. The appropriate approach depends on the amount of available data and the specific problem at hand. PMID:23874456

  10. First-time detection of Mycobacterium bovis in livestock tissues and milk in the West Bank, Palestinian Territories.

    PubMed

    Ereqat, Suheir; Nasereddin, Abedelmajeed; Levine, Hagai; Azmi, Kifaya; Al-Jawabreh, Amer; Greenblatt, Charles L; Abdeen, Ziad; Bar-Gal, Gila Kahila

    2013-01-01

    Bovine tuberculosis, bTB, is classified by the WHO as one of the seven neglected zoonontic diseases that cause animal health problems and has high potential to infect humans. In the West Bank, bTB was not studied among animals and the prevalence of human tuberculosis caused by M. bovis is unknown. Therefore, the aim of this study was to estimate the prevalence of bTB among cattle and goats and identify the molecular characteristics of bTB in our area. A total of 208 tissue samples, representing 104 animals, and 150 raw milk samples, obtained from cows and goats were examined for the presence of mycobacteria. The tissue samples were collected during routine meat inspection from the Jericho abattoir. DNA was extracted from all samples, milk and tissue biopsies (n = 358), and screened for presence of TB DNA by amplifying a 123-bp segment of the insertion sequence IS6110. Eight out of 254 animals (3.1%) were found to be TB positive based on the IS6110-PCR. Identification of M. bovis among the positive TB samples was carried out via real time PCR followed by high resolution melt curve analysis, targeting the A/G transition along the oxyR gene. Spoligotyping analysis revealed a new genotype of M. bovis that was revealed from one tissue sample. Detection of M. bovis in tissue and milk of livestock suggests that apparently healthy cattle and goats are a potential source of infection of bTB and may pose a risk to public health. Hence, appropriate measures including meat inspection at abattoirs in the region are required together with promotion of a health campaign emphasizing the importance of drinking pasteurized milk. In addition, further studies are essential at the farm level to determine the exact prevalence of bTB in goats and cattle herds in the West Bank and Israel.

  11. First-Time Detection of Mycobacterium bovis in Livestock Tissues and Milk in the West Bank, Palestinian Territories

    PubMed Central

    Ereqat, Suheir; Nasereddin, Abedelmajeed; Levine, Hagai; Azmi, Kifaya; Al-Jawabreh, Amer; Greenblatt, Charles L.; Abdeen, Ziad; Bar-Gal, Gila Kahila

    2013-01-01

    Background Bovine tuberculosis, bTB, is classified by the WHO as one of the seven neglected zoonontic diseases that cause animal health problems and has high potential to infect humans. In the West Bank, bTB was not studied among animals and the prevalence of human tuberculosis caused by M. bovis is unknown. Therefore, the aim of this study was to estimate the prevalence of bTB among cattle and goats and identify the molecular characteristics of bTB in our area. Methodology/principal findings A total of 208 tissue samples, representing 104 animals, and 150 raw milk samples, obtained from cows and goats were examined for the presence of mycobacteria. The tissue samples were collected during routine meat inspection from the Jericho abattoir. DNA was extracted from all samples, milk and tissue biopsies (n = 358), and screened for presence of TB DNA by amplifying a 123-bp segment of the insertion sequence IS6110. Eight out of 254 animals (3.1%) were found to be TB positive based on the IS6110-PCR. Identification of M. bovis among the positive TB samples was carried out via real time PCR followed by high resolution melt curve analysis, targeting the A/G transition along the oxyR gene. Spoligotyping analysis revealed a new genotype of M. bovis that was revealed from one tissue sample. Significance Detection of M. bovis in tissue and milk of livestock suggests that apparently healthy cattle and goats are a potential source of infection of bTB and may pose a risk to public health. Hence, appropriate measures including meat inspection at abattoirs in the region are required together with promotion of a health campaign emphasizing the importance of drinking pasteurized milk. In addition, further studies are essential at the farm level to determine the exact prevalence of bTB in goats and cattle herds in the West Bank and Israel. PMID:24069475

  12. Investigation of c-KIT and Ki67 expression in normal, preneoplastic and neoplastic canine prostate.

    PubMed

    Fonseca-Alves, Carlos Eduardo; Kobayashi, Priscilla Emiko; Palmieri, Chiara; Laufer-Amorim, Renée

    2017-12-06

    c-KIT expression has been related to bone metastasis in human prostate cancer, but whether c-KIT expression can be similarly classified in canine prostatic tissue is unknown. This study assessed c-KIT and Ki67 expression in canine prostate cancer (PC). c-KIT gene and protein expression and Ki67 expression were evaluated in forty-four canine prostatic tissues by immunohistochemistry, RT-qPCR and western blot. Additionally, we have investigated c-KIT protein expression by immunoblotting in two primary canine prostate cancer cell lines. Eleven normal prostates, 12 proliferative inflammatory atrophy (PIA) prostates, 18 PC, 3 metastatic lesions and two prostate cancer cell cultures (PC1 and PC2) were analysed. The prostatic tissue exhibited varying degrees of membranous, cytoplasmic or membranous/cytoplasmic c-KIT staining. Four normal prostates, 4 PIA and 5 prostatic carcinomas showed positive c-KIT expression. No c-KIT immunoexpression was observed in metastases. Canine prostate cancer and PIA samples contained a higher number of Ki67-positive cells compared to normal samples. The median relative quantification (RQ) for c-KIT expression in normal, PIA and prostate cancer and metastatic samples were 0.6 (0.1-2.5), 0.7 (0.09-2.1), 0.7 (0.09-5.1) and 0.1 (0.07-0.6), respectively. A positive correlation between the number of Ki67-positive cells and c-KIT transcript levels was observed in prostate cancer samples. In the cell line, PC1 was negative for c-KIT protein expression, while PC2 was weakly positive. The present study identified a strong correlation between c-KIT expression and proliferative index, suggesting that c-KIT may influence cell proliferation. Therefore, c-KIT heterogeneous protein expression among the samples (five positive and thirteen negative prostate cancer samples) indicates a personalized approach for canine prostate cancer.

  13. Triacylglycerol stereospecific analysis and linear discriminant analysis for milk speciation.

    PubMed

    Blasi, Francesca; Lombardi, Germana; Damiani, Pietro; Simonetti, Maria Stella; Giua, Laura; Cossignani, Lina

    2013-05-01

    Product authenticity is an important topic in dairy sector. Dairy products sold for public consumption must be accurately labelled in accordance with the contained milk species. Linear discriminant analysis (LDA), a common chemometric procedure, has been applied to fatty acid% composition to classify pure milk samples (cow, ewe, buffalo, donkey, goat). All original grouped cases were correctly classified, while 90% of cross-validated grouped cases were correctly classified. Another objective of this research was the characterisation of cow-ewe milk mixtures in order to reveal a common fraud in dairy field, that is the addition of cow to ewe milk. Stereospecific analysis of triacylglycerols (TAG), a method based on chemical-enzymatic procedures coupled with chromatographic techniques, has been carried out to detect fraudulent milk additions, in particular 1, 3, 5% cow milk added to ewe milk. When only TAG composition data were used for the elaboration, 75% of original grouped cases were correctly classified, while totally correct classified samples were obtained when both total and intrapositional TAG data were used. Also the results of cross validation were better when TAG stereospecific analysis data were considered as LDA variables. In particular, 100% of cross-validated grouped cases were obtained when 5% cow milk mixtures were considered.

  14. Transfer Learning for Class Imbalance Problems with Inadequate Data.

    PubMed

    Al-Stouhi, Samir; Reddy, Chandan K

    2016-07-01

    A fundamental problem in data mining is to effectively build robust classifiers in the presence of skewed data distributions. Class imbalance classifiers are trained specifically for skewed distribution datasets. Existing methods assume an ample supply of training examples as a fundamental prerequisite for constructing an effective classifier. However, when sufficient data is not readily available, the development of a representative classification algorithm becomes even more difficult due to the unequal distribution between classes. We provide a unified framework that will potentially take advantage of auxiliary data using a transfer learning mechanism and simultaneously build a robust classifier to tackle this imbalance issue in the presence of few training samples in a particular target domain of interest. Transfer learning methods use auxiliary data to augment learning when training examples are not sufficient and in this paper we will develop a method that is optimized to simultaneously augment the training data and induce balance into skewed datasets. We propose a novel boosting based instance-transfer classifier with a label-dependent update mechanism that simultaneously compensates for class imbalance and incorporates samples from an auxiliary domain to improve classification. We provide theoretical and empirical validation of our method and apply to healthcare and text classification applications.

  15. Comparative transcriptome analysis of the Asteraceae halophyte Karelinia caspica under salt stress.

    PubMed

    Zhang, Xia; Liao, Maoseng; Chang, Dan; Zhang, Fuchun

    2014-12-17

    Much attention has been given to the potential of halophytes as sources of tolerance traits for introduction into cereals. However, a great deal remains unknown about the diverse mechanisms employed by halophytes to cope with salinity. To characterize salt tolerance mechanisms underlying Karelinia caspica, an Asteraceae halophyte, we performed Large-scale transcriptomic analysis using a high-throughput Illumina sequencing platform. Comparative gene expression analysis was performed to correlate the effects of salt stress and ABA regulation at the molecular level. Total sequence reads generated by pyrosequencing were assembled into 287,185 non-redundant transcripts with an average length of 652 bp. Using the BLAST function in the Swiss-Prot, NCBI nr, GO, KEGG, and KOG databases, a total of 216,416 coding sequences associated with known proteins were annotated. Among these, 35,533 unigenes were classified into 69 gene ontology categories, and 18,378 unigenes were classified into 202 known pathways. Based on the fold changes observed when comparing the salt stress and control samples, 60,127 unigenes were differentially expressed, with 38,122 and 22,005 up- and down-regulated, respectively. Several of the differentially expressed genes are known to be involved in the signaling pathway of the plant hormone ABA, including ABA metabolism, transport, and sensing as well as the ABA signaling cascade. Transcriptome profiling of K. caspica contribute to a comprehensive understanding of K. caspica at the molecular level. Moreover, the global survey of differentially expressed genes in this species under salt stress and analyses of the effects of salt stress and ABA regulation will contribute to the identification and characterization of genes and molecular mechanisms underlying salt stress responses in Asteraceae plants.

  16. Analysis of agreement between cardiac risk stratification protocols applied to participants of a center for cardiac rehabilitation

    PubMed Central

    Santos, Ana A. S.; Silva, Anne K. F.; Vanderlei, Franciele M.; Christofaro, Diego G. D.; Gonçalves, Aline F. L.; Vanderlei, Luiz C. M.

    2016-01-01

    ABSTRACT Background Cardiac risk stratification is related to the risk of the occurrence of events induced by exercise. Despite the existence of several protocols to calculate risk stratification, studies indicating that there is similarity between these protocols are still unknown. Objective To evaluate the agreement between the existing protocols on cardiac risk rating in cardiac patients. Method The records of 50 patients from a cardiac rehabilitation program were analyzed, from which the following information was extracted: age, sex, weight, height, clinical diagnosis, medical history, risk factors, associated diseases, and the results from the most recent laboratory and complementary tests performed. This information was used for risk stratification of the patients in the protocols of the American College of Sports Medicine, the Brazilian Society of Cardiology, the American Heart Association, the protocol designed by Frederic J. Pashkow, the American Association of Cardiovascular and Pulmonary Rehabilitation, the Société Française de Cardiologie, and the Sociedad Española de Cardiología. Descriptive statistics were used to characterize the sample and the analysis of agreement between the protocols was calculated using the Kappa coefficient. Differences were considered with a significance level of 5%. Results Of the 21 analyses of agreement, 12 were considered significant between the protocols used for risk classification, with nine classified as moderate and three as low. No agreements were classified as excellent. Different proportions were observed in each risk category, with significant differences between the protocols for all risk categories. Conclusion The agreements between the protocols were considered low and moderate and the risk proportions differed between protocols. PMID:27556385

  17. Prevalence and causes of hearing impairment in Fundong Health District, North-West Cameroon.

    PubMed

    Ferrite, Silvia; Mactaggart, Islay; Kuper, Hannah; Oye, Joseph; Polack, Sarah

    2017-04-01

    To estimate the prevalence and causes of hearing impairment in Fundong Health District, North-West Cameroon. We selected 51 clusters of 80 people (all ages) through probability proportionate to size sampling. Initial hearing screening was undertaken through an otoacoustic emission (OAE) test. Participants aged 4+ years who failed this test in both ears or for whom an OAE reading could not be taken underwent a manual pure-tone audiometry (PTA) screening. Cases of hearing impairment were defined as those with pure-tone average ≥41 dBHL in adults and ≥35 dBHL in children in the better ear, or children under age 4 who failed the OAE test in both ears. Each case with hearing loss was examined by an ear, nose and throat nurse who indicated the main likely cause. We examined 3567 (86.9%) of 4104 eligible people. The overall prevalence of hearing impairment was 3.6% (95% confidence interval [CI]: 2.8-4.6). The prevalence was low in people aged 0-17 (1.1%, 0.7-1.8%) and 18-49 (1.1%, 0.5-2.6%) and then rose sharply in people aged 50+ (14.8%, 11.7-19.1%). Among cases, the majority were classified as moderate (76%), followed by severe (15%) and profound (9%). More than one-third of cases of hearing impairment were classified as unknown (37%) or conductive (37%) causes, while sensorineural causes were less common (26%). Prevalence of hearing impairment in North-West Cameroon is in line with the WHO estimate for sub-Saharan Africa. The majority of cases with known causes are treatable, with impacted wax playing a major role. © 2017 John Wiley & Sons Ltd.

  18. Pica and rumination behavior among individuals seeking treatment for eating disorders or obesity.

    PubMed

    Delaney, Charlotte B; Eddy, Kamryn T; Hartmann, Andrea S; Becker, Anne E; Murray, Helen B; Thomas, Jennifer J

    2015-03-01

    Pica and rumination disorder (RD)-formerly classified within DSM-IV Feeding and Eating Disorders of Infancy or Early Childhood-are now classified within DSM-5 Feeding and Eating Disorders. Though pica and RD have been studied in select populations (e.g., pregnant women, intellectually disabled persons), their typical features and overall prevalence remain unknown. This study examined the clinical characteristics and frequency of DSM-5 pica and RD among individuals seeking treatment for eating disorders and obesity. We conducted structured interviews with adolescent and young adult females from a residential eating disorder center (N = 149), and adult males and females with overweight or obesity from an outpatient weight-loss clinic (N = 100). Several participants reported ingesting non-nutritive substances (e.g., ice) for weight-control purposes. However, only 1.3% (n = 2; 95% CI: .06% to 5.1%) at the residential eating disorder center and 0% at the weight-loss clinic met DSM-5 criteria for pica, consuming gum and plastic. Although no eating disorder participants were eligible for an RD diagnosis due to DSM-5 trumping rules, 7.4% (n = 11; 95% CI: 4.0% to 12.9%) endorsed rumination behavior under varying degrees of volitional control. At the weight-loss clinic, 2.0% (n = 2; 95% CI: 0.1% to 7.4%) had RD. DSM-5 pica and RD were rare in our sample of individuals seeking treatment for eating disorders and obesity, but related behaviors were more common. The wide range of pica and rumination presentations highlights the challenges of differential diagnosis with other forms of disordered eating. © 2014 Wiley Periodicals, Inc.

  19. Color vision deficiency in a middle-aged population: the Shahroud Eye Study.

    PubMed

    Jafarzadehpur, Ebrahim; Hashemi, Hassan; Emamian, Mohammad Hassan; Khabazkhoob, Mehdi; Mehravaran, Shiva; Shariati, Mohammad; Fotouhi, Akbar

    2014-10-01

    The aim of this study was to determine the prevalence of color vision defects in the middle-age population of Shahroud, Iran. We selected 6,311 people from the 40- to 64-year-old population through random cluster sampling. Color vision testing was performed with the Farnsworth D-15. Cases with similar and symmetric results in both eyes were classified as hereditary, and those with asymmetric results were considered acquired. Cases that did not conform to standard patterns were classified as unknown category. Of 5,190 respondents (response rate 82.2 %), 5,102 participants underwent the color vision test. Of these, 14.7 % (95 % confidence interval 13.7-15.6) had some type of color vision deficiency. Of the 2,157 male participants, 6.2 % were hereditary and 10.2 % were acquired and of the 2,945 female participants, 3.1 % were hereditary and 10 % were acquired. Hereditary color deficiencies were mostly of the deutan form (63.8 %), and acquired deficiencies were mostly tritan (66.1 %). The prevalence of hereditary and acquired color vision deficiency, as well as different types of red-green and blue-yellow color vision defects significantly increased with age (p < 0.001). In conclusion, the pattern of color vision defects among the middle-aged population of Shahroud was significantly different from that seen in the younger population. This could be due to changes associated with age, gender, medical and ocular conditions, and differences in race and environment. Thus, results of previous examinations and the overall health status should be considered before making any judgment about the status of color vision in middle-aged people.

  20. Soil-geomorphology relationships and landscape evolution in a southwestern Atlantic tidal salt marsh in Patagonia, Argentina

    NASA Astrophysics Data System (ADS)

    Ríos, Ileana; Bouza, Pablo José; Bortolus, Alejandro; Alvarez, María del Pilar

    2018-07-01

    Salt marshes in Patagonia ecosystem are nowadays fully recognized by ecological, pollution and phytoremediation studies but a soil genesis and geomorphology approach is currently unknown. The aim of this study was to establish the soil-geomorphology relationship in Fracasso salt marsh and to determine the successional vegetation dynamics associated with the landscape evolution. This work was carried out in Fracasso salt marsh sited in Península Valdés, Argentina, where an integrated study on soil-geomorphology relationship and landscape evolution was performed along with sedimentological analysis and vegetation changes (C3 photosynthesis pathway vs. C4 photosynthesis pathway plants). This last was determined through the δ13C composition from soil organic matter (SOM). Soil descriptions and laboratory analysis of soil samples were performed. A marked relationship between the vegetation unit, the dominant landform and the type of associated soil was found. Limonium brasiliense (Lb) and Sarcocornia perennis (Sp), both C3 plants, are dominant in levees associated with tidal creeks, and soils were classified as Typic Fluvaquents, while Spartina alterniflora (Sa) soils were classified as Sodic Endoaquents and Sodic Psammaquents. Although no sulfidic materials were identified by incubation test, they were identified by hydrogen peroxide treatment in Sa soils, and now are considered potential acid sulfate soils (PASS). Sedimentological analysis from deepest sandy C horizons indicates a beach depositional environment. On the other hand, the δ13C stable isotope composition of SOM preserved into these buried soil acting as parent materials shows the dominance of C4 plants presumably belonging to Spartina species, suggesting a possible colonization and stabilization as the pioneer salt marsh.

  1. Adaptive classifier for steel strip surface defects

    NASA Astrophysics Data System (ADS)

    Jiang, Mingming; Li, Guangyao; Xie, Li; Xiao, Mang; Yi, Li

    2017-01-01

    Surface defects detection system has been receiving increased attention as its precision, speed and less cost. One of the most challenges is reacting to accuracy deterioration with time as aged equipment and changed processes. These variables will make a tiny change to the real world model but a big impact on the classification result. In this paper, we propose a new adaptive classifier with a Bayes kernel (BYEC) which update the model with small sample to it adaptive for accuracy deterioration. Firstly, abundant features were introduced to cover lots of information about the defects. Secondly, we constructed a series of SVMs with the random subspace of the features. Then, a Bayes classifier was trained as an evolutionary kernel to fuse the results from base SVMs. Finally, we proposed the method to update the Bayes evolutionary kernel. The proposed algorithm is experimentally compared with different algorithms, experimental results demonstrate that the proposed method can be updated with small sample and fit the changed model well. Robustness, low requirement for samples and adaptive is presented in the experiment.

  2. Advanced Subspace Techniques for Modeling Channel and Session Variability in a Speaker Recognition System

    DTIC Science & Technology

    2012-03-01

    with each SVM discriminating between a pair of the N total speakers in the data set. The (( + 1))/2 classifiers then vote on the final...classification of a test sample. The Random Forest classifier is an ensemble classifier that votes amongst decision trees generated with each node using...Forest vote , and the effects of overtraining will be mitigated by the fact that each decision tree is overtrained differently (due to the random

  3. Arrogance analysis of several typical pattern recognition classifiers

    NASA Astrophysics Data System (ADS)

    Jing, Chen; Xia, Shengping; Hu, Weidong

    2007-04-01

    Various kinds of classification methods have been developed. However, most of these classical methods, such as Back-Propagation (BP), Bayesian method, Support Vector Machine(SVM), Self-Organizing Map (SOM) are arrogant. A so-called arrogance, for a human, means that his decision, which even is a mistake, overstates his actual experience. Accordingly, we say that he is a arrogant if he frequently makes arrogant decisions. Likewise, some classical pattern classifiers represent the similar characteristic of arrogance. Given an input feature vector, we say a classifier is arrogant in its classification if its veracity is high yet its experience is low. Typically, for a new sample which is distinguishable from original training samples, traditional classifiers recognize it as one of the known targets. Clearly, arrogance in classification is an undesirable attribute. Conversely, a classifier is non-arrogant in its classification if there is a reasonable balance between its veracity and its experience. Inquisitiveness is, in many ways, the opposite of arrogance. In nature, inquisitiveness is an eagerness for knowledge characterized by the drive to question, to seek a deeper understanding. The human capacity to doubt present beliefs allows us to acquire new experiences and to learn from our mistakes. Within the discrete world of computers, inquisitive pattern recognition is the constructive investigation and exploitation of conflict in information. Thus, we quantify this balance and discuss new techniques that will detect arrogance in a classifier.

  4. An evaluation of several different classification schemes - Their parameters and performance. [maximum likelihood decision for crop identification

    NASA Technical Reports Server (NTRS)

    Scholz, D.; Fuhs, N.; Hixson, M.

    1979-01-01

    The overall objective of this study was to apply and evaluate several of the currently available classification schemes for crop identification. The approaches examined were: (1) a per point Gaussian maximum likelihood classifier, (2) a per point sum of normal densities classifier, (3) a per point linear classifier, (4) a per point Gaussian maximum likelihood decision tree classifier, and (5) a texture sensitive per field Gaussian maximum likelihood classifier. Three agricultural data sets were used in the study: areas from Fayette County, Illinois, and Pottawattamie and Shelby Counties in Iowa. The segments were located in two distinct regions of the Corn Belt to sample variability in soils, climate, and agricultural practices.

  5. Multicentre prospective validation of a urinary peptidome-based classifier for the diagnosis of type 2 diabetic nephropathy

    PubMed Central

    Siwy, Justyna; Schanstra, Joost P.; Argiles, Angel; Bakker, Stephan J.L.; Beige, Joachim; Boucek, Petr; Brand, Korbinian; Delles, Christian; Duranton, Flore; Fernandez-Fernandez, Beatriz; Jankowski, Marie-Luise; Al Khatib, Mohammad; Kunt, Thomas; Lajer, Maria; Lichtinghagen, Ralf; Lindhardt, Morten; Maahs, David M; Mischak, Harald; Mullen, William; Navis, Gerjan; Noutsou, Marina; Ortiz, Alberto; Persson, Frederik; Petrie, John R.; Roob, Johannes M.; Rossing, Peter; Ruggenenti, Piero; Rychlik, Ivan; Serra, Andreas L.; Snell-Bergeon, Janet; Spasovski, Goce; Stojceva-Taneva, Olivera; Trillini, Matias; von der Leyen, Heiko; Winklhofer-Roob, Brigitte M.; Zürbig, Petra; Jankowski, Joachim

    2014-01-01

    Background Diabetic nephropathy (DN) is one of the major late complications of diabetes. Treatment aimed at slowing down the progression of DN is available but methods for early and definitive detection of DN progression are currently lacking. The ‘Proteomic prediction and Renin angiotensin aldosterone system Inhibition prevention Of early diabetic nephRopathy In TYpe 2 diabetic patients with normoalbuminuria trial’ (PRIORITY) aims to evaluate the early detection of DN in patients with type 2 diabetes (T2D) using a urinary proteome-based classifier (CKD273). Methods In this ancillary study of the recently initiated PRIORITY trial we aimed to validate for the first time the CKD273 classifier in a multicentre (9 different institutions providing samples from 165 T2D patients) prospective setting. In addition we also investigated the influence of sample containers, age and gender on the CKD273 classifier. Results We observed a high consistency of the CKD273 classification scores across the different centres with areas under the curves ranging from 0.95 to 1.00. The classifier was independent of age (range tested 16–89 years) and gender. Furthermore, the use of different urine storage containers did not affect the classification scores. Analysis of the distribution of the individual peptides of the classifier over the nine different centres showed that fragments of blood-derived and extracellular matrix proteins were the most consistently found. Conclusion We provide for the first time validation of this urinary proteome-based classifier in a multicentre prospective setting and show the suitability of the CKD273 classifier to be used in the PRIORITY trial. PMID:24589724

  6. The prevalence and determinants of overweight and obesity among French youths and adults with intellectual disabilities attending special education schools.

    PubMed

    Bégarie, Jérôme; Maïano, Christophe; Leconte, Pascale; Ninot, Grégory

    2013-05-01

    This study examines the prevalence of overweight and obesity and a panel of potential determinants among French youths and adults with an intellectual disability (ID). The sample used consisted of 1120 youths and adults with an ID, from 5 to 28 years old, attending a French special education school. The results indicated that 19.8% of the participants with an ID are classified as overweight and 8.6% as obese. Multivariate logistic regression analyses revealed that there are nearly three times more girls/women classified as overweight than boys/men. Additionally, they showed that there are nearly two times more participants from southern France classified as overweight than from northern France, and that the risk of being classified as overweight significantly increases with seniority in the school. Next, the interaction effects observed indicated first that there are nearly two times more boys/men on psychotropic medication classified as overweight than boys/men not on psychotropic medication. Second, they revealed that the odds of being classified as overweight for boys/men not on psychotropic medication are 47% lower than for girls/women not on psychotropic medication. Third, they indicated that there are nearly two times more boys/men from southern France classified as obese than boys/men from northern France. Fourth, they showed that the odds of being classified as obese for boys/men from northern France are 52% lower than for girls/women from northern France. In conclusion, these results should be viewed as preliminary and need to be replicated since, to our knowledge, this study is the first one to examine this topic while simultaneously controlling for all of the potential determinants and relying on a sample of youths and adults. Copyright © 2013 Elsevier Ltd. All rights reserved.

  7. Comparison of four approaches to a rock facies classification problem

    USGS Publications Warehouse

    Dubois, M.K.; Bohling, Geoffrey C.; Chakrabarti, S.

    2007-01-01

    In this study, seven classifiers based on four different approaches were tested in a rock facies classification problem: classical parametric methods using Bayes' rule, and non-parametric methods using fuzzy logic, k-nearest neighbor, and feed forward-back propagating artificial neural network. Determining the most effective classifier for geologic facies prediction in wells without cores in the Panoma gas field, in Southwest Kansas, was the objective. Study data include 3600 samples with known rock facies class (from core) with each sample having either four or five measured properties (wire-line log curves), and two derived geologic properties (geologic constraining variables). The sample set was divided into two subsets, one for training and one for testing the ability of the trained classifier to correctly assign classes. Artificial neural networks clearly outperformed all other classifiers and are effective tools for this particular classification problem. Classical parametric models were inadequate due to the nature of the predictor variables (high dimensional and not linearly correlated), and feature space of the classes (overlapping). The other non-parametric methods tested, k-nearest neighbor and fuzzy logic, would need considerable improvement to match the neural network effectiveness, but further work, possibly combining certain aspects of the three non-parametric methods, may be justified. ?? 2006 Elsevier Ltd. All rights reserved.

  8. Classifier-Guided Sampling for Complex Energy System Optimization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Backlund, Peter B.; Eddy, John P.

    2015-09-01

    This report documents the results of a Laboratory Directed Research and Development (LDRD) effort enti tled "Classifier - Guided Sampling for Complex Energy System Optimization" that was conducted during FY 2014 and FY 2015. The goal of this proj ect was to develop, implement, and test major improvements to the classifier - guided sampling (CGS) algorithm. CGS is type of evolutionary algorithm for perform ing search and optimization over a set of discrete design variables in the face of one or more objective functions. E xisting evolutionary algorithms, such as genetic algorithms , may require a large number of omore » bjecti ve function evaluations to identify optimal or near - optimal solutions . Reducing the number of evaluations can result in significant time savings, especially if the objective function is computationally expensive. CGS reduce s the evaluation count by us ing a Bayesian network classifier to filter out non - promising candidate designs , prior to evaluation, based on their posterior probabilit ies . In this project, b oth the single - objective and multi - objective version s of the CGS are developed and tested on a set of benchm ark problems. As a domain - specific case study, CGS is used to design a microgrid for use in islanded mode during an extended bulk power grid outage.« less

  9. Finding a Needle in a Haystack: Distinguishing Mexican Maize Landraces Using a Small Number of SNPs

    PubMed Central

    Caldu-Primo, Jose L.; Mastretta-Yanes, Alicia; Wegier, Ana; Piñero, Daniel

    2017-01-01

    In Mexico's territory, the center of origin and domestication of maize (Zea mays), there is a large phenotypic diversity of this crop. This diversity has been classified into “landraces.” Previous studies have reported that genomic variation in Mexican maize is better explained by environmental factors, particularly those related with altitude, than by landrace. Still, landraces are extensively used by agronomists, who recognize them as stable and discriminatory categories for the classification of samples. In order to investigate the genomic foundation of maize landraces, we analyzed genomic data (35,909 SNPs from Illumina MaizeSNP50 BeadChip) obtained from 50 samples representing five maize landraces (Comiteco, Conejo, Tehua, Zapalote Grande, and Zapalote Chico), and searched for markers suitable for landrace assignment. Landrace clusters could not be identified taking all the genomic information, but they become manifest taking only a subset of SNPs with high FST among landraces. Discriminant analysis of principal components was conducted to classify samples using SNP data. Two classification analyses were done, first classifying samples by landrace and then by altitude category. Through this classification method, we identified 20 landrace-informative SNPs and 14 altitude-informative SNPs, with only 6 SNPs in common for both analyses. These results show that Mexican maize phenotypic diversity can be classified in landraces using a small number of genomic markers, given the fact that landrace genomic diversity is influenced by environmental factors as well as artificial selection due to bio-cultural practices. PMID:28458682

  10. Benchmarking contactless acquisition sensor reproducibility for latent fingerprint trace evidence

    NASA Astrophysics Data System (ADS)

    Hildebrandt, Mario; Dittmann, Jana

    2015-03-01

    Optical, nano-meter range, contactless, non-destructive sensor devices are promising acquisition techniques in crime scene trace forensics, e.g. for digitizing latent fingerprint traces. Before new approaches are introduced in crime investigations, innovations need to be positively tested and quality ensured. In this paper we investigate sensor reproducibility by studying different scans from four sensors: two chromatic white light sensors (CWL600/CWL1mm), one confocal laser scanning microscope, and one NIR/VIS/UV reflection spectrometer. Firstly, we perform an intra-sensor reproducibility testing for CWL600 with a privacy conform test set of artificial-sweat printed, computer generated fingerprints. We use 24 different fingerprint patterns as original samples (printing samples/templates) for printing with artificial sweat (physical trace samples) and their acquisition with contactless sensory resulting in 96 sensor images, called scan or acquired samples. The second test set for inter-sensor reproducibility assessment consists of the first three patterns from the first test set, acquired in two consecutive scans using each device. We suggest using a simple feature space set in spatial and frequency domain known from signal processing and test its suitability for six different classifiers classifying scan data into small differences (reproducible) and large differences (non-reproducible). Furthermore, we suggest comparing the classification results with biometric verification scores (calculated with NBIS, with threshold of 40) as biometric reproducibility score. The Bagging classifier is nearly for all cases the most reliable classifier in our experiments and the results are also confirmed with the biometric matching rates.

  11. Influence of diagnostic criteria on the interpretation of adrenal vein sampling.

    PubMed

    Lethielleux, Gaëlle; Amar, Laurence; Raynaud, Alain; Plouin, Pierre-François; Steichen, Olivier

    2015-04-01

    Guidelines promote the use of adrenal vein sampling (AVS) to document lateralized aldosterone hypersecretion in primary aldosteronism. However, there are large discrepancies between institutions in the criteria used to interpret its results. This study evaluates the consequences of these differences on the classification and management of patients. The results of all 537 AVS procedures performed between January 2001 and July 2010 in our institution were interpreted with 4 diagnostic criteria used in experienced institutions where AVS is performed without cosyntropin (Brisbane, Padua, Paris, and Turin) and with criteria proposed by a recent consensus statement. AVS procedures were classified as unsuccessful, lateralized, or not lateralized according to each set of criteria. Almost 5× more AVS procedures were classified as unsuccessful with the strictest criteria than with the least strict criteria (18% versus 4%, respectively). Similarly, over 2× more AVS procedures were classified as lateralized with the least stringent criteria than with the most stringent criteria (60% versus 26%, respectively). Multiple samples were available from ≥1 side for 155 AVS procedures. These procedures were classified differently by ≥2 right-left sample pairs in 12% to 20% of cases. Thus, different sets of criteria used to interpret AVS in experienced institutions translate into heterogeneous classifications and hence management decisions, for patients with primary aldosteronism. Defining the most appropriate procedures and diagnostic criteria is needed for AVS to achieve optimal performance and fully justify its status as a gold standard. © 2015 American Heart Association, Inc.

  12. Prevalence and etiology of epilepsy in a Norwegian county-A population based study.

    PubMed

    Syvertsen, Marte; Nakken, Karl Otto; Edland, Astrid; Hansen, Gunnar; Hellum, Morten Kristoffer; Koht, Jeanette

    2015-05-01

    Epilepsy represents a substantial personal and social burden worldwide. When addressing the multifaceted issues of epilepsy care, updated epidemiologic studies using recent guidelines are essential. The aim of this study was to find the prevalence and causes of epilepsy in a representative Norwegian county, implementing the new guidelines and terminology suggested by the International League Against Epilepsy (ILAE). Included in the study were all patients from Buskerud County in Norway with a diagnosis of epilepsy at Drammen Hospital and the National Center for Epilepsy at Oslo University Hospital. The study period was 1999-2014. Patients with active epilepsy were identified through a systematic review of medical records, containing information about case history, electroencephalography (EEG), cerebral magnetic resonance imaging (MRI), genetic tests, blood samples, treatment, and other investigations. Epilepsies were classified according to the revised terminology suggested by the ILAE in 2010. In a population of 272,228 inhabitants, 1,771 persons had active epilepsy. Point prevalence on January 1, 2014 was 0.65%. Of the subjects registered with a diagnostic code of epilepsy, 20% did not fulfill the ILAE criteria of the diagnosis. Epilepsy etiology was structural-metabolic in 43%, genetic/presumed genetic in 20%, and unknown in 32%. Due to lack of information, etiology could not be determined in 4%. Epilepsy is a common disorder, affecting 0.65% of the subjects in this cohort. Every fifth subject registered with a diagnosis of epilepsy was misdiagnosed. In those with a reliable epilepsy diagnosis, every third patient had an unknown etiology. Future advances in genetic research will probably lead to an increased identification of genetic and hopefully treatable causes of epilepsy. Wiley Periodicals, Inc. © 2015 International League Against Epilepsy.

  13. Digital microbiology: detection and classification of unknown bacterial pathogens using a label-free laser light scatter-sensing system

    NASA Astrophysics Data System (ADS)

    Rajwa, Bartek; Dundar, M. Murat; Akova, Ferit; Patsekin, Valery; Bae, Euiwon; Tang, Yanjie; Dietz, J. Eric; Hirleman, E. Daniel; Robinson, J. Paul; Bhunia, Arun K.

    2011-06-01

    The majority of tools for pathogen sensing and recognition are based on physiological or genetic properties of microorganisms. However, there is enormous interest in devising label-free and reagentless biosensors that would operate utilizing the biophysical signatures of samples without the need for labeling and reporting biochemistry. Optical biosensors are closest to realizing this goal and vibrational spectroscopies are examples of well-established optical label-free biosensing techniques. A recently introduced forward-scatter phenotyping (FSP) also belongs to the broad class of optical sensors. However, in contrast to spectroscopies, the remarkable specificity of FSP derives from the morphological information that bacterial material encodes on a coherent optical wavefront passing through the colony. The system collects elastically scattered light patterns that, given a constant environment, are unique to each bacterial species and/or serovar. Both FSP technology and spectroscopies rely on statistical machine learning to perform recognition and classification. However, the commonly used methods utilize either simplistic unsupervised learning or traditional supervised techniques that assume completeness of training libraries. This restrictive assumption is known to be false for real-life conditions, resulting in unsatisfactory levels of accuracy, and consequently limited overall performance for biodetection and classification tasks. The presented work demonstrates preliminary studies on the use of FSP system to classify selected serotypes of non-O157 Shiga toxin-producing E. coli in a nonexhaustive framework, that is, without full knowledge about all the possible classes that can be encountered. Our study uses a Bayesian approach to learning with a nonexhaustive training dataset to allow for the automated and distributed detection of unknown bacterial classes.

  14. Ionic liquid-based reagents improve the stability of midterm fecal sample storage.

    PubMed

    Hao, Lilan; Xia, Zhongkui; Yang, Huanming; Wang, Jian; Han, Mo

    2017-08-01

    Fecal samples are widely used in metagenomic research, which aims to elucidate the relationship between human health and the intestinal microbiota. However, the best conditions for stable and reliable storage and transport of these samples at room temperature are still unknown, and whether samples stored at room temperature for several days will maintain their microbiota composition is still unknown. Here, we established and tested a preservation method using reagents containing imidazolium- or pyridinium-based ionic liquids. We stored human fecal samples in these reagents for up to 7 days at different temperatures. Subsequently, all samples were sequenced and compared with fresh samples and/or samples treated under other conditions. The 16S rRNA sequencing results suggested that ionic liquid-based reagents could stabilize the composition of the microbiota in fecal samples during a 7-day storage period, particularly when stored at room temperature. Thus, this method may have implications in the storage of fecal samples for metagenomic research. Copyright © 2017 Elsevier B.V. All rights reserved.

  15. Classification of time-of-flight secondary ion mass spectrometry spectra from complex Cu-Fe sulphides by principal component analysis and artificial neural networks.

    PubMed

    Kalegowda, Yogesh; Harmer, Sarah L

    2013-01-08

    Artificial neural network (ANN) and a hybrid principal component analysis-artificial neural network (PCA-ANN) classifiers have been successfully implemented for classification of static time-of-flight secondary ion mass spectrometry (ToF-SIMS) mass spectra collected from complex Cu-Fe sulphides (chalcopyrite, bornite, chalcocite and pyrite) at different flotation conditions. ANNs are very good pattern classifiers because of: their ability to learn and generalise patterns that are not linearly separable; their fault and noise tolerance capability; and high parallelism. In the first approach, fragments from the whole ToF-SIMS spectrum were used as input to the ANN, the model yielded high overall correct classification rates of 100% for feed samples, 88% for conditioned feed samples and 91% for Eh modified samples. In the second approach, the hybrid pattern classifier PCA-ANN was integrated. PCA is a very effective multivariate data analysis tool applied to enhance species features and reduce data dimensionality. Principal component (PC) scores which accounted for 95% of the raw spectral data variance, were used as input to the ANN, the model yielded high overall correct classification rates of 88% for conditioned feed samples and 95% for Eh modified samples. Copyright © 2012 Elsevier B.V. All rights reserved.

  16. Sample size determination for disease prevalence studies with partially validated data.

    PubMed

    Qiu, Shi-Fang; Poon, Wai-Yin; Tang, Man-Lai

    2016-02-01

    Disease prevalence is an important topic in medical research, and its study is based on data that are obtained by classifying subjects according to whether a disease has been contracted. Classification can be conducted with high-cost gold standard tests or low-cost screening tests, but the latter are subject to the misclassification of subjects. As a compromise between the two, many research studies use partially validated datasets in which all data points are classified by fallible tests, and some of the data points are validated in the sense that they are also classified by the completely accurate gold-standard test. In this article, we investigate the determination of sample sizes for disease prevalence studies with partially validated data. We use two approaches. The first is to find sample sizes that can achieve a pre-specified power of a statistical test at a chosen significance level, and the second is to find sample sizes that can control the width of a confidence interval with a pre-specified confidence level. Empirical studies have been conducted to demonstrate the performance of various testing procedures with the proposed sample sizes. The applicability of the proposed methods are illustrated by a real-data example. © The Author(s) 2012.

  17. A Structure-Adaptive Hybrid RBF-BP Classifier with an Optimized Learning Strategy

    PubMed Central

    Wen, Hui; Xie, Weixin; Pei, Jihong

    2016-01-01

    This paper presents a structure-adaptive hybrid RBF-BP (SAHRBF-BP) classifier with an optimized learning strategy. SAHRBF-BP is composed of a structure-adaptive RBF network and a BP network of cascade, where the number of RBF hidden nodes is adjusted adaptively according to the distribution of sample space, the adaptive RBF network is used for nonlinear kernel mapping and the BP network is used for nonlinear classification. The optimized learning strategy is as follows: firstly, a potential function is introduced into training sample space to adaptively determine the number of initial RBF hidden nodes and node parameters, and a form of heterogeneous samples repulsive force is designed to further optimize each generated RBF hidden node parameters, the optimized structure-adaptive RBF network is used for adaptively nonlinear mapping the sample space; then, according to the number of adaptively generated RBF hidden nodes, the number of subsequent BP input nodes can be determined, and the overall SAHRBF-BP classifier is built up; finally, different training sample sets are used to train the BP network parameters in SAHRBF-BP. Compared with other algorithms applied to different data sets, experiments show the superiority of SAHRBF-BP. Especially on most low dimensional and large number of data sets, the classification performance of SAHRBF-BP outperforms other training SLFNs algorithms. PMID:27792737

  18. Classification of ductal carcinoma in situ by gene expression profiling.

    PubMed

    Hannemann, Juliane; Velds, Arno; Halfwerk, Johannes B G; Kreike, Bas; Peterse, Johannes L; van de Vijver, Marc J

    2006-01-01

    Ductal carcinoma in situ (DCIS) is characterised by the intraductal proliferation of malignant epithelial cells. Several histological classification systems have been developed, but assessing the histological type/grade of DCIS lesions is still challenging, making treatment decisions based on these features difficult. To obtain insight in the molecular basis of the development of different types of DCIS and its progression to invasive breast cancer, we have studied differences in gene expression between different types of DCIS and between DCIS and invasive breast carcinomas. Gene expression profiling using microarray analysis has been performed on 40 in situ and 40 invasive breast cancer cases. DCIS cases were classified as well- (n = 6), intermediately (n = 18), and poorly (n = 14) differentiated type. Of the 40 invasive breast cancer samples, five samples were grade I, 11 samples were grade II, and 24 samples were grade III. Using two-dimensional hierarchical clustering, the basal-like type, ERB-B2 type, and the luminal-type tumours originally described for invasive breast cancer could also be identified in DCIS. Using supervised classification, we identified a gene expression classifier of 35 genes, which differed between DCIS and invasive breast cancer; a classifier of 43 genes could be identified separating between well- and poorly differentiated DCIS samples.

  19. Classification of ductal carcinoma in situ by gene expression profiling

    PubMed Central

    Hannemann, Juliane; Velds, Arno; Halfwerk, Johannes BG; Kreike, Bas; Peterse, Johannes L; van de Vijver, Marc J

    2006-01-01

    Introduction Ductal carcinoma in situ (DCIS) is characterised by the intraductal proliferation of malignant epithelial cells. Several histological classification systems have been developed, but assessing the histological type/grade of DCIS lesions is still challenging, making treatment decisions based on these features difficult. To obtain insight in the molecular basis of the development of different types of DCIS and its progression to invasive breast cancer, we have studied differences in gene expression between different types of DCIS and between DCIS and invasive breast carcinomas. Methods Gene expression profiling using microarray analysis has been performed on 40 in situ and 40 invasive breast cancer cases. Results DCIS cases were classified as well- (n = 6), intermediately (n = 18), and poorly (n = 14) differentiated type. Of the 40 invasive breast cancer samples, five samples were grade I, 11 samples were grade II, and 24 samples were grade III. Using two-dimensional hierarchical clustering, the basal-like type, ERB-B2 type, and the luminal-type tumours originally described for invasive breast cancer could also be identified in DCIS. Conclusion Using supervised classification, we identified a gene expression classifier of 35 genes, which differed between DCIS and invasive breast cancer; a classifier of 43 genes could be identified separating between well- and poorly differentiated DCIS samples. PMID:17069663

  20. Clustered lot quality assurance sampling: a pragmatic tool for timely assessment of vaccination coverage.

    PubMed

    Greenland, K; Rondy, M; Chevez, A; Sadozai, N; Gasasira, A; Abanida, E A; Pate, M A; Ronveaux, O; Okayasu, H; Pedalino, B; Pezzoli, L

    2011-07-01

    To evaluate oral poliovirus vaccine (OPV) coverage of the November 2009 round in five Northern Nigeria states with ongoing wild poliovirus transmission using clustered lot quality assurance sampling (CLQAS). We selected four local government areas in each pre-selected state and sampled six clusters of 10 children in each Local Government Area, defined as the lot area. We used three decision thresholds to classify OPV coverage: 75-90%, 55-70% and 35-50%. A full lot was completed, but we also assessed in retrospect the potential time-saving benefits of stopping sampling when a lot had been classified. We accepted two local government areas (LGAs) with vaccination coverage above 75%. Of the remaining 18 rejected LGAs, 11 also failed to reach 70% coverage, of which four also failed to reach 50%. The average time taken to complete a lot was 10 h. By stopping sampling when a decision was reached, we could have classified lots in 5.3, 7.7 and 7.3 h on average at the 90%, 70% and 50% coverage targets, respectively. Clustered lot quality assurance sampling was feasible and useful to estimate OPV coverage in Northern Nigeria. The multi-threshold approach provided useful information on the variation of IPD vaccination coverage. CLQAS is a very timely tool, allowing corrective actions to be directly taken in insufficiently covered areas. © 2011 Blackwell Publishing Ltd.

  1. The classification of secondary colorectal liver cancer in human biopsy samples using angular dispersive x-ray diffraction and multivariate analysis

    NASA Astrophysics Data System (ADS)

    Theodorakou, Chrysoula; Farquharson, Michael J.

    2009-08-01

    The motivation behind this study is to assess whether angular dispersive x-ray diffraction (ADXRD) data, processed using multivariate analysis techniques, can be used for classifying secondary colorectal liver cancer tissue and normal surrounding liver tissue in human liver biopsy samples. The ADXRD profiles from a total of 60 samples of normal liver tissue and colorectal liver metastases were measured using a synchrotron radiation source. The data were analysed for 56 samples using nonlinear peak-fitting software. Four peaks were fitted to all of the ADXRD profiles, and the amplitude, area, amplitude and area ratios for three of the four peaks were calculated and used for the statistical and multivariate analysis. The statistical analysis showed that there are significant differences between all the peak-fitting parameters and ratios between the normal and the diseased tissue groups. The technique of soft independent modelling of class analogy (SIMCA) was used to classify normal liver tissue and colorectal liver metastases resulting in 67% of the normal tissue samples and 60% of the secondary colorectal liver tissue samples being classified correctly. This study has shown that the ADXRD data of normal and secondary colorectal liver cancer are statistically different and x-ray diffraction data analysed using multivariate analysis have the potential to be used as a method of tissue classification.

  2. Industrial Application of Valuable Materials Generated from PLK Rock-A Bauxite Mining Waste

    NASA Astrophysics Data System (ADS)

    Swain, Ranjita; Routray, Sunita; Mohapatra, Abhisek; Ranjan Patra, Biswa

    2018-03-01

    PLK rock classified in to two products after a selective grinding to a particular size fraction. PLK rocks ground to below 45-micron size which is followed by a classifier i.e. hydrocyclone. The ground product classified in to different sizes of apex and vortex finder. The pressure gauge was attached for the measurement of the pressure. The production of fines is also increasing with increase in the vortex finder diameter. In order to increase in the feed capacity of the hydrocyclone, the vortex finder 11.1 mm diameter and the spigot diameter 8.0 mm has been considered as the best optimum condition for recovery of fines from PLK rock sample. The overflow sample contains 5.39% iron oxide (Fe2O3) with 0.97% of TiO2 and underflow sample contains 1.87% Fe2O3 with 2.39% of TiO2. The cut point or separation size of overflow sample is 25 μm. The efficiency of separation, or the so-called imperfection I, is at 6 μm size. In this study, the iron oxide content in underflow sample is less than 2% which is suitable for making of refractory application. The overflow sample is very fine which can also be a raw material for ceramic industry as well as a cosmetic product.

  3. Bayes-LQAS: classifying the prevalence of global acute malnutrition

    PubMed Central

    2010-01-01

    Lot Quality Assurance Sampling (LQAS) applications in health have generally relied on frequentist interpretations for statistical validity. Yet health professionals often seek statements about the probability distribution of unknown parameters to answer questions of interest. The frequentist paradigm does not pretend to yield such information, although a Bayesian formulation might. This is the source of an error made in a recent paper published in this journal. Many applications lend themselves to a Bayesian treatment, and would benefit from such considerations in their design. We discuss Bayes-LQAS (B-LQAS), which allows for incorporation of prior information into the LQAS classification procedure, and thus shows how to correct the aforementioned error. Further, we pay special attention to the formulation of Bayes Operating Characteristic Curves and the use of prior information to improve survey designs. As a motivating example, we discuss the classification of Global Acute Malnutrition prevalence and draw parallels between the Bayes and classical classifications schemes. We also illustrate the impact of informative and non-informative priors on the survey design. Results indicate that using a Bayesian approach allows the incorporation of expert information and/or historical data and is thus potentially a valuable tool for making accurate and precise classifications. PMID:20534159

  4. Bayes-LQAS: classifying the prevalence of global acute malnutrition.

    PubMed

    Olives, Casey; Pagano, Marcello

    2010-06-09

    Lot Quality Assurance Sampling (LQAS) applications in health have generally relied on frequentist interpretations for statistical validity. Yet health professionals often seek statements about the probability distribution of unknown parameters to answer questions of interest. The frequentist paradigm does not pretend to yield such information, although a Bayesian formulation might. This is the source of an error made in a recent paper published in this journal. Many applications lend themselves to a Bayesian treatment, and would benefit from such considerations in their design. We discuss Bayes-LQAS (B-LQAS), which allows for incorporation of prior information into the LQAS classification procedure, and thus shows how to correct the aforementioned error. Further, we pay special attention to the formulation of Bayes Operating Characteristic Curves and the use of prior information to improve survey designs. As a motivating example, we discuss the classification of Global Acute Malnutrition prevalence and draw parallels between the Bayes and classical classifications schemes. We also illustrate the impact of informative and non-informative priors on the survey design. Results indicate that using a Bayesian approach allows the incorporation of expert information and/or historical data and is thus potentially a valuable tool for making accurate and precise classifications.

  5. Enhanced Mass Defect Filtering To Simplify and Classify Complex Mixtures of Lignin Degradation Products.

    PubMed

    Dier, Tobias K F; Egele, Kerstin; Fossog, Verlaine; Hempelmann, Rolf; Volmer, Dietrich A

    2016-01-19

    High resolution mass spectrometry was utilized to study the highly complex product mixtures resulting from electrochemical breakdown of lignin. As most of the chemical structures of the degradation products were unknown, enhanced mass defect filtering techniques were implemented to simplify the characterization of the mixtures. It was shown that the implemented ionization techniques had a major impact on the range of detectable breakdown products, with atmospheric pressure photoionization in negative ionization mode providing the widest coverage in our experiments. Different modified Kendrick mass plots were used as a basis for mass defect filtering, where Kendrick mass defect and the mass defect of the lignin-specific guaiacol (C7H7O2) monomeric unit were utilized, readily allowing class assignments independent of the oligomeric state of the product. The enhanced mass defect filtering strategy therefore provided rapid characterization of the sample composition. In addition, the structural similarities between the compounds within a degradation sequence were determined by comparison to a tentatively identified product of this compound series. In general, our analyses revealed that primarily breakdown products with low oxygen content were formed under electrochemical conditions using protic ionic liquids as solvent for lignin.

  6. Scene Segmentation For Autonomous Robotic Navigation Using Sequential Laser Projected Structured Light

    NASA Astrophysics Data System (ADS)

    Brown, C. David; Ih, Charles S.; Arce, Gonzalo R.; Fertell, David A.

    1987-01-01

    Vision systems for mobile robots or autonomous vehicles navigating in an unknown terrain environment must provide a rapid and accurate method of segmenting the scene ahead into regions of pathway and background. A major distinguishing feature between the pathway and background is the three dimensional texture of these two regions. Typical methods of textural image segmentation are very computationally intensive, often lack the required robustness, and are incapable of sensing the three dimensional texture of various regions of the scene. A method is presented where scanned laser projected lines of structured light, viewed by a stereoscopically located single video camera, resulted in an image in which the three dimensional characteristics of the scene were represented by the discontinuity of the projected lines. This image was conducive to processing with simple regional operators to classify regions as pathway or background. Design of some operators and application methods, and demonstration on sample images are presented. This method provides rapid and robust scene segmentation capability that has been implemented on a microcomputer in near real time, and should result in higher speed and more reliable robotic or autonomous navigation in unstructured environments.

  7. Remarkable difference of somatic mutation patterns between oncogenes and tumor suppressor genes.

    PubMed

    Liu, Haoxuan; Xing, Yuhang; Yang, Sihai; Tian, Dacheng

    2011-12-01

    Cancers arise owing to mutations that confer selective growth advantages on the cells in a subset of tumor suppressor and/or oncogenes. To understand oncogenesis and diagnose cancers, it is crucial to discriminate these two groups of genes by using the difference in their mutation patterns. Here, we investigated>120,000 mutation samples in 66 well-known tumor suppressor genes and oncogenes of the COSMIC database, and found a set of significant differences in mutation patterns (e.g., non-3n-indel, non-sense SNP and mutation hotspot) between them. By screening the best measurement, we developed indices to readily distinguish one from another and predict clearly the unknown oncogenesis genes as tumor suppressors (e.g., ASXL1, HNF1A and KDM6A) or oncogenes (e.g., FOXL2, MYD88 and TSHR). Based on our results, a third gene group can be classified, which has a mutational pattern between tumor suppressors and oncogenes. The concept of the third gene group could help to understand gene function in different cancers or individual patients and to know the exact function of genes in oncogenesis. In conclusion, our study provides further insights into cancer-related genes and identifies several potential therapeutic targets.

  8. An exploratory study of clinical measures associated with subsyndromal pathological gambling in patients with binge eating disorder.

    PubMed

    Yip, Sarah W; White, Marney A; Grilo, Carlos M; Potenza, Marc N

    2011-06-01

    Both binge eating disorder (BED) and pathological gambling (PG) are characterized by impairments in impulse control. Subsyndromal levels of PG have been associated with measures of adverse health. The nature and significance of PG features in individuals with BED is unknown. Ninety-four patients with BED (28 men and 66 women) were classified by gambling group based on inclusionary criteria for Diagnostic and Statistical Manual-IV (DSM-IV) PG and compared on a range of behavioral, psychological and eating disorder (ED) psychopathology variables. One individual (1.1% of the sample) met criteria for PG, although 18.7% of patients with BED displayed one or more DSM-IV criteria for PG, hereafter referred to as problem gambling features. Men were more likely than women to have problem gambling features. BED patients with problem gambling features were distinguished by lower self-esteem and greater substance problem use. After controlling for gender, findings of reduced self-esteem and increased substance problem use among patients with problem gambling features remained significant. In patients with BED, problem gambling features are associated with a number of heightened clinical problems.

  9. Tactile Evaluation Feedback System for Multi-Layered Structure Inspired by Human Tactile Perception Mechanism.

    PubMed

    Hashim, Iza Husna Mohamad; Kumamoto, Shogo; Takemura, Kenjiro; Maeno, Takashi; Okuda, Shin; Mori, Yukio

    2017-11-11

    Tactile sensation is one type of valuable feedback in evaluating a product. Conventionally, sensory evaluation is used to get direct subjective responses from the consumers, in order to improve the product's quality. However, this method is a time-consuming and costly process. Therefore, this paper proposes a novel tactile evaluation system that can give tactile feedback from a sensor's output. The main concept of this system is hierarchically layering the tactile sensation, which is inspired by the flow of human perception. The tactile sensation is classified from low-order of tactile sensation (LTS) to high-order of tactile sensation (HTS), and also to preference. Here, LTS will be correlated with physical measures. Furthermore, the physical measures that are used to correlate with LTS are selected based on four main aspects of haptic information (roughness, compliance, coldness, and slipperiness), which are perceived through human tactile sensors. By using statistical analysis, the correlation between each hierarchy was obtained, and the preference was derived in terms of physical measures. A verification test was conducted by using unknown samples to determine the reliability of the system. The results showed that the system developed was capable of estimating preference with an accuracy of approximately 80%.

  10. ANN expert system screening for illicit amphetamines using molecular descriptors

    NASA Astrophysics Data System (ADS)

    Gosav, S.; Praisler, M.; Dorohoi, D. O.

    2007-05-01

    The goal of this study was to develop and an artificial neural network (ANN) based on computed descriptors, which would be able to classify the molecular structures of potential illicit amphetamines and to derive their biological activity according to the similarity of their molecular structure with amphetamines of known toxicity. The system is necessary for testing new molecular structures for epidemiological, clinical, and forensic purposes. It was built using a database formed by 146 compounds representing drugs of abuse (mainly central stimulants, hallucinogens, sympathomimetic amines, narcotics and other potent analgesics), precursors, or derivatized counterparts. Their molecular structures were characterized by computing three types of descriptors: 38 constitutional descriptors (CDs), 69 topological descriptors (TDs) and 160 3D-MoRSE descriptors (3DDs). An ANN system was built for each category of variables. All three networks (CD-NN, TD-NN and 3DD-NN) were trained to distinguish between stimulant amphetamines, hallucinogenic amphetamines, and nonamphetamines. A selection of variables was performed when necessary. The efficiency with which each network identifies the class identity of an unknown sample was evaluated by calculating several figures of merit. The results of the comparative analysis are presented.

  11. A Portable Electronic Nose for Toxic Vapor Detection, Identification, and Quantification

    NASA Technical Reports Server (NTRS)

    Linnell, B. R.; Young, R. C.; Griffin, T. P.; Meneghelli, B. J.; Peterson, B. V.; Brooks, K. B.

    2005-01-01

    The Space Program and military use large quantities of hydrazine and monomethyl hydrazine as rocket propellant, which are very toxic and suspected human carcinogens. Current off-the-shelf portable instruments require 10 to 20 minutes of exposure to detect these compounds at the minimum required concentrations and are prone to false positives, making them unacceptable for many operations. In addition, post-mission analyses of grab bag air samples from the Shuttle have confirmed the occasional presence of on-board volatile organic contaminants, which also need to be monitored to ensure crew safety. A new prototype instrument based on electronic nose (e-nose) technology has demonstrated the ability to qualify (identify) and quantify many of these vapors at their minimum required concentrations, and may easily be adapted to detect many other toxic vapors. To do this, it was necessary to develop algorithms to classify unknown vapors, recognize when a vapor is not any of the vapors of interest, and estimate the concentrations of the contaminants. This paper describes the design of the portable e-nose instrument, test equipment setup, test protocols, pattern recognition algorithms, concentration estimation methods, and laboratory test results.

  12. Serum calprotectin levels correlate with biochemical and histological markers of disease activity in TNBS colitis

    PubMed Central

    Cury, Didia Bismara; Mizsputen, Sender Jankiel; Versolato, Clara; Miiji, Luciana Odashiro; Pereira, Edson; Delboni, Maria Aparecida; Schor, Nestor; Moss, Alan C.

    2014-01-01

    Background and aim Serum calprotectin is elevated in patients with inflammatory bowel disease (IBD). Whether it correlates other markers of disease activity is unknown. The aim of this study was to correlate serum calprotectin with biochemical and histological measures of intestinal inflammation. Materials and methods TNBS colitis was induced in wistar rats, and serial blood samples were collected at 0, 3, and 12 days. Animals were subsequently sacrificed for pathological evaluation at day 12. Serum calprotectin and cytokines were measured by ELISA. Pathologic changes were classified at the macroscopic and microscopic levels. Results TNBS colitis induced elevated serum calprotectin, TNF and IL-6 within 24 h. Levels of serum calprotectin remained elevated in parallel to persistence of loose stool and weight loss to day 12. Serum calprotectin levels correlated with serum levels of TNF-α and IL6 (p < 0.001), but not CRP. Animals with liquid stool had significantly higher levels of serum calprotectin than control animals. There was a correlation between macroscopic colitis scores, and levels of serum calprotectin. Conclusion Serum calprotectin levels correlate with biochemical and histological markers of inflammation in TNBS colitis. This biomarker may have potential for diagnostic use in patients with IBD. PMID:23685388

  13. The diagnostic value of interleukin-6 and interleukin-8 for early prediction of bacteremia and sepsis in children with febrile neutropenia and cancer.

    PubMed

    Urbonas, Vincas; Eidukaitė, Audronė; Tamulienė, Indrė

    2012-03-01

    Early diagnosis of sepsis in children with febrile neutropenia and cancer still remains a challenge for modern medicine because of lack of specific laboratory markers and clinical signs especially at the beginning of the infection. The objective of this study was to evaluate the ability of interleukin-6 and interleukin-8 to predict bacteremia and sepsis during the first 2 days in oncohematologic patients with febrile neutropenia. A total of 61 febrile neutropenic episodes in 37 children were studied. Serum samples were collected on day 1 and day 2 from the onset of fever and analyzed using an automated random access analyzer. Neutropenic children with febrile episodes were classified into the following 2 groups: (1) fever of unknown origin group--patients with a negative blood culture--and (2) bacteremia/sepsis group--patients with a positive blood culture or clinical sepsis. High negative predictive values were found on day 1 for interleukin-6 and interleukin-8 (89% and 82%, respectively) for exclusion of bacteremia/sepsis. These interleukins could be used as a screening tool for the rejection of sepsis or bacteremia on the first day of fever in neutropenic children with cancer.

  14. [Food allergy or food intolerance?].

    PubMed

    Maître, S; Maniu, C-M; Buss, G; Maillard, M H; Spertini, F; Ribi, C

    2014-04-16

    Adverse food reactions can be classified into two main categories depending on wether an immune mechanism is involved or not. The first category includes immune mediated reactions like IgE mediated food allergy, eosinophilic oesophagitis, food protein-induced enterocolitis syndrome and celiac disease. The second category implies non-immune mediated adverse food reactions, also called food intolerances. Intoxications, pharmacologic reactions, metabolic reactions, physiologic, psychologic or reactions with an unknown mechanism belong to this category. We present a classification of adverse food reactions based on the pathophysiologic mechanism that can be useful for both diagnostic approach and management.

  15. [Congenital esophageal stenosis: a case report].

    PubMed

    Oquendo, Raquel; Resumil, Gisela; Villafañe, Vanesa; Flores, Mariana; Navacchia, Daniel; Quintana, Carlos

    2014-03-01

    Congenital esophageal stenosis, a rare disease of unknown cause which reports have increased in the last few years, requires a high index of suspicion for its diagnosis and treatment. It can be classified in three types based on the etiology of the stenosis: tracheobronchial rest, fibromuscular hypertrophy and membranous diaphragm. Symptoms may vary depending on location and severity of the stenosis. Treatment options are based on clinical suspicion of the histologic type and they can be balloon dilation or surgical resection of the stenotic segment. The definitive diagnosis is the histological study.

  16. Molecular toolbox for the identification of unknown genetically modified organisms.

    PubMed

    Ruttink, Tom; Demeyer, Rolinde; Van Gulck, Elke; Van Droogenbroeck, Bart; Querci, Maddalena; Taverniers, Isabel; De Loose, Marc

    2010-03-01

    Competent laboratories monitor genetically modified organisms (GMOs) and products derived thereof in the food and feed chain in the framework of labeling and traceability legislation. In addition, screening is performed to detect the unauthorized presence of GMOs including asynchronously authorized GMOs or GMOs that are not officially registered for commercialization (unknown GMOs). Currently, unauthorized or unknown events are detected by screening blind samples for commonly used transgenic elements, such as p35S or t-nos. If (1) positive detection of such screening elements shows the presence of transgenic material and (2) all known GMOs are tested by event-specific methods but are not detected, then the presence of an unknown GMO is inferred. However, such evidence is indirect because it is based on negative observations and inconclusive because the procedure does not identify the causative event per se. In addition, detection of unknown events is hampered in products that also contain known authorized events. Here, we outline alternative approaches for analytical detection and GMO identification and develop new methods to complement the existing routine screening procedure. We developed a fluorescent anchor-polymerase chain reaction (PCR) method for the identification of the sequences flanking the p35S and t-nos screening elements. Thus, anchor-PCR fingerprinting allows the detection of unique discriminative signals per event. In addition, we established a collection of in silico calculated fingerprints of known events to support interpretation of experimentally generated anchor-PCR GM fingerprints of blind samples. Here, we first describe the molecular characterization of a novel GMO, which expresses recombinant human intrinsic factor in Arabidopsis thaliana. Next, we purposefully treated the novel GMO as a blind sample to simulate how the new methods lead to the molecular identification of a novel unknown event without prior knowledge of its transgene sequence. The results demonstrate that the new methods complement routine screening procedures by providing direct conclusive evidence and may also be useful to resolve masking of unknown events by known events.

  17. Front-End Processing of Cell Lysates for Enhanced Chip-Based Detection

    DTIC Science & Technology

    2006-07-28

    manipulation used in lab-on-a-chip devices. A small unknown sample is first mixed with the PNA surfactants (“PNAA”) to tag the DNA targets, and then the...unknown sample is first mixed with the PNA surfactants (hereafter referred to as “PNA amphiphiles” or “PNAA”) to tag the DNA targets, and then the...prolate ellipsoid, and mixed PNAA/SDS micelles form spherical micelles. On addition of complementary DNA, the PNAA/DNA duplexes do not participate in

  18. A comparative study of nonparametric methods for pattern recognition

    NASA Technical Reports Server (NTRS)

    Hahn, S. F.; Nelson, G. D.

    1972-01-01

    The applied research discussed in this report determines and compares the correct classification percentage of the nonparametric sign test, Wilcoxon's signed rank test, and K-class classifier with the performance of the Bayes classifier. The performance is determined for data which have Gaussian, Laplacian and Rayleigh probability density functions. The correct classification percentage is shown graphically for differences in modes and/or means of the probability density functions for four, eight and sixteen samples. The K-class classifier performed very well with respect to the other classifiers used. Since the K-class classifier is a nonparametric technique, it usually performed better than the Bayes classifier which assumes the data to be Gaussian even though it may not be. The K-class classifier has the advantage over the Bayes in that it works well with non-Gaussian data without having to determine the probability density function of the data. It should be noted that the data in this experiment was always unimodal.

  19. An ensemble of dissimilarity based classifiers for Mackerel gender determination

    NASA Astrophysics Data System (ADS)

    Blanco, A.; Rodriguez, R.; Martinez-Maranon, I.

    2014-03-01

    Mackerel is an infravalored fish captured by European fishing vessels. A manner to add value to this specie can be achieved by trying to classify it attending to its sex. Colour measurements were performed on Mackerel females and males (fresh and defrozen) extracted gonads to obtain differences between sexes. Several linear and non linear classifiers such as Support Vector Machines (SVM), k Nearest Neighbors (k-NN) or Diagonal Linear Discriminant Analysis (DLDA) can been applied to this problem. However, theyare usually based on Euclidean distances that fail to reflect accurately the sample proximities. Classifiers based on non-Euclidean dissimilarities misclassify a different set of patterns. We combine different kind of dissimilarity based classifiers. The diversity is induced considering a set of complementary dissimilarities for each model. The experimental results suggest that our algorithm helps to improve classifiers based on a single dissimilarity.

  20. Actively learning to distinguish suspicious from innocuous anomalies in a batch of vehicle tracks

    NASA Astrophysics Data System (ADS)

    Qiu, Zhicong; Miller, David J.; Stieber, Brian; Fair, Tim

    2014-06-01

    We investigate the problem of actively learning to distinguish between two sets of anomalous vehicle tracks, innocuous" and suspicious", starting from scratch, without any initial examples of suspicious" and with no prior knowledge of what an operator would deem suspicious. This two-class problem is challenging because it is a priori unknown which track features may characterize the suspicious class. Furthermore, there is inherent imbalance in the sizes of the labeled innocuous" and suspicious" sets, even after some suspicious examples are identified. We present a comprehensive solution wherein a classifier learns to discriminate suspicious from innocuous based on derived p-value track features. Through active learning, our classifier thus learns the types of anomalies on which to base its discrimination. Our solution encompasses: i) judicious choice of kinematic p-value based features conditioned on the road of origin, along with more explicit features that capture unique vehicle behavior (e.g. U-turns); ii) novel semi-supervised learning that exploits information in the unlabeled (test batch) tracks, and iii) evaluation of several classifier models (logistic regression, SVMs). We find that two active labeling streams are necessary in practice in order to have efficient classifier learning while also forwarding (for labeling) the most actionable tracks. Experiments on wide-area motion imagery (WAMI) tracks, extracted via a system developed by Toyon Research Corporation, demonstrate the strong ROC AUC performance of our system, with sparing use of operator-based active labeling.

  1. Identification of DNA-Binding Proteins Using Structural, Electrostatic and Evolutionary Features

    PubMed Central

    Nimrod, Guy; Szilágyi, András; Leslie, Christina; Ben-Tal, Nir

    2009-01-01

    Summary DNA binding proteins (DBPs) often take part in various crucial processes of the cell's life cycle. Therefore, the identification and characterization of these proteins are of great importance. We present here a random forests classifier for identifying DBPs among proteins with known three-dimensional structures. First, clusters of evolutionarily conserved regions (patches) on the protein's surface are detected using the PatchFinder algorithm; previous studies showed that these regions are typically the proteins' functionally important regions. Next, we train a classifier using features like the electrostatic potential, cluster-based amino acid conservation patterns and the secondary structure content of the patches, as well as features of the whole protein including its dipole moment. Using 10-fold cross validation on a dataset of 138 DNA-binding proteins and 110 proteins which do not bind DNA, the classifier achieved a sensitivity and a specificity of 0.90, which is overall better than the performance of previously published methods. Furthermore, when we tested 5 different methods on 11 new DBPs which did not appear in the original dataset, only our method annotated all correctly. The resulting classifier was applied to a collection of 757 proteins of known structure and unknown function. Of these proteins, 218 were predicted to bind DNA, and we anticipate that some of them interact with DNA using new structural motifs. The use of complementary computational tools supports the notion that at least some of them do bind DNA. PMID:19233205

  2. Identification of DNA-binding proteins using structural, electrostatic and evolutionary features.

    PubMed

    Nimrod, Guy; Szilágyi, András; Leslie, Christina; Ben-Tal, Nir

    2009-04-10

    DNA-binding proteins (DBPs) participate in various crucial processes in the life-cycle of the cells, and the identification and characterization of these proteins is of great importance. We present here a random forests classifier for identifying DBPs among proteins with known 3D structures. First, clusters of evolutionarily conserved regions (patches) on the surface of proteins were detected using the PatchFinder algorithm; earlier studies showed that these regions are typically the functionally important regions of proteins. Next, we trained a classifier using features like the electrostatic potential, cluster-based amino acid conservation patterns and the secondary structure content of the patches, as well as features of the whole protein, including its dipole moment. Using 10-fold cross-validation on a dataset of 138 DBPs and 110 proteins that do not bind DNA, the classifier achieved a sensitivity and a specificity of 0.90, which is overall better than the performance of published methods. Furthermore, when we tested five different methods on 11 new DBPs that did not appear in the original dataset, only our method annotated all correctly. The resulting classifier was applied to a collection of 757 proteins of known structure and unknown function. Of these proteins, 218 were predicted to bind DNA, and we anticipate that some of them interact with DNA using new structural motifs. The use of complementary computational tools supports the notion that at least some of them do bind DNA.

  3. Accurate determination of imaging modality using an ensemble of text- and image-based classifiers.

    PubMed

    Kahn, Charles E; Kalpathy-Cramer, Jayashree; Lam, Cesar A; Eldredge, Christina E

    2012-02-01

    Imaging modality can aid retrieval of medical images for clinical practice, research, and education. We evaluated whether an ensemble classifier could outperform its constituent individual classifiers in determining the modality of figures from radiology journals. Seventeen automated classifiers analyzed 77,495 images from two radiology journals. Each classifier assigned one of eight imaging modalities--computed tomography, graphic, magnetic resonance imaging, nuclear medicine, positron emission tomography, photograph, ultrasound, or radiograph-to each image based on visual and/or textual information. Three physicians determined the modality of 5,000 randomly selected images as a reference standard. A "Simple Vote" ensemble classifier assigned each image to the modality that received the greatest number of individual classifiers' votes. A "Weighted Vote" classifier weighted each individual classifier's vote based on performance over a training set. For each image, this classifier's output was the imaging modality that received the greatest weighted vote score. We measured precision, recall, and F score (the harmonic mean of precision and recall) for each classifier. Individual classifiers' F scores ranged from 0.184 to 0.892. The simple vote and weighted vote classifiers correctly assigned 4,565 images (F score, 0.913; 95% confidence interval, 0.905-0.921) and 4,672 images (F score, 0.934; 95% confidence interval, 0.927-0.941), respectively. The weighted vote classifier performed significantly better than all individual classifiers. An ensemble classifier correctly determined the imaging modality of 93% of figures in our sample. The imaging modality of figures published in radiology journals can be determined with high accuracy, which will improve systems for image retrieval.

  4. Label-free capture of breast cancer cells spiked in buffy coats using carbon nanotube antibody micro-arrays

    NASA Astrophysics Data System (ADS)

    Khosravi, Farhad; Trainor, Patrick; Rai, Shesh N.; Kloecker, Goetz; Wickstrom, Eric; Panchapakesan, Balaji

    2016-04-01

    We demonstrate the rapid and label-free capture of breast cancer cells spiked in buffy coats using nanotube-antibody micro-arrays. Single wall carbon nanotube arrays were manufactured using photo-lithography, metal deposition, and etching techniques. Anti-epithelial cell adhesion molecule (EpCAM) antibodies were functionalized to the surface of the nanotube devices using 1-pyrene-butanoic acid succinimidyl ester functionalization method. Following functionalization, plain buffy coat and MCF7 cell spiked buffy coats were adsorbed on to the nanotube device and electrical signatures were recorded for differences in interaction between samples. A statistical classifier for the ‘liquid biopsy’ was developed to create a predictive model based on dynamic time warping to classify device electrical signals that corresponded to plain (control) or spiked buffy coats (case). In training test, the device electrical signals originating from buffy versus spiked buffy samples were classified with ˜100% sensitivity, ˜91% specificity and ˜96% accuracy. In the blinded test, the signals were classified with ˜91% sensitivity, ˜82% specificity and ˜86% accuracy. A heatmap was generated to visually capture the relationship between electrical signatures and the sample condition. Confocal microscopic analysis of devices that were classified as spiked buffy coats based on their electrical signatures confirmed the presence of cancer cells, their attachment to the device and overexpression of EpCAM receptors. The cell numbers were counted to be ˜1-17 cells per 5 μl per device suggesting single cell sensitivity in spiked buffy coats that is scalable to higher volumes using the micro-arrays.

  5. Classifying Radio Galaxies with the Convolutional Neural Network

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aniyan, A. K.; Thorat, K.

    We present the application of a deep machine learning technique to classify radio images of extended sources on a morphological basis using convolutional neural networks (CNN). In this study, we have taken the case of the Fanaroff–Riley (FR) class of radio galaxies as well as radio galaxies with bent-tailed morphology. We have used archival data from the Very Large Array (VLA)—Faint Images of the Radio Sky at Twenty Centimeters survey and existing visually classified samples available in the literature to train a neural network for morphological classification of these categories of radio sources. Our training sample size for each of these categoriesmore » is ∼200 sources, which has been augmented by rotated versions of the same. Our study shows that CNNs can classify images of the FRI and FRII and bent-tailed radio galaxies with high accuracy (maximum precision at 95%) using well-defined samples and a “fusion classifier,” which combines the results of binary classifications, while allowing for a mechanism to find sources with unusual morphologies. The individual precision is highest for bent-tailed radio galaxies at 95% and is 91% and 75% for the FRI and FRII classes, respectively, whereas the recall is highest for FRI and FRIIs at 91% each, while the bent-tailed class has a recall of 79%. These results show that our results are comparable to that of manual classification, while being much faster. Finally, we discuss the computational and data-related challenges associated with the morphological classification of radio galaxies with CNNs.« less

  6. Oregon ground-water quality and its relation to hydrogeological factors; a statistical approach

    USGS Publications Warehouse

    Miller, T.L.; Gonthier, J.B.

    1984-01-01

    An appraisal of Oregon ground-water quality was made using existing data accessible through the U.S. Geological Survey computer system. The data available for about 1,000 sites were separated by aquifer units and hydrologic units. Selected statistical moments were described for 19 constituents including major ions. About 96 percent of all sites in the data base were sampled only once. The sample data were classified by aquifer unit and hydrologic unit and analysis of variance was run to determine if significant differences exist between the units within each of these two classifications for the same 19 constituents on which statistical moments were determined. Results of the analysis of variance indicated both classification variables performed about the same, but aquifer unit did provide more separation for some constituents. Samples from the Rogue River basin were classified by location within the flow system and type of flow system. The samples were then analyzed using analysis of variance on 14 constituents to determine if there were significant differences between subsets classified by flow path. Results of this analysis were not definitive, but classification as to the type of flow system did indicate potential for segregating water-quality data into distinct subsets. (USGS)

  7. Accuracy of body fat percent and adiposity indicators cut off values to detect metabolic risk factors in a sample of Mexican adults

    PubMed Central

    2014-01-01

    Background Although body fat percent (BF%) may be used for screening metabolic risk factors, its accuracy compared to BMI and waist circumference is unknown in a Mexican population. We compared the classification accuracy of BF%, BMI and WC for the detection of metabolic risk factors in a sample of Mexican adults; optimized cutoffs as well as sensitivity and specificity at commonly used BF% and BMI international cutoffs were estimated. We also estimated conditional BF% means at BMI international cutoffs. Methods We performed a cross-sectional analysis of data on body composition, anthropometry and metabolic risk factors(high glucose, high triglycerides, low HDL cholesterol and hypertension) from 5,100 Mexican men and women. The association between BMI, WC and BF%was evaluated with linear regression models. The BF%, BMI and WC optimal cutoffs for the detection of metabolic risk factors were selected at the point where sensitivity was closest to specificity. Areas under the ROC Curve (AUC) were compared among classifiers using a non-parametric method. Results After adjustment for WC, a 1% increase in BMI was associated with a BF% rise of 0.05 percentage points (p.p.) in men (P < 0.05) and 0.25 p.p. in women (P < 0.001). At BMI = 25.0 predicted BF% was 27.6 ± 0.16 (mean ± SE) in men and 41.2 ± 0.07 in women. Estimated BF% cutoffs for detection of metabolic risk factors were close to 30.0 in men and close to 44.0 in women. In men WC had higher AUC than BF% for the classification of all conditions whereas BMI had higher AUC than BF% for the classification of high triglycerides and hypertension. In womenBMI and WC had higher AUC than BF% for the classification of all metabolic risk factors. Conclusions BMI and WC were more accurate than BF% for classifying the studied metabolic disorders. International BF% cutoffs had very low specificity and thus produced a high rate of false positives in both sexes. PMID:24721260

  8. Partner Violence Before and After Couples-Based Alcoholism Treatment for Female Alcoholic Patients

    PubMed Central

    Schumm, Jeremiah A.; O'Farrell, Timothy J.; Murphy, Christopher M.; Fals-Stewart, William

    2010-01-01

    This study examined partner violence before and in the first and second year after behavioral couples therapy (BCT) for 103 married or cohabiting women seeking alcohol dependence treatment and their male partners, and used a demographically matched non-alcoholic comparison sample. The treatment sample received M = 16.7 BCT sessions over 5-6 months. Follow-up rates for the treatment sample at years 1 and 2 were 88% and 83%, respectively. In the year before BCT, 68% of female alcoholic patients had been violent toward their male partner, nearly five times the comparison sample rate of 15%. In the year after BCT, violence prevalence decreased significantly to 31% of the treatment sample. Women were classified as remitted after treatment if they demonstrated abstinence or minimal substance use and no serious consequences related to substance use. In year 1 following BCT, 45% were classified as remitted, and 49% were classified as remitted in year 2. Among remitted patients in the year after BCT, violence prevalence of 22% did not differ from the comparison sample and was significantly lower than the rate among relapsed patients (38%). Results for male-perpetrated violence and for the second year after BCT were similar to the first year. Results supported predictions that partner violence would decrease after BCT, and that clinically significant violence reductions to the level of a non-alcoholic comparison sample would occur for patients whose alcoholism was remitted after BCT. These findings replicate previous research among men with alcoholism. PMID:19968389

  9. Classification of Odours for Mobile Robots Using an Ensemble of Linear Classifiers

    NASA Astrophysics Data System (ADS)

    Trincavelli, Marco; Coradeschi, Silvia; Loutfi, Amy

    2009-05-01

    This paper investigates the classification of odours using an electronic nose mounted on a mobile robot. The samples are collected as the robot explores the environment. Under such conditions, the sensor response differs from typical three phase sampling processes. In this paper, we focus particularly on the classification problem and how it is influenced by the movement of the robot. To cope with these influences, an algorithm consisting of an ensemble of classifiers is presented. Experimental results show that this algorithm increases classification performance compared to other traditional classification methods.

  10. Decimated Input Ensembles for Improved Generalization

    NASA Technical Reports Server (NTRS)

    Tumer, Kagan; Oza, Nikunj C.; Norvig, Peter (Technical Monitor)

    1999-01-01

    Recently, many researchers have demonstrated that using classifier ensembles (e.g., averaging the outputs of multiple classifiers before reaching a classification decision) leads to improved performance for many difficult generalization problems. However, in many domains there are serious impediments to such "turnkey" classification accuracy improvements. Most notable among these is the deleterious effect of highly correlated classifiers on the ensemble performance. One particular solution to this problem is generating "new" training sets by sampling the original one. However, with finite number of patterns, this causes a reduction in the training patterns each classifier sees, often resulting in considerably worsened generalization performance (particularly for high dimensional data domains) for each individual classifier. Generally, this drop in the accuracy of the individual classifier performance more than offsets any potential gains due to combining, unless diversity among classifiers is actively promoted. In this work, we introduce a method that: (1) reduces the correlation among the classifiers; (2) reduces the dimensionality of the data, thus lessening the impact of the 'curse of dimensionality'; and (3) improves the classification performance of the ensemble.

  11. Rock images classification by using deep convolution neural network

    NASA Astrophysics Data System (ADS)

    Cheng, Guojian; Guo, Wenhui

    2017-08-01

    Granularity analysis is one of the most essential issues in authenticate under microscope. To improve the efficiency and accuracy of traditional manual work, an convolutional neural network based method is proposed for granularity analysis from thin section image, which chooses and extracts features from image samples while build classifier to recognize granularity of input image samples. 4800 samples from Ordos basin are used for experiments under colour spaces of HSV, YCbCr and RGB respectively. On the test dataset, the correct rate in RGB colour space is 98.5%, and it is believable in HSV and YCbCr colour space. The results show that the convolution neural network can classify the rock images with high reliability.

  12. 7 CFR 27.24 - Delivery of samples of cotton.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 7 Agriculture 2 2011-01-01 2011-01-01 false Delivery of samples of cotton. 27.24 Section 27.24... REGULATIONS COTTON CLASSIFICATION UNDER COTTON FUTURES LEGISLATION Regulations Inspection and Samples § 27.24 Delivery of samples of cotton. The original sample from each bale to be classified shall be delivered to...

  13. 7 CFR 27.24 - Delivery of samples of cotton.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 7 Agriculture 2 2013-01-01 2013-01-01 false Delivery of samples of cotton. 27.24 Section 27.24... REGULATIONS COTTON CLASSIFICATION UNDER COTTON FUTURES LEGISLATION Regulations Inspection and Samples § 27.24 Delivery of samples of cotton. The original sample from each bale to be classified shall be delivered to...

  14. 7 CFR 27.24 - Delivery of samples of cotton.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 7 Agriculture 2 2014-01-01 2014-01-01 false Delivery of samples of cotton. 27.24 Section 27.24... REGULATIONS COTTON CLASSIFICATION UNDER COTTON FUTURES LEGISLATION Regulations Inspection and Samples § 27.24 Delivery of samples of cotton. The original sample from each bale to be classified shall be delivered to...

  15. 7 CFR 27.24 - Delivery of samples of cotton.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 7 Agriculture 2 2012-01-01 2012-01-01 false Delivery of samples of cotton. 27.24 Section 27.24... REGULATIONS COTTON CLASSIFICATION UNDER COTTON FUTURES LEGISLATION Regulations Inspection and Samples § 27.24 Delivery of samples of cotton. The original sample from each bale to be classified shall be delivered to...

  16. 7 CFR 27.24 - Delivery of samples of cotton.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 7 Agriculture 2 2010-01-01 2010-01-01 false Delivery of samples of cotton. 27.24 Section 27.24... REGULATIONS COTTON CLASSIFICATION UNDER COTTON FUTURES LEGISLATION Regulations Inspection and Samples § 27.24 Delivery of samples of cotton. The original sample from each bale to be classified shall be delivered to...

  17. DNA barcoding using skin exuviates can improve identification and biodiversity studies of snakes.

    PubMed

    Khedkar, Trupti; Sharma, Rashmi; Tiknaik, Anita; Khedkar, Gulab; Naikwade, Bhagwat S; Ron, Tetsuzan Benny; Haymer, David

    2016-01-01

    Snakes represent a taxonomically underdeveloped group of animals in India with a lack of experts and incomplete taxonomic descriptions being the main deterrents to advances in this area. Molecular taxonomic approaches using DNA barcoding could aid in snake identification as well as studies of biodiversity. Here a non-invasive sampling method using DNA barcoding is tested using skin exuviates. Taxonomically authenticated samples were collected and tested for validation and comparisons to unknown snake exuviate samples. This approach was also used to construct the first comprehensive study targeting the snake species from Maharashtra state in India. A total of 92 skin exuviate samples were collected and tested for this study. Of these, 81 samples were successfully DNA barcoded and compared with unknown samples for assignment of taxonomic identity. Good quality DNA was obtained irrespective of age and quality of the exuviate material, and all unknown samples were successfully identified. A total of 23 species of snakes were identified, six of which were in the list of Endangered species (Red Data Book). Intra- and inter-specific distance values were also calculated, and these were sufficient to allow discrimination among species and between species without ambiguity in most cases. Two samples were suspected to represent cryptic species based on deep K2P divergence values (>3%), and one sample could be identified to the genus level only. Eleven samples failed to amplify COI sequences, suggesting the need for alternative PCR primer pairs. This study clearly documents how snake skin exuviates can be used for DNA barcoding, estimates of diversity and population genetic structuring in a noninvasive manner.

  18. Expanding the World of Marine Bacterial and Archaeal Clades

    PubMed Central

    Yilmaz, Pelin; Yarza, Pablo; Rapp, Josephine Z.; Glöckner, Frank O.

    2016-01-01

    Determining which microbial taxa are out there, where they live, and what they are doing is a driving approach in marine microbial ecology. The importance of these questions is underlined by concerted, large-scale, and global ocean sampling initiatives, for example the International Census of Marine Microbes, Ocean Sampling Day, or Tara Oceans. Given decades of effort, we know that the large majority of marine Bacteria and Archaea belong to about a dozen phyla. In addition to the classically culturable Bacteria and Archaea, at least 50 “clades,” at different taxonomic depths, exist. These account for the majority of marine microbial diversity, but there is still an underexplored and less abundant portion remaining. We refer to these hitherto unrecognized clades as unknown, as their boundaries, names, and classifications are not available. In this work, we were able to characterize up to 92 of these unknown clades found within the bacterial and archaeal phylogenetic diversity currently reported for marine water column environments. We mined the SILVA 16S rRNA gene datasets for sequences originating from the marine water column. Instead of the usual subjective taxa delineation and nomenclature methods, we applied the candidate taxonomic unit (CTU) circumscription system, along with a standardized nomenclature to the sequences in newly constructed phylogenetic trees. With this new phylogenetic and taxonomic framework, we performed an analysis of ICoMM rRNA gene amplicon datasets to gain insights into the global distribution of the new marine clades, their ecology, biogeography, and interaction with oceanographic variables. Most of the new clades we identified were interspersed by known taxa with cultivated members, whose genome sequences are available. This result encouraged us to perform metabolic predictions for the novel marine clades using the PICRUSt approach. Our work also provides an update on the taxonomy of several phyla and widely known marine clades as our CTU approach breaks down these randomly lumped clades into smaller objectively calculated subgroups. Finally, all taxa were classified and named following standards compatible with the Bacteriological Code rules, enhancing their digitization, and comparability with future microbial ecological and taxonomy studies. PMID:26779174

  19. Integrated metagenomic data analysis demonstrates that a loss of diversity in oral microbiota is associated with periodontitis.

    PubMed

    Ai, Dongmei; Huang, Ruocheng; Wen, Jin; Li, Chao; Zhu, Jiangping; Xia, Li Charlie

    2017-01-25

    Periodontitis is an inflammatory disease affecting the tissues supporting teeth (periodontium). Integrative analysis of metagenomic samples from multiple periodontitis studies is a powerful way to examine microbiota diversity and interactions within host oral cavity. A total of 43 subjects were recruited to participate in two previous studies profiling the microbial community of human subgingival plaque samples using shotgun metagenomic sequencing. We integrated metagenomic sequence data from those two studies, including six healthy controls, 14 sites representative of stable periodontitis, 16 sites representative of progressing periodontitis, and seven periodontal sites of unknown status. We applied phylogenetic diversity, differential abundance, and network analyses, as well as clustering, to the integrated dataset to compare microbiological community profiles among the different disease states. We found alpha-diversity, i.e., mean species diversity in sites or habitats at a local scale, to be the single strongest predictor of subjects' periodontitis status (P < 0.011). More specifically, healthy subjects had the highest alpha-diversity, while subjects with stable sites had the lowest alpha-diversity. From these results, we developed an alpha-diversity logistic model-based naive classifier able to perfectly predict the disease status of the seven subjects with unknown periodontal status (not used in training). Phylogenetic profiling resulted in the discovery of nine marker microbes, and these species are able to differentiate between stable and progressing periodontitis, achieving an accuracy of 94.4%. Finally, we found that the reduction of negatively correlated species is a notable signature of disease progression. Our results consistently show a strong association between the loss of oral microbiota diversity and the progression of periodontitis, suggesting that metagenomics sequencing and phylogenetic profiling are predictive of early periodontitis, leading to potential therapeutic intervention. Our results also support a keystone pathogen-mediated polymicrobial synergy and dysbiosis (PSD) model to explain the etiology of periodontitis. Apart from P. gingivalis, we identified three additional keystone species potentially mediating the progression of periodontitis progression based on pathogenic characteristics similar to those of known keystone pathogens.

  20. Performance of the fourth-generation Bio-Rad GS HIV Combo Ag/Ab enzyme immunoassay for diagnosis of HIV infection in Southern Africa

    PubMed Central

    Piwowar-Manning, Estelle; Fogel, Jessica M.; Richardson, Paul; Wolf, Shauna; Clarke, William; Marzinke, Mark A.; Fiamma, Agnès; Donnell, Deborah; Kulich, Michal; Mbwambo, Jessie K.K.; Richter, Linda; Gray, Glenda; Sweat, Michael; Coates, Thomas J.; Eshleman, Susan H.

    2015-01-01

    Background Fourth-generation HIV assays detect both antigen and antibody, facilitating detection of acute/early HIV infection. The Bio-Rad GS HIV Combo Ag/Ab assay (Bio-Rad Combo) is an enzyme immunoassay that simultaneously detects HIV p24 antigen and antibodies to HIV-1 and HIV-2 in serum or plasma. Objective To evaluate the performance of the Bio-Rad Combo assay for detection of HIV infection in adults from Southern Africa. Study design Samples were obtained from adults in Soweto and Vulindlela, South Africa and Dar es Salaam, Tanzania (300 HIV-positive samples; 300 HIV-negative samples; 12 samples from individuals previously classified as having acute/early HIV infection). The samples were tested with the Bio-Rad Combo assay. Additional testing was performed to characterize the 12 acute/early samples. Results All 300 HIV-positive samples were reactive using the Bio-Rad Combo assay; false positive test results were obtained for 10 (3.3%) of the HIV-negative samples (sensitivity: 100%, 95% confidence interval [CI]: 98.8–100%); specificity: 96.7%, 95% CI: 94.0–98.4%). The assay detected 10 of the 12 infections classified as acute/early. The two infections that were not detected had viral loads < 400 copies/mL; one of those samples contained antiretroviral drugs consistent with antiretroviral therapy. Conclusions The Bio-Rad Combo assay correctly classified the majority of study specimens. The specificity reported here may be higher than that seen in other settings, since HIV-negative samples were pre-screened using a different fourth-generation test. The assay also had high sensitivity for detection of acute/early infection. False-negative test results may be obtained in individuals who are virally suppressed. PMID:25542477

  1. Intestinal parasites and genotyping of Giardia duodenalis in children: first report of genotype B in isolates from human clinical samples in Mexico

    PubMed Central

    Torres-Romero, Julio César; Euan-Canto, Antonio de Jesus; Benito-González, Namibya; Padilla-Montaño, Nayely; Huchin-Chan, Claribel; Lara-Riegos, Julio; Cedillo-Rivera, Roberto

    2014-01-01

    Giardia duodenalis is one of the most prevalent enteroparasites in children. This parasite produces several clinical manifestations. The aim of this study was to determine the prevalence of genotypes of G. duodenalis causing infection in a region of southeastern Mexico. G. duodenalis cysts were isolated (33/429) from stool samples of children and molecular genotyping was performed by polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP) analysis, targeting the triosephosphate isomerase ( tpi ) and glutamate dehydrogenase ( gdh ) genes. The tpi gene was amplified in all of the cyst samples, either for assemblage A (27 samples) or assemblage B (6 samples). RFLP analysis classified the 27 tpi -A amplicons in assemblage A, subgenotype I. Samples classified as assemblage B were further analysed using PCR-RFLP of the gdh gene and identified as assemblage B, subgenotype III. To our knowledge, this is the first report of assemblage B of G. duodenalis in human clinical samples from Mexico. PMID:24676655

  2. Feature selection and classification of multiparametric medical images using bagging and SVM

    NASA Astrophysics Data System (ADS)

    Fan, Yong; Resnick, Susan M.; Davatzikos, Christos

    2008-03-01

    This paper presents a framework for brain classification based on multi-parametric medical images. This method takes advantage of multi-parametric imaging to provide a set of discriminative features for classifier construction by using a regional feature extraction method which takes into account joint correlations among different image parameters; in the experiments herein, MRI and PET images of the brain are used. Support vector machine classifiers are then trained based on the most discriminative features selected from the feature set. To facilitate robust classification and optimal selection of parameters involved in classification, in view of the well-known "curse of dimensionality", base classifiers are constructed in a bagging (bootstrap aggregating) framework for building an ensemble classifier and the classification parameters of these base classifiers are optimized by means of maximizing the area under the ROC (receiver operating characteristic) curve estimated from their prediction performance on left-out samples of bootstrap sampling. This classification system is tested on a sex classification problem, where it yields over 90% classification rates for unseen subjects. The proposed classification method is also compared with other commonly used classification algorithms, with favorable results. These results illustrate that the methods built upon information jointly extracted from multi-parametric images have the potential to perform individual classification with high sensitivity and specificity.

  3. Multiwavelet grading of prostate pathological images

    NASA Astrophysics Data System (ADS)

    Soltanian-Zadeh, Hamid; Jafari-Khouzani, Kourosh

    2002-05-01

    We have developed image analysis methods to automatically grade pathological images of prostate. The proposed method generates Gleason grades to images, where each image is assigned a grade between 1 and 5. This is done using features extracted from multiwavelet transformations. We extract energy and entropy features from submatrices obtained in the decomposition. Next, we apply a k-NN classifier to grade the image. To find optimal multiwavelet basis, preprocessing, and classifier, we use features extracted by different multiwavelets with either critically sampled preprocessing or repeated row preprocessing and different k-NN classifiers and compare their performances, evaluated by total misclassification rate (TMR). To evaluate sensitivity to noise, we add white Gaussian noise to images and compare the results (TMR's). We applied proposed methods to 100 images. We evaluated the first and second levels of decomposition using Geronimo, Hardin, and Massopust (GHM), Chui and Lian (CL), and Shen (SA4) multiwavelets. We also evaluated k-NN classifier for k=1,2,3,4,5. Experimental results illustrate that first level of decomposition is quite noisy. They also show that critically sampled preprocessing outperforms repeated row preprocessing and has less sensitivity to noise. Finally, comparison studies indicate that SA4 multiwavelet and k-NN classifier (k=1) generates optimal results (with smallest TMR of 3%).

  4. Guidelines for the identification of unknown samples for laboratories performing forensic analyses for chemical terrorism.

    PubMed

    Magnuson, Matthew L; Satzger, R Duane; Alcaraz, Armando; Brewer, Jason; Fetterolf, Dean; Harper, Martin; Hrynchuk, Ronald; McNally, Mary F; Montgomery, Madeline; Nottingham, Eric; Peterson, James; Rickenbach, Michael; Seidel, Jimmy L; Wolnik, Karen

    2012-05-01

    Since the early 1990s, the FBI Laboratory has sponsored Scientific Working Groups to improve discipline practices and build consensus among the forensic community. The Scientific Working Group on the Forensic Analysis of Chemical, Biological, Radiological and Nuclear Terrorism developed guidance, contained in this document, on issues forensic laboratories encounter when accepting and analyzing unknown samples associated with chemical terrorism, including laboratory capabilities and analytical testing plans. In the context of forensic analysis of chemical terrorism, this guidance defines an unknown sample and addresses what constitutes definitive and tentative identification. Laboratory safety, reporting issues, and postreporting considerations are also discussed. Utilization of these guidelines, as part of planning for forensic analysis related to a chemical terrorism incident, may help avoid unfortunate consequences not only to the public but also to the laboratory personnel. 2011 American Academy of Forensic Sciences. Published 2011. This article is a U.S. Government work and is in the public domain in the U.S.A.

  5. Soil-Gas Radon Anomaly Map of an Unknown Fault Zone Area, Chiang Mai, Northern Thailand

    NASA Astrophysics Data System (ADS)

    Udphuay, S.; Kaweewong, C.; Imurai, W.; Pondthai, P.

    2015-12-01

    Soil-gas radon concentration anomaly map was constructed to help detect an unknown subsurface fault location in San Sai District, Chiang Mai Province, Northern Thailand where a 5.1-magnitude earthquake took place in December 2006. It was suspected that this earthquake may have been associated with an unrecognized active fault in the area. In this study, soil-gas samples were collected from eighty-four measuring stations covering an area of approximately 50 km2. Radon in soil-gas samples was quantified using Scintrex Radon Detector, RDA-200. The samplings were conducted twice: during December 2014-January 2015 and March 2015-April 2015. The soil-gas radon map obtained from this study reveals linear NNW-SSE trend of high concentration. This anomaly corresponds to the direction of the prospective fault system interpreted from satellite images. The findings from this study support the existence of this unknown fault system. However a more detailed investigation should be conducted in order to confirm its geometry, orientation and lateral extent.

  6. Methods for detecting and correcting inaccurate results in inductively coupled plasma-atomic emission spectrometry

    DOEpatents

    Chan, George C. Y. [Bloomington, IN; Hieftje, Gary M [Bloomington, IN

    2010-08-03

    A method for detecting and correcting inaccurate results in inductively coupled plasma-atomic emission spectrometry (ICP-AES). ICP-AES analysis is performed across a plurality of selected locations in the plasma on an unknown sample, collecting the light intensity at one or more selected wavelengths of one or more sought-for analytes, creating a first dataset. The first dataset is then calibrated with a calibration dataset creating a calibrated first dataset curve. If the calibrated first dataset curve has a variability along the location within the plasma for a selected wavelength, errors are present. Plasma-related errors are then corrected by diluting the unknown sample and performing the same ICP-AES analysis on the diluted unknown sample creating a calibrated second dataset curve (accounting for the dilution) for the one or more sought-for analytes. The cross-over point of the calibrated dataset curves yields the corrected value (free from plasma related errors) for each sought-for analyte.

  7. Raman spectroscopic detection and identification of Burkholderia mallei and Burkholderia pseudomallei in feedstuff.

    PubMed

    Stöckel, Stephan; Meisel, Susann; Elschner, Mandy; Melzer, Falk; Rösch, Petra; Popp, Jürgen

    2015-01-01

    Burkholderia mallei (the etiologic agent of glanders in equines and rarely humans) and Burkholderia pseudomallei, causing melioidosis in humans and animals, are designated category B biothreat agents. The intrinsically high resistance of both agents to many antibiotics, their potential use as bioweapons, and their low infectious dose, necessitate the need for rapid and accurate detection methods. Current methods to identify these organisms may require up to 1 week, as they rely on phenotypic characteristics and an extensive set of biochemical reactions. In this study, Raman microspectroscopy, a cultivation-independent typing technique for single bacterial cells with the potential for being a rapid point-of-care analysis system, is evaluated to identify and differentiate B. mallei and B. pseudomallei within hours. Here, not only broth-cultured microbes but also bacteria isolated out of pelleted animal feedstuff were taken into account. A database of Raman spectra allowed a calculation of classification functions, which were trained to differentiate Raman spectra of not only both pathogens but also of five further Burkholderia spp. and four species of the closely related genus Pseudomonas. The developed two-stage classification system comprising two support vector machine (SVM) classifiers was then challenged by a test set of 11 samples to simulate the case of a real-world-scenario, when "unknown samples" are to be identified. In the end, all test set samples were identified correctly, even if the contained bacterial strains were not incorporated in the database before or were isolated out of animal feedstuff. Specifically, the five test samples bearing B. mallei and B. pseudomallei were correctly identified on species level with accuracies between 93.9 and 98.7%. The sample analysis itself requires no biomass enrichment step prior to the analysis and can be performed under biosafety level 1 (BSL 1) conditions after inactivating the bacteria with formaldehyde.

  8. The Preservation of Two Infant Temperaments into Adolescence

    ERIC Educational Resources Information Center

    Kagan, Jerome; Snidman, Nancy; Kahn, Vali; Towsley, Sara

    2007-01-01

    This "Monograph" reports theoretically relevant behavioral, biological, and self-report assessments of a sample of 14-17-year-olds who had been classified into one of four temperamental groups at 4 months of age. The infant temperamental categories were based on observed behavior to a battery of unfamiliar stimuli. The infants classified as high…

  9. The use of light's criteria in hospitalized children with a pleural effusion of unknown etiology.

    PubMed

    McGraw, Matthew D; Robison, Kyle; Kupfer, Oren; Brinton, John T; Stillwell, Paul C

    2018-05-27

    Pleural effusions are common in pediatrics. When the etiology of a pleural effusion remains unknown, adult literature recommends the use of Light's criteria to differentiate a transudate from an exudate. Pediatricians may rely on adult literature for the diagnostic management of pleural effusions as Light's criteria has not been validated in children. The purpose of this study was to review the use of Light's criteria in hospitalized children with a pleural effusion of unknown etiology. Retrospective review was performed on children hospitalized with a pleural effusion requiring chest tube placement or thoracentesis between January 1, 2016 to January 1, 2017 at Children's Hospital Colorado. Charts were reviewed for primary team, use of Light's criteria, pleural effusion diagnosis, and 30-day recurrence of repeat intervention or fluid analysis. Sixty-eight patients were hospitalized with a pleural effusion of unknown etiology requiring intervention. Only 16 pleural effusions (24%) were classified using Light's criteria. In those patients for whom Light's criteria was used, a diagnosis or change in management occurred in 10 of 16 patients (63%). Pleural effusions were most common on the cardiology service (26/68). Use of Light's criteria was most frequent on the oncology service (7/8). Thirty-day need for repeat intervention was lower in those with Light's criteria (13%) compared to those without (27%). Light's criteria were utilized infrequently in hospitalized children with a pleural effusion of unknown etiology at a single institution. There was considerable practice variation among provider teams. When utilized, Light's criteria assisted in making a diagnosis or changing management in many patients, and may lead to a reduction in 30-day recurrence requiring repeat intervention. © 2018 Wiley Periodicals, Inc.

  10. A false sense of security? Can tiered approach be trusted to accurately classify immunogenicity samples?

    PubMed

    Jaki, Thomas; Allacher, Peter; Horling, Frank

    2016-09-05

    Detecting and characterizing of anti-drug antibodies (ADA) against a protein therapeutic are crucially important to monitor the unwanted immune response. Usually a multi-tiered approach that initially rapidly screens for positive samples that are subsequently confirmed in a separate assay is employed for testing of patient samples for ADA activity. In this manuscript we evaluate the ability of different methods used to classify subject with screening and competition based confirmatory assays. We find that for the overall performance of the multi-stage process the method used for confirmation is most important where a t-test is best when differences are moderate to large. Moreover we find that, when differences between positive and negative samples are not sufficiently large, using a competition based confirmation step does yield poor classification of positive samples. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  11. LCC: Light Curves Classifier

    NASA Astrophysics Data System (ADS)

    Vo, Martin

    2017-08-01

    Light Curves Classifier uses data mining and machine learning to obtain and classify desired objects. This task can be accomplished by attributes of light curves or any time series, including shapes, histograms, or variograms, or by other available information about the inspected objects, such as color indices, temperatures, and abundances. After specifying features which describe the objects to be searched, the software trains on a given training sample, and can then be used for unsupervised clustering for visualizing the natural separation of the sample. The package can be also used for automatic tuning parameters of used methods (for example, number of hidden neurons or binning ratio). Trained classifiers can be used for filtering outputs from astronomical databases or data stored locally. The Light Curve Classifier can also be used for simple downloading of light curves and all available information of queried stars. It natively can connect to OgleII, OgleIII, ASAS, CoRoT, Kepler, Catalina and MACHO, and new connectors or descriptors can be implemented. In addition to direct usage of the package and command line UI, the program can be used through a web interface. Users can create jobs for ”training” methods on given objects, querying databases and filtering outputs by trained filters. Preimplemented descriptors, classifier and connectors can be picked by simple clicks and their parameters can be tuned by giving ranges of these values. All combinations are then calculated and the best one is used for creating the filter. Natural separation of the data can be visualized by unsupervised clustering.

  12. Association between statin-associated myopathy and skeletal muscle damage

    PubMed Central

    Mohaupt, Markus G.; Karas, Richard H.; Babiychuk, Eduard B.; Sanchez-Freire, Verónica; Monastyrskaya, Katia; Iyer, Lakshmanan; Hoppeler, Hans; Breil, Fabio; Draeger, Annette

    2009-01-01

    Background Many patients taking statins often complain of muscle pain and weakness. The extent to which muscle pain reflects muscle injury is unknown. Methods We obtained biopsy samples from the vastus lateralis muscle of 83 patients. Of the 44 patients with clinically diagnosed statin-associated myopathy, 29 were currently taking a statin, and 15 had discontinued statin therapy before the biopsy (minimal duration of discontinuation 3 weeks). We also included 19 patients who were taking statins and had no myopathy, and 20 patients who had never taken statins and had no myopathy. We classified the muscles as injured if 2% or more of the muscle fibres in a biopsy sample showed damage. Using reverse transcriptase polymerase chain reaction, we evaluated the expression levels of candidate genes potentially related to myocyte injury. Results Muscle injury was observed in 25 (of 44) patients with myopathy and in 1 patient without myopathy. Only 1 patient with structural injury had a circulating level of creatine phosphokinase that was elevated more than 1950 U/L (10× the upper limit of normal). Expression of ryanodine receptor 3 was significantly upregulated in patients with biopsy evidence of structural damage (1.7, standard error of the mean 0.3). Interpretation Persistent myopathy in patients taking statins reflects structural muscle damage. A lack of elevated levels of circulating creatine phosphokinase does not rule out structural muscle injury. Upregulation of the expression of ryanodine receptor 3 is suggestive of an intracellular calcium leak. PMID:19581603

  13. Association between statin-associated myopathy and skeletal muscle damage.

    PubMed

    Mohaupt, Markus G; Karas, Richard H; Babiychuk, Eduard B; Sanchez-Freire, Verónica; Monastyrskaya, Katia; Iyer, Lakshmanan; Hoppeler, Hans; Breil, Fabio; Draeger, Annette

    2009-07-07

    Many patients taking statins often complain of muscle pain and weakness. The extent to which muscle pain reflects muscle injury is unknown. We obtained biopsy samples from the vastus lateralis muscle of 83 patients. Of the 44 patients with clinically diagnosed statin-associated myopathy, 29 were currently taking a statin, and 15 had discontinued statin therapy before the biopsy (minimal duration of discontinuation 3 weeks). We also included 19 patients who were taking statins and had no myopathy, and 20 patients who had never taken statins and had no myopathy. We classified the muscles as injured if 2% or more of the muscle fibres in a biopsy sample showed damage. Using reverse transcriptase polymerase chain reaction, we evaluated the expression levels of candidate genes potentially related to myocyte injury. Muscle injury was observed in 25 (of 44) patients with myopathy and in 1 patient without myopathy. Only 1 patient with structural injury had a circulating level of creatine phosphokinase that was elevated more than 1950 U/L (10x the upper limit of normal). Expression of ryanodine receptor 3 was significantly upregulated in patients with biopsy evidence of structural damage (1.7, standard error of the mean 0.3). Persistent myopathy in patients taking statins reflects structural muscle damage. A lack of elevated levels of circulating creatine phosphokinase does not rule out structural muscle injury. Upregulation of the expression of ryanodine receptor 3 is suggestive of an intracellular calcium leak.

  14. The first catalog of active galactic nuclei detected by the FERMI large area telescope

    DOE PAGES

    Abdo, A. A.; Ackermann, M.; Ajello, M.; ...

    2010-04-29

    Here, we present the first catalog of active galactic nuclei (AGNs) detected by the Large Area Telescope (LAT), corresponding to 11 months of data collected in scientific operation mode. The First LAT AGN Catalog (1LAC) includes 671 γ-ray sources located at high Galactic latitudes (|b|>10°) that are detected with a test statistic greater than 25 and associated statistically with AGNs. Some LAT sources are associated with multiple AGNs, and consequently, the catalog includes 709 AGNs, comprising 300 BL Lacertae objects, 296 flat-spectrum radio quasars, 41 AGNs of other types, and 72 AGNs of unknown type. We also classify the blazarsmore » based on their spectral energy distributions as archival radio, optical, and X-ray data permit. In addition to the formal 1LAC sample, we provide AGN associations for 51 low-latitude LAT sources and AGN "affiliations" (unquantified counterpart candidates) for 104 high-latitude LAT sources without AGN associations. The overlap of the 1LAC with existing γ-ray AGN catalogs (LBAS, EGRET, AGILE, Swift, INTEGRAL, TeVCat) is briefly discussed. Various properties—such as γ-ray fluxes and photon power-law spectral indices, redshifts, γ-ray luminosities, variability, and archival radio luminosities—and their correlations are presented and discussed for the different blazar classes. Lastly, we compare the 1LAC results with predictions regarding the γ-ray AGN populations, and we comment on the power of the sample to address the question of the blazar sequence.« less

  15. Vehicle classification in WAMI imagery using deep network

    NASA Astrophysics Data System (ADS)

    Yi, Meng; Yang, Fan; Blasch, Erik; Sheaff, Carolyn; Liu, Kui; Chen, Genshe; Ling, Haibin

    2016-05-01

    Humans have always had a keen interest in understanding activities and the surrounding environment for mobility, communication, and survival. Thanks to recent progress in photography and breakthroughs in aviation, we are now able to capture tens of megapixels of ground imagery, namely Wide Area Motion Imagery (WAMI), at multiple frames per second from unmanned aerial vehicles (UAVs). WAMI serves as a great source for many applications, including security, urban planning and route planning. These applications require fast and accurate image understanding which is time consuming for humans, due to the large data volume and city-scale area coverage. Therefore, automatic processing and understanding of WAMI imagery has been gaining attention in both industry and the research community. This paper focuses on an essential step in WAMI imagery analysis, namely vehicle classification. That is, deciding whether a certain image patch contains a vehicle or not. We collect a set of positive and negative sample image patches, for training and testing the detector. Positive samples are 64 × 64 image patches centered on annotated vehicles. We generate two sets of negative images. The first set is generated from positive images with some location shift. The second set of negative patches is generated from randomly sampled patches. We also discard those patches if a vehicle accidentally locates at the center. Both positive and negative samples are randomly divided into 9000 training images and 3000 testing images. We propose to train a deep convolution network for classifying these patches. The classifier is based on a pre-trained AlexNet Model in the Caffe library, with an adapted loss function for vehicle classification. The performance of our classifier is compared to several traditional image classifier methods using Support Vector Machine (SVM) and Histogram of Oriented Gradient (HOG) features. While the SVM+HOG method achieves an accuracy of 91.2%, the accuracy of our deep network-based classifier reaches 97.9%.

  16. Neurons from the adult human dentate nucleus: neural networks in the neuron classification.

    PubMed

    Grbatinić, Ivan; Marić, Dušica L; Milošević, Nebojša T

    2015-04-07

    Topological (central vs. border neuron type) and morphological classification of adult human dentate nucleus neurons according to their quantified histomorphological properties using neural networks on real and virtual neuron samples. In the real sample 53.1% and 14.1% of central and border neurons, respectively, are classified correctly with total of 32.8% of misclassified neurons. The most important result present 62.2% of misclassified neurons in border neurons group which is even greater than number of correctly classified neurons (37.8%) in that group, showing obvious failure of network to classify neurons correctly based on computational parameters used in our study. On the virtual sample 97.3% of misclassified neurons in border neurons group which is much greater than number of correctly classified neurons (2.7%) in that group, again confirms obvious failure of network to classify neurons correctly. Statistical analysis shows that there is no statistically significant difference in between central and border neurons for each measured parameter (p>0.05). Total of 96.74% neurons are morphologically classified correctly by neural networks and each one belongs to one of the four histomorphological types: (a) neurons with small soma and short dendrites, (b) neurons with small soma and long dendrites, (c) neuron with large soma and short dendrites, (d) neurons with large soma and long dendrites. Statistical analysis supports these results (p<0.05). Human dentate nucleus neurons can be classified in four neuron types according to their quantitative histomorphological properties. These neuron types consist of two neuron sets, small and large ones with respect to their perykarions with subtypes differing in dendrite length i.e. neurons with short vs. long dendrites. Besides confirmation of neuron classification on small and large ones, already shown in literature, we found two new subtypes i.e. neurons with small soma and long dendrites and with large soma and short dendrites. These neurons are most probably equally distributed throughout the dentate nucleus as no significant difference in their topological distribution is observed. Copyright © 2015 Elsevier Ltd. All rights reserved.

  17. FT-Raman and chemometric tools for rapid determination of quality parameters in milk powder: Classification of samples for the presence of lactose and fraud detection by addition of maltodextrin.

    PubMed

    Rodrigues Júnior, Paulo Henrique; de Sá Oliveira, Kamila; de Almeida, Carlos Eduardo Rocha; De Oliveira, Luiz Fernando Cappa; Stephani, Rodrigo; Pinto, Michele da Silva; de Carvalho, Antônio Fernandes; Perrone, Ítalo Tuler

    2016-04-01

    FT-Raman spectroscopy has been explored as a quick screening method to evaluate the presence of lactose and identify milk powder samples adulterated with maltodextrin (2.5-50% w/w). Raman measurements can easily differentiate samples of milk powder, without the need for sample preparation, while traditional quality control methods, including high performance liquid chromatography, are cumbersome and slow. FT-Raman spectra were obtained from samples of whole lactose and low-lactose milk powder, both without and with addition of maltodextrin. Differences were observed between the spectra involved in identifying samples with low lactose content, as well as adulterated samples. Exploratory data analysis using Raman spectroscopy and multivariate analysis was also developed to classify samples with PCA and PLS-DA. The PLS-DA models obtained allowed to correctly classify all samples. These results demonstrate the utility of FT-Raman spectroscopy in combination with chemometrics to infer about the quality of milk powder. Copyright © 2015 Elsevier Ltd. All rights reserved.

  18. A fast learning method for large scale and multi-class samples of SVM

    NASA Astrophysics Data System (ADS)

    Fan, Yu; Guo, Huiming

    2017-06-01

    A multi-class classification SVM(Support Vector Machine) fast learning method based on binary tree is presented to solve its low learning efficiency when SVM processing large scale multi-class samples. This paper adopts bottom-up method to set up binary tree hierarchy structure, according to achieved hierarchy structure, sub-classifier learns from corresponding samples of each node. During the learning, several class clusters are generated after the first clustering of the training samples. Firstly, central points are extracted from those class clusters which just have one type of samples. For those which have two types of samples, cluster numbers of their positive and negative samples are set respectively according to their mixture degree, secondary clustering undertaken afterwards, after which, central points are extracted from achieved sub-class clusters. By learning from the reduced samples formed by the integration of extracted central points above, sub-classifiers are obtained. Simulation experiment shows that, this fast learning method, which is based on multi-level clustering, can guarantee higher classification accuracy, greatly reduce sample numbers and effectively improve learning efficiency.

  19. Method for determining the concentration of atomic species in gases and solids

    DOEpatents

    Loge, Gary W.

    1998-01-01

    Method for determining the concentration of atomic species in gases and solids. Measurement of at least two emission intensities from a species in a sample that is excited by incident laser radiation. Which generates a plasma therein after a sufficient time period has elapsed and during a second time period, permits an instantaneous temperature to be established within the sample. The concentration of the atomic species to be determined is then derived from the known emission intensity of a predetermined concentration of that species in the sample at the measured temperature, a quantity which is measured prior to the determination of the unknown concentration, and the actual measured emission from the unknown species, or by this latter emission and the emission intensity of a species having known concentration within the sample such as nitrogen for gaseous air samples.

  20. Adverse Childhood Experiences (ACEs) questionnaire and Adult Attachment Interview (AAI): implications for parent child relationships.

    PubMed

    Murphy, Anne; Steele, Miriam; Dube, Shanta Rishi; Bate, Jordan; Bonuck, Karen; Meissner, Paul; Goldman, Hannah; Steele, Howard

    2014-02-01

    Although Adverse Childhood Experiences (ACEs) are linked to increased health problems and risk behaviors in adulthood, there are no studies on the association between ACEs and adults' states of mind regarding their early childhood attachments, loss, and trauma experiences. To validate the ACEs questions, we analyzed the association between ACEs and emotional support indicators and Adult Attachment Interview (AAI) classifications in terms of unresolved mourning regarding past loss or trauma and discordant states of mind in cannot classify (U/CC) interviews. Seventy-five urban women (41 clinical and 34 community) completed a questionnaire on ACEs, which included 10 categories of abuse, neglect, and household dysfunction, in addition to emotional support. Internal psychological processes or states of mind concerning attachment were assessed using the AAI. ACE responses were internally consistent (Cronbach's α=.88). In the clinical sample, 84% reported≥4 ACEs compared to 27% among the community sample. AAIs judged U/CC occurred in 76% of the clinical sample compared to 9% in the community sample. When ACEs were≥4, 65% of AAIs were classified U/CC. Absence of emotional support in the ACEs questionnaire was associated with 72% of AAIs being classified U/CC. As the number of ACEs and the lack of emotional support increases so too does the probability of AAIs being classified as U/CC. Findings provide rationale for including ACEs questions in pediatric screening protocols to identify and offer treatment reducing the intergenerational transmission of risk associated with problematic parenting. Copyright © 2013 Elsevier Ltd. All rights reserved.

  1. D Semantic Labeling of ALS Data Based on Domain Adaption by Transferring and Fusing Random Forest Models

    NASA Astrophysics Data System (ADS)

    Wu, J.; Yao, W.; Zhang, J.; Li, Y.

    2018-04-01

    Labeling 3D point cloud data with traditional supervised learning methods requires considerable labelled samples, the collection of which is cost and time expensive. This work focuses on adopting domain adaption concept to transfer existing trained random forest classifiers (based on source domain) to new data scenes (target domain), which aims at reducing the dependence of accurate 3D semantic labeling in point clouds on training samples from the new data scene. Firstly, two random forest classifiers were firstly trained with existing samples previously collected for other data. They were different from each other by using two different decision tree construction algorithms: C4.5 with information gain ratio and CART with Gini index. Secondly, four random forest classifiers adapted to the target domain are derived through transferring each tree in the source random forest models with two types of operations: structure expansion and reduction-SER and structure transfer-STRUT. Finally, points in target domain are labelled by fusing the four newly derived random forest classifiers using weights of evidence based fusion model. To validate our method, experimental analysis was conducted using 3 datasets: one is used as the source domain data (Vaihingen data for 3D Semantic Labelling); another two are used as the target domain data from two cities in China (Jinmen city and Dunhuang city). Overall accuracies of 85.5 % and 83.3 % for 3D labelling were achieved for Jinmen city and Dunhuang city data respectively, with only 1/3 newly labelled samples compared to the cases without domain adaption.

  2. The Discovery of Novel Biomarkers Improves Breast Cancer Intrinsic Subtype Prediction and Reconciles the Labels in the METABRIC Data Set

    PubMed Central

    Milioli, Heloisa Helena; Vimieiro, Renato; Riveros, Carlos; Tishchenko, Inna; Berretta, Regina; Moscato, Pablo

    2015-01-01

    Background The prediction of breast cancer intrinsic subtypes has been introduced as a valuable strategy to determine patient diagnosis and prognosis, and therapy response. The PAM50 method, based on the expression levels of 50 genes, uses a single sample predictor model to assign subtype labels to samples. Intrinsic errors reported within this assay demonstrate the challenge of identifying and understanding the breast cancer groups. In this study, we aim to: a) identify novel biomarkers for subtype individuation by exploring the competence of a newly proposed method named CM1 score, and b) apply an ensemble learning, as opposed to the use of a single classifier, for sample subtype assignment. The overarching objective is to improve class prediction. Methods and Findings The microarray transcriptome data sets used in this study are: the METABRIC breast cancer data recorded for over 2000 patients, and the public integrated source from ROCK database with 1570 samples. We first computed the CM1 score to identify the probes with highly discriminative patterns of expression across samples of each intrinsic subtype. We further assessed the ability of 42 selected probes on assigning correct subtype labels using 24 different classifiers from the Weka software suite. For comparison, the same method was applied on the list of 50 genes from the PAM50 method. Conclusions The CM1 score portrayed 30 novel biomarkers for predicting breast cancer subtypes, with the confirmation of the role of 12 well-established genes. Intrinsic subtypes assigned using the CM1 list and the ensemble of classifiers are more consistent and homogeneous than the original PAM50 labels. The new subtypes show accurate distributions of current clinical markers ER, PR and HER2, and survival curves in the METABRIC and ROCK data sets. Remarkably, the paradoxical attribution of the original labels reinforces the limitations of employing a single sample classifiers to predict breast cancer intrinsic subtypes. PMID:26132585

  3. Outpatient endometrial aspiration: an alternative to methotrexate for pregnancy of unknown location.

    PubMed

    Insogna, Iris G; Farland, Leslie V; Missmer, Stacey A; Ginsburg, Elizabeth S; Brady, Paula C

    2017-08-01

    Pregnancies of unknown location with abnormal beta-human chorionic gonadotropin trends are frequently treated as presumed ectopic pregnancies with methotrexate. Preliminary data suggest that outpatient endometrial aspiration may be an effective tool to diagnose pregnancy location, while also sparing women exposure to methotrexate. The purpose of this study was to evaluate the utility of an endometrial sampling protocol for the diagnosis of pregnancies of unknown location after in vitro fertilization. A retrospective cohort study of 14,505 autologous fresh and frozen in vitro fertilization cycles from October 2007 to September 2015 was performed; 110 patients were diagnosed with pregnancy of unknown location, defined as a positive beta-human chorionic gonadotropin without ultrasound evidence of intrauterine or ectopic pregnancy and an abnormal beta-human chorionic gonadotropin trend (<53% rise or <15% fall in 2 days). These patients underwent outpatient endometrial sampling with Karman cannula aspiration. Patients with a beta-human chorionic gonadotropin decline ≥15% within 24 hours of sampling and/or villi detected on pathologic analysis were diagnosed with failing intrauterine pregnancy and had weekly beta-human chorionic gonadotropin measurements thereafter. Those patients with beta-human chorionic gonadotropin declines <15% and no villi identified were diagnosed with ectopic pregnancy and treated with intramuscular methotrexate (50 mg/m 2 ) or laparoscopy. Across 8 years of follow up, among women with pregnancy of unknown location, failed intrauterine pregnancy was diagnosed in 46 patients (42%), and ectopic pregnancy was diagnosed in 64 patients (58%). Clinical variables that included fresh or frozen embryo transfer, day of embryo transfer, serum beta-human chorionic gonadotropin at the time of sampling, endometrial thickness, and presence of an adnexal mass were not significantly different between patients with failed intrauterine pregnancy or ectopic pregnancy. In patients with failed intrauterine pregnancy, 100% demonstrated adequate postsampling beta-human chorionic gonadotropin declines; villi were identified in just 46% (n=21 patients). Patients with failed intrauterine pregnancy had significantly shorter time to resolution (negative serum beta-human chorionic gonadotropin) after sampling compared with patients with ectopic pregnancy (12.6 vs 26.3 days; P<.001). With the use of this safe and effective protocol of endometrial aspiration with Karman cannula, a large proportion of women with pregnancy of unknown location are spared methotrexate, with a shorter time to pregnancy resolution than those who receive methotrexate. Copyright © 2017 Elsevier Inc. All rights reserved.

  4. Comprehensive benchmarking and ensemble approaches for metagenomic classifiers.

    PubMed

    McIntyre, Alexa B R; Ounit, Rachid; Afshinnekoo, Ebrahim; Prill, Robert J; Hénaff, Elizabeth; Alexander, Noah; Minot, Samuel S; Danko, David; Foox, Jonathan; Ahsanuddin, Sofia; Tighe, Scott; Hasan, Nur A; Subramanian, Poorani; Moffat, Kelly; Levy, Shawn; Lonardi, Stefano; Greenfield, Nick; Colwell, Rita R; Rosen, Gail L; Mason, Christopher E

    2017-09-21

    One of the main challenges in metagenomics is the identification of microorganisms in clinical and environmental samples. While an extensive and heterogeneous set of computational tools is available to classify microorganisms using whole-genome shotgun sequencing data, comprehensive comparisons of these methods are limited. In this study, we use the largest-to-date set of laboratory-generated and simulated controls across 846 species to evaluate the performance of 11 metagenomic classifiers. Tools were characterized on the basis of their ability to identify taxa at the genus, species, and strain levels, quantify relative abundances of taxa, and classify individual reads to the species level. Strikingly, the number of species identified by the 11 tools can differ by over three orders of magnitude on the same datasets. Various strategies can ameliorate taxonomic misclassification, including abundance filtering, ensemble approaches, and tool intersection. Nevertheless, these strategies were often insufficient to completely eliminate false positives from environmental samples, which are especially important where they concern medically relevant species. Overall, pairing tools with different classification strategies (k-mer, alignment, marker) can combine their respective advantages. This study provides positive and negative controls, titrated standards, and a guide for selecting tools for metagenomic analyses by comparing ranges of precision, accuracy, and recall. We show that proper experimental design and analysis parameters can reduce false positives, provide greater resolution of species in complex metagenomic samples, and improve the interpretation of results.

  5. Plasma and muscle cortisol measurements as indicators of meat quality and stress in pigs.

    PubMed

    Shaw, F D; Trout, G R; McPhee, C P

    1995-01-01

    Post-slaughter blood samples and muscle samples were collected from pigs slaughtered at the completion of a live-animal performance trial. There were two lines of pigs in which the halothane allele (n) was segregating. The lines were a lean line selected for rapid lean growth and an unselected fat line. There were homozygous normal (NN), homozygous halothane positive (nn) and heterozygous (Nn) genotypes in both lnes. Cortisol was measured in the plasma of the blood samples and in muscle juice obtained by high-speed centrifugation. Meat quality was assessed using pH, colour, fibre-optic probe, drip loss and cure yield measurements. Plasma cortisol concentrations in the fat line were significantly (P < 0·05) greater than thosein the lean line but concentrations did not differ significantly for the three halothane genotypes. Carcasses classified as dark, firm and dry (DFD) had significantly (P < 0·05) greater muscle cortisol concentrations than those classified as normal. Plasma and muscle cortisol concentrations of carcases classified as pale, soft and exudative (PSE) did not differ significantly from those classified as normal. Correlations between muscle cortisol and meat quality attributes were generally highly significant (r = 0·31 to r = 0·51, P < 0·001) There was a highly significant correlation (r = 0·73, P < 0·0001) between plasma and muscle cortisol concentrations.

  6. Classification and Identification of Plant Fibrous Material with Different Species Using near Infrared Technique—A New Way to Approach Determining Biomass Properties Accurately within Different Species

    PubMed Central

    Jiang, Wei; Zhou, Chengfeng; Han, Guangting; Via, Brian; Swain, Tammy; Fan, Zhaofei; Liu, Shaoyang

    2017-01-01

    Plant fibrous material is a good resource in textile and other industries. Normally, several kinds of plant fibrous materials used in one process are needed to be identified and characterized in advance. It is easy to identify them when they are in raw condition. However, most of the materials are semi products which are ground, rotted or pre-hydrolyzed. To classify these samples which include different species with high accuracy is a big challenge. In this research, both qualitative and quantitative analysis methods were chosen to classify six different species of samples, including softwood, hardwood, bast, and aquatic plant. Soft Independent Modeling of Class Analogy (SIMCA) and partial least squares (PLS) were used. The algorithm to classify different species of samples using PLS was created independently in this research. Results found that the six species can be successfully classified using SIMCA and PLS methods, and these two methods show similar results. The identification rates of kenaf, ramie and pine are 100%, and the identification rates of lotus, eucalyptus and tallow are higher than 94%. It is also found that spectra loadings can help pick up best wavenumber ranges for constructing the NIR model. Inter material distance can show how close between two species. Scores graph is helpful to choose the principal components numbers during the model construction. PMID:28105037

  7. On-Site Classification of Pansteatitis in Mozambique Tilapia (Oreochromis mossambicus) using a Portable Lipid-Based Analyzer

    PubMed Central

    Somerville, Stephen E.; Cantu, Theresa M.; Guillette, Matthew P.; Botha, Hannes; Boggs, Ashley S. P.; Luus-Powell, Wilmien; Guillette, Louis J.

    2017-01-01

    While no pansteatitis-related large-scale mortality events have occurred since 2008, the current status of pansteatitis (presence and pervasiveness) in the Olifants River system and other regions of South Africa remain largely unknown. In part, this is due to both a lack of known biological markers of pansteatitis and a lack of suitable non-invasive assays capable of rapidly classifying the disease. Here, we propose the application of a point-of-care (POC) device using lipid-based test strips (total cholesterol (TC) and total triglyceride (TG)), for classifying pansteatitis status in the whole blood of pre-spawning Mozambique tilapia (Oreochromis mossambicus). Using the TC strips, the POC device was able to non-lethally classify the tilapia as either healthy or pansteatitis-affected; the sexes were examined independently because sexual dimorphism was observed for TC (males p = 0.0364, females χ2 = 0.0007). No significant difference between diseased and pansteatitis-affected tilapia was observed using the TG strips. This is one of the first described applications of using POC devices for on-site environmental disease state testing. A discussion on the merits of using portable lipid-based analyzers as an in-field disease-state diagnostic tool is provided. PMID:28729886

  8. Bitter or not? BitterPredict, a tool for predicting taste from chemical structure.

    PubMed

    Dagan-Wiener, Ayana; Nissim, Ido; Ben Abu, Natalie; Borgonovo, Gigliola; Bassoli, Angela; Niv, Masha Y

    2017-09-21

    Bitter taste is an innately aversive taste modality that is considered to protect animals from consuming toxic compounds. Yet, bitterness is not always noxious and some bitter compounds have beneficial effects on health. Hundreds of bitter compounds were reported (and are accessible via the BitterDB http://bitterdb.agri.huji.ac.il/dbbitter.php ), but numerous additional bitter molecules are still unknown. The dramatic chemical diversity of bitterants makes bitterness prediction a difficult task. Here we present a machine learning classifier, BitterPredict, which predicts whether a compound is bitter or not, based on its chemical structure. BitterDB was used as the positive set, and non-bitter molecules were gathered from literature to create the negative set. Adaptive Boosting (AdaBoost), based on decision trees machine-learning algorithm was applied to molecules that were represented using physicochemical and ADME/Tox descriptors. BitterPredict correctly classifies over 80% of the compounds in the hold-out test set, and 70-90% of the compounds in three independent external sets and in sensory test validation, providing a quick and reliable tool for classifying large sets of compounds into bitter and non-bitter groups. BitterPredict suggests that about 40% of random molecules, and a large portion (66%) of clinical and experimental drugs, and of natural products (77%) are bitter.

  9. The fusion of large scale classified side-scan sonar image mosaics.

    PubMed

    Reed, Scott; Tena, Ruiz Ioseba; Capus, Chris; Petillot, Yvan

    2006-07-01

    This paper presents a unified framework for the creation of classified maps of the seafloor from sonar imagery. Significant challenges in photometric correction, classification, navigation and registration, and image fusion are addressed. The techniques described are directly applicable to a range of remote sensing problems. Recent advances in side-scan data correction are incorporated to compensate for the sonar beam pattern and motion of the acquisition platform. The corrected images are segmented using pixel-based textural features and standard classifiers. In parallel, the navigation of the sonar device is processed using Kalman filtering techniques. A simultaneous localization and mapping framework is adopted to improve the navigation accuracy and produce georeferenced mosaics of the segmented side-scan data. These are fused within a Markovian framework and two fusion models are presented. The first uses a voting scheme regularized by an isotropic Markov random field and is applicable when the reliability of each information source is unknown. The Markov model is also used to inpaint regions where no final classification decision can be reached using pixel level fusion. The second model formally introduces the reliability of each information source into a probabilistic model. Evaluation of the two models using both synthetic images and real data from a large scale survey shows significant quantitative and qualitative improvement using the fusion approach.

  10. Identification of DEP domain-containing proteins by a machine learning method and experimental analysis of their expression in human HCC tissues

    NASA Astrophysics Data System (ADS)

    Liao, Zhijun; Wang, Xinrui; Zeng, Yeting; Zou, Quan

    2016-12-01

    The Dishevelled/EGL-10/Pleckstrin (DEP) domain-containing (DEPDC) proteins have seven members. However, whether this superfamily can be distinguished from other proteins based only on the amino acid sequences, remains unknown. Here, we describe a computational method to segregate DEPDCs and non-DEPDCs. First, we examined the Pfam numbers of the known DEPDCs and used the longest sequences for each Pfam to construct a phylogenetic tree. Subsequently, we extracted 188-dimensional (188D) and 20D features of DEPDCs and non-DEPDCs and classified them with random forest classifier. We also mined the motifs of human DEPDCs to find the related domains. Finally, we designed experimental verification methods of human DEPDC expression at the mRNA level in hepatocellular carcinoma (HCC) and adjacent normal tissues. The phylogenetic analysis showed that the DEPDCs superfamily can be divided into three clusters. Moreover, the 188D and 20D features can both be used to effectively distinguish the two protein types. Motif analysis revealed that the DEP and RhoGAP domain was common in human DEPDCs, human HCC and the adjacent tissues that widely expressed DEPDCs. However, their regulation was not identical. In conclusion, we successfully constructed a binary classifier for DEPDCs and experimentally verified their expression in human HCC tissues.

  11. Publications - GMC 185 | Alaska Division of Geological & Geophysical

    Science.gov Websites

    and Facilities Staff Seismic and Well Data Data Reports Contact Us Frequently Asked Questions Ask a North Slope well and surface Late Jurassic-Neocomian samples Authors: Unknown Publication Date: 1991 Unknown, 1991, Porosity, permeability, and grain density determinations of North Slope well and surface

  12. Improving imbalanced scientific text classification using sampling strategies and dictionaries.

    PubMed

    Borrajo, L; Romero, R; Iglesias, E L; Redondo Marey, C M

    2011-09-15

    Many real applications have the imbalanced class distribution problem, where one of the classes is represented by a very small number of cases compared to the other classes. One of the systems affected are those related to the recovery and classification of scientific documentation. Sampling strategies such as Oversampling and Subsampling are popular in tackling the problem of class imbalance. In this work, we study their effects on three types of classifiers (Knn, SVM and Naive-Bayes) when they are applied to search on the PubMed scientific database. Another purpose of this paper is to study the use of dictionaries in the classification of biomedical texts. Experiments are conducted with three different dictionaries (BioCreative, NLPBA, and an ad-hoc subset of the UniProt database named Protein) using the mentioned classifiers and sampling strategies. Best results were obtained with NLPBA and Protein dictionaries and the SVM classifier using the Subsampling balancing technique. These results were compared with those obtained by other authors using the TREC Genomics 2005 public corpus. Copyright 2011 The Author(s). Published by Journal of Integrative Bioinformatics.

  13. Multicategory nets of single-layer perceptrons: complexity and sample-size issues.

    PubMed

    Raudys, Sarunas; Kybartas, Rimantas; Zavadskas, Edmundas Kazimieras

    2010-05-01

    The standard cost function of multicategory single-layer perceptrons (SLPs) does not minimize the classification error rate. In order to reduce classification error, it is necessary to: 1) refuse the traditional cost function, 2) obtain near to optimal pairwise linear classifiers by specially organized SLP training and optimal stopping, and 3) fuse their decisions properly. To obtain better classification in unbalanced training set situations, we introduce the unbalance correcting term. It was found that fusion based on the Kulback-Leibler (K-L) distance and the Wu-Lin-Weng (WLW) method result in approximately the same performance in situations where sample sizes are relatively small. The explanation for this observation is by theoretically known verity that an excessive minimization of inexact criteria becomes harmful at times. Comprehensive comparative investigations of six real-world pattern recognition (PR) problems demonstrated that employment of SLP-based pairwise classifiers is comparable and as often as not outperforming the linear support vector (SV) classifiers in moderate dimensional situations. The colored noise injection used to design pseudovalidation sets proves to be a powerful tool for facilitating finite sample problems in moderate-dimensional PR tasks.

  14. A 10-Gene Classifier for Indeterminate Thyroid Nodules: Development and Multicenter Accuracy Study

    PubMed Central

    González, Hernán E.; Martínez, José R.; Vargas-Salas, Sergio; Solar, Antonieta; Veliz, Loreto; Cruz, Francisco; Arias, Tatiana; Loyola, Soledad; Horvath, Eleonora; Tala, Hernán; Traipe, Eufrosina; Meneses, Manuel; Marín, Luis; Wohllk, Nelson; Diaz, René E.; Véliz, Jesús; Pineda, Pedro; Arroyo, Patricia; Mena, Natalia; Bracamonte, Milagros; Miranda, Giovanna; Bruce, Elsa

    2017-01-01

    Background: In most of the world, diagnostic surgery remains the most frequent approach for indeterminate thyroid cytology. Although several molecular tests are available for testing in centralized commercial laboratories in the United States, there are no available kits for local laboratory testing. The aim of this study was to develop a prototype in vitro diagnostic (IVD) gene classifier for the further characterization of nodules with an indeterminate thyroid cytology. Methods: In a first stage, the expression of 18 genes was determined by quantitative polymerase chain reaction (qPCR) in a broad histopathological spectrum of 114 fresh-tissue biopsies. Expression data were used to train several classifiers by supervised machine learning approaches. Classifiers were tested in an independent set of 139 samples. In a second stage, the best classifier was chosen as a model to develop a multiplexed-qPCR IVD prototype assay, which was tested in a prospective multicenter cohort of fine-needle aspiration biopsies. Results: In tissue biopsies, the best classifier, using only 10 genes, reached an optimal and consistent performance in the ninefold cross-validated testing set (sensitivity 93% and specificity 81%). In the multicenter cohort of fine-needle aspiration biopsy samples, the 10-gene signature, built into a multiplexed-qPCR IVD prototype, showed an area under the curve of 0.97, a positive predictive value of 78%, and a negative predictive value of 98%. By Bayes' theorem, the IVD prototype is expected to achieve a positive predictive value of 64–82% and a negative predictive value of 97–99% in patients with a cancer prevalence range of 20–40%. Conclusions: A new multiplexed-qPCR IVD prototype is reported that accurately classifies thyroid nodules and may provide a future solution suitable for local reference laboratory testing. PMID:28521616

  15. A Dirichlet-Multinomial Bayes Classifier for Disease Diagnosis with Microbial Compositions.

    PubMed

    Gao, Xiang; Lin, Huaiying; Dong, Qunfeng

    2017-01-01

    Dysbiosis of microbial communities is associated with various human diseases, raising the possibility of using microbial compositions as biomarkers for disease diagnosis. We have developed a Bayes classifier by modeling microbial compositions with Dirichlet-multinomial distributions, which are widely used to model multicategorical count data with extra variation. The parameters of the Dirichlet-multinomial distributions are estimated from training microbiome data sets based on maximum likelihood. The posterior probability of a microbiome sample belonging to a disease or healthy category is calculated based on Bayes' theorem, using the likelihood values computed from the estimated Dirichlet-multinomial distribution, as well as a prior probability estimated from the training microbiome data set or previously published information on disease prevalence. When tested on real-world microbiome data sets, our method, called DMBC (for Dirichlet-multinomial Bayes classifier), shows better classification accuracy than the only existing Bayesian microbiome classifier based on a Dirichlet-multinomial mixture model and the popular random forest method. The advantage of DMBC is its built-in automatic feature selection, capable of identifying a subset of microbial taxa with the best classification accuracy between different classes of samples based on cross-validation. This unique ability enables DMBC to maintain and even improve its accuracy at modeling species-level taxa. The R package for DMBC is freely available at https://github.com/qunfengdong/DMBC. IMPORTANCE By incorporating prior information on disease prevalence, Bayes classifiers have the potential to estimate disease probability better than other common machine-learning methods. Thus, it is important to develop Bayes classifiers specifically tailored for microbiome data. Our method shows higher classification accuracy than the only existing Bayesian classifier and the popular random forest method, and thus provides an alternative option for using microbial compositions for disease diagnosis.

  16. Human papillomaviruses and skin cancer.

    PubMed

    Smola, Sigrun

    2014-01-01

    Human papillomaviruses (HPVs) infect squamous epithelia and can induce hyperproliferative lesions. More than 120 different HPV types have been characterized and classified into five different genera. While mucosal high-risk HPVs have a well-established causal role in anogenital carcinogenesis, the biology of cutaneous HPVs is less well understood. The clinical relevance of genus beta-PV infection has clearly been demonstrated in patients suffering from epidermodysplasia verruciformis (EV), a rare inherited disease associated with ahigh rate of skin cancer. In the normal population genus beta-PV are suspected to have an etiologic role in skin carcinogenesis as well but this is still controversially discussed. Their oncogenic potency has been investigated in mouse models and in vitro. In 2009, the International Agency for Research on Cancer (IARC) classified the genus beta HPV types 5 and 8 as "possible carcinogenic" biological agents (group 2B) in EV disease. This chapter will give an overview on the knowns and unknowns of infections with genus beta-PV and discuss their potential impact on skin carcinogenesis in the general population.

  17. Seismic event classification system

    DOEpatents

    Dowla, F.U.; Jarpe, S.P.; Maurer, W.

    1994-12-13

    In the computer interpretation of seismic data, the critical first step is to identify the general class of an unknown event. For example, the classification might be: teleseismic, regional, local, vehicular, or noise. Self-organizing neural networks (SONNs) can be used for classifying such events. Both Kohonen and Adaptive Resonance Theory (ART) SONNs are useful for this purpose. Given the detection of a seismic event and the corresponding signal, computation is made of: the time-frequency distribution, its binary representation, and finally a shift-invariant representation, which is the magnitude of the two-dimensional Fourier transform (2-D FFT) of the binary time-frequency distribution. This pre-processed input is fed into the SONNs. These neural networks are able to group events that look similar. The ART SONN has an advantage in classifying the event because the types of cluster groups do not need to be pre-defined. The results from the SONNs together with an expert seismologist's classification are then used to derive event classification probabilities. 21 figures.

  18. Application of the pessimistic pruning to increase the accuracy of C4.5 algorithm in diagnosing chronic kidney disease

    NASA Astrophysics Data System (ADS)

    Muslim, M. A.; Herowati, A. J.; Sugiharti, E.; Prasetiyo, B.

    2018-03-01

    A technique to dig valuable information buried or hidden in data collection which is so big to be found an interesting patterns that was previously unknown is called data mining. Data mining has been applied in the healthcare industry. One technique used data mining is classification. The decision tree included in the classification of data mining and algorithm developed by decision tree is C4.5 algorithm. A classifier is designed using applying pessimistic pruning in C4.5 algorithm in diagnosing chronic kidney disease. Pessimistic pruning use to identify and remove branches that are not needed, this is done to avoid overfitting the decision tree generated by the C4.5 algorithm. In this paper, the result obtained using these classifiers are presented and discussed. Using pessimistic pruning shows increase accuracy of C4.5 algorithm of 1.5% from 95% to 96.5% in diagnosing of chronic kidney disease.

  19. Seismic event classification system

    DOEpatents

    Dowla, Farid U.; Jarpe, Stephen P.; Maurer, William

    1994-01-01

    In the computer interpretation of seismic data, the critical first step is to identify the general class of an unknown event. For example, the classification might be: teleseismic, regional, local, vehicular, or noise. Self-organizing neural networks (SONNs) can be used for classifying such events. Both Kohonen and Adaptive Resonance Theory (ART) SONNs are useful for this purpose. Given the detection of a seismic event and the corresponding signal, computation is made of: the time-frequency distribution, its binary representation, and finally a shift-invariant representation, which is the magnitude of the two-dimensional Fourier transform (2-D FFT) of the binary time-frequency distribution. This pre-processed input is fed into the SONNs. These neural networks are able to group events that look similar. The ART SONN has an advantage in classifying the event because the types of cluster groups do not need to be pre-defined. The results from the SONNs together with an expert seismologist's classification are then used to derive event classification probabilities.

  20. Facial soft tissue thickness differences among three skeletal classes in Japanese population.

    PubMed

    Utsuno, Hajime; Kageyama, Toru; Uchida, Keiichi; Kibayashi, Kazuhiko

    2014-03-01

    Facial reconstruction is used in forensic anthropology to recreate the face from unknown human skeletal remains, and to elucidate the antemortem facial appearance. This requires accurate assessment of the skull (age, sex, ancestry, etc.) and thickness data. However, additional information is required to reconstruct the face as the information obtained from the skull is limited. Here, we aimed to examine the information from the skull that is required for accurate facial reconstruction. The human facial profile is classified into 3 shapes: straight, convex, and concave. These facial profiles facilitate recognition of individuals. The skeletal classes used in orthodontics are classified according to these 3 facial types. We have previously reported the differences between Japanese females. In the present study, we applied this classification for facial tissue measurement, compared the differences in tissue depth of each skeletal class for both sexes in the Japanese population, and elucidated the differences between the skeletal classes. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  1. Misophonia: diagnostic criteria for a new psychiatric disorder.

    PubMed

    Schröder, Arjan; Vulink, Nienke; Denys, Damiaan

    2013-01-01

    Some patients report a preoccupation with a specific aversive human sound that triggers impulsive aggression. This condition is relatively unknown and has hitherto never been described, although the phenomenon has anecdotally been named misophonia. 42 patients who reported misophonia were recruited by our hospital website. All patients were interviewed by an experienced psychiatrist and were screened with an adapted version of the Y-BOCS, HAM-D, HAM-A, SCL-90 and SCID II. The misophonia patients shared a similar pattern of symptoms in which an auditory or visual stimulus provoked an immediate aversive physical reaction with anger, disgust and impulsive aggression. The intensity of these emotions caused subsequent obsessions with the cue, avoidance and social dysfunctioning with intense suffering. The symptoms cannot be classified in the current nosological DSM-IV TR or ICD-10 systems. We suggest that misophonia should be classified as a discrete psychiatric disorder. Diagnostic criteria could help to officially recognize the patients and the disorder, improve its identification by professional health carers, and encourage scientific research.

  2. Learning disordered topological phases by statistical recovery of symmetry

    NASA Astrophysics Data System (ADS)

    Yoshioka, Nobuyuki; Akagi, Yutaka; Katsura, Hosho

    2018-05-01

    We apply the artificial neural network in a supervised manner to map out the quantum phase diagram of disordered topological superconductors in class DIII. Given the disorder that keeps the discrete symmetries of the ensemble as a whole, translational symmetry which is broken in the quasiparticle distribution individually is recovered statistically by taking an ensemble average. By using this, we classify the phases by the artificial neural network that learned the quasiparticle distribution in the clean limit and show that the result is totally consistent with the calculation by the transfer matrix method or noncommutative geometry approach. If all three phases, namely the Z2, trivial, and thermal metal phases, appear in the clean limit, the machine can classify them with high confidence over the entire phase diagram. If only the former two phases are present, we find that the machine remains confused in a certain region, leading us to conclude the detection of the unknown phase which is eventually identified as the thermal metal phase.

  3. Hierarchical classification of dynamically varying radar pulse repetition interval modulation patterns.

    PubMed

    Kauppi, Jukka-Pekka; Martikainen, Kalle; Ruotsalainen, Ulla

    2010-12-01

    The central purpose of passive signal intercept receivers is to perform automatic categorization of unknown radar signals. Currently, there is an urgent need to develop intelligent classification algorithms for these devices due to emerging complexity of radar waveforms. Especially multifunction radars (MFRs) capable of performing several simultaneous tasks by utilizing complex, dynamically varying scheduled waveforms are a major challenge for automatic pattern classification systems. To assist recognition of complex radar emissions in modern intercept receivers, we have developed a novel method to recognize dynamically varying pulse repetition interval (PRI) modulation patterns emitted by MFRs. We use robust feature extraction and classifier design techniques to assist recognition in unpredictable real-world signal environments. We classify received pulse trains hierarchically which allows unambiguous detection of the subpatterns using a sliding window. Accuracy, robustness and reliability of the technique are demonstrated with extensive simulations using both static and dynamically varying PRI modulation patterns. Copyright © 2010 Elsevier Ltd. All rights reserved.

  4. Validity of the CAGE questionnaire for men who have sex with men (MSM) in China.

    PubMed

    Chen, Yen-Tyng; Ibragimov, Umedjon; Nehl, Eric J; Zheng, Tony; He, Na; Wong, Frank Y

    2016-03-01

    Detection of heavy drinking among men who have sex with men (MSM) is crucial for both intervention and treatment. The CAGE questionnaire is a popular screening instrument for alcohol use problems. However, the validity of CAGE for Chinese MSM is unknown. Data were from three waves of cross-sectional assessments among general MSM (n=523) and men who sell sex to other men ("money boys" or MBs, n=486) in Shanghai, China. Specifically, participants were recruited using respondent-driven, community popular opinion leader, and venue-based sampling methods. The validity of the CAGE was examined for different cutoff scores and individual CAGE items using self-reported heavy drinking (≥14 drinks in the past week) as a criterion. In the full sample, 75 (7.4%) of participants were classified as heavy drinkers. 32 (6.1%) of general MSM and 43 (8.9%) of MBs were heavy drinkers. The area under curve statistics for overall sample was 0.7 (95% CI: 0.36-0.77). Overall, the sensitivities (ranging from 18.7 to 66.7%), specificities (ranging from 67.5 to 95.8%), and positive predictive values (ranging from 14.1 to 26.4%) for different cutoff scores were inadequate using past week heavy drinking as the criterion. The ability of CAGE to discriminate heavy drinkers from non-heavy drinkers was limited. Our findings showed the inadequate validity of CAGE as a screening instrument for current heavy drinking in Chinese MSM. Further research using a combination of validity criteria is needed to determine the applicability of CAGE for this population. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  5. A naive Bayes algorithm for tissue origin diagnosis (TOD-Bayes) of synchronous multifocal tumors in the hepatobiliary and pancreatic system.

    PubMed

    Jiang, Weiqin; Shen, Yifei; Ding, Yongfeng; Ye, Chuyu; Zheng, Yi; Zhao, Peng; Liu, Lulu; Tong, Zhou; Zhou, Linfu; Sun, Shuo; Zhang, Xingchen; Teng, Lisong; Timko, Michael P; Fan, Longjiang; Fang, Weijia

    2018-01-15

    Synchronous multifocal tumors are common in the hepatobiliary and pancreatic system but because of similarities in their histological features, oncologists have difficulty in identifying their precise tissue clonal origin through routine histopathological methods. To address this problem and assist in more precise diagnosis, we developed a computational approach for tissue origin diagnosis based on naive Bayes algorithm (TOD-Bayes) using ubiquitous RNA-Seq data. Massive tissue-specific RNA-Seq data sets were first obtained from The Cancer Genome Atlas (TCGA) and ∼1,000 feature genes were used to train and validate the TOD-Bayes algorithm. The accuracy of the model was >95% based on tenfold cross validation by the data from TCGA. A total of 18 clinical cancer samples (including six negative controls) with definitive tissue origin were subsequently used for external validation and 17 of the 18 samples were classified correctly in our study (94.4%). Furthermore, we included as cases studies seven tumor samples, taken from two individuals who suffered from synchronous multifocal tumors across tissues, where the efforts to make a definitive primary cancer diagnosis by traditional diagnostic methods had failed. Using our TOD-Bayes analysis, the two clinical test cases were successfully diagnosed as pancreatic cancer (PC) and cholangiocarcinoma (CC), respectively, in agreement with their clinical outcomes. Based on our findings, we believe that the TOD-Bayes algorithm is a powerful novel methodology to accurately identify the tissue origin of synchronous multifocal tumors of unknown primary cancers using RNA-Seq data and an important step toward more precision-based medicine in cancer diagnosis and treatment. © 2017 UICC.

  6. Combined radiogrammetry and texture analysis for early diagnosis of osteoporosis using Indian and Swiss data.

    PubMed

    Areeckal, Anu Shaju; Kamath, Jagannath; Zawadynski, Sophie; Kocher, Michel; S, Sumam David

    2018-05-26

    Osteoporosis is a bone disorder characterized by bone loss and decreased bone strength. The most widely used technique for detection of osteoporosis is the measurement of bone mineral density (BMD) using dual energy X-ray absorptiometry (DXA). But DXA scans are expensive and not widely available in low-income economies. In this paper, we propose a low cost pre-screening tool for the detection of low bone mass, using cortical radiogrammetry of third metacarpal bone and trabecular texture analysis of distal radius from hand and wrist radiographs. An automatic segmentation algorithm to automatically locate and segment the third metacarpal bone and distal radius region of interest (ROI) is proposed. Cortical measurements such as combined cortical thickness (CCT), cortical area (CA), percent cortical area (PCA) and Barnett Nordin index (BNI) were taken from the shaft of third metacarpal bone. Texture analysis of trabecular network at the distal radius was performed using features obtained from histogram, gray level Co-occurrence matrix (GLCM) and morphological gradient method (MGM). The significant cortical and texture features were selected using independent sample t-test and used to train classifiers to classify healthy subjects and people with low bone mass. The proposed pre-screening tool was validated on two ethnic groups, Indian sample population and Swiss sample population. Data of 134 subjects from Indian sample population and 65 subjects from Swiss sample population were analysed. The proposed automatic segmentation approach shows a detection accuracy of 86% in detecting the third metacarpal bone shaft and 90% in accurately locating the distal radius ROI. Comparison of the automatic radiogrammetry to the ground truth provided by experts show a mean absolute error of 0.04 mm for cortical width of healthy group, 0.12 mm for cortical width of low bone mass group, 0.22 mm for medullary width of healthy group, and 0.26 mm for medullary width of low bone mass group. Independent sample t-test was used to select the most discriminant features, to be used as input for training the classifiers. Pearson correlation analysis of the extracted features with DXA-BMD of lumbar spine (DXA-LS) shows significantly high correlation values. Classifiers were trained with the most significant features in the Indian and Swiss sample data. Weighted KNN classifier shows the best test accuracy of 78% for Indian sample data and 100% for Swiss sample data. Hence, combined automatic radiogrammetry and texture analysis is shown to be an effective low cost pre-screening tool for early diagnosis of osteoporosis. Copyright © 2018 Elsevier Ltd. All rights reserved.

  7. A novel approach for small sample size family-based association studies: sequential tests.

    PubMed

    Ilk, Ozlem; Rajabli, Farid; Dungul, Dilay Ciglidag; Ozdag, Hilal; Ilk, Hakki Gokhan

    2011-08-01

    In this paper, we propose a sequential probability ratio test (SPRT) to overcome the problem of limited samples in studies related to complex genetic diseases. The results of this novel approach are compared with the ones obtained from the traditional transmission disequilibrium test (TDT) on simulated data. Although TDT classifies single-nucleotide polymorphisms (SNPs) to only two groups (SNPs associated with the disease and the others), SPRT has the flexibility of assigning SNPs to a third group, that is, those for which we do not have enough evidence and should keep sampling. It is shown that SPRT results in smaller ratios of false positives and negatives, as well as better accuracy and sensitivity values for classifying SNPs when compared with TDT. By using SPRT, data with small sample size become usable for an accurate association analysis.

  8. METHODS TO CLASSIFY ENVIRONMENTAL SAMPLES BASED ON MOLD ANALYSES BY QPCR

    EPA Science Inventory

    Quantitative PCR (QPCR) analysis of molds in indoor environmental samples produces highly accurate speciation and enumeration data. In a number of studies, eighty of the most common or potentially problematic indoor molds were identified and quantified in dust samples from homes...

  9. ADEQUACY OF VISUALLY CLASSIFIED PARTICLE COUNT STATISTICS FROM REGIONAL STREAM HABITAT SURVEYS

    EPA Science Inventory

    Streamlined sampling procedures must be used to achieve a sufficient sample size with limited resources in studies undertaken to evaluate habitat status and potential management-related habitat degradation at a regional scale. At the same time, these sampling procedures must achi...

  10. Active learning based segmentation of Crohns disease from abdominal MRI.

    PubMed

    Mahapatra, Dwarikanath; Vos, Franciscus M; Buhmann, Joachim M

    2016-05-01

    This paper proposes a novel active learning (AL) framework, and combines it with semi supervised learning (SSL) for segmenting Crohns disease (CD) tissues from abdominal magnetic resonance (MR) images. Robust fully supervised learning (FSL) based classifiers require lots of labeled data of different disease severities. Obtaining such data is time consuming and requires considerable expertise. SSL methods use a few labeled samples, and leverage the information from many unlabeled samples to train an accurate classifier. AL queries labels of most informative samples and maximizes gain from the labeling effort. Our primary contribution is in designing a query strategy that combines novel context information with classification uncertainty and feature similarity. Combining SSL and AL gives a robust segmentation method that: (1) optimally uses few labeled samples and many unlabeled samples; and (2) requires lower training time. Experimental results show our method achieves higher segmentation accuracy than FSL methods with fewer samples and reduced training effort. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  11. [Studies on the brand traceability of milk powder based on NIR spectroscopy technology].

    PubMed

    Guan, Xiao; Gu, Fang-Qing; Liu, Jing; Yang, Yong-Jian

    2013-10-01

    Brand traceability of several different kinds of milk powder was studied by combining near infrared spectroscopy diffuse reflectance mode with soft independent modeling of class analogy (SIMCA) in the present paper. The near infrared spectrum of 138 samples, including 54 Guangming milk powder samples, 43 Netherlands samples, and 33 Nestle samples and 8 Yili samples, were collected. After pretreatment of full spectrum data variables in training set, principal component analysis was performed, and the contribution rate of the cumulative variance of the first three principal components was about 99.07%. Milk powder principal component regression model based on SIMCA was established, and used to classify the milk powder samples in prediction sets. The results showed that the recognition rate of Guangming milk powder, Netherlands milk powder and Nestle milk powder was 78%, 75% and 100%, the rejection rate was 100%, 87%, and 88%, respectively. Therefore, the near infrared spectroscopy combined with SIMCA model can classify milk powder with high accuracy, and is a promising identification method of milk powder variety.

  12. The use of wavelength dispersive X-ray fluorescence in the identification of the elemental composition of vanilla samples and the determination of the geographic origin by discriminant function analysis.

    PubMed

    Hondrogiannis, Ellen; Rotta, Kathryn; Zapf, Charles M

    2013-03-01

    Sixteen elements found in 37 vanilla samples from Madagascar, Uganda, India, Indonesia (all Vanilla planifolia species), and Papa New Guinea (Vanilla tahitensis species) were measured by wavelength dispersive X-ray fluorescence (WDXRF) spectroscopy for the purpose of determining the elemental concentrations to discriminate among the origins. Pellets were prepared of the samples and elemental concentrations were calculated based on calibration curves created using 4 Natl. Inst. of Standards and Technology (NIST) standards. Discriminant analysis was used to successfully classify the vanilla samples by their species and their geographical region. Our method allows for higher throughput in the rapid screening of vanilla samples in less time than analytical methods currently available. Wavelength dispersive X-ray fluorescence spectroscopy and discriminant function analysis were used to classify vanilla from different origins resulting in a model that could potentially serve to rapidly validate these samples before purchasing from a producer. © 2013 Institute of Food Technologists®

  13. Bayesian geostatistics in health cartography: the perspective of malaria.

    PubMed

    Patil, Anand P; Gething, Peter W; Piel, Frédéric B; Hay, Simon I

    2011-06-01

    Maps of parasite prevalences and other aspects of infectious diseases that vary in space are widely used in parasitology. However, spatial parasitological datasets rarely, if ever, have sufficient coverage to allow exact determination of such maps. Bayesian geostatistics (BG) is a method for finding a large sample of maps that can explain a dataset, in which maps that do a better job of explaining the data are more likely to be represented. This sample represents the knowledge that the analyst has gained from the data about the unknown true map. BG provides a conceptually simple way to convert these samples to predictions of features of the unknown map, for example regional averages. These predictions account for each map in the sample, yielding an appropriate level of predictive precision.

  14. Bayesian geostatistics in health cartography: the perspective of malaria

    PubMed Central

    Patil, Anand P.; Gething, Peter W.; Piel, Frédéric B.; Hay, Simon I.

    2011-01-01

    Maps of parasite prevalences and other aspects of infectious diseases that vary in space are widely used in parasitology. However, spatial parasitological datasets rarely, if ever, have sufficient coverage to allow exact determination of such maps. Bayesian geostatistics (BG) is a method for finding a large sample of maps that can explain a dataset, in which maps that do a better job of explaining the data are more likely to be represented. This sample represents the knowledge that the analyst has gained from the data about the unknown true map. BG provides a conceptually simple way to convert these samples to predictions of features of the unknown map, for example regional averages. These predictions account for each map in the sample, yielding an appropriate level of predictive precision. PMID:21420361

  15. Design of an audio advertisement dataset

    NASA Astrophysics Data System (ADS)

    Fu, Yutao; Liu, Jihong; Zhang, Qi; Geng, Yuting

    2015-12-01

    Since more and more advertisements swarm into radios, it is necessary to establish an audio advertising dataset which could be used to analyze and classify the advertisement. A method of how to establish a complete audio advertising dataset is presented in this paper. The dataset is divided into four different kinds of advertisements. Each advertisement's sample is given in *.wav file format, and annotated with a txt file which contains its file name, sampling frequency, channel number, broadcasting time and its class. The classifying rationality of the advertisements in this dataset is proved by clustering the different advertisements based on Principal Component Analysis (PCA). The experimental results show that this audio advertisement dataset offers a reliable set of samples for correlative audio advertisement experimental studies.

  16. Guide to the US collection of antarctic meteorites 1976-1988 (everything you wanted to know about the meteorite collection). Antarctic Meteorite Newsletter, Volume 13, Number 1

    NASA Technical Reports Server (NTRS)

    Score, Roberta; Lindstrom, Marilyn M.

    1990-01-01

    The state of the collection of Antarctic Meteorites is summarized. This guide is intended to assist investigators plan their meteorite research and select and request samples. Useful information is presented for all classified meteorites from 1976 to 1988 collections, as of Sept. 1989. The meteorite collection has grown over 13 years to include 4264 samples of which 2754 have been classified. Most of the unclassified meteorites are ordinary chondrites because the collections have been culled for specimens of special petrologic type. The guide consists of two large classification tables. They are preceded by a list of sample locations and important notes to make the tables understandable.

  17. Accuracy of Bayes and Logistic Regression Subscale Probabilities for Educational and Certification Tests

    ERIC Educational Resources Information Center

    Rudner, Lawrence

    2016-01-01

    In the machine learning literature, it is commonly accepted as fact that as calibration sample sizes increase, Naïve Bayes classifiers initially outperform Logistic Regression classifiers in terms of classification accuracy. Applied to subtests from an on-line final examination and from a highly regarded certification examination, this study shows…

  18. The Impact of Five Missing Data Treatments on a Cross-Classified Random Effects Model

    ERIC Educational Resources Information Center

    Hoelzle, Braden R.

    2012-01-01

    The present study compared the performance of five missing data treatment methods within a Cross-Classified Random Effects Model environment under various levels and patterns of missing data given a specified sample size. Prior research has shown the varying effect of missing data treatment options within the context of numerous statistical…

  19. Chemical classification of iron meteorites. XI - Multi-element studies of 38 new irons and the high abundance of ungrouped irons from Antarctica

    NASA Technical Reports Server (NTRS)

    Wasson, John T.; Ouyang, Xinwei; Wang, Jianmin; Jerde, Eric

    1989-01-01

    Concentrations of 14 elements in the metal of 38 iron meteorites and a pallasite are reported. Three samples are paired with previously classified irons, raising the number of well-classified, independent iron meteorites to 598. Several of the new irons are from Antarctica. Of 24 independent irons from Antarctica, eight are ungrouped, a much higher fraction than that among all classified irons. The difference is probably related to the fact that the median mass of Antarctic irons is about two orders of magnitude smaller than that of non-Antarctic irons. Smaller meteoroids may tend to sample a larger number of asteroidal source regions, perhaps because small meteoroids tend to have higher ejection velocities or because they have random-walked a greater increment of orbital semimajor axis away from that of the parent body.

  20. 2010 NCCA oligochaete trophic index results to inform benthic ...

    EPA Pesticide Factsheets

    Over 400 sites were sampled in the nearshore of the U.S. Great Lakes during the National Coastal Condition Assessment (NCCA) field survey in summer 2010. To assess benthic ecological condition, 393 PONARs were attempted, and collected macroinvertebrates were identified and enumerated. Biological condition at each site was classified as good, fair or poor using the Oligochaete Trophic Index (OTI). The Great Lakes coasts were then classified by calculating percent area within a condition class: good (20.3%), fair (11.6%), and poor (18.0%). Due to unsuccessful PONARs, unclassified oligochaetes or no oligochaetes captured, 50.1% of the sampled area was classified as missing. In order to help focus future discussion and development of a Great Lakes benthic index, OTI results were compared to other traditional biotic integrity indices. In addition, unclassified sites were examined to determine possible methods or metrics that could prevent missing data in a newly developed index. not applicable

Top