classification model applied: Topics by Science.gov

Sample records for classification model applied

Looking at the ICF and human communication through the lens of classification theory.

PubMed

Walsh, Regina

2011-08-01

This paper explores the insights that classification theory can provide about the application of the International Classification of Functioning, Disability and Health (ICF) to communication. It first considers the relationship between conceptual models and classification systems, highlighting that classification systems in speech-language pathology (SLP) have not historically been based on conceptual models of human communication. It then overviews the key concepts and criteria of classification theory. Applying classification theory to the ICF and communication raises a number of issues, some previously highlighted through clinical application. Six focus questions from classification theory are used to explore these issues, and to propose the creation of an ICF-related conceptual model of communicating for the field of communication disability, which would address some of the issues raised. Developing a conceptual model of communication for SLP purposes closely articulated with the ICF would foster productive intra-professional discourse, while at the same time allow the profession to continue to use the ICF for purposes in inter-disciplinary discourse. The paper concludes by suggesting the insights of classification theory can assist professionals to apply the ICF to communication with the necessary rigour, and to work further in developing a conceptual model of human communication.
Landcover Classification Using Deep Fully Convolutional Neural Networks

NASA Astrophysics Data System (ADS)

Wang, J.; Li, X.; Zhou, S.; Tang, J.

2017-12-01

Land cover classification has always been an essential application in remote sensing. Certain image features are needed for land cover classification whether it is based on pixel or object-based methods. Different from other machine learning methods, deep learning model not only extracts useful information from multiple bands/attributes, but also learns spatial characteristics. In recent years, deep learning methods have been developed rapidly and widely applied in image recognition, semantic understanding, and other application domains. However, there are limited studies applying deep learning methods in land cover classification. In this research, we used fully convolutional networks (FCN) as the deep learning model to classify land covers. The National Land Cover Database (NLCD) within the state of Kansas was used as training dataset and Landsat images were classified using the trained FCN model. We also applied an image segmentation method to improve the original results from the FCN model. In addition, the pros and cons between deep learning and several machine learning methods were compared and explored. Our research indicates: (1) FCN is an effective classification model with an overall accuracy of 75%; (2) image segmentation improves the classification results with better match of spatial patterns; (3) FCN has an excellent ability of learning which can attains higher accuracy and better spatial patterns compared with several machine learning methods.
Improved supervised classification of accelerometry data to distinguish behaviors of soaring birds.

PubMed

Sur, Maitreyi; Suffredini, Tony; Wessells, Stephen M; Bloom, Peter H; Lanzone, Michael; Blackshire, Sheldon; Sridhar, Srisarguru; Katzner, Todd

2017-01-01

Soaring birds can balance the energetic costs of movement by switching between flapping, soaring and gliding flight. Accelerometers can allow quantification of flight behavior and thus a context to interpret these energetic costs. However, models to interpret accelerometry data are still being developed, rarely trained with supervised datasets, and difficult to apply. We collected accelerometry data at 140Hz from a trained golden eagle (Aquila chrysaetos) whose flight we recorded with video that we used to characterize behavior. We applied two forms of supervised classifications, random forest (RF) models and K-nearest neighbor (KNN) models. The KNN model was substantially easier to implement than the RF approach but both were highly accurate in classifying basic behaviors such as flapping (85.5% and 83.6% accurate, respectively), soaring (92.8% and 87.6%) and sitting (84.1% and 88.9%) with overall accuracies of 86.6% and 92.3% respectively. More detailed classification schemes, with specific behaviors such as banking and straight flights were well classified only by the KNN model (91.24% accurate; RF = 61.64% accurate). The RF model maintained its accuracy of classifying basic behavior classification accuracy of basic behaviors at sampling frequencies as low as 10Hz, the KNN at sampling frequencies as low as 20Hz. Classification of accelerometer data collected from free ranging birds demonstrated a strong dependence of predicted behavior on the type of classification model used. Our analyses demonstrate the consequence of different approaches to classification of accelerometry data, the potential to optimize classification algorithms with validated flight behaviors to improve classification accuracy, ideal sampling frequencies for different classification algorithms, and a number of ways to improve commonly used analytical techniques and best practices for classification of accelerometry data.
Improved supervised classification of accelerometry data to distinguish behaviors of soaring birds

PubMed Central

Suffredini, Tony; Wessells, Stephen M.; Bloom, Peter H.; Lanzone, Michael; Blackshire, Sheldon; Sridhar, Srisarguru; Katzner, Todd

2017-01-01

Soaring birds can balance the energetic costs of movement by switching between flapping, soaring and gliding flight. Accelerometers can allow quantification of flight behavior and thus a context to interpret these energetic costs. However, models to interpret accelerometry data are still being developed, rarely trained with supervised datasets, and difficult to apply. We collected accelerometry data at 140Hz from a trained golden eagle (Aquila chrysaetos) whose flight we recorded with video that we used to characterize behavior. We applied two forms of supervised classifications, random forest (RF) models and K-nearest neighbor (KNN) models. The KNN model was substantially easier to implement than the RF approach but both were highly accurate in classifying basic behaviors such as flapping (85.5% and 83.6% accurate, respectively), soaring (92.8% and 87.6%) and sitting (84.1% and 88.9%) with overall accuracies of 86.6% and 92.3% respectively. More detailed classification schemes, with specific behaviors such as banking and straight flights were well classified only by the KNN model (91.24% accurate; RF = 61.64% accurate). The RF model maintained its accuracy of classifying basic behavior classification accuracy of basic behaviors at sampling frequencies as low as 10Hz, the KNN at sampling frequencies as low as 20Hz. Classification of accelerometer data collected from free ranging birds demonstrated a strong dependence of predicted behavior on the type of classification model used. Our analyses demonstrate the consequence of different approaches to classification of accelerometry data, the potential to optimize classification algorithms with validated flight behaviors to improve classification accuracy, ideal sampling frequencies for different classification algorithms, and a number of ways to improve commonly used analytical techniques and best practices for classification of accelerometry data. PMID:28403159
Improved supervised classification of accelerometry data to distinguish behaviors of soaring birds

USGS Publications Warehouse

Sur, Maitreyi; Suffredini, Tony; Wessells, Stephen M.; Bloom, Peter H.; Lanzone, Michael J.; Blackshire, Sheldon; Sridhar, Srisarguru; Katzner, Todd

2017-01-01

Soaring birds can balance the energetic costs of movement by switching between flapping, soaring and gliding flight. Accelerometers can allow quantification of flight behavior and thus a context to interpret these energetic costs. However, models to interpret accelerometry data are still being developed, rarely trained with supervised datasets, and difficult to apply. We collected accelerometry data at 140Hz from a trained golden eagle (Aquila chrysaetos) whose flight we recorded with video that we used to characterize behavior. We applied two forms of supervised classifications, random forest (RF) models and K-nearest neighbor (KNN) models. The KNN model was substantially easier to implement than the RF approach but both were highly accurate in classifying basic behaviors such as flapping (85.5% and 83.6% accurate, respectively), soaring (92.8% and 87.6%) and sitting (84.1% and 88.9%) with overall accuracies of 86.6% and 92.3% respectively. More detailed classification schemes, with specific behaviors such as banking and straight flights were well classified only by the KNN model (91.24% accurate; RF = 61.64% accurate). The RF model maintained its accuracy of classifying basic behavior classification accuracy of basic behaviors at sampling frequencies as low as 10Hz, the KNN at sampling frequencies as low as 20Hz. Classification of accelerometer data collected from free ranging birds demonstrated a strong dependence of predicted behavior on the type of classification model used. Our analyses demonstrate the consequence of different approaches to classification of accelerometry data, the potential to optimize classification algorithms with validated flight behaviors to improve classification accuracy, ideal sampling frequencies for different classification algorithms, and a number of ways to improve commonly used analytical techniques and best practices for classification of accelerometry data.
An Evaluation of Item Response Theory Classification Accuracy and Consistency Indices

ERIC Educational Resources Information Center

Wyse, Adam E.; Hao, Shiqi

2012-01-01

This article introduces two new classification consistency indices that can be used when item response theory (IRT) models have been applied. The new indices are shown to be related to Rudner's classification accuracy index and Guo's classification accuracy index. The Rudner- and Guo-based classification accuracy and consistency indices are…
One input-class and two input-class classifications for differentiating olive oil from other edible vegetable oils by use of the normal-phase liquid chromatography fingerprint of the methyl-transesterified fraction.

PubMed

Jiménez-Carvelo, Ana M; Pérez-Castaño, Estefanía; González-Casado, Antonio; Cuadros-Rodríguez, Luis

2017-04-15

A new method for differentiation of olive oil (independently of the quality category) from other vegetable oils (canola, safflower, corn, peanut, seeds, grapeseed, palm, linseed, sesame and soybean) has been developed. The analytical procedure for chromatographic fingerprinting of the methyl-transesterified fraction of each vegetable oil, using normal-phase liquid chromatography, is described and the chemometric strategies applied and discussed. Some chemometric methods, such as k-nearest neighbours (kNN), partial least squared-discriminant analysis (PLS-DA), support vector machine classification analysis (SVM-C), and soft independent modelling of class analogies (SIMCA), were applied to build classification models. Performance of the classification was evaluated and ranked using several classification quality metrics. The discriminant analysis, based on the use of one input-class, (plus a dummy class) was applied for the first time in this study. Copyright © 2016 Elsevier Ltd. All rights reserved.
Model-based classification of CPT data and automated lithostratigraphic mapping for high-resolution characterization of a heterogeneous sedimentary aquifer

PubMed Central

Mallants, Dirk; Batelaan, Okke; Gedeon, Matej; Huysmans, Marijke; Dassargues, Alain

2017-01-01

Cone penetration testing (CPT) is one of the most efficient and versatile methods currently available for geotechnical, lithostratigraphic and hydrogeological site characterization. Currently available methods for soil behaviour type classification (SBT) of CPT data however have severe limitations, often restricting their application to a local scale. For parameterization of regional groundwater flow or geotechnical models, and delineation of regional hydro- or lithostratigraphy, regional SBT classification would be very useful. This paper investigates the use of model-based clustering for SBT classification, and the influence of different clustering approaches on the properties and spatial distribution of the obtained soil classes. We additionally propose a methodology for automated lithostratigraphic mapping of regionally occurring sedimentary units using SBT classification. The methodology is applied to a large CPT dataset, covering a groundwater basin of ~60 km2 with predominantly unconsolidated sandy sediments in northern Belgium. Results show that the model-based approach is superior in detecting the true lithological classes when compared to more frequently applied unsupervised classification approaches or literature classification diagrams. We demonstrate that automated mapping of lithostratigraphic units using advanced SBT classification techniques can provide a large gain in efficiency, compared to more time-consuming manual approaches and yields at least equally accurate results. PMID:28467468
Model-based classification of CPT data and automated lithostratigraphic mapping for high-resolution characterization of a heterogeneous sedimentary aquifer.

PubMed

Rogiers, Bart; Mallants, Dirk; Batelaan, Okke; Gedeon, Matej; Huysmans, Marijke; Dassargues, Alain

2017-01-01

Cone penetration testing (CPT) is one of the most efficient and versatile methods currently available for geotechnical, lithostratigraphic and hydrogeological site characterization. Currently available methods for soil behaviour type classification (SBT) of CPT data however have severe limitations, often restricting their application to a local scale. For parameterization of regional groundwater flow or geotechnical models, and delineation of regional hydro- or lithostratigraphy, regional SBT classification would be very useful. This paper investigates the use of model-based clustering for SBT classification, and the influence of different clustering approaches on the properties and spatial distribution of the obtained soil classes. We additionally propose a methodology for automated lithostratigraphic mapping of regionally occurring sedimentary units using SBT classification. The methodology is applied to a large CPT dataset, covering a groundwater basin of ~60 km2 with predominantly unconsolidated sandy sediments in northern Belgium. Results show that the model-based approach is superior in detecting the true lithological classes when compared to more frequently applied unsupervised classification approaches or literature classification diagrams. We demonstrate that automated mapping of lithostratigraphic units using advanced SBT classification techniques can provide a large gain in efficiency, compared to more time-consuming manual approaches and yields at least equally accurate results.
Assessing the Accuracy and Consistency of Language Proficiency Classification under Competing Measurement Models

ERIC Educational Resources Information Center

Zhang, Bo

2010-01-01

This article investigates how measurement models and statistical procedures can be applied to estimate the accuracy of proficiency classification in language testing. The paper starts with a concise introduction of four measurement models: the classical test theory (CTT) model, the dichotomous item response theory (IRT) model, the testlet response…
Diagnostic Classification Models: Thoughts and Future Directions

ERIC Educational Resources Information Center

Henson, Robert A.

2009-01-01

The paper by Drs. Rupp and Templin provides a much needed step toward the general application of diagnostic classification modeling (DCMs). The authors have provided a summary of many of the concepts that one must consider to properly apply a DCM (which ranges from model selection and estimation, to assessing the appropriateness of the model using…
A Pruning Neural Network Model in Credit Classification Analysis

PubMed Central

Tang, Yajiao; Ji, Junkai; Dai, Hongwei; Yu, Yang; Todo, Yuki

2018-01-01

Nowadays, credit classification models are widely applied because they can help financial decision-makers to handle credit classification issues. Among them, artificial neural networks (ANNs) have been widely accepted as the convincing methods in the credit industry. In this paper, we propose a pruning neural network (PNN) and apply it to solve credit classification problem by adopting the well-known Australian and Japanese credit datasets. The model is inspired by synaptic nonlinearity of a dendritic tree in a biological neural model. And it is trained by an error back-propagation algorithm. The model is capable of realizing a neuronal pruning function by removing the superfluous synapses and useless dendrites and forms a tidy dendritic morphology at the end of learning. Furthermore, we utilize logic circuits (LCs) to simulate the dendritic structures successfully which makes PNN be implemented on the hardware effectively. The statistical results of our experiments have verified that PNN obtains superior performance in comparison with other classical algorithms in terms of accuracy and computational efficiency. PMID:29606961
Time-reversal imaging for classification of submerged elastic targets via Gibbs sampling and the Relevance Vector Machine.

PubMed

Dasgupta, Nilanjan; Carin, Lawrence

2005-04-01

Time-reversal imaging (TRI) is analogous to matched-field processing, although TRI is typically very wideband and is appropriate for subsequent target classification (in addition to localization). Time-reversal techniques, as applied to acoustic target classification, are highly sensitive to channel mismatch. Hence, it is crucial to estimate the channel parameters before time-reversal imaging is performed. The channel-parameter statistics are estimated here by applying a geoacoustic inversion technique based on Gibbs sampling. The maximum a posteriori (MAP) estimate of the channel parameters are then used to perform time-reversal imaging. Time-reversal implementation requires a fast forward model, implemented here by a normal-mode framework. In addition to imaging, extraction of features from the time-reversed images is explored, with these applied to subsequent target classification. The classification of time-reversed signatures is performed by the relevance vector machine (RVM). The efficacy of the technique is analyzed on simulated in-channel data generated by a free-field finite element method (FEM) code, in conjunction with a channel propagation model, wherein the final classification performance is demonstrated to be relatively insensitive to the associated channel parameters. The underlying theory of Gibbs sampling and TRI are presented along with the feature extraction and target classification via the RVM.
A Method for Application of Classification Tree Models to Map Aquatic Vegetation Using Remotely Sensed Images from Different Sensors and Dates

PubMed Central

Jiang, Hao; Zhao, Dehua; Cai, Ying; An, Shuqing

2012-01-01

In previous attempts to identify aquatic vegetation from remotely-sensed images using classification trees (CT), the images used to apply CT models to different times or locations necessarily originated from the same satellite sensor as that from which the original images used in model development came, greatly limiting the application of CT. We have developed an effective normalization method to improve the robustness of CT models when applied to images originating from different sensors and dates. A total of 965 ground-truth samples of aquatic vegetation types were obtained in 2009 and 2010 in Taihu Lake, China. Using relevant spectral indices (SI) as classifiers, we manually developed a stable CT model structure and then applied a standard CT algorithm to obtain quantitative (optimal) thresholds from 2009 ground-truth data and images from Landsat7-ETM+, HJ-1B-CCD, Landsat5-TM and ALOS-AVNIR-2 sensors. Optimal CT thresholds produced average classification accuracies of 78.1%, 84.7% and 74.0% for emergent vegetation, floating-leaf vegetation and submerged vegetation, respectively. However, the optimal CT thresholds for different sensor images differed from each other, with an average relative variation (RV) of 6.40%. We developed and evaluated three new approaches to normalizing the images. The best-performing method (Method of 0.1% index scaling) normalized the SI images using tailored percentages of extreme pixel values. Using the images normalized by Method of 0.1% index scaling, CT models for a particular sensor in which thresholds were replaced by those from the models developed for images originating from other sensors provided average classification accuracies of 76.0%, 82.8% and 68.9% for emergent vegetation, floating-leaf vegetation and submerged vegetation, respectively. Applying the CT models developed for normalized 2009 images to 2010 images resulted in high classification (78.0%–93.3%) and overall (92.0%–93.1%) accuracies. Our results suggest that Method of 0.1% index scaling provides a feasible way to apply CT models directly to images from sensors or time periods that differ from those of the images used to develop the original models.
Predicting Flavonoid UGT Regioselectivity

PubMed Central

Jackson, Rhydon; Knisley, Debra; McIntosh, Cecilia; Pfeiffer, Phillip

2011-01-01

Machine learning was applied to a challenging and biologically significant protein classification problem: the prediction of avonoid UGT acceptor regioselectivity from primary sequence. Novel indices characterizing graphical models of residues were proposed and found to be widely distributed among existing amino acid indices and to cluster residues appropriately. UGT subsequences biochemically linked to regioselectivity were modeled as sets of index sequences. Several learning techniques incorporating these UGT models were compared with classifications based on standard sequence alignment scores. These techniques included an application of time series distance functions to protein classification. Time series distances defined on the index sequences were used in nearest neighbor and support vector machine classifiers. Additionally, Bayesian neural network classifiers were applied to the index sequences. The experiments identified improvements over the nearest neighbor and support vector machine classifications relying on standard alignment similarity scores, as well as strong correlations between specific subsequences and regioselectivities. PMID:21747849
Combining Unsupervised and Supervised Classification to Build User Models for Exploratory Learning Environments

ERIC Educational Resources Information Center

Amershi, Saleema; Conati, Cristina

2009-01-01

In this paper, we present a data-based user modeling framework that uses both unsupervised and supervised classification to build student models for exploratory learning environments. We apply the framework to build student models for two different learning environments and using two different data sources (logged interface and eye-tracking data).…
A classification scheme for edge-localized modes based on their probability distributions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shabbir, A., E-mail: aqsa.shabbir@ugent.be; Max Planck Institute for Plasma Physics, D-85748 Garching; Hornung, G.

We present here an automated classification scheme which is particularly well suited to scenarios where the parameters have significant uncertainties or are stochastic quantities. To this end, the parameters are modeled with probability distributions in a metric space and classification is conducted using the notion of nearest neighbors. The presented framework is then applied to the classification of type I and type III edge-localized modes (ELMs) from a set of carbon-wall plasmas at JET. This provides a fast, standardized classification of ELM types which is expected to significantly reduce the effort of ELM experts in identifying ELM types. Further, themore » classification scheme is general and can be applied to various other plasma phenomena as well.« less
Motor Oil Classification using Color Histograms and Pattern Recognition Techniques.

PubMed

Ahmadi, Shiva; Mani-Varnosfaderani, Ahmad; Habibi, Biuck

2018-04-20

Motor oil classification is important for quality control and the identification of oil adulteration. In thiswork, we propose a simple, rapid, inexpensive and nondestructive approach based on image analysis and pattern recognition techniques for the classification of nine different types of motor oils according to their corresponding color histograms. For this, we applied color histogram in different color spaces such as red green blue (RGB), grayscale, and hue saturation intensity (HSI) in order to extract features that can help with the classification procedure. These color histograms and their combinations were used as input for model development and then were statistically evaluated by using linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and support vector machine (SVM) techniques. Here, two common solutions for solving a multiclass classification problem were applied: (1) transformation to binary classification problem using a one-against-all (OAA) approach and (2) extension from binary classifiers to a single globally optimized multilabel classification model. In the OAA strategy, LDA, QDA, and SVM reached up to 97% in terms of accuracy, sensitivity, and specificity for both the training and test sets. In extension from binary case, despite good performances by the SVM classification model, QDA and LDA provided better results up to 92% for RGB-grayscale-HSI color histograms and up to 93% for the HSI color map, respectively. In order to reduce the numbers of independent variables for modeling, a principle component analysis algorithm was used. Our results suggest that the proposed method is promising for the identification and classification of different types of motor oils.
Recent development of feature extraction and classification multispectral/hyperspectral images: a systematic literature review

NASA Astrophysics Data System (ADS)

Setiyoko, A.; Dharma, I. G. W. S.; Haryanto, T.

2017-01-01

Multispectral data and hyperspectral data acquired from satellite sensor have the ability in detecting various objects on the earth ranging from low scale to high scale modeling. These data are increasingly being used to produce geospatial information for rapid analysis by running feature extraction or classification process. Applying the most suited model for this data mining is still challenging because there are issues regarding accuracy and computational cost. This research aim is to develop a better understanding regarding object feature extraction and classification applied for satellite image by systematically reviewing related recent research projects. A method used in this research is based on PRISMA statement. After deriving important points from trusted sources, pixel based and texture-based feature extraction techniques are promising technique to be analyzed more in recent development of feature extraction and classification.
Autoradiographic Distribution and Applied Pharmacological Characteristics of Dextromethorphan and Related Antitissue/Anticonvulsant Drugs and Novel Analogs

DTIC Science & Technology

1993-10-01

AD-A273 247 AD____ CONTRACT NO: DAMD17-90-C-0124 TITLE: AUTORADIOGRAPHIC DISTRIBUTION AND APPLIED PHARMACOLOGICAL CHARACTERISTICS OF DEXTROMETHORPHAN ...Anticonvulsants, Antitissue, Dextromethorphan , Autoradiography, Pharmacokinetics 16. PRICE CODE 17. SECURITY CLASSIFICATION 18. SECURITY CLASSIFICATION...middle cerebral artery occlusion model with dextromethorphan , carbetapentane and three of the carbetapentane analogues, 11, B and D, which were

Remote sensing of aquatic vegetation distribution in Taihu Lake using an improved classification tree with modified thresholds.

PubMed

Zhao, Dehua; Jiang, Hao; Yang, Tangwu; Cai, Ying; Xu, Delin; An, Shuqing

2012-03-01

Classification trees (CT) have been used successfully in the past to classify aquatic vegetation from spectral indices (SI) obtained from remotely-sensed images. However, applying CT models developed for certain image dates to other time periods within the same year or among different years can reduce the classification accuracy. In this study, we developed CT models with modified thresholds using extreme SI values (CT(m)) to improve the stability of the models when applying them to different time periods. A total of 903 ground-truth samples were obtained in September of 2009 and 2010 and classified as emergent, floating-leaf, or submerged vegetation or other cover types. Classification trees were developed for 2009 (Model-09) and 2010 (Model-10) using field samples and a combination of two images from winter and summer. Overall accuracies of these models were 92.8% and 94.9%, respectively, which confirmed the ability of CT analysis to map aquatic vegetation in Taihu Lake. However, Model-10 had only 58.9-71.6% classification accuracy and 31.1-58.3% agreement (i.e., pixels classified the same in the two maps) for aquatic vegetation when it was applied to image pairs from both a different time period in 2010 and a similar time period in 2009. We developed a method to estimate the effects of extrinsic (EF) and intrinsic (IF) factors on model uncertainty using Modis images. Results indicated that 71.1% of the instability in classification between time periods was due to EF, which might include changes in atmospheric conditions, sun-view angle and water quality. The remainder was due to IF, such as phenological and growth status differences between time periods. The modified version of Model-10 (i.e. CT(m)) performed better than traditional CT with different image dates. When applied to 2009 images, the CT(m) version of Model-10 had very similar thresholds and performance as Model-09, with overall accuracies of 92.8% and 90.5% for Model-09 and the CT(m) version of Model-10, respectively. CT(m) decreased the variability related to EF and IF and thereby improved the applicability of the models to different time periods. In both practice and theory, our results suggested that CT(m) was more stable than traditional CT models and could be used to map aquatic vegetation in time periods other than the one for which the model was developed. Copyright © 2011 Elsevier Ltd. All rights reserved.
Patterns of Use of an Agent-Based Model and a System Dynamics Model: The Application of Patterns of Use and the Impacts on Learning Outcomes

ERIC Educational Resources Information Center

Thompson, Kate; Reimann, Peter

2010-01-01

A classification system that was developed for the use of agent-based models was applied to strategies used by school-aged students to interrogate an agent-based model and a system dynamics model. These were compared, and relationships between learning outcomes and the strategies used were also analysed. It was found that the classification system…
Measuring CAMD technique performance. 2. How "druglike" are drugs? Implications of Random test set selection exemplified using druglikeness classification models.

PubMed

Good, Andrew C; Hermsmeier, Mark A

2007-01-01

Research into the advancement of computer-aided molecular design (CAMD) has a tendency to focus on the discipline of algorithm development. Such efforts are often wrought to the detriment of the data set selection and analysis used in said algorithm validation. Here we highlight the potential problems this can cause in the context of druglikeness classification. More rigorous efforts are applied to the selection of decoy (nondruglike) molecules from the ACD. Comparisons are made between model performance using the standard technique of random test set creation with test sets derived from explicit ontological separation by drug class. The dangers of viewing druglike space as sufficiently coherent to permit simple classification are highlighted. In addition the issues inherent in applying unfiltered data and random test set selection to (Q)SAR models utilizing large and supposedly heterogeneous databases are discussed.
Chinese Sentence Classification Based on Convolutional Neural Network

NASA Astrophysics Data System (ADS)

Gu, Chengwei; Wu, Ming; Zhang, Chuang

2017-10-01

Sentence classification is one of the significant issues in Natural Language Processing (NLP). Feature extraction is often regarded as the key point for natural language processing. Traditional ways based on machine learning can not take high level features into consideration, such as Naive Bayesian Model. The neural network for sentence classification can make use of contextual information to achieve greater results in sentence classification tasks. In this paper, we focus on classifying Chinese sentences. And the most important is that we post a novel architecture of Convolutional Neural Network (CNN) to apply on Chinese sentence classification. In particular, most of the previous methods often use softmax classifier for prediction, we embed a linear support vector machine to substitute softmax in the deep neural network model, minimizing a margin-based loss to get a better result. And we use tanh as an activation function, instead of ReLU. The CNN model improve the result of Chinese sentence classification tasks. Experimental results on the Chinese news title database validate the effectiveness of our model.
Multilingual Twitter Sentiment Classification: The Role of Human Annotators

PubMed Central

Mozetič, Igor; Grčar, Miha; Smailović, Jasmina

2016-01-01

What are the limits of automated Twitter sentiment classification? We analyze a large set of manually labeled tweets in different languages, use them as training data, and construct automated classification models. It turns out that the quality of classification models depends much more on the quality and size of training data than on the type of the model trained. Experimental results indicate that there is no statistically significant difference between the performance of the top classification models. We quantify the quality of training data by applying various annotator agreement measures, and identify the weakest points of different datasets. We show that the model performance approaches the inter-annotator agreement when the size of the training set is sufficiently large. However, it is crucial to regularly monitor the self- and inter-annotator agreements since this improves the training datasets and consequently the model performance. Finally, we show that there is strong evidence that humans perceive the sentiment classes (negative, neutral, and positive) as ordered. PMID:27149621
A Wavelet Polarization Decomposition Net Model for Polarimetric SAR Image Classification

NASA Astrophysics Data System (ADS)

He, Chu; Ou, Dan; Yang, Teng; Wu, Kun; Liao, Mingsheng; Chen, Erxue

2014-11-01

In this paper, a deep model based on wavelet texture has been proposed for Polarimetric Synthetic Aperture Radar (PolSAR) image classification inspired by recent successful deep learning method. Our model is supposed to learn powerful and informative representations to improve the generalization ability for the complex scene classification tasks. Given the influence of speckle noise in Polarimetric SAR image, wavelet polarization decomposition is applied first to obtain basic and discriminative texture features which are then embedded into a Deep Neural Network (DNN) in order to compose multi-layer higher representations. We demonstrate that the model can produce a powerful representation which can capture some untraceable information from Polarimetric SAR images and show a promising achievement in comparison with other traditional SAR image classification methods for the SAR image dataset.
Modeling of Complex Mixtures: JP-8 Toxicokinetics

DTIC Science & Technology

2008-10-01

generic tissue compartments in which we have combined diffusion limitation and deep tissue (global tissue model). We also applied a QSAR approach for...SUBJECT TERMS jet fuel, JP-8, PBPK modeling, complex mixtures, nonane, decane, naphthalene, QSAR , alternative fuels 16. SECURITY CLASSIFICATION OF...necessary, to apply to the interaction of specific compounds with specific tissues. We have also applied a QSAR approach for estimating blood and tissue
Comparing the performance of flat and hierarchical Habitat/Land-Cover classification models in a NATURA 2000 site

NASA Astrophysics Data System (ADS)

Gavish, Yoni; O'Connell, Jerome; Marsh, Charles J.; Tarantino, Cristina; Blonda, Palma; Tomaselli, Valeria; Kunin, William E.

2018-02-01

The increasing need for high quality Habitat/Land-Cover (H/LC) maps has triggered considerable research into novel machine-learning based classification models. In many cases, H/LC classes follow pre-defined hierarchical classification schemes (e.g., CORINE), in which fine H/LC categories are thematically nested within more general categories. However, none of the existing machine-learning algorithms account for this pre-defined hierarchical structure. Here we introduce a novel Random Forest (RF) based application of hierarchical classification, which fits a separate local classification model in every branching point of the thematic tree, and then integrates all the different local models to a single global prediction. We applied the hierarchal RF approach in a NATURA 2000 site in Italy, using two land-cover (CORINE, FAO-LCCS) and one habitat classification scheme (EUNIS) that differ from one another in the shape of the class hierarchy. For all 3 classification schemes, both the hierarchical model and a flat model alternative provided accurate predictions, with kappa values mostly above 0.9 (despite using only 2.2-3.2% of the study area as training cells). The flat approach slightly outperformed the hierarchical models when the hierarchy was relatively simple, while the hierarchical model worked better under more complex thematic hierarchies. Most misclassifications came from habitat pairs that are thematically distant yet spectrally similar. In 2 out of 3 classification schemes, the additional constraints of the hierarchical model resulted with fewer such serious misclassifications relative to the flat model. The hierarchical model also provided valuable information on variable importance which can shed light into "black-box" based machine learning algorithms like RF. We suggest various ways by which hierarchical classification models can increase the accuracy and interpretability of H/LC classification maps.
Median Filter Noise Reduction of Image and Backpropagation Neural Network Model for Cervical Cancer Classification

NASA Astrophysics Data System (ADS)

Wutsqa, D. U.; Marwah, M.

2017-06-01

In this paper, we consider spatial operation median filter to reduce the noise in the cervical images yielded by colposcopy tool. The backpropagation neural network (BPNN) model is applied to the colposcopy images to classify cervical cancer. The classification process requires an image extraction by using a gray level co-occurrence matrix (GLCM) method to obtain image features that are used as inputs of BPNN model. The advantage of noise reduction is evaluated by comparing the performances of BPNN models with and without spatial operation median filter. The experimental result shows that the spatial operation median filter can improve the accuracy of the BPNN model for cervical cancer classification.
Computer discrimination procedures applicable to aerial and ERTS multispectral data

NASA Technical Reports Server (NTRS)

Richardson, A. J.; Torline, R. J.; Allen, W. A.

1970-01-01

Two statistical models are compared in the classification of crops recorded on color aerial photographs. A theory of error ellipses is applied to the pattern recognition problem. An elliptical boundary condition classification model (EBC), useful for recognition of candidate patterns, evolves out of error ellipse theory. The EBC model is compared with the minimum distance to the mean (MDM) classification model in terms of pattern recognition ability. The pattern recognition results of both models are interpreted graphically using scatter diagrams to represent measurement space. Measurement space, for this report, is determined by optical density measurements collected from Kodak Ektachrome Infrared Aero Film 8443 (EIR). The EBC model is shown to be a significant improvement over the MDM model.
Gastric precancerous diseases classification using CNN with a concise model.

PubMed

Zhang, Xu; Hu, Weiling; Chen, Fei; Liu, Jiquan; Yang, Yuanhang; Wang, Liangjing; Duan, Huilong; Si, Jianmin

2017-01-01

Gastric precancerous diseases (GPD) may deteriorate into early gastric cancer if misdiagnosed, so it is important to help doctors recognize GPD accurately and quickly. In this paper, we realize the classification of 3-class GPD, namely, polyp, erosion, and ulcer using convolutional neural networks (CNN) with a concise model called the Gastric Precancerous Disease Network (GPDNet). GPDNet introduces fire modules from SqueezeNet to reduce the model size and parameters about 10 times while improving speed for quick classification. To maintain classification accuracy with fewer parameters, we propose an innovative method called iterative reinforced learning (IRL). After training GPDNet from scratch, we apply IRL to fine-tune the parameters whose values are close to 0, and then we take the modified model as a pretrained model for the next training. The result shows that IRL can improve the accuracy about 9% after 6 iterations. The final classification accuracy of our GPDNet was 88.90%, which is promising for clinical GPD recognition.
Data Clustering and Evolving Fuzzy Decision Tree for Data Base Classification Problems

NASA Astrophysics Data System (ADS)

Chang, Pei-Chann; Fan, Chin-Yuan; Wang, Yen-Wen

Data base classification suffers from two well known difficulties, i.e., the high dimensionality and non-stationary variations within the large historic data. This paper presents a hybrid classification model by integrating a case based reasoning technique, a Fuzzy Decision Tree (FDT), and Genetic Algorithms (GA) to construct a decision-making system for data classification in various data base applications. The model is major based on the idea that the historic data base can be transformed into a smaller case-base together with a group of fuzzy decision rules. As a result, the model can be more accurately respond to the current data under classifying from the inductions by these smaller cases based fuzzy decision trees. Hit rate is applied as a performance measure and the effectiveness of our proposed model is demonstrated by experimentally compared with other approaches on different data base classification applications. The average hit rate of our proposed model is the highest among others.
Hybrid Model Based on Genetic Algorithms and SVM Applied to Variable Selection within Fruit Juice Classification

PubMed Central

Fernandez-Lozano, C.; Canto, C.; Gestal, M.; Andrade-Garda, J. M.; Rabuñal, J. R.; Dorado, J.; Pazos, A.

2013-01-01

Given the background of the use of Neural Networks in problems of apple juice classification, this paper aim at implementing a newly developed method in the field of machine learning: the Support Vector Machines (SVM). Therefore, a hybrid model that combines genetic algorithms and support vector machines is suggested in such a way that, when using SVM as a fitness function of the Genetic Algorithm (GA), the most representative variables for a specific classification problem can be selected. PMID:24453933
Modeling ready biodegradability of fragrance materials.

PubMed

Ceriani, Lidia; Papa, Ester; Kovarich, Simona; Boethling, Robert; Gramatica, Paola

2015-06-01

In the present study, quantitative structure activity relationships were developed for predicting ready biodegradability of approximately 200 heterogeneous fragrance materials. Two classification methods, classification and regression tree (CART) and k-nearest neighbors (kNN), were applied to perform the modeling. The models were validated with multiple external prediction sets, and the structural applicability domain was verified by the leverage approach. The best models had good sensitivity (internal ≥80%; external ≥68%), specificity (internal ≥80%; external 73%), and overall accuracy (≥75%). Results from the comparison with BIOWIN global models, based on group contribution method, show that specific models developed in the present study perform better in prediction than BIOWIN6, in particular for the correct classification of not readily biodegradable fragrance materials. © 2015 SETAC.
The Use of Multilevel Modeling to Estimate Which Measures Are Most Influential in Determining an Institution's Placement in Carnegie's New Doctoral/Research University Classification Schema

ERIC Educational Resources Information Center

Micceri, Theodore

2007-01-01

This research sought to determine whether any measure(s) used in the Carnegie Foundation's classification of Doctoral/Research Universities contribute to a greater degree than other measures to final rank placement. Multilevel Modeling (MLM) was applied to all eight of the Carnegie Foundation's predictor measures using final rank…
Active Learning of Classification Models with Likert-Scale Feedback.

PubMed

Xue, Yanbing; Hauskrecht, Milos

2017-01-01

Annotation of classification data by humans can be a time-consuming and tedious process. Finding ways of reducing the annotation effort is critical for building the classification models in practice and for applying them to a variety of classification tasks. In this paper, we develop a new active learning framework that combines two strategies to reduce the annotation effort. First, it relies on label uncertainty information obtained from the human in terms of the Likert-scale feedback. Second, it uses active learning to annotate examples with the greatest expected change. We propose a Bayesian approach to calculate the expectation and an incremental SVM solver to reduce the time complexity of the solvers. We show the combination of our active learning strategy and the Likert-scale feedback can learn classification models more rapidly and with a smaller number of labeled instances than methods that rely on either Likert-scale labels or active learning alone.
Active Learning of Classification Models with Likert-Scale Feedback

PubMed Central

Xue, Yanbing; Hauskrecht, Milos

2017-01-01

Annotation of classification data by humans can be a time-consuming and tedious process. Finding ways of reducing the annotation effort is critical for building the classification models in practice and for applying them to a variety of classification tasks. In this paper, we develop a new active learning framework that combines two strategies to reduce the annotation effort. First, it relies on label uncertainty information obtained from the human in terms of the Likert-scale feedback. Second, it uses active learning to annotate examples with the greatest expected change. We propose a Bayesian approach to calculate the expectation and an incremental SVM solver to reduce the time complexity of the solvers. We show the combination of our active learning strategy and the Likert-scale feedback can learn classification models more rapidly and with a smaller number of labeled instances than methods that rely on either Likert-scale labels or active learning alone. PMID:28979827
A tool for urban soundscape evaluation applying Support Vector Machines for developing a soundscape classification model.

PubMed

Torija, Antonio J; Ruiz, Diego P; Ramos-Ridao, Angel F

2014-06-01

To ensure appropriate soundscape management in urban environments, the urban-planning authorities need a range of tools that enable such a task to be performed. An essential step during the management of urban areas from a sound standpoint should be the evaluation of the soundscape in such an area. In this sense, it has been widely acknowledged that a subjective and acoustical categorization of a soundscape is the first step to evaluate it, providing a basis for designing or adapting it to match people's expectations as well. In this sense, this work proposes a model for automatic classification of urban soundscapes. This model is intended for the automatic classification of urban soundscapes based on underlying acoustical and perceptual criteria. Thus, this classification model is proposed to be used as a tool for a comprehensive urban soundscape evaluation. Because of the great complexity associated with the problem, two machine learning techniques, Support Vector Machines (SVM) and Support Vector Machines trained with Sequential Minimal Optimization (SMO), are implemented in developing model classification. The results indicate that the SMO model outperforms the SVM model in the specific task of soundscape classification. With the implementation of the SMO algorithm, the classification model achieves an outstanding performance (91.3% of instances correctly classified). © 2013 Elsevier B.V. All rights reserved.
Highly efficient classification and identification of human pathogenic bacteria by MALDI-TOF MS.

PubMed

Hsieh, Sen-Yung; Tseng, Chiao-Li; Lee, Yun-Shien; Kuo, An-Jing; Sun, Chien-Feng; Lin, Yen-Hsiu; Chen, Jen-Kun

2008-02-01

Accurate and rapid identification of pathogenic microorganisms is of critical importance in disease treatment and public health. Conventional work flows are time-consuming, and procedures are multifaceted. MS can be an alternative but is limited by low efficiency for amino acid sequencing as well as low reproducibility for spectrum fingerprinting. We systematically analyzed the feasibility of applying MS for rapid and accurate bacterial identification. Directly applying bacterial colonies without further protein extraction to MALDI-TOF MS analysis revealed rich peak contents and high reproducibility. The MS spectra derived from 57 isolates comprising six human pathogenic bacterial species were analyzed using both unsupervised hierarchical clustering and supervised model construction via the Genetic Algorithm. Hierarchical clustering analysis categorized the spectra into six groups precisely corresponding to the six bacterial species. Precise classification was also maintained in an independently prepared set of bacteria even when the numbers of m/z values were reduced to six. In parallel, classification models were constructed via Genetic Algorithm analysis. A model containing 18 m/z values accurately classified independently prepared bacteria and identified those species originally not used for model construction. Moreover bacteria fewer than 10(4) cells and different species in bacterial mixtures were identified using the classification model approach. In conclusion, the application of MALDI-TOF MS in combination with a suitable model construction provides a highly accurate method for bacterial classification and identification. The approach can identify bacteria with low abundance even in mixed flora, suggesting that a rapid and accurate bacterial identification using MS techniques even before culture can be attained in the near future.
Classification of wines according to their production regions with the contained trace elements using laser-induced breakdown spectroscopy

NASA Astrophysics Data System (ADS)

Tian, Ye; Yan, Chunhua; Zhang, Tianlong; Tang, Hongsheng; Li, Hua; Yu, Jialu; Bernard, Jérôme; Chen, Li; Martin, Serge; Delepine-Gilon, Nicole; Bocková, Jana; Veis, Pavel; Chen, Yanping; Yu, Jin

2017-09-01

Laser-induced breakdown spectroscopy (LIBS) has been applied to classify French wines according to their production regions. The use of the surface-assisted (or surface-enhanced) sample preparation method enabled a sub-ppm limit of detection (LOD), which led to the detection and identification of at least 22 metal and nonmetal elements in a typical wine sample including majors, minors and traces. An ensemble of 29 bottles of French wines, either red or white wines, from five production regions, Alsace, Bourgogne, Beaujolais, Bordeaux and Languedoc, was analyzed together with a wine from California, considered as an outlier. A non-supervised classification model based on principal component analysis (PCA) was first developed for the classification. The results showed a limited separation power of the model, which however allowed, in a step by step approach, to understand the physical reasons behind each step of sample separation and especially to observe the influence of the matrix effect in the sample classification. A supervised classification model was then developed based on random forest (RF), which is in addition a nonlinear algorithm. The obtained classification results were satisfactory with, when the parameters of the model were optimized, a classification accuracy of 100% for the tested samples. We especially discuss in the paper, the effect of spectrum normalization with an internal reference, the choice of input variables for the classification models and the optimization of parameters for the developed classification models.

Classification Consistency and Accuracy for Complex Assessments Using Item Response Theory

ERIC Educational Resources Information Center

Lee, Won-Chan

2010-01-01

In this article, procedures are described for estimating single-administration classification consistency and accuracy indices for complex assessments using item response theory (IRT). This IRT approach was applied to real test data comprising dichotomous and polytomous items. Several different IRT model combinations were considered. Comparisons…
Analyzing Student Inquiry Data Using Process Discovery and Sequence Classification

ERIC Educational Resources Information Center

Emond, Bruno; Buffett, Scott

2015-01-01

This paper reports on results of applying process discovery mining and sequence classification mining techniques to a data set of semi-structured learning activities. The main research objective is to advance educational data mining to model and support self-regulated learning in heterogeneous environments of learning content, activities, and…
Classification and Sequential Pattern Analysis for Improving Managerial Efficiency and Providing Better Medical Service in Public Healthcare Centers

PubMed Central

Chung, Sukhoon; Rhee, Hyunsill; Suh, Yongmoo

2010-01-01

Objectives This study sought to find answers to the following questions: 1) Can we predict whether a patient will revisit a healthcare center? 2) Can we anticipate diseases of patients who revisit the center? Methods For the first question, we applied 5 classification algorithms (decision tree, artificial neural network, logistic regression, Bayesian networks, and Naïve Bayes) and the stacking-bagging method for building classification models. To solve the second question, we performed sequential pattern analysis. Results We determined: 1) In general, the most influential variables which impact whether a patient of a public healthcare center will revisit it or not are personal burden, insurance bill, period of prescription, age, systolic pressure, name of disease, and postal code. 2) The best plain classification model is dependent on the dataset. 3) Based on average of classification accuracy, the proposed stacking-bagging method outperformed all traditional classification models and our sequential pattern analysis revealed 16 sequential patterns. Conclusions Classification models and sequential patterns can help public healthcare centers plan and implement healthcare service programs and businesses that are more appropriate to local residents, encouraging them to revisit public health centers. PMID:21818426
Estimation of Lithological Classification in Taipei Basin: A Bayesian Maximum Entropy Method

NASA Astrophysics Data System (ADS)

Wu, Meng-Ting; Lin, Yuan-Chien; Yu, Hwa-Lung

2015-04-01

In environmental or other scientific applications, we must have a certain understanding of geological lithological composition. Because of restrictions of real conditions, only limited amount of data can be acquired. To find out the lithological distribution in the study area, many spatial statistical methods used to estimate the lithological composition on unsampled points or grids. This study applied the Bayesian Maximum Entropy (BME method), which is an emerging method of the geological spatiotemporal statistics field. The BME method can identify the spatiotemporal correlation of the data, and combine not only the hard data but the soft data to improve estimation. The data of lithological classification is discrete categorical data. Therefore, this research applied Categorical BME to establish a complete three-dimensional Lithological estimation model. Apply the limited hard data from the cores and the soft data generated from the geological dating data and the virtual wells to estimate the three-dimensional lithological classification in Taipei Basin. Keywords: Categorical Bayesian Maximum Entropy method, Lithological Classification, Hydrogeological Setting
Utilizing Biological Models to Determine the Recruitment of the IRA by Modeling the Voting Behavior of Sinn Fein

DTIC Science & Technology

2006-03-01

models, the thesis applies a biological model, the Lotka - Volterra predator- prey model, to a highly suggestive case study, that of the Irish Republican...Model, Irish Republican Army, Sinn Féin, Lotka - Volterra Predator Prey Model, Recruitment, British Army 16. PRICE CODE 17. SECURITY CLASSIFICATION OF...weaknesses of sociological and biological models, the thesis applies a biological model, the Lotka - Volterra predator-prey model, to a highly suggestive
A Novel Approach to ECG Classification Based upon Two-Layered HMMs in Body Sensor Networks

PubMed Central

Liang, Wei; Zhang, Yinlong; Tan, Jindong; Li, Yang

2014-01-01

This paper presents a novel approach to ECG signal filtering and classification. Unlike the traditional techniques which aim at collecting and processing the ECG signals with the patient being still, lying in bed in hospitals, our proposed algorithm is intentionally designed for monitoring and classifying the patient's ECG signals in the free-living environment. The patients are equipped with wearable ambulatory devices the whole day, which facilitates the real-time heart attack detection. In ECG preprocessing, an integral-coefficient-band-stop (ICBS) filter is applied, which omits time-consuming floating-point computations. In addition, two-layered Hidden Markov Models (HMMs) are applied to achieve ECG feature extraction and classification. The periodic ECG waveforms are segmented into ISO intervals, P subwave, QRS complex and T subwave respectively in the first HMM layer where expert-annotation assisted Baum-Welch algorithm is utilized in HMM modeling. Then the corresponding interval features are selected and applied to categorize the ECG into normal type or abnormal type (PVC, APC) in the second HMM layer. For verifying the effectiveness of our algorithm on abnormal signal detection, we have developed an ECG body sensor network (BSN) platform, whereby real-time ECG signals are collected, transmitted, displayed and the corresponding classification outcomes are deduced and shown on the BSN screen. PMID:24681668
Classification of Parkinson's disease utilizing multi-edit nearest-neighbor and ensemble learning algorithms with speech samples.

PubMed

Zhang, He-Hua; Yang, Liuyang; Liu, Yuchuan; Wang, Pin; Yin, Jun; Li, Yongming; Qiu, Mingguo; Zhu, Xueru; Yan, Fang

2016-11-16

The use of speech based data in the classification of Parkinson disease (PD) has been shown to provide an effect, non-invasive mode of classification in recent years. Thus, there has been an increased interest in speech pattern analysis methods applicable to Parkinsonism for building predictive tele-diagnosis and tele-monitoring models. One of the obstacles in optimizing classifications is to reduce noise within the collected speech samples, thus ensuring better classification accuracy and stability. While the currently used methods are effect, the ability to invoke instance selection has been seldomly examined. In this study, a PD classification algorithm was proposed and examined that combines a multi-edit-nearest-neighbor (MENN) algorithm and an ensemble learning algorithm. First, the MENN algorithm is applied for selecting optimal training speech samples iteratively, thereby obtaining samples with high separability. Next, an ensemble learning algorithm, random forest (RF) or decorrelated neural network ensembles (DNNE), is used to generate trained samples from the collected training samples. Lastly, the trained ensemble learning algorithms are applied to the test samples for PD classification. This proposed method was examined using a more recently deposited public datasets and compared against other currently used algorithms for validation. Experimental results showed that the proposed algorithm obtained the highest degree of improved classification accuracy (29.44%) compared with the other algorithm that was examined. Furthermore, the MENN algorithm alone was found to improve classification accuracy by as much as 45.72%. Moreover, the proposed algorithm was found to exhibit a higher stability, particularly when combining the MENN and RF algorithms. This study showed that the proposed method could improve PD classification when using speech data and can be applied to future studies seeking to improve PD classification methods.
Remote sensing of Earth terrain

NASA Technical Reports Server (NTRS)

Kong, Jin AU; Shin, Robert T.; Nghiem, Son V.; Yueh, Herng-Aung; Han, Hsiu C.; Lim, Harold H.; Arnold, David V.

1990-01-01

Remote sensing of earth terrain is examined. The layered random medium model is used to investigate the fully polarimetric scattering of electromagnetic waves from vegetation. The model is used to interpret the measured data for vegetation fields such as rice, wheat, or soybean over water or soil. Accurate calibration of polarimetric radar systems is essential for the polarimetric remote sensing of earth terrain. A polarimetric calibration algorithm using three arbitrary in-scene reflectors is developed. In the interpretation of active and passive microwave remote sensing data from the earth terrain, the random medium model was shown to be quite successful. A multivariate K-distribution is proposed to model the statistics of fully polarimetric radar returns from earth terrain. In the terrain cover classification using the synthetic aperture radar (SAR) images, the applications of the K-distribution model will provide better performance than the conventional Gaussian classifiers. The layered random medium model is used to study the polarimetric response of sea ice. Supervised and unsupervised classification procedures are also developed and applied to synthetic aperture radar polarimetric images in order to identify their various earth terrain components for more than two classes. These classification procedures were applied to San Francisco Bay and Traverse City SAR images.
Integration of adaptive guided filtering, deep feature learning, and edge-detection techniques for hyperspectral image classification

NASA Astrophysics Data System (ADS)

Wan, Xiaoqing; Zhao, Chunhui; Gao, Bing

2017-11-01

The integration of an edge-preserving filtering technique in the classification of a hyperspectral image (HSI) has been proven effective in enhancing classification performance. This paper proposes an ensemble strategy for HSI classification using an edge-preserving filter along with a deep learning model and edge detection. First, an adaptive guided filter is applied to the original HSI to reduce the noise in degraded images and to extract powerful spectral-spatial features. Second, the extracted features are fed as input to a stacked sparse autoencoder to adaptively exploit more invariant and deep feature representations; then, a random forest classifier is applied to fine-tune the entire pretrained network and determine the classification output. Third, a Prewitt compass operator is further performed on the HSI to extract the edges of the first principal component after dimension reduction. Moreover, the regional growth rule is applied to the resulting edge logical image to determine the local region for each unlabeled pixel. Finally, the categories of the corresponding neighborhood samples are determined in the original classification map; then, the major voting mechanism is implemented to generate the final output. Extensive experiments proved that the proposed method achieves competitive performance compared with several traditional approaches.
Vesicular stomatitis forecasting based on Google Trends

PubMed Central

Lu, Yi; Zhou, GuangYa; Chen, Qin

2018-01-01

Background Vesicular stomatitis (VS) is an important viral disease of livestock. The main feature of VS is irregular blisters that occur on the lips, tongue, oral mucosa, hoof crown and nipple. Humans can also be infected with vesicular stomatitis and develop meningitis. This study analyses 2014 American VS outbreaks in order to accurately predict vesicular stomatitis outbreak trends. Methods American VS outbreaks data were collected from OIE. The data for VS keywords were obtained by inputting 24 disease-related keywords into Google Trends. After calculating the Pearson and Spearman correlation coefficients, it was found that there was a relationship between outbreaks and keywords derived from Google Trends. Finally, the predicted model was constructed based on qualitative classification and quantitative regression. Results For the regression model, the Pearson correlation coefficients between the predicted outbreaks and actual outbreaks are 0.953 and 0.948, respectively. For the qualitative classification model, we constructed five classification predictive models and chose the best classification predictive model as the result. The results showed, SN (sensitivity), SP (specificity) and ACC (prediction accuracy) values of the best classification predictive model are 78.52%,72.5% and 77.14%, respectively. Conclusion This study applied Google search data to construct a qualitative classification model and a quantitative regression model. The results show that the method is effective and that these two models obtain more accurate forecast. PMID:29385198
Theory and analysis of statistical discriminant techniques as applied to remote sensing data

NASA Technical Reports Server (NTRS)

Odell, P. L.

1973-01-01

Classification of remote earth resources sensing data according to normed exponential density statistics is reported. The use of density models appropriate for several physical situations provides an exact solution for the probabilities of classifications associated with the Bayes discriminant procedure even when the covariance matrices are unequal.
Passive polarimetric imagery-based material classification robust to illumination source position and viewpoint.

PubMed

Thilak Krishna, Thilakam Vimal; Creusere, Charles D; Voelz, David G

2011-01-01

Polarization, a property of light that conveys information about the transverse electric field orientation, complements other attributes of electromagnetic radiation such as intensity and frequency. Using multiple passive polarimetric images, we develop an iterative, model-based approach to estimate the complex index of refraction and apply it to target classification.
Modeling EEG Waveforms with Semi-Supervised Deep Belief Nets: Fast Classification and Anomaly Measurement

PubMed Central

Wulsin, D. F.; Gupta, J. R.; Mani, R.; Blanco, J. A.; Litt, B.

2011-01-01

Clinical electroencephalography (EEG) records vast amounts of human complex data yet is still reviewed primarily by human readers. Deep Belief Nets (DBNs) are a relatively new type of multi-layer neural network commonly tested on two-dimensional image data, but are rarely applied to times-series data such as EEG. We apply DBNs in a semi-supervised paradigm to model EEG waveforms for classification and anomaly detection. DBN performance was comparable to standard classifiers on our EEG dataset, and classification time was found to be 1.7 to 103.7 times faster than the other high-performing classifiers. We demonstrate how the unsupervised step of DBN learning produces an autoencoder that can naturally be used in anomaly measurement. We compare the use of raw, unprocessed data—a rarity in automated physiological waveform analysis—to hand-chosen features and find that raw data produces comparable classification and better anomaly measurement performance. These results indicate that DBNs and raw data inputs may be more effective for online automated EEG waveform recognition than other common techniques. PMID:21525569
On the difficulty to delimit disease risk hot spots

NASA Astrophysics Data System (ADS)

Charras-Garrido, M.; Azizi, L.; Forbes, F.; Doyle, S.; Peyrard, N.; Abrial, D.

2013-06-01

Representing the health state of a region is a helpful tool to highlight spatial heterogeneity and localize high risk areas. For ease of interpretation and to determine where to apply control procedures, we need to clearly identify and delineate homogeneous regions in terms of disease risk, and in particular disease risk hot spots. However, even if practical purposes require the delineation of different risk classes, such a classification does not correspond to a reality and is thus difficult to estimate. Working with grouped data, a first natural choice is to apply disease mapping models. We apply a usual disease mapping model, producing continuous estimations of the risks that requires a post-processing classification step to obtain clearly delimited risk zones. We also apply a risk partition model that build a classification of the risk levels in a one step procedure. Working with point data, we will focus on the scan statistic clustering method. We illustrate our article with a real example concerning the bovin spongiform encephalopathy (BSE) an animal disease whose zones at risk are well known by the epidemiologists. We show that in this difficult case of a rare disease and a very heterogeneous population, the different methods provide risk zones that are globally coherent. But, related to the dichotomy between the need and the reality, the exact delimitation of the risk zones, as well as the corresponding estimated risks are quite different.
Automatic classification of animal vocalizations

NASA Astrophysics Data System (ADS)

Clemins, Patrick J.

2005-11-01

Bioacoustics, the study of animal vocalizations, has begun to use increasingly sophisticated analysis techniques in recent years. Some common tasks in bioacoustics are repertoire determination, call detection, individual identification, stress detection, and behavior correlation. Each research study, however, uses a wide variety of different measured variables, called features, and classification systems to accomplish these tasks. The well-established field of human speech processing has developed a number of different techniques to perform many of the aforementioned bioacoustics tasks. Melfrequency cepstral coefficients (MFCCs) and perceptual linear prediction (PLP) coefficients are two popular feature sets. The hidden Markov model (HMM), a statistical model similar to a finite autonoma machine, is the most commonly used supervised classification model and is capable of modeling both temporal and spectral variations. This research designs a framework that applies models from human speech processing for bioacoustic analysis tasks. The development of the generalized perceptual linear prediction (gPLP) feature extraction model is one of the more important novel contributions of the framework. Perceptual information from the species under study can be incorporated into the gPLP feature extraction model to represent the vocalizations as the animals might perceive them. By including this perceptual information and modifying parameters of the HMM classification system, this framework can be applied to a wide range of species. The effectiveness of the framework is shown by analyzing African elephant and beluga whale vocalizations. The features extracted from the African elephant data are used as input to a supervised classification system and compared to results from traditional statistical tests. The gPLP features extracted from the beluga whale data are used in an unsupervised classification system and the results are compared to labels assigned by experts. The development of a framework from which to build animal vocalization classifiers will provide bioacoustics researchers with a consistent platform to analyze and classify vocalizations. A common framework will also allow studies to compare results across species and institutions. In addition, the use of automated classification techniques can speed analysis and uncover behavioral correlations not readily apparent using traditional techniques.
An information-based network approach for protein classification

PubMed Central

Wan, Xiaogeng; Zhao, Xin; Yau, Stephen S. T.

2017-01-01

Protein classification is one of the critical problems in bioinformatics. Early studies used geometric distances and polygenetic-tree to classify proteins. These methods use binary trees to present protein classification. In this paper, we propose a new protein classification method, whereby theories of information and networks are used to classify the multivariate relationships of proteins. In this study, protein universe is modeled as an undirected network, where proteins are classified according to their connections. Our method is unsupervised, multivariate, and alignment-free. It can be applied to the classification of both protein sequences and structures. Nine examples are used to demonstrate the efficiency of our new method. PMID:28350835
Real-Time Subject-Independent Pattern Classification of Overt and Covert Movements from fNIRS Signals

PubMed Central

Rana, Mohit; Prasad, Vinod A.; Guan, Cuntai; Birbaumer, Niels; Sitaram, Ranganatha

2016-01-01

Recently, studies have reported the use of Near Infrared Spectroscopy (NIRS) for developing Brain–Computer Interface (BCI) by applying online pattern classification of brain states from subject-specific fNIRS signals. The purpose of the present study was to develop and test a real-time method for subject-specific and subject-independent classification of multi-channel fNIRS signals using support-vector machines (SVM), so as to determine its feasibility as an online neurofeedback system. Towards this goal, we used left versus right hand movement execution and movement imagery as study paradigms in a series of experiments. In the first two experiments, activations in the motor cortex during movement execution and movement imagery were used to develop subject-dependent models that obtained high classification accuracies thereby indicating the robustness of our classification method. In the third experiment, a generalized classifier-model was developed from the first two experimental data, which was then applied for subject-independent neurofeedback training. Application of this method in new participants showed mean classification accuracy of 63% for movement imagery tasks and 80% for movement execution tasks. These results, and their corresponding offline analysis reported in this study demonstrate that SVM based real-time subject-independent classification of fNIRS signals is feasible. This method has important applications in the field of hemodynamic BCIs, and neuro-rehabilitation where patients can be trained to learn spatio-temporal patterns of healthy brain activity. PMID:27467528
Classification Model for Damage Localization in a Plate Structure

NASA Astrophysics Data System (ADS)

Janeliukstis, R.; Ruchevskis, S.; Chate, A.

2018-01-01

The present study is devoted to the problem of damage localization by means of data classification. The commercial ANSYS finite-elements program was used to make a model of a cantilevered composite plate equipped with numerous strain sensors. The plate was divided into zones, and, for data classification purposes, each of them housed several points to which a point mass of magnitude 5 and 10% of plate mass was applied. At each of these points, a numerical modal analysis was performed, from which the first few natural frequencies and strain readings were extracted. The strain data for every point were the input for a classification procedure involving k nearest neighbors and decision trees. The classification model was trained and optimized by finetuning the key parameters of both algorithms. Finally, two new query points were simulated and subjected to a classification in terms of assigning a label to one of the zones of the plate, thus localizing these points. Damage localization results were compared for both algorithms and were found to be in good agreement with the actual application positions of point load.
Optimizing spectral CT parameters for material classification tasks

NASA Astrophysics Data System (ADS)

Rigie, D. S.; La Rivière, P. J.

2016-06-01

In this work, we propose a framework for optimizing spectral CT imaging parameters and hardware design with regard to material classification tasks. Compared with conventional CT, many more parameters must be considered when designing spectral CT systems and protocols. These choices will impact material classification performance in a non-obvious, task-dependent way with direct implications for radiation dose reduction. In light of this, we adapt Hotelling Observer formalisms typically applied to signal detection tasks to the spectral CT, material-classification problem. The result is a rapidly computable metric that makes it possible to sweep out many system configurations, generating parameter optimization curves (POC’s) that can be used to select optimal settings. The proposed model avoids restrictive assumptions about the basis-material decomposition (e.g. linearity) and incorporates signal uncertainty with a stochastic object model. This technique is demonstrated on dual-kVp and photon-counting systems for two different, clinically motivated material classification tasks (kidney stone classification and plaque removal). We show that the POC’s predicted with the proposed analytic model agree well with those derived from computationally intensive numerical simulation studies.
Optimizing Spectral CT Parameters for Material Classification Tasks

PubMed Central

Rigie, D. S.; La Rivière, P. J.

2017-01-01

In this work, we propose a framework for optimizing spectral CT imaging parameters and hardware design with regard to material classification tasks. Compared with conventional CT, many more parameters must be considered when designing spectral CT systems and protocols. These choices will impact material classification performance in a non-obvious, task-dependent way with direct implications for radiation dose reduction. In light of this, we adapt Hotelling Observer formalisms typically applied to signal detection tasks to the spectral CT, material-classification problem. The result is a rapidly computable metric that makes it possible to sweep out many system configurations, generating parameter optimization curves (POC’s) that can be used to select optimal settings. The proposed model avoids restrictive assumptions about the basis-material decomposition (e.g. linearity) and incorporates signal uncertainty with a stochastic object model. This technique is demonstrated on dual-kVp and photon-counting systems for two different, clinically motivated material classification tasks (kidney stone classification and plaque removal). We show that the POC’s predicted with the proposed analytic model agree well with those derived from computationally intensive numerical simulation studies. PMID:27227430

14 CFR 1203.501 - Applying derivative classification markings.

Code of Federal Regulations, 2011 CFR

2011-01-01

... INFORMATION SECURITY PROGRAM Derivative Classification § 1203.501 Applying derivative classification markings... classification decisions: (b) Verify the information's current level of classification so far as practicable...
14 CFR 1203.501 - Applying derivative classification markings.

Code of Federal Regulations, 2010 CFR

2010-01-01

... INFORMATION SECURITY PROGRAM Derivative Classification § 1203.501 Applying derivative classification markings... classification decisions: (b) Verify the information's current level of classification so far as practicable...
Fuzzy support vector machine: an efficient rule-based classification technique for microarrays.

PubMed

Hajiloo, Mohsen; Rabiee, Hamid R; Anooshahpour, Mahdi

2013-01-01

The abundance of gene expression microarray data has led to the development of machine learning algorithms applicable for tackling disease diagnosis, disease prognosis, and treatment selection problems. However, these algorithms often produce classifiers with weaknesses in terms of accuracy, robustness, and interpretability. This paper introduces fuzzy support vector machine which is a learning algorithm based on combination of fuzzy classifiers and kernel machines for microarray classification. Experimental results on public leukemia, prostate, and colon cancer datasets show that fuzzy support vector machine applied in combination with filter or wrapper feature selection methods develops a robust model with higher accuracy than the conventional microarray classification models such as support vector machine, artificial neural network, decision trees, k nearest neighbors, and diagonal linear discriminant analysis. Furthermore, the interpretable rule-base inferred from fuzzy support vector machine helps extracting biological knowledge from microarray data. Fuzzy support vector machine as a new classification model with high generalization power, robustness, and good interpretability seems to be a promising tool for gene expression microarray classification.
Pattern classification using an olfactory model with PCA feature selection in electronic noses: study and application.

PubMed

Fu, Jun; Huang, Canqin; Xing, Jianguo; Zheng, Junbao

2012-01-01

Biologically-inspired models and algorithms are considered as promising sensor array signal processing methods for electronic noses. Feature selection is one of the most important issues for developing robust pattern recognition models in machine learning. This paper describes an investigation into the classification performance of a bionic olfactory model with the increase of the dimensions of input feature vector (outer factor) as well as its parallel channels (inner factor). The principal component analysis technique was applied for feature selection and dimension reduction. Two data sets of three classes of wine derived from different cultivars and five classes of green tea derived from five different provinces of China were used for experiments. In the former case the results showed that the average correct classification rate increased as more principal components were put in to feature vector. In the latter case the results showed that sufficient parallel channels should be reserved in the model to avoid pattern space crowding. We concluded that 6~8 channels of the model with principal component feature vector values of at least 90% cumulative variance is adequate for a classification task of 3~5 pattern classes considering the trade-off between time consumption and classification rate.
Extension of Companion Modeling Using Classification Learning

NASA Astrophysics Data System (ADS)

Torii, Daisuke; Bousquet, François; Ishida, Toru

Companion Modeling is a methodology of refining initial models for understanding reality through a role-playing game (RPG) and a multiagent simulation. In this research, we propose a novel agent model construction methodology in which classification learning is applied to the RPG log data in Companion Modeling. This methodology enables a systematic model construction that handles multi-parameters, independent of the modelers ability. There are three problems in applying classification learning to the RPG log data: 1) It is difficult to gather enough data for the number of features because the cost of gathering data is high. 2) Noise data can affect the learning results because the amount of data may be insufficient. 3) The learning results should be explained as a human decision making model and should be recognized by the expert as being the result that reflects reality. We realized an agent model construction system using the following two approaches: 1) Using a feature selction method, the feature subset that has the best prediction accuracy is identified. In this process, the important features chosen by the expert are always included. 2) The expert eliminates irrelevant features from the learning results after evaluating the learning model through a visualization of the results. Finally, using the RPG log data from the Companion Modeling of agricultural economics in northeastern Thailand, we confirm the capability of this methodology.
Traffic sign classification with dataset augmentation and convolutional neural network

NASA Astrophysics Data System (ADS)

Tang, Qing; Kurnianggoro, Laksono; Jo, Kang-Hyun

2018-04-01

This paper presents a method for traffic sign classification using a convolutional neural network (CNN). In this method, firstly we transfer a color image into grayscale, and then normalize it in the range (-1,1) as the preprocessing step. To increase robustness of classification model, we apply a dataset augmentation algorithm and create new images to train the model. To avoid overfitting, we utilize a dropout module before the last fully connection layer. To assess the performance of the proposed method, the German traffic sign recognition benchmark (GTSRB) dataset is utilized. Experimental results show that the method is effective in classifying traffic signs.
A model-based test for treatment effects with probabilistic classifications.

PubMed

Cavagnaro, Daniel R; Davis-Stober, Clintin P

2018-05-21

Within modern psychology, computational and statistical models play an important role in describing a wide variety of human behavior. Model selection analyses are typically used to classify individuals according to the model(s) that best describe their behavior. These classifications are inherently probabilistic, which presents challenges for performing group-level analyses, such as quantifying the effect of an experimental manipulation. We answer this challenge by presenting a method for quantifying treatment effects in terms of distributional changes in model-based (i.e., probabilistic) classifications across treatment conditions. The method uses hierarchical Bayesian mixture modeling to incorporate classification uncertainty at the individual level into the test for a treatment effect at the group level. We illustrate the method with several worked examples, including a reanalysis of the data from Kellen, Mata, and Davis-Stober (2017), and analyze its performance more generally through simulation studies. Our simulations show that the method is both more powerful and less prone to type-1 errors than Fisher's exact test when classifications are uncertain. In the special case where classifications are deterministic, we find a near-perfect power-law relationship between the Bayes factor, derived from our method, and the p value obtained from Fisher's exact test. We provide code in an online supplement that allows researchers to apply the method to their own data. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Scattering property based contextual PolSAR speckle filter

NASA Astrophysics Data System (ADS)

Mullissa, Adugna G.; Tolpekin, Valentyn; Stein, Alfred

2017-12-01

Reliability of the scattering model based polarimetric SAR (PolSAR) speckle filter depends upon the accurate decomposition and classification of the scattering mechanisms. This paper presents an improved scattering property based contextual speckle filter based upon an iterative classification of the scattering mechanisms. It applies a Cloude-Pottier eigenvalue-eigenvector decomposition and a fuzzy H/α classification to determine the scattering mechanisms on a pre-estimate of the coherency matrix. The H/α classification identifies pixels with homogeneous scattering properties. A coarse pixel selection rule groups pixels that are either single bounce, double bounce or volume scatterers. A fine pixel selection rule is applied to pixels within each canonical scattering mechanism. We filter the PolSAR data and depending on the type of image scene (urban or rural) use either the coarse or fine pixel selection rule. Iterative refinement of the Wishart H/α classification reduces the speckle in the PolSAR data. Effectiveness of this new filter is demonstrated by using both simulated and real PolSAR data. It is compared with the refined Lee filter, the scattering model based filter and the non-local means filter. The study concludes that the proposed filter compares favorably with other polarimetric speckle filters in preserving polarimetric information, point scatterers and subtle features in PolSAR data.
Evaluating the statistical performance of less applied algorithms in classification of worldview-3 imagery data in an urbanized landscape

NASA Astrophysics Data System (ADS)

Ranaie, Mehrdad; Soffianian, Alireza; Pourmanafi, Saeid; Mirghaffari, Noorollah; Tarkesh, Mostafa

2018-03-01

In recent decade, analyzing the remotely sensed imagery is considered as one of the most common and widely used procedures in the environmental studies. In this case, supervised image classification techniques play a central role. Hence, taking a high resolution Worldview-3 over a mixed urbanized landscape in Iran, three less applied image classification methods including Bagged CART, Stochastic gradient boosting model and Neural network with feature extraction were tested and compared with two prevalent methods: random forest and support vector machine with linear kernel. To do so, each method was run ten time and three validation techniques was used to estimate the accuracy statistics consist of cross validation, independent validation and validation with total of train data. Moreover, using ANOVA and Tukey test, statistical difference significance between the classification methods was significantly surveyed. In general, the results showed that random forest with marginal difference compared to Bagged CART and stochastic gradient boosting model is the best performing method whilst based on independent validation there was no significant difference between the performances of classification methods. It should be finally noted that neural network with feature extraction and linear support vector machine had better processing speed than other.
Evaluation of image features and classification methods for Barrett's cancer detection using VLE imaging

NASA Astrophysics Data System (ADS)

Klomp, Sander; van der Sommen, Fons; Swager, Anne-Fré; Zinger, Svitlana; Schoon, Erik J.; Curvers, Wouter L.; Bergman, Jacques J.; de With, Peter H. N.

2017-03-01

Volumetric Laser Endomicroscopy (VLE) is a promising technique for the detection of early neoplasia in Barrett's Esophagus (BE). VLE generates hundreds of high resolution, grayscale, cross-sectional images of the esophagus. However, at present, classifying these images is a time consuming and cumbersome effort performed by an expert using a clinical prediction model. This paper explores the feasibility of using computer vision techniques to accurately predict the presence of dysplastic tissue in VLE BE images. Our contribution is threefold. First, a benchmarking is performed for widely applied machine learning techniques and feature extraction methods. Second, three new features based on the clinical detection model are proposed, having superior classification accuracy and speed, compared to earlier work. Third, we evaluate automated parameter tuning by applying simple grid search and feature selection methods. The results are evaluated on a clinically validated dataset of 30 dysplastic and 30 non-dysplastic VLE images. Optimal classification accuracy is obtained by applying a support vector machine and using our modified Haralick features and optimal image cropping, obtaining an area under the receiver operating characteristic of 0.95 compared to the clinical prediction model at 0.81. Optimal execution time is achieved using a proposed mean and median feature, which is extracted at least factor 2.5 faster than alternative features with comparable performance.
ASTM clustering for improving coal analysis by near-infrared spectroscopy.

PubMed

Andrés, J M; Bona, M T

2006-11-15

Multivariate analysis techniques have been applied to near-infrared (NIR) spectra coals to investigate the relationship between nine coal properties (moisture (%), ash (%), volatile matter (%), fixed carbon (%), heating value (kcal/kg), carbon (%), hydrogen (%), nitrogen (%) and sulphur (%)) and the corresponding predictor variables. In this work, a whole set of coal samples was grouped into six more homogeneous clusters following the ASTM reference method for classification prior to the application of calibration methods to each coal set. The results obtained showed a considerable improvement of the error determination compared with the calibration for the whole sample set. For some groups, the established calibrations approached the quality required by the ASTM/ISO norms for laboratory analysis. To predict property values for a new coal sample it is necessary the assignation of that sample to its respective group. Thus, the discrimination and classification ability of coal samples by Diffuse Reflectance Infrared Fourier Transform Spectroscopy (DRIFTS) in the NIR range was also studied by applying Soft Independent Modelling of Class Analogy (SIMCA) and Linear Discriminant Analysis (LDA) techniques. Modelling of the groups by SIMCA led to overlapping models that cannot discriminate for unique classification. On the other hand, the application of Linear Discriminant Analysis improved the classification of the samples but not enough to be satisfactory for every group considered.
Using landscape limnology to classify freshwater ecosystems for multi-ecosystem management and conservation

USGS Publications Warehouse

Soranno, Patricia A.; Cheruvelil, Kendra Spence; Webster, Katherine E.; Bremigan, Mary T.; Wagner, Tyler; Stow, Craig A.

2010-01-01

Governmental entities are responsible for managing and conserving large numbers of lake, river, and wetland ecosystems that can be addressed only rarely on a case-by-case basis. We present a system for predictive classification modeling, grounded in the theoretical foundation of landscape limnology, that creates a tractable number of ecosystem classes to which management actions may be tailored. We demonstrate our system by applying two types of predictive classification modeling approaches to develop nutrient criteria for eutrophication management in 1998 north temperate lakes. Our predictive classification system promotes the effective management of multiple ecosystems across broad geographic scales by explicitly connecting management and conservation goals to the classification modeling approach, considering multiple spatial scales as drivers of ecosystem dynamics, and acknowledging the hierarchical structure of freshwater ecosystems. Such a system is critical for adaptive management of complex mosaics of freshwater ecosystems and for balancing competing needs for ecosystem services in a changing world.
Land Covers Classification Based on Random Forest Method Using Features from Full-Waveform LIDAR Data

NASA Astrophysics Data System (ADS)

Ma, L.; Zhou, M.; Li, C.

2017-09-01

In this study, a Random Forest (RF) based land covers classification method is presented to predict the types of land covers in Miyun area. The returned full-waveforms which were acquired by a LiteMapper 5600 airborne LiDAR system were processed, including waveform filtering, waveform decomposition and features extraction. The commonly used features that were distance, intensity, Full Width at Half Maximum (FWHM), skewness and kurtosis were extracted. These waveform features were used as attributes of training data for generating the RF prediction model. The RF prediction model was applied to predict the types of land covers in Miyun area as trees, buildings, farmland and ground. The classification results of these four types of land covers were obtained according to the ground truth information acquired from CCD image data of the same region. The RF classification results were compared with that of SVM method and show better results. The RF classification accuracy reached 89.73% and the classification Kappa was 0.8631.
ATLS Hypovolemic Shock Classification by Prediction of Blood Loss in Rats Using Regression Models.

PubMed

Choi, Soo Beom; Choi, Joon Yul; Park, Jee Soo; Kim, Deok Won

2016-07-01

In our previous study, our input data set consisted of 78 rats, the blood loss in percent as a dependent variable, and 11 independent variables (heart rate, systolic blood pressure, diastolic blood pressure, mean arterial pressure, pulse pressure, respiration rate, temperature, perfusion index, lactate concentration, shock index, and new index (lactate concentration/perfusion)). The machine learning methods for multicategory classification were applied to a rat model in acute hemorrhage to predict the four Advanced Trauma Life Support (ATLS) hypovolemic shock classes for triage in our previous study. However, multicategory classification is much more difficult and complicated than binary classification. We introduce a simple approach for classifying ATLS hypovolaemic shock class by predicting blood loss in percent using support vector regression and multivariate linear regression (MLR). We also compared the performance of the classification models using absolute and relative vital signs. The accuracies of support vector regression and MLR models with relative values by predicting blood loss in percent were 88.5% and 84.6%, respectively. These were better than the best accuracy of 80.8% of the direct multicategory classification using the support vector machine one-versus-one model in our previous study for the same validation data set. Moreover, the simple MLR models with both absolute and relative values could provide possibility of the future clinical decision support system for ATLS classification. The perfusion index and new index were more appropriate with relative changes than absolute values.
Statistical Analysis of Q-matrix Based Diagnostic Classification Models

PubMed Central

Chen, Yunxiao; Liu, Jingchen; Xu, Gongjun; Ying, Zhiliang

2014-01-01

Diagnostic classification models have recently gained prominence in educational assessment, psychiatric evaluation, and many other disciplines. Central to the model specification is the so-called Q-matrix that provides a qualitative specification of the item-attribute relationship. In this paper, we develop theories on the identifiability for the Q-matrix under the DINA and the DINO models. We further propose an estimation procedure for the Q-matrix through the regularized maximum likelihood. The applicability of this procedure is not limited to the DINA or the DINO model and it can be applied to essentially all Q-matrix based diagnostic classification models. Simulation studies are conducted to illustrate its performance. Furthermore, two case studies are presented. The first case is a data set on fraction subtraction (educational application) and the second case is a subsample of the National Epidemiological Survey on Alcohol and Related Conditions concerning the social anxiety disorder (psychiatric application). PMID:26294801
Multi-Layer Identification of Highly-Potent ABCA1 Up-Regulators Targeting LXRβ Using Multiple QSAR Modeling, Structural Similarity Analysis, and Molecular Docking.

PubMed

Chen, Meimei; Yang, Fafu; Kang, Jie; Yang, Xuemei; Lai, Xinmei; Gao, Yuxing

2016-11-29

In this study, in silico approaches, including multiple QSAR modeling, structural similarity analysis, and molecular docking, were applied to develop QSAR classification models as a fast screening tool for identifying highly-potent ABCA1 up-regulators targeting LXRβ based on a series of new flavonoids. Initially, four modeling approaches, including linear discriminant analysis, support vector machine, radial basis function neural network, and classification and regression trees, were applied to construct different QSAR classification models. The statistics results indicated that these four kinds of QSAR models were powerful tools for screening highly potent ABCA1 up-regulators. Then, a consensus QSAR model was developed by combining the predictions from these four models. To discover new ABCA1 up-regulators at maximum accuracy, the compounds in the ZINC database that fulfilled the requirement of structural similarity of 0.7 compared to known potent ABCA1 up-regulator were subjected to the consensus QSAR model, which led to the discovery of 50 compounds. Finally, they were docked into the LXRβ binding site to understand their role in up-regulating ABCA1 expression. The excellent binding modes and docking scores of 10 hit compounds suggested they were highly-potent ABCA1 up-regulators targeting LXRβ. Overall, this study provided an effective strategy to discover highly potent ABCA1 up-regulators.
Property Specification Patterns for intelligence building software

NASA Astrophysics Data System (ADS)

Chun, Seungsu

2018-03-01

In this paper, through the property specification pattern research for Modal MU(μ) logical aspects present a single framework based on the pattern of intelligence building software. In this study, broken down by state property specification pattern classification of Dwyer (S) and action (A) and was subdivided into it again strong (A) and weaknesses (E). Through these means based on a hierarchical pattern classification of the property specification pattern analysis of logical aspects Mu(μ) was applied to the pattern classification of the examples used in the actual model checker. As a result, not only can a more accurate classification than the existing classification systems were easy to create and understand the attributes specified.
Applied Chaos Level Test for Validation of Signal Conditions Underlying Optimal Performance of Voice Classification Methods

ERIC Educational Resources Information Center

Liu, Boquan; Polce, Evan; Sprott, Julien C.; Jiang, Jack J.

2018-01-01

Purpose: The purpose of this study is to introduce a chaos level test to evaluate linear and nonlinear voice type classification method performances under varying signal chaos conditions without subjective impression. Study Design: Voice signals were constructed with differing degrees of noise to model signal chaos. Within each noise power, 100…
Predicting 30-day Hospital Readmission with Publicly Available Administrative Database. A Conditional Logistic Regression Modeling Approach.

PubMed

Zhu, K; Lou, Z; Zhou, J; Ballester, N; Kong, N; Parikh, P

2015-01-01

This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". Hospital readmissions raise healthcare costs and cause significant distress to providers and patients. It is, therefore, of great interest to healthcare organizations to predict what patients are at risk to be readmitted to their hospitals. However, current logistic regression based risk prediction models have limited prediction power when applied to hospital administrative data. Meanwhile, although decision trees and random forests have been applied, they tend to be too complex to understand among the hospital practitioners. Explore the use of conditional logistic regression to increase the prediction accuracy. We analyzed an HCUP statewide inpatient discharge record dataset, which includes patient demographics, clinical and care utilization data from California. We extracted records of heart failure Medicare beneficiaries who had inpatient experience during an 11-month period. We corrected the data imbalance issue with under-sampling. In our study, we first applied standard logistic regression and decision tree to obtain influential variables and derive practically meaning decision rules. We then stratified the original data set accordingly and applied logistic regression on each data stratum. We further explored the effect of interacting variables in the logistic regression modeling. We conducted cross validation to assess the overall prediction performance of conditional logistic regression (CLR) and compared it with standard classification models. The developed CLR models outperformed several standard classification models (e.g., straightforward logistic regression, stepwise logistic regression, random forest, support vector machine). For example, the best CLR model improved the classification accuracy by nearly 20% over the straightforward logistic regression model. Furthermore, the developed CLR models tend to achieve better sensitivity of more than 10% over the standard classification models, which can be translated to correct labeling of additional 400 - 500 readmissions for heart failure patients in the state of California over a year. Lastly, several key predictor identified from the HCUP data include the disposition location from discharge, the number of chronic conditions, and the number of acute procedures. It would be beneficial to apply simple decision rules obtained from the decision tree in an ad-hoc manner to guide the cohort stratification. It could be potentially beneficial to explore the effect of pairwise interactions between influential predictors when building the logistic regression models for different data strata. Judicious use of the ad-hoc CLR models developed offers insights into future development of prediction models for hospital readmissions, which can lead to better intuition in identifying high-risk patients and developing effective post-discharge care strategies. Lastly, this paper is expected to raise the awareness of collecting data on additional markers and developing necessary database infrastructure for larger-scale exploratory studies on readmission risk prediction.
Hierarchical relaxation methods for multispectral pixel classification as applied to target identification

NASA Astrophysics Data System (ADS)

Cohen, E. A., Jr.

1985-02-01

This report provides insights into the approaches toward image modeling as applied to target detection. The approach is that of examining the energy in prescribed wave-bands which emanate from a target and correlating the emissions. Typically, one might be looking at two or three infrared bands, possibly together with several visual bands. The target is segmented, using both first and second order modeling, into a set of interesting components and these components are correlated so as to enhance the classification process. A Markov-type model is used to provide an a priori assessment of the spatial relationships among critical parts of the target, and a stochastic model using the output of an initial probabilistic labeling is invoked. The tradeoff between this stochastic model and the Markov model is then optimized to yield a best labeling for identification purposes. In an identification of friend or foe (IFF) context, this methodology could be of interest, for it provides the ingredients for such a higher level of understanding.

Spectral-spatial classification of hyperspectral image using three-dimensional convolution network

NASA Astrophysics Data System (ADS)

Liu, Bing; Yu, Xuchu; Zhang, Pengqiang; Tan, Xiong; Wang, Ruirui; Zhi, Lu

2018-01-01

Recently, hyperspectral image (HSI) classification has become a focus of research. However, the complex structure of an HSI makes feature extraction difficult to achieve. Most current methods build classifiers based on complex handcrafted features computed from the raw inputs. The design of an improved 3-D convolutional neural network (3D-CNN) model for HSI classification is described. This model extracts features from both the spectral and spatial dimensions through the application of 3-D convolutions, thereby capturing the important discrimination information encoded in multiple adjacent bands. The designed model views the HSI cube data altogether without relying on any pre- or postprocessing. In addition, the model is trained in an end-to-end fashion without any handcrafted features. The designed model was applied to three widely used HSI datasets. The experimental results demonstrate that the 3D-CNN-based method outperforms conventional methods even with limited labeled training samples.
Pattern Classification Using an Olfactory Model with PCA Feature Selection in Electronic Noses: Study and Application

PubMed Central

Fu, Jun; Huang, Canqin; Xing, Jianguo; Zheng, Junbao

2012-01-01

Biologically-inspired models and algorithms are considered as promising sensor array signal processing methods for electronic noses. Feature selection is one of the most important issues for developing robust pattern recognition models in machine learning. This paper describes an investigation into the classification performance of a bionic olfactory model with the increase of the dimensions of input feature vector (outer factor) as well as its parallel channels (inner factor). The principal component analysis technique was applied for feature selection and dimension reduction. Two data sets of three classes of wine derived from different cultivars and five classes of green tea derived from five different provinces of China were used for experiments. In the former case the results showed that the average correct classification rate increased as more principal components were put in to feature vector. In the latter case the results showed that sufficient parallel channels should be reserved in the model to avoid pattern space crowding. We concluded that 6∼8 channels of the model with principal component feature vector values of at least 90% cumulative variance is adequate for a classification task of 3∼5 pattern classes considering the trade-off between time consumption and classification rate. PMID:22736979
Optimal land use/cover classification using remote sensing imagery for hydrological modelling in a Himalayan watershed

NASA Astrophysics Data System (ADS)

Saran, Sameer; Sterk, Geert; Kumar, Suresh

2007-10-01

Land use/cover is an important watershed surface characteristic that affects surface runoff and erosion. Many of the available hydrological models divide the watershed into Hydrological Response Units (HRU), which are spatial units with expected similar hydrological behaviours. The division into HRU's requires good-quality spatial data on land use/cover. This paper presents different approaches to attain an optimal land use/cover map based on remote sensing imagery for a Himalayan watershed in northern India. First digital classifications using maximum likelihood classifier (MLC) and a decision tree classifier were applied. The results obtained from the decision tree were better and even improved after post classification sorting. But the obtained land use/cover map was not sufficient for the delineation of HRUs, since the agricultural land use/cover class did not discriminate between the two major crops in the area i.e. paddy and maize. Therefore we adopted a visual classification approach using optical data alone and also fused with ENVISAT ASAR data. This second step with detailed classification system resulted into better classification accuracy within the 'agricultural land' class which will be further combined with topography and soil type to derive HRU's for physically-based hydrological modelling.
Populations, Natural Selection, and Applied Organizational Science.

ERIC Educational Resources Information Center

McKelvey, Bill; Aldrich, Howard

1983-01-01

Deficiencies in existing models in organizational science may be remedied by applying the population approach, with its concepts of taxonomy, classification, evolution, and population ecology; and natural selection theory, with its principles of variation, natural selection, heredity, and struggle for existence, to the idea of organizational forms…
Multispectral LiDAR Data for Land Cover Classification of Urban Areas

PubMed Central

Morsy, Salem; Shaker, Ahmed; El-Rabbany, Ahmed

2017-01-01

Airborne Light Detection And Ranging (LiDAR) systems usually operate at a monochromatic wavelength measuring the range and the strength of the reflected energy (intensity) from objects. Recently, multispectral LiDAR sensors, which acquire data at different wavelengths, have emerged. This allows for recording of a diversity of spectral reflectance from objects. In this context, we aim to investigate the use of multispectral LiDAR data in land cover classification using two different techniques. The first is image-based classification, where intensity and height images are created from LiDAR points and then a maximum likelihood classifier is applied. The second is point-based classification, where ground filtering and Normalized Difference Vegetation Indices (NDVIs) computation are conducted. A dataset of an urban area located in Oshawa, Ontario, Canada, is classified into four classes: buildings, trees, roads and grass. An overall accuracy of up to 89.9% and 92.7% is achieved from image classification and 3D point classification, respectively. A radiometric correction model is also applied to the intensity data in order to remove the attenuation due to the system distortion and terrain height variation. The classification process is then repeated, and the results demonstrate that there are no significant improvements achieved in the overall accuracy. PMID:28445432
Multispectral LiDAR Data for Land Cover Classification of Urban Areas.

PubMed

Morsy, Salem; Shaker, Ahmed; El-Rabbany, Ahmed

2017-04-26

Airborne Light Detection And Ranging (LiDAR) systems usually operate at a monochromatic wavelength measuring the range and the strength of the reflected energy (intensity) from objects. Recently, multispectral LiDAR sensors, which acquire data at different wavelengths, have emerged. This allows for recording of a diversity of spectral reflectance from objects. In this context, we aim to investigate the use of multispectral LiDAR data in land cover classification using two different techniques. The first is image-based classification, where intensity and height images are created from LiDAR points and then a maximum likelihood classifier is applied. The second is point-based classification, where ground filtering and Normalized Difference Vegetation Indices (NDVIs) computation are conducted. A dataset of an urban area located in Oshawa, Ontario, Canada, is classified into four classes: buildings, trees, roads and grass. An overall accuracy of up to 89.9% and 92.7% is achieved from image classification and 3D point classification, respectively. A radiometric correction model is also applied to the intensity data in order to remove the attenuation due to the system distortion and terrain height variation. The classification process is then repeated, and the results demonstrate that there are no significant improvements achieved in the overall accuracy.
Mining Predictors of Success in Air Force Flight Training Regiments via Semantic Analysis of Instructor Evaluations

DTIC Science & Technology

2018-03-01

We apply our methodology to the criticism text written in the flight-training program student evaluations in order to construct a model that...factors. We apply our methodology to the criticism text written in the flight-training program student evaluations in order to construct a model...9 D. BINARY CLASSIFICATION AND FEATURE SELECTION ..........11 III. METHODOLOGY
Remote Sensing Image Classification Applied to the First National Geographical Information Census of China

NASA Astrophysics Data System (ADS)

Yu, Xin; Wen, Zongyong; Zhu, Zhaorong; Xia, Qiang; Shun, Lan

2016-06-01

Image classification will still be a long way in the future, although it has gone almost half a century. In fact, researchers have gained many fruits in the image classification domain, but there is still a long distance between theory and practice. However, some new methods in the artificial intelligence domain will be absorbed into the image classification domain and draw on the strength of each to offset the weakness of the other, which will open up a new prospect. Usually, networks play the role of a high-level language, as is seen in Artificial Intelligence and statistics, because networks are used to build complex model from simple components. These years, Bayesian Networks, one of probabilistic networks, are a powerful data mining technique for handling uncertainty in complex domains. In this paper, we apply Tree Augmented Naive Bayesian Networks (TAN) to texture classification of High-resolution remote sensing images and put up a new method to construct the network topology structure in terms of training accuracy based on the training samples. Since 2013, China government has started the first national geographical information census project, which mainly interprets geographical information based on high-resolution remote sensing images. Therefore, this paper tries to apply Bayesian network to remote sensing image classification, in order to improve image interpretation in the first national geographical information census project. In the experiment, we choose some remote sensing images in Beijing. Experimental results demonstrate TAN outperform than Naive Bayesian Classifier (NBC) and Maximum Likelihood Classification Method (MLC) in the overall classification accuracy. In addition, the proposed method can reduce the workload of field workers and improve the work efficiency. Although it is time consuming, it will be an attractive and effective method for assisting office operation of image interpretation.
Identification of an Efficient Gene Expression Panel for Glioblastoma Classification

PubMed Central

Zelaya, Ivette; Laks, Dan R.; Zhao, Yining; Kawaguchi, Riki; Gao, Fuying; Kornblum, Harley I.; Coppola, Giovanni

2016-01-01

We present here a novel genetic algorithm-based random forest (GARF) modeling technique that enables a reduction in the complexity of large gene disease signatures to highly accurate, greatly simplified gene panels. When applied to 803 glioblastoma multiforme samples, this method allowed the 840-gene Verhaak et al. gene panel (the standard in the field) to be reduced to a 48-gene classifier, while retaining 90.91% classification accuracy, and outperforming the best available alternative methods. Additionally, using this approach we produced a 32-gene panel which allows for better consistency between RNA-seq and microarray-based classifications, improving cross-platform classification retention from 69.67% to 86.07%. A webpage producing these classifications is available at http://simplegbm.semel.ucla.edu. PMID:27855170
Texture classification using autoregressive filtering

NASA Technical Reports Server (NTRS)

Lawton, W. M.; Lee, M.

1984-01-01

A general theory of image texture models is proposed and its applicability to the problem of scene segmentation using texture classification is discussed. An algorithm, based on half-plane autoregressive filtering, which optimally utilizes second order statistics to discriminate between texture classes represented by arbitrary wide sense stationary random fields is described. Empirical results of applying this algorithm to natural and sysnthesized scenes are presented and future research is outlined.
Machine Learning for Biological Trajectory Classification Applications

NASA Technical Reports Server (NTRS)

Sbalzarini, Ivo F.; Theriot, Julie; Koumoutsakos, Petros

2002-01-01

Machine-learning techniques, including clustering algorithms, support vector machines and hidden Markov models, are applied to the task of classifying trajectories of moving keratocyte cells. The different algorithms axe compared to each other as well as to expert and non-expert test persons, using concepts from signal-detection theory. The algorithms performed very well as compared to humans, suggesting a robust tool for trajectory classification in biological applications.
Simultaneous fecal microbial and metabolite profiling enables accurate classification of pediatric irritable bowel syndrome.

PubMed

Shankar, Vijay; Reo, Nicholas V; Paliy, Oleg

2015-12-09

We previously showed that stool samples of pre-adolescent and adolescent US children diagnosed with diarrhea-predominant IBS (IBS-D) had different compositions of microbiota and metabolites compared to healthy age-matched controls. Here we explored whether observed fecal microbiota and metabolite differences between these two adolescent populations can be used to discriminate between IBS and health. We constructed individual microbiota- and metabolite-based sample classification models based on the partial least squares multivariate analysis and then applied a Bayesian approach to integrate individual models into a single classifier. The resulting combined classification achieved 84 % accuracy of correct sample group assignment and 86 % prediction for IBS-D in cross-validation tests. The performance of the cumulative classification model was further validated by the de novo analysis of stool samples from a small independent IBS-D cohort. High-throughput microbial and metabolite profiling of subject stool samples can be used to facilitate IBS diagnosis.
Semi-supervised vibration-based classification and condition monitoring of compressors

NASA Astrophysics Data System (ADS)

Potočnik, Primož; Govekar, Edvard

2017-09-01

Semi-supervised vibration-based classification and condition monitoring of the reciprocating compressors installed in refrigeration appliances is proposed in this paper. The method addresses the problem of industrial condition monitoring where prior class definitions are often not available or difficult to obtain from local experts. The proposed method combines feature extraction, principal component analysis, and statistical analysis for the extraction of initial class representatives, and compares the capability of various classification methods, including discriminant analysis (DA), neural networks (NN), support vector machines (SVM), and extreme learning machines (ELM). The use of the method is demonstrated on a case study which was based on industrially acquired vibration measurements of reciprocating compressors during the production of refrigeration appliances. The paper presents a comparative qualitative analysis of the applied classifiers, confirming the good performance of several nonlinear classifiers. If the model parameters are properly selected, then very good classification performance can be obtained from NN trained by Bayesian regularization, SVM and ELM classifiers. The method can be effectively applied for the industrial condition monitoring of compressors.
A hybrid method for classifying cognitive states from fMRI data.

PubMed

Parida, S; Dehuri, S; Cho, S-B; Cacha, L A; Poznanski, R R

2015-09-01

Functional magnetic resonance imaging (fMRI) makes it possible to detect brain activities in order to elucidate cognitive-states. The complex nature of fMRI data requires under-standing of the analyses applied to produce possible avenues for developing models of cognitive state classification and improving brain activity prediction. While many models of classification task of fMRI data analysis have been developed, in this paper, we present a novel hybrid technique through combining the best attributes of genetic algorithms (GAs) and ensemble decision tree technique that consistently outperforms all other methods which are being used for cognitive-state classification. Specifically, this paper illustrates the combined effort of decision-trees ensemble and GAs for feature selection through an extensive simulation study and discusses the classification performance with respect to fMRI data. We have shown that our proposed method exhibits significant reduction of the number of features with clear edge classification accuracy over ensemble of decision-trees.
Using genetically modified tomato crop plants with purple leaves for absolute weed/crop classification.

PubMed

Lati, Ran N; Filin, Sagi; Aly, Radi; Lande, Tal; Levin, Ilan; Eizenberg, Hanan

2014-07-01

Weed/crop classification is considered the main problem in developing precise weed-management methodologies, because both crops and weeds share similar hues. Great effort has been invested in the development of classification models, most based on expensive sensors and complicated algorithms. However, satisfactory results are not consistently obtained due to imaging conditions in the field. We report on an innovative approach that combines advances in genetic engineering and robust image-processing methods to detect weeds and distinguish them from crop plants by manipulating the crop's leaf color. We demonstrate this on genetically modified tomato (germplasm AN-113) which expresses a purple leaf color. An autonomous weed/crop classification is performed using an invariant-hue transformation that is applied to images acquired by a standard consumer camera (visible wavelength) and handles variations in illumination intensities. The integration of these methodologies is simple and effective, and classification results were accurate and stable under a wide range of imaging conditions. Using this approach, we simplify the most complicated stage in image-based weed/crop classification models. © 2013 Society of Chemical Industry.
Neuromorphic Computing for Temporal Scientific Data Classification

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schuman, Catherine D.; Potok, Thomas E.; Young, Steven

In this work, we apply a spiking neural network model and an associated memristive neuromorphic implementation to an application in classifying temporal scientific data. We demonstrate that the spiking neural network model achieves comparable results to a previously reported convolutional neural network model, with significantly fewer neurons and synapses required.
Reduction in training time of a deep learning model in detection of lesions in CT

NASA Astrophysics Data System (ADS)

Makkinejad, Nazanin; Tajbakhsh, Nima; Zarshenas, Amin; Khokhar, Ashfaq; Suzuki, Kenji

2018-02-01

Deep learning (DL) emerged as a powerful tool for object detection and classification in medical images. Building a well-performing DL model, however, requires a huge number of images for training, and it takes days to train a DL model even on a cutting edge high-performance computing platform. This study is aimed at developing a method for selecting a "small" number of representative samples from a large collection of training samples to train a DL model for the could be used to detect polyps in CT colonography (CTC), without compromising the classification performance. Our proposed method for representative sample selection (RSS) consists of a K-means clustering algorithm. For the performance evaluation, we applied the proposed method to select samples for the training of a massive training artificial neural network based DL model, to be used for the classification of polyps and non-polyps in CTC. Our results show that the proposed method reduce the training time by a factor of 15, while maintaining the classification performance equivalent to the model trained using the full training set. We compare the performance using area under the receiveroperating- characteristic curve (AUC).
Adaptive sleep-wake discrimination for wearable devices.

PubMed

Karlen, Walter; Floreano, Dario

2011-04-01

Sleep/wake classification systems that rely on physiological signals suffer from intersubject differences that make accurate classification with a single, subject-independent model difficult. To overcome the limitations of intersubject variability, we suggest a novel online adaptation technique that updates the sleep/wake classifier in real time. The objective of the present study was to evaluate the performance of a newly developed adaptive classification algorithm that was embedded on a wearable sleep/wake classification system called SleePic. The algorithm processed ECG and respiratory effort signals for the classification task and applied behavioral measurements (obtained from accelerometer and press-button data) for the automatic adaptation task. When trained as a subject-independent classifier algorithm, the SleePic device was only able to correctly classify 74.94 ± 6.76% of the human-rated sleep/wake data. By using the suggested automatic adaptation method, the mean classification accuracy could be significantly improved to 92.98 ± 3.19%. A subject-independent classifier based on activity data only showed a comparable accuracy of 90.44 ± 3.57%. We demonstrated that subject-independent models used for online sleep-wake classification can successfully be adapted to previously unseen subjects without the intervention of human experts or off-line calibration.
Automatic detection and classification of artifacts in single-channel EEG.

PubMed

Olund, Thomas; Duun-Henriksen, Jonas; Kjaer, Troels W; Sorensen, Helge B D

2014-01-01

Ambulatory EEG monitoring can provide medical doctors important diagnostic information, without hospitalizing the patient. These recordings are however more exposed to noise and artifacts compared to clinically recorded EEG. An automatic artifact detection and classification algorithm for single-channel EEG is proposed to help identifying these artifacts. Features are extracted from the EEG signal and wavelet subbands. Subsequently a selection algorithm is applied in order to identify the best discriminating features. A non-linear support vector machine is used to discriminate among different artifact classes using the selected features. Single-channel (Fp1-F7) EEG recordings are obtained from experiments with 12 healthy subjects performing artifact inducing movements. The dataset was used to construct and validate the model. Both subject-specific and generic implementation, are investigated. The detection algorithm yield an average sensitivity and specificity above 95% for both the subject-specific and generic models. The classification algorithm show a mean accuracy of 78 and 64% for the subject-specific and generic model, respectively. The classification model was additionally validated on a reference dataset with similar results.
A bayesian hierarchical model for classification with selection of functional predictors.

PubMed

Zhu, Hongxiao; Vannucci, Marina; Cox, Dennis D

2010-06-01

In functional data classification, functional observations are often contaminated by various systematic effects, such as random batch effects caused by device artifacts, or fixed effects caused by sample-related factors. These effects may lead to classification bias and thus should not be neglected. Another issue of concern is the selection of functions when predictors consist of multiple functions, some of which may be redundant. The above issues arise in a real data application where we use fluorescence spectroscopy to detect cervical precancer. In this article, we propose a Bayesian hierarchical model that takes into account random batch effects and selects effective functions among multiple functional predictors. Fixed effects or predictors in nonfunctional form are also included in the model. The dimension of the functional data is reduced through orthonormal basis expansion or functional principal components. For posterior sampling, we use a hybrid Metropolis-Hastings/Gibbs sampler, which suffers slow mixing. An evolutionary Monte Carlo algorithm is applied to improve the mixing. Simulation and real data application show that the proposed model provides accurate selection of functional predictors as well as good classification.

Classification and recognition of texture collagen obtaining by multiphoton microscope with neural network analysis

NASA Astrophysics Data System (ADS)

Wu, Shulian; Peng, Yuanyuan; Hu, Liangjun; Zhang, Xiaoman; Li, Hui

2016-01-01

Second harmonic generation microscopy (SHGM) was used to monitor the process of chronological aging skin in vivo. The collagen structures of mice model with different ages were obtained using SHGM. Then, texture feature with contrast, correlation and entropy were extracted and analysed using the grey level co-occurrence matrix. At last, the neural network tool of Matlab was applied to train the texture of collagen in different statues during the aging process. And the simulation of mice collagen texture was carried out. The results indicated that the classification accuracy reach 85%. Results demonstrated that the proposed approach effectively detected the target object in the collagen texture image during the chronological aging process and the analysis tool based on neural network applied the skin of classification and feature extraction method is feasible.
Use of circulation types classifications to evaluate AR4 climate models over the Euro-Atlantic region

NASA Astrophysics Data System (ADS)

Pastor, M. A.; Casado, M. J.

2012-10-01

This paper presents an evaluation of the multi-model simulations for the 4th Assessment Report of the Intergovernmental Panel on Climate Change (IPCC) in terms of their ability to simulate the ERA40 circulation types over the Euro-Atlantic region in winter season. Two classification schemes, k-means and SANDRA, have been considered to test the sensitivity of the evaluation results to the classification procedure. The assessment allows establishing different rankings attending spatial and temporal features of the circulation types. Regarding temporal characteristics, in general, all AR4 models tend to underestimate the frequency of occurrence. The best model simulating spatial characteristics is the UKMO-HadGEM1 whereas CCSM3, UKMO-HadGEM1 and CGCM3.1(T63) are the best simulating the temporal features, for both classification schemes. This result agrees with the AR4 models ranking obtained when having analysed the ability of the same AR4 models to simulate Euro-Atlantic variability modes. This study has proved the utility of applying such a synoptic climatology approach as a diagnostic tool for models' assessment. The ability of the models to properly reproduce the position of ridges and troughs and the frequency of synoptic patterns, will therefore improve our confidence in the response of models to future climate changes.
Clinical relevance of rare germline sequence variants in cancer genes: evolution and application of classification models.

PubMed

Spurdle, Amanda B

2010-06-01

Multifactorial models developed for BRCA1/2 variant classification have proved very useful for delineating BRCA1/2 variants associated with very high risk of cancer, or with little clinical significance. Recent linkage of this quantitative assessment of risk to clinical management guidelines has provided a basis to standardize variant reporting, variant classification and management of families with such variants, and can theoretically be applied to any disease gene. As proof of principle, the multifactorial approach already shows great promise for application to the evaluation of mismatch repair gene variants identified in families with suspected Lynch syndrome. However there is need to be cautious of the noted limitations and caveats of the current model, some of which may be exacerbated by differences in ascertainment and biological pathways to disease for different cancer syndromes.
Multiresource analysis and information system concepts for incorporating LANDSAT and GIS technology into large area forest surveys. [South Carolina

NASA Technical Reports Server (NTRS)

Langley, P. G.

1981-01-01

A method of relating different classifications at each stage of a multistage, multiresource inventory using remotely sensed imagery is discussed. A class transformation matrix allowing the conversion of a set of proportions at one stage, to a set of proportions at the subsequent stage through use of a linear model, is described. The technique was tested by applying it to Kershaw County, South Carolina. Unsupervised LANDSAT spectral classifications were correlated with interpretations of land use aerial photography, the correlations employed to estimate land use classifications using the linear model, and the land use proportions used to stratify current annual increment (CAI) field plot data to obtain a total CAI for the county. The estimate differed by 1% from the published figure for land use. Potential sediment loss and a variety of land use classifications were also obtained.
Case-based statistical learning applied to SPECT image classification

NASA Astrophysics Data System (ADS)

Górriz, Juan M.; Ramírez, Javier; Illán, I. A.; Martínez-Murcia, Francisco J.; Segovia, Fermín.; Salas-Gonzalez, Diego; Ortiz, A.

2017-03-01

Statistical learning and decision theory play a key role in many areas of science and engineering. Some examples include time series regression and prediction, optical character recognition, signal detection in communications or biomedical applications for diagnosis and prognosis. This paper deals with the topic of learning from biomedical image data in the classification problem. In a typical scenario we have a training set that is employed to fit a prediction model or learner and a testing set on which the learner is applied to in order to predict the outcome for new unseen patterns. Both processes are usually completely separated to avoid over-fitting and due to the fact that, in practice, the unseen new objects (testing set) have unknown outcomes. However, the outcome yields one of a discrete set of values, i.e. the binary diagnosis problem. Thus, assumptions on these outcome values could be established to obtain the most likely prediction model at the training stage, that could improve the overall classification accuracy on the testing set, or keep its performance at least at the level of the selected statistical classifier. In this sense, a novel case-based learning (c-learning) procedure is proposed which combines hypothesis testing from a discrete set of expected outcomes and a cross-validated classification stage.
Deep learning for tumor classification in imaging mass spectrometry.

PubMed

Behrmann, Jens; Etmann, Christian; Boskamp, Tobias; Casadonte, Rita; Kriegsmann, Jörg; Maaß, Peter

2018-04-01

Tumor classification using imaging mass spectrometry (IMS) data has a high potential for future applications in pathology. Due to the complexity and size of the data, automated feature extraction and classification steps are required to fully process the data. Since mass spectra exhibit certain structural similarities to image data, deep learning may offer a promising strategy for classification of IMS data as it has been successfully applied to image classification. Methodologically, we propose an adapted architecture based on deep convolutional networks to handle the characteristics of mass spectrometry data, as well as a strategy to interpret the learned model in the spectral domain based on a sensitivity analysis. The proposed methods are evaluated on two algorithmically challenging tumor classification tasks and compared to a baseline approach. Competitiveness of the proposed methods is shown on both tasks by studying the performance via cross-validation. Moreover, the learned models are analyzed by the proposed sensitivity analysis revealing biologically plausible effects as well as confounding factors of the considered tasks. Thus, this study may serve as a starting point for further development of deep learning approaches in IMS classification tasks. https://gitlab.informatik.uni-bremen.de/digipath/Deep_Learning_for_Tumor_Classification_in_IMS. jbehrmann@uni-bremen.de or christianetmann@uni-bremen.de. Supplementary data are available at Bioinformatics online.
Fast-HPLC Fingerprinting to Discriminate Olive Oil from Other Edible Vegetable Oils by Multivariate Classification Methods.

PubMed

Jiménez-Carvelo, Ana M; González-Casado, Antonio; Pérez-Castaño, Estefanía; Cuadros-Rodríguez, Luis

2017-03-01

A new analytical method for the differentiation of olive oil from other vegetable oils using reversed-phase LC and applying chemometric techniques was developed. A 3 cm short column was used to obtain the chromatographic fingerprint of the methyl-transesterified fraction of each vegetable oil. The chromatographic analysis took only 4 min. The multivariate classification methods used were k-nearest neighbors, partial least-squares (PLS) discriminant analysis, one-class PLS, support vector machine classification, and soft independent modeling of class analogies. The discrimination of olive oil from other vegetable edible oils was evaluated by several classification quality metrics. Several strategies for the classification of the olive oil were used: one input-class, two input-class, and pseudo two input-class.
Identification of terrain cover using the optimum polarimetric classifier

NASA Technical Reports Server (NTRS)

Kong, J. A.; Swartz, A. A.; Yueh, H. A.; Novak, L. M.; Shin, R. T.

1988-01-01

A systematic approach for the identification of terrain media such as vegetation canopy, forest, and snow-covered fields is developed using the optimum polarimetric classifier. The covariance matrices for various terrain cover are computed from theoretical models of random medium by evaluating the scattering matrix elements. The optimal classification scheme makes use of a quadratic distance measure and is applied to classify a vegetation canopy consisting of both trees and grass. Experimentally measured data are used to validate the classification scheme. Analytical and Monte Carlo simulated classification errors using the fully polarimetric feature vector are compared with classification based on single features which include the phase difference between the VV and HH polarization returns. It is shown that the full polarimetric results are optimal and provide better classification performance than single feature measurements.
Weakly supervised classification in high energy physics

DOE PAGES

Dery, Lucio Mwinmaarong; Nachman, Benjamin; Rubbo, Francesco; ...

2017-05-01

As machine learning algorithms become increasingly sophisticated to exploit subtle features of the data, they often become more dependent on simulations. Here, this paper presents a new approach called weakly supervised classification in which class proportions are the only input into the machine learning algorithm. Using one of the most challenging binary classification tasks in high energy physics $-$ quark versus gluon tagging $-$ we show that weakly supervised classification can match the performance of fully supervised algorithms. Furthermore, by design, the new algorithm is insensitive to any mis-modeling of discriminating features in the data by the simulation. Weakly supervisedmore » classification is a general procedure that can be applied to a wide variety of learning problems to boost performance and robustness when detailed simulations are not reliable or not available.« less
Weakly supervised classification in high energy physics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dery, Lucio Mwinmaarong; Nachman, Benjamin; Rubbo, Francesco

As machine learning algorithms become increasingly sophisticated to exploit subtle features of the data, they often become more dependent on simulations. Here, this paper presents a new approach called weakly supervised classification in which class proportions are the only input into the machine learning algorithm. Using one of the most challenging binary classification tasks in high energy physics $-$ quark versus gluon tagging $-$ we show that weakly supervised classification can match the performance of fully supervised algorithms. Furthermore, by design, the new algorithm is insensitive to any mis-modeling of discriminating features in the data by the simulation. Weakly supervisedmore » classification is a general procedure that can be applied to a wide variety of learning problems to boost performance and robustness when detailed simulations are not reliable or not available.« less
Improved Fuzzy K-Nearest Neighbor Using Modified Particle Swarm Optimization

NASA Astrophysics Data System (ADS)

Jamaluddin; Siringoringo, Rimbun

2017-12-01

Fuzzy k-Nearest Neighbor (FkNN) is one of the most powerful classification methods. The presence of fuzzy concepts in this method successfully improves its performance on almost all classification issues. The main drawbackof FKNN is that it is difficult to determine the parameters. These parameters are the number of neighbors (k) and fuzzy strength (m). Both parameters are very sensitive. This makes it difficult to determine the values of ‘m’ and ‘k’, thus making FKNN difficult to control because no theories or guides can deduce how proper ‘m’ and ‘k’ should be. This study uses Modified Particle Swarm Optimization (MPSO) to determine the best value of ‘k’ and ‘m’. MPSO is focused on the Constriction Factor Method. Constriction Factor Method is an improvement of PSO in order to avoid local circumstances optima. The model proposed in this study was tested on the German Credit Dataset. The test of the data/The data test has been standardized by UCI Machine Learning Repository which is widely applied to classification problems. The application of MPSO to the determination of FKNN parameters is expected to increase the value of classification performance. Based on the experiments that have been done indicating that the model offered in this research results in a better classification performance compared to the Fk-NN model only. The model offered in this study has an accuracy rate of 81%, while. With using Fk-NN model, it has the accuracy of 70%. At the end is done comparison of research model superiority with 2 other classification models;such as Naive Bayes and Decision Tree. This research model has a better performance level, where Naive Bayes has accuracy 75%, and the decision tree model has 70%
A Comprehensive Study of Retinal Vessel Classification Methods in Fundus Images

PubMed Central

Miri, Maliheh; Amini, Zahra; Rabbani, Hossein; Kafieh, Raheleh

2017-01-01

Nowadays, it is obvious that there is a relationship between changes in the retinal vessel structure and diseases such as diabetic, hypertension, stroke, and the other cardiovascular diseases in adults as well as retinopathy of prematurity in infants. Retinal fundus images provide non-invasive visualization of the retinal vessel structure. Applying image processing techniques in the study of digital color fundus photographs and analyzing their vasculature is a reliable approach for early diagnosis of the aforementioned diseases. Reduction in the arteriolar–venular ratio of retina is one of the primary signs of hypertension, diabetic, and cardiovascular diseases which can be calculated by analyzing the fundus images. To achieve a precise measuring of this parameter and meaningful diagnostic results, accurate classification of arteries and veins is necessary. Classification of vessels in fundus images faces with some challenges that make it difficult. In this paper, a comprehensive study of the proposed methods for classification of arteries and veins in fundus images is presented. Considering that these methods are evaluated on different datasets and use different evaluation criteria, it is not possible to conduct a fair comparison of their performance. Therefore, we evaluate the classification methods from modeling perspective. This analysis reveals that most of the proposed approaches have focused on statistics, and geometric models in spatial domain and transform domain models have received less attention. This could suggest the possibility of using transform models, especially data adaptive ones, for modeling of the fundus images in future classification approaches. PMID:28553578
Optimal land use/land cover classification using remote sensing imagery for hydrological modeling in a Himalayan watershed

NASA Astrophysics Data System (ADS)

Saran, Sameer; Sterk, Geert; Kumar, Suresh

2009-10-01

Land use/land cover is an important watershed surface characteristic that affects surface runoff and erosion. Many of the available hydrological models divide the watershed into Hydrological Response Units (HRU), which are spatial units with expected similar hydrological behaviours. The division into HRU's requires good-quality spatial data on land use/land cover. This paper presents different approaches to attain an optimal land use/land cover map based on remote sensing imagery for a Himalayan watershed in northern India. First digital classifications using maximum likelihood classifier (MLC) and a decision tree classifier were applied. The results obtained from the decision tree were better and even improved after post classification sorting. But the obtained land use/land cover map was not sufficient for the delineation of HRUs, since the agricultural land use/land cover class did not discriminate between the two major crops in the area i.e. paddy and maize. Subsequently the digital classification on fused data (ASAR and ASTER) were attempted to map land use/land cover classes with emphasis to delineate the paddy and maize crops but the supervised classification over fused datasets did not provide the desired accuracy and proper delineation of paddy and maize crops. Eventually, we adopted a visual classification approach on fused data. This second step with detailed classification system resulted into better classification accuracy within the 'agricultural land' class which will be further combined with topography and soil type to derive HRU's for physically-based hydrological modeling.
BCDForest: a boosting cascade deep forest model towards the classification of cancer subtypes based on gene expression data.

PubMed

Guo, Yang; Liu, Shuhui; Li, Zhanhuai; Shang, Xuequn

2018-04-11

The classification of cancer subtypes is of great importance to cancer disease diagnosis and therapy. Many supervised learning approaches have been applied to cancer subtype classification in the past few years, especially of deep learning based approaches. Recently, the deep forest model has been proposed as an alternative of deep neural networks to learn hyper-representations by using cascade ensemble decision trees. It has been proved that the deep forest model has competitive or even better performance than deep neural networks in some extent. However, the standard deep forest model may face overfitting and ensemble diversity challenges when dealing with small sample size and high-dimensional biology data. In this paper, we propose a deep learning model, so-called BCDForest, to address cancer subtype classification on small-scale biology datasets, which can be viewed as a modification of the standard deep forest model. The BCDForest distinguishes from the standard deep forest model with the following two main contributions: First, a named multi-class-grained scanning method is proposed to train multiple binary classifiers to encourage diversity of ensemble. Meanwhile, the fitting quality of each classifier is considered in representation learning. Second, we propose a boosting strategy to emphasize more important features in cascade forests, thus to propagate the benefits of discriminative features among cascade layers to improve the classification performance. Systematic comparison experiments on both microarray and RNA-Seq gene expression datasets demonstrate that our method consistently outperforms the state-of-the-art methods in application of cancer subtype classification. The multi-class-grained scanning and boosting strategy in our model provide an effective solution to ease the overfitting challenge and improve the robustness of deep forest model working on small-scale data. Our model provides a useful approach to the classification of cancer subtypes by using deep learning on high-dimensional and small-scale biology data.
Visual word ambiguity.

PubMed

van Gemert, Jan C; Veenman, Cor J; Smeulders, Arnold W M; Geusebroek, Jan-Mark

2010-07-01

This paper studies automatic image classification by modeling soft assignment in the popular codebook model. The codebook model describes an image as a bag of discrete visual words selected from a vocabulary, where the frequency distributions of visual words in an image allow classification. One inherent component of the codebook model is the assignment of discrete visual words to continuous image features. Despite the clear mismatch of this hard assignment with the nature of continuous features, the approach has been successfully applied for some years. In this paper, we investigate four types of soft assignment of visual words to image features. We demonstrate that explicitly modeling visual word assignment ambiguity improves classification performance compared to the hard assignment of the traditional codebook model. The traditional codebook model is compared against our method for five well-known data sets: 15 natural scenes, Caltech-101, Caltech-256, and Pascal VOC 2007/2008. We demonstrate that large codebook vocabulary sizes completely deteriorate the performance of the traditional model, whereas the proposed model performs consistently. Moreover, we show that our method profits in high-dimensional feature spaces and reaps higher benefits when increasing the number of image categories.
Families with Noncompliant Children: Applications of the Systemic Model.

ERIC Educational Resources Information Center

Neilans, Thomas H.; And Others

This paper describes the application of a systems approach model to assessing families with a labeled noncompliant child. The first section describes and comments on the applied methodology for the model. The second section describes the classification of 61 families containing a child labeled by the family as noncompliant. An analysis of data…
Dimensional Representation and Gradient Boosting for Seismic Event Classification

NASA Astrophysics Data System (ADS)

Semmelmayer, F. C.; Kappedal, R. D.; Magana-Zook, S. A.

2017-12-01

In this research, we conducted experiments of representational structures on 5009 seismic signals with the intent of finding a method to classify signals as either an explosion or an earthquake in an automated fashion. We also applied a gradient boosted classifier. While perfect classification was not attained (approximately 88% was our best model), some cases demonstrate that many events can be filtered out as very high probability being explosions or earthquakes, diminishing subject-matter experts'(SME) workload for first stage analysis. It is our hope that these methods can be refined, further increasing the classification probability.
Developing collaborative classifiers using an expert-based model

USGS Publications Warehouse

Mountrakis, G.; Watts, R.; Luo, L.; Wang, Jingyuan

2009-01-01

This paper presents a hierarchical, multi-stage adaptive strategy for image classification. We iteratively apply various classification methods (e.g., decision trees, neural networks), identify regions of parametric and geographic space where accuracy is low, and in these regions, test and apply alternate methods repeating the process until the entire image is classified. Currently, classifiers are evaluated through human input using an expert-based system; therefore, this paper acts as the proof of concept for collaborative classifiers. Because we decompose the problem into smaller, more manageable sub-tasks, our classification exhibits increased flexibility compared to existing methods since classification methods are tailored to the idiosyncrasies of specific regions. A major benefit of our approach is its scalability and collaborative support since selected low-accuracy classifiers can be easily replaced with others without affecting classification accuracy in high accuracy areas. At each stage, we develop spatially explicit accuracy metrics that provide straightforward assessment of results by non-experts and point to areas that need algorithmic improvement or ancillary data. Our approach is demonstrated in the task of detecting impervious surface areas, an important indicator for human-induced alterations to the environment, using a 2001 Landsat scene from Las Vegas, Nevada. ?? 2009 American Society for Photogrammetry and Remote Sensing.
Hierarchical structure for audio-video based semantic classification of sports video sequences

NASA Astrophysics Data System (ADS)

Kolekar, M. H.; Sengupta, S.

2005-07-01

A hierarchical structure for sports event classification based on audio and video content analysis is proposed in this paper. Compared to the event classifications in other games, those of cricket are very challenging and yet unexplored. We have successfully solved cricket video classification problem using a six level hierarchical structure. The first level performs event detection based on audio energy and Zero Crossing Rate (ZCR) of short-time audio signal. In the subsequent levels, we classify the events based on video features using a Hidden Markov Model implemented through Dynamic Programming (HMM-DP) using color or motion as a likelihood function. For some of the game-specific decisions, a rule-based classification is also performed. Our proposed hierarchical structure can easily be applied to any other sports. Our results are very promising and we have moved a step forward towards addressing semantic classification problems in general.
Behavior Based Social Dimensions Extraction for Multi-Label Classification

PubMed Central

Li, Le; Xu, Junyi; Xiao, Weidong; Ge, Bin

2016-01-01

Classification based on social dimensions is commonly used to handle the multi-label classification task in heterogeneous networks. However, traditional methods, which mostly rely on the community detection algorithms to extract the latent social dimensions, produce unsatisfactory performance when community detection algorithms fail. In this paper, we propose a novel behavior based social dimensions extraction method to improve the classification performance in multi-label heterogeneous networks. In our method, nodes’ behavior features, instead of community memberships, are used to extract social dimensions. By introducing Latent Dirichlet Allocation (LDA) to model the network generation process, nodes’ connection behaviors with different communities can be extracted accurately, which are applied as latent social dimensions for classification. Experiments on various public datasets reveal that the proposed method can obtain satisfactory classification results in comparison to other state-of-the-art methods on smaller social dimensions. PMID:27049849

Desert plains classification based on Geomorphometrical parameters (Case study: Aghda, Yazd)

NASA Astrophysics Data System (ADS)

Tazeh, mahdi; Kalantari, Saeideh

2013-04-01

This research focuses on plains. There are several tremendous methods and classification which presented for plain classification. One of The natural resource based classification which is mostly using in Iran, classified plains into three types, Erosional Pediment, Denudation Pediment Aggradational Piedmont. The qualitative and quantitative factors to differentiate them from each other are also used appropriately. In this study effective Geomorphometrical parameters in differentiate landforms were applied for plain. Geomorphometrical parameters are calculable and can be extracted using mathematical equations and the corresponding relations on digital elevation model. Geomorphometrical parameters used in this study included Percent of Slope, Plan Curvature, Profile Curvature, Minimum Curvature, the Maximum Curvature, Cross sectional Curvature, Longitudinal Curvature and Gaussian Curvature. The results indicated that the most important affecting Geomorphometrical parameters for plain and desert classifications includes: Percent of Slope, Minimum Curvature, Profile Curvature, and Longitudinal Curvature. Key Words: Plain, Geomorphometry, Classification, Biophysical, Yazd Khezarabad.
Classification of Birds and Bats Using Flight Tracks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cullinan, Valerie I.; Matzner, Shari; Duberstein, Corey A.

Classification of birds and bats that use areas targeted for offshore wind farm development and the inference of their behavior is essential to evaluating the potential effects of development. The current approach to assessing the number and distribution of birds at sea involves transect surveys using trained individuals in boats or airplanes or using high-resolution imagery. These approaches are costly and have safety concerns. Based on a limited annotated library extracted from a single-camera thermal video, we provide a framework for building models that classify birds and bats and their associated behaviors. As an example, we developed a discriminant modelmore » for theoretical flight paths and applied it to data (N = 64 tracks) extracted from 5-min video clips. The agreement between model- and observer-classified path types was initially only 41%, but it increased to 73% when small-scale jitter was censored and path types were combined. Classification of 46 tracks of bats, swallows, gulls, and terns on average was 82% accurate, based on a jackknife cross-validation. Model classification of bats and terns (N = 4 and 2, respectively) was 94% and 91% correct, respectively; however, the variance associated with the tracks from these targets is poorly estimated. Model classification of gulls and swallows (N ≥ 18) was on average 73% and 85% correct, respectively. The models developed here should be considered preliminary because they are based on a small data set both in terms of the numbers of species and the identified flight tracks. Future classification models would be greatly improved by including a measure of distance between the camera and the target.« less
Classification of Brazilian and foreign gasolines adulterated with alcohol using infrared spectroscopy.

PubMed

da Silva, Neirivaldo C; Pimentel, Maria Fernanda; Honorato, Ricardo S; Talhavini, Marcio; Maldaner, Adriano O; Honorato, Fernanda A

2015-08-01

The smuggling of products across the border regions of many countries is a practice to be fought. Brazilian authorities are increasingly worried about the illicit trade of fuels along the frontiers of the country. In order to confirm this as a crime, the Federal Police must have a means of identifying the origin of the fuel. This work describes the development of a rapid and nondestructive methodology to classify gasoline as to its origin (Brazil, Venezuela and Peru), using infrared spectroscopy and multivariate classification. Partial Least Squares Discriminant Analysis (PLS-DA) and Soft Independent Modeling Class Analogy (SIMCA) models were built. Direct standardization (DS) was employed aiming to standardize the spectra obtained in different laboratories of the border units of the Federal Police. Two approaches were considered in this work: (1) local and (2) global classification models. When using Approach 1, the PLS-DA achieved 100% correct classification, and the deviation of the predicted values for the secondary instrument considerably decreased after performing DS. In this case, SIMCA models were not efficient in the classification, even after standardization. Using a global model (Approach 2), both PLS-DA and SIMCA techniques were effective after performing DS. Considering that real situations may involve questioned samples from other nations (such as Peru), the SIMCA method developed according to Approach 2 is a more adequate, since the sample will be classified neither as Brazil nor Venezuelan. This methodology could be applied to other forensic problems involving the chemical classification of a product, provided that a specific modeling is performed. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Target Scattering Metrics: Model-Model and Model-Data Comparisons

DTIC Science & Technology

2017-12-13

measured synthetic aperture sonar (SAS) data or from numerical models is investigated. Metrics are needed for quantitative comparisons for signals...candidate metrics for model-model comparisons are examined here with a goal to consider raw data prior to its reduction to data products, which may...be suitable for input to classification schemes. The investigated metrics are then applied to model-data comparisons. INTRODUCTION Metrics for
Target Scattering Metrics: Model-Model and Model Data comparisons

DTIC Science & Technology

2017-12-13

measured synthetic aperture sonar (SAS) data or from numerical models is investigated. Metrics are needed for quantitative comparisons for signals...candidate metrics for model-model comparisons are examined here with a goal to consider raw data prior to its reduction to data products, which may...be suitable for input to classification schemes. The investigated metrics are then applied to model-data comparisons. INTRODUCTION Metrics for
Protein classification based on text document classification techniques.

PubMed

Cheng, Betty Yee Man; Carbonell, Jaime G; Klein-Seetharaman, Judith

2005-03-01

The need for accurate, automated protein classification methods continues to increase as advances in biotechnology uncover new proteins. G-protein coupled receptors (GPCRs) are a particularly difficult superfamily of proteins to classify due to extreme diversity among its members. Previous comparisons of BLAST, k-nearest neighbor (k-NN), hidden markov model (HMM) and support vector machine (SVM) using alignment-based features have suggested that classifiers at the complexity of SVM are needed to attain high accuracy. Here, analogous to document classification, we applied Decision Tree and Naive Bayes classifiers with chi-square feature selection on counts of n-grams (i.e. short peptide sequences of length n) to this classification task. Using the GPCR dataset and evaluation protocol from the previous study, the Naive Bayes classifier attained an accuracy of 93.0 and 92.4% in level I and level II subfamily classification respectively, while SVM has a reported accuracy of 88.4 and 86.3%. This is a 39.7 and 44.5% reduction in residual error for level I and level II subfamily classification, respectively. The Decision Tree, while inferior to SVM, outperforms HMM in both level I and level II subfamily classification. For those GPCR families whose profiles are stored in the Protein FAMilies database of alignments and HMMs (PFAM), our method performs comparably to a search against those profiles. Finally, our method can be generalized to other protein families by applying it to the superfamily of nuclear receptors with 94.5, 97.8 and 93.6% accuracy in family, level I and level II subfamily classification respectively. Copyright 2005 Wiley-Liss, Inc.
Classification of postural profiles among mouth-breathing children by learning vector quantization.

PubMed

Mancini, F; Sousa, F S; Hummel, A D; Falcão, A E J; Yi, L C; Ortolani, C F; Sigulem, D; Pisa, I T

2011-01-01

Mouth breathing is a chronic syndrome that may bring about postural changes. Finding characteristic patterns of changes occurring in the complex musculoskeletal system of mouth-breathing children has been a challenge. Learning vector quantization (LVQ) is an artificial neural network model that can be applied for this purpose. The aim of the present study was to apply LVQ to determine the characteristic postural profiles shown by mouth-breathing children, in order to further understand abnormal posture among mouth breathers. Postural training data on 52 children (30 mouth breathers and 22 nose breathers) and postural validation data on 32 children (22 mouth breathers and 10 nose breathers) were used. The performance of LVQ and other classification models was compared in relation to self-organizing maps, back-propagation applied to multilayer perceptrons, Bayesian networks, naive Bayes, J48 decision trees, k, and k-nearest-neighbor classifiers. Classifier accuracy was assessed by means of leave-one-out cross-validation, area under ROC curve (AUC), and inter-rater agreement (Kappa statistics). By using the LVQ model, five postural profiles for mouth-breathing children could be determined. LVQ showed satisfactory results for mouth-breathing and nose-breathing classification: sensitivity and specificity rates of 0.90 and 0.95, respectively, when using the training dataset, and 0.95 and 0.90, respectively, when using the validation dataset. The five postural profiles for mouth-breathing children suggested by LVQ were incorporated into application software for classifying the severity of mouth breathers' abnormal posture.
Automated connectionist-geostatistical classification as an approach to identify sea ice and land ice types, properties and provinces

NASA Astrophysics Data System (ADS)

Goetz-Weiss, L. R.; Herzfeld, U. C.; Trantow, T.; Hunke, E. C.; Maslanik, J. A.; Crocker, R. I.

2016-12-01

An important problem in model-data comparison is the identification of parameters that can be extracted from observational data as well as used in numerical models, which are typically based on idealized physical processes. Here, we present a suite of approaches to characterization and classification of sea ice and land ice types, properties and provinces based on several types of remote-sensing data. Applications will be given to not only illustrate the approach, but employ it in model evaluation and understanding of physical processes. (1) In a geostatistical characterization, spatial sea-ice properties in the Chukchi and Beaufort Sea and in Elsoon Lagoon are derived from analysis of RADARSAT and ERS-2 SAR data. (2) The analysis is taken further by utilizing multi-parameter feature vectors as inputs for unsupervised and supervised statistical classification, which facilitates classification of different sea-ice types. (3) Characteristic sea-ice parameters, as resultant from the classification, can then be applied in model evaluation, as demonstrated for the ridging scheme of the Los Alamos sea ice model, CICE, using high-resolution altimeter and image data collected from unmanned aircraft over Fram Strait during the Characterization of Arctic Sea Ice Experiment (CASIE). The characteristic parameters chosen in this application are directly related to deformation processes, which also underly the ridging scheme. (4) The method that is capable of the most complex classification tasks is the connectionist-geostatistical classification method. This approach has been developed to identify currently up to 18 different crevasse types in order to map progression of the surge through the complex Bering-Bagley Glacier System, Alaska, in 2011-2014. The analysis utilizes airborne altimeter data and video image data and satellite image data. Results of the crevasse classification are compare to fracture modeling and found to match.
Neural Network Classifier Architectures for Phoneme Recognition. CRC Technical Note No. CRC-TN-92-001.

ERIC Educational Resources Information Center

Treurniet, William

A study applied artificial neural networks, trained with the back-propagation learning algorithm, to modelling phonemes extracted from the DARPA TIMIT multi-speaker, continuous speech data base. A number of proposed network architectures were applied to the phoneme classification task, ranging from the simple feedforward multilayer network to more…
Rotationally invariant clustering of diffusion MRI data using spherical harmonics

NASA Astrophysics Data System (ADS)

Liptrot, Matthew; Lauze, François

2016-03-01

We present a simple approach to the voxelwise classification of brain tissue acquired with diffusion weighted MRI (DWI). The approach leverages the power of spherical harmonics to summarise the diffusion information, sampled at many points over a sphere, using only a handful of coefficients. We use simple features that are invariant to the rotation of the highly orientational diffusion data. This provides a way to directly classify voxels whose diffusion characteristics are similar yet whose primary diffusion orientations differ. Subsequent application of machine-learning to the spherical harmonic coefficients therefore may permit classification of DWI voxels according to their inferred underlying fibre properties, whilst ignoring the specifics of orientation. After smoothing apparent diffusion coefficients volumes, we apply a spherical harmonic transform, which models the multi-directional diffusion data as a collection of spherical basis functions. We use the derived coefficients as voxelwise feature vectors for classification. Using a simple Gaussian mixture model, we examined the classification performance for a range of sub-classes (3-20). The results were compared against existing alternatives for tissue classification e.g. fractional anisotropy (FA) or the standard model used by Camino.1 The approach was implemented on both two publicly-available datasets: an ex-vivo pig brain and in-vivo human brain from the Human Connectome Project (HCP). We have demonstrated how a robust classification of DWI data can be performed without the need for a model reconstruction step. This avoids the potential confounds and uncertainty that such models may impose, and has the benefit of being computable directly from the DWI volumes. As such, the method could prove useful in subsequent pre-processing stages, such as model fitting, where it could inform about individual voxel complexities and improve model parameter choice.
Mapping and monitoring changes in vegetation communities of Jasper Ridge, CA, using spectral fractions derived from AVIRIS images

NASA Technical Reports Server (NTRS)

Sabol, Donald E., Jr.; Roberts, Dar A.; Adams, John B.; Smith, Milton O.

1993-01-01

An important application of remote sensing is to map and monitor changes over large areas of the land surface. This is particularly significant with the current interest in monitoring vegetation communities. Most of traditional methods for mapping different types of plant communities are based upon statistical classification techniques (i.e., parallel piped, nearest-neighbor, etc.) applied to uncalibrated multispectral data. Classes from these techniques are typically difficult to interpret (particularly to a field ecologist/botanist). Also, classes derived for one image can be very different from those derived from another image of the same area, making interpretation of observed temporal changes nearly impossible. More recently, neural networks have been applied to classification. Neural network classification, based upon spectral matching, is weak in dealing with spectral mixtures (a condition prevalent in images of natural surfaces). Another approach to mapping vegetation communities is based on spectral mixture analysis, which can provide a consistent framework for image interpretation. Roberts et al. (1990) mapped vegetation using the band residuals from a simple mixing model (the same spectral endmembers applied to all image pixels). Sabol et al. (1992b) and Roberts et al. (1992) used different methods to apply the most appropriate spectral endmembers to each image pixel, thereby allowing mapping of vegetation based upon the the different endmember spectra. In this paper, we describe a new approach to classification of vegetation communities based upon the spectra fractions derived from spectral mixture analysis. This approach was applied to three 1992 AVIRIS images of Jasper Ridge, California to observe seasonal changes in surface composition.
A comprehensive simulation study on classification of RNA-Seq data.

PubMed

Zararsız, Gökmen; Goksuluk, Dincer; Korkmaz, Selcuk; Eldem, Vahap; Zararsiz, Gozde Erturk; Duru, Izzet Parug; Ozturk, Ahmet

2017-01-01

RNA sequencing (RNA-Seq) is a powerful technique for the gene-expression profiling of organisms that uses the capabilities of next-generation sequencing technologies. Developing gene-expression-based classification algorithms is an emerging powerful method for diagnosis, disease classification and monitoring at molecular level, as well as providing potential markers of diseases. Most of the statistical methods proposed for the classification of gene-expression data are either based on a continuous scale (eg. microarray data) or require a normal distribution assumption. Hence, these methods cannot be directly applied to RNA-Seq data since they violate both data structure and distributional assumptions. However, it is possible to apply these algorithms with appropriate modifications to RNA-Seq data. One way is to develop count-based classifiers, such as Poisson linear discriminant analysis and negative binomial linear discriminant analysis. Another way is to bring the data closer to microarrays and apply microarray-based classifiers. In this study, we compared several classifiers including PLDA with and without power transformation, NBLDA, single SVM, bagging SVM (bagSVM), classification and regression trees (CART), and random forests (RF). We also examined the effect of several parameters such as overdispersion, sample size, number of genes, number of classes, differential-expression rate, and the transformation method on model performances. A comprehensive simulation study is conducted and the results are compared with the results of two miRNA and two mRNA experimental datasets. The results revealed that increasing the sample size, differential-expression rate and decreasing the dispersion parameter and number of groups lead to an increase in classification accuracy. Similar with differential-expression studies, the classification of RNA-Seq data requires careful attention when handling data overdispersion. We conclude that, as a count-based classifier, the power transformed PLDA and, as a microarray-based classifier, vst or rlog transformed RF and SVM classifiers may be a good choice for classification. An R/BIOCONDUCTOR package, MLSeq, is freely available at https://www.bioconductor.org/packages/release/bioc/html/MLSeq.html.
HIV classification using the coalescent theory

PubMed Central

Bulla, Ingo; Schultz, Anne-Kathrin; Schreiber, Fabian; Zhang, Ming; Leitner, Thomas; Korber, Bette; Morgenstern, Burkhard; Stanke, Mario

2010-01-01

Motivation: Existing coalescent models and phylogenetic tools based on them are not designed for studying the genealogy of sequences like those of HIV, since in HIV recombinants with multiple cross-over points between the parental strains frequently arise. Hence, ambiguous cases in the classification of HIV sequences into subtypes and circulating recombinant forms (CRFs) have been treated with ad hoc methods in lack of tools based on a comprehensive coalescent model accounting for complex recombination patterns. Results: We developed the program ARGUS that scores classifications of sequences into subtypes and recombinant forms. It reconstructs ancestral recombination graphs (ARGs) that reflect the genealogy of the input sequences given a classification hypothesis. An ARG with maximal probability is approximated using a Markov chain Monte Carlo approach. ARGUS was able to distinguish the correct classification with a low error rate from plausible alternative classifications in simulation studies with realistic parameters. We applied our algorithm to decide between two recently debated alternatives in the classification of CRF02 of HIV-1 and find that CRF02 is indeed a recombinant of Subtypes A and G. Availability: ARGUS is implemented in C++ and the source code is available at http://gobics.de/software Contact: ibulla@uni-goettingen.de Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:20400454
Using Gaussian mixture models to detect and classify dolphin whistles and pulses.

PubMed

Peso Parada, Pablo; Cardenal-López, Antonio

2014-06-01

In recent years, a number of automatic detection systems for free-ranging cetaceans have been proposed that aim to detect not just surfaced, but also submerged, individuals. These systems are typically based on pattern-recognition techniques applied to underwater acoustic recordings. Using a Gaussian mixture model, a classification system was developed that detects sounds in recordings and classifies them as one of four types: background noise, whistles, pulses, and combined whistles and pulses. The classifier was tested using a database of underwater recordings made off the Spanish coast during 2011. Using cepstral-coefficient-based parameterization, a sound detection rate of 87.5% was achieved for a 23.6% classification error rate. To improve these results, two parameters computed using the multiple signal classification algorithm and an unpredictability measure were included in the classifier. These parameters, which helped to classify the segments containing whistles, increased the detection rate to 90.3% and reduced the classification error rate to 18.1%. Finally, the potential of the multiple signal classification algorithm and unpredictability measure for estimating whistle contours and classifying cetacean species was also explored, with promising results.
Automatic Estimation of Osteoporotic Fracture Cases by Using Ensemble Learning Approaches.

PubMed

Kilic, Niyazi; Hosgormez, Erkan

2016-03-01

Ensemble learning methods are one of the most powerful tools for the pattern classification problems. In this paper, the effects of ensemble learning methods and some physical bone densitometry parameters on osteoporotic fracture detection were investigated. Six feature set models were constructed including different physical parameters and they fed into the ensemble classifiers as input features. As ensemble learning techniques, bagging, gradient boosting and random subspace (RSM) were used. Instance based learning (IBk) and random forest (RF) classifiers applied to six feature set models. The patients were classified into three groups such as osteoporosis, osteopenia and control (healthy), using ensemble classifiers. Total classification accuracy and f-measure were also used to evaluate diagnostic performance of the proposed ensemble classification system. The classification accuracy has reached to 98.85 % by the combination of model 6 (five BMD + five T-score values) using RSM-RF classifier. The findings of this paper suggest that the patients will be able to be warned before a bone fracture occurred, by just examining some physical parameters that can easily be measured without invasive operations.
G0-WISHART Distribution Based Classification from Polarimetric SAR Images

NASA Astrophysics Data System (ADS)

Hu, G. C.; Zhao, Q. H.

2017-09-01

Enormous scientific and technical developments have been carried out to further improve the remote sensing for decades, particularly Polarimetric Synthetic Aperture Radar(PolSAR) technique, so classification method based on PolSAR images has getted much more attention from scholars and related department around the world. The multilook polarmetric G0-Wishart model is a more flexible model which describe homogeneous, heterogeneous and extremely heterogeneous regions in the image. Moreover, the polarmetric G0-Wishart distribution dose not include the modified Bessel function of the second kind. It is a kind of simple statistical distribution model with less parameter. To prove its feasibility, a process of classification has been tested with the full-polarized Synthetic Aperture Radar (SAR) image by the method. First, apply multilook polarimetric SAR data process and speckle filter to reduce speckle influence for classification result. Initially classify the image into sixteen classes by H/A/α decomposition. Using the ICM algorithm to classify feature based on the G0-Wshart distance. Qualitative and quantitative results show that the proposed method can classify polaimetric SAR data effectively and efficiently.
Machine learning algorithms for meteorological event classification in the coastal area using in-situ data

NASA Astrophysics Data System (ADS)

Sokolov, Anton; Gengembre, Cyril; Dmitriev, Egor; Delbarre, Hervé

2017-04-01

The problem is considered of classification of local atmospheric meteorological events in the coastal area such as sea breezes, fogs and storms. The in-situ meteorological data as wind speed and direction, temperature, humidity and turbulence are used as predictors. Local atmospheric events of 2013-2014 were analysed manually to train classification algorithms in the coastal area of English Channel in Dunkirk (France). Then, ultrasonic anemometer data and LIDAR wind profiler data were used as predictors. A few algorithms were applied to determine meteorological events by local data such as a decision tree, the nearest neighbour classifier, a support vector machine. The comparison of classification algorithms was carried out, the most important predictors for each event type were determined. It was shown that in more than 80 percent of the cases machine learning algorithms detect the meteorological class correctly. We expect that this methodology could be applied also to classify events by climatological in-situ data or by modelling data. It allows estimating frequencies of each event in perspective of climate change.
Classifier dependent feature preprocessing methods

NASA Astrophysics Data System (ADS)

Rodriguez, Benjamin M., II; Peterson, Gilbert L.

2008-04-01

In mobile applications, computational complexity is an issue that limits sophisticated algorithms from being implemented on these devices. This paper provides an initial solution to applying pattern recognition systems on mobile devices by combining existing preprocessing algorithms for recognition. In pattern recognition systems, it is essential to properly apply feature preprocessing tools prior to training classification models in an attempt to reduce computational complexity and improve the overall classification accuracy. The feature preprocessing tools extended for the mobile environment are feature ranking, feature extraction, data preparation and outlier removal. Most desktop systems today are capable of processing a majority of the available classification algorithms without concern of processing while the same is not true on mobile platforms. As an application of pattern recognition for mobile devices, the recognition system targets the problem of steganalysis, determining if an image contains hidden information. The measure of performance shows that feature preprocessing increases the overall steganalysis classification accuracy by an average of 22%. The methods in this paper are tested on a workstation and a Nokia 6620 (Symbian operating system) camera phone with similar results.
A theory of fine structure image models with an application to detection and classification of dementia.

PubMed

O'Neill, William; Penn, Richard; Werner, Michael; Thomas, Justin

2015-06-01

Estimation of stochastic process models from data is a common application of time series analysis methods. Such system identification processes are often cast as hypothesis testing exercises whose intent is to estimate model parameters and test them for statistical significance. Ordinary least squares (OLS) regression and the Levenberg-Marquardt algorithm (LMA) have proven invaluable computational tools for models being described by non-homogeneous, linear, stationary, ordinary differential equations. In this paper we extend stochastic model identification to linear, stationary, partial differential equations in two independent variables (2D) and show that OLS and LMA apply equally well to these systems. The method employs an original nonparametric statistic as a test for the significance of estimated parameters. We show gray scale and color images are special cases of 2D systems satisfying a particular autoregressive partial difference equation which estimates an analogous partial differential equation. Several applications to medical image modeling and classification illustrate the method by correctly classifying demented and normal OLS models of axial magnetic resonance brain scans according to subject Mini Mental State Exam (MMSE) scores. Comparison with 13 image classifiers from the literature indicates our classifier is at least 14 times faster than any of them and has a classification accuracy better than all but one. Our modeling method applies to any linear, stationary, partial differential equation and the method is readily extended to 3D whole-organ systems. Further, in addition to being a robust image classifier, estimated image models offer insights into which parameters carry the most diagnostic image information and thereby suggest finer divisions could be made within a class. Image models can be estimated in milliseconds which translate to whole-organ models in seconds; such runtimes could make real-time medicine and surgery modeling possible.
Combining two open source tools for neural computation (BioPatRec and Netlab) improves movement classification for prosthetic control.

PubMed

Prahm, Cosima; Eckstein, Korbinian; Ortiz-Catalan, Max; Dorffner, Georg; Kaniusas, Eugenijus; Aszmann, Oskar C

2016-08-31

Controlling a myoelectric prosthesis for upper limbs is increasingly challenging for the user as more electrodes and joints become available. Motion classification based on pattern recognition with a multi-electrode array allows multiple joints to be controlled simultaneously. Previous pattern recognition studies are difficult to compare, because individual research groups use their own data sets. To resolve this shortcoming and to facilitate comparisons, open access data sets were analysed using components of BioPatRec and Netlab pattern recognition models. Performances of the artificial neural networks, linear models, and training program components were compared. Evaluation took place within the BioPatRec environment, a Matlab-based open source platform that provides feature extraction, processing and motion classification algorithms for prosthetic control. The algorithms were applied to myoelectric signals for individual and simultaneous classification of movements, with the aim of finding the best performing algorithm and network model. Evaluation criteria included classification accuracy and training time. Results in both the linear and the artificial neural network models demonstrated that Netlab's implementation using scaled conjugate training algorithm reached significantly higher accuracies than BioPatRec. It is concluded that the best movement classification performance would be achieved through integrating Netlab training algorithms in the BioPatRec environment so that future prosthesis training can be shortened and control made more reliable. Netlab was therefore included into the newest release of BioPatRec (v4.0).

Unsupervised domain adaptation for early detection of drought stress in hyperspectral images

NASA Astrophysics Data System (ADS)

Schmitter, P.; Steinrücken, J.; Römer, C.; Ballvora, A.; Léon, J.; Rascher, U.; Plümer, L.

2017-09-01

Hyperspectral images can be used to uncover physiological processes in plants if interpreted properly. Machine Learning methods such as Support Vector Machines (SVM) and Random Forests have been applied to estimate development of biomass and detect and predict plant diseases and drought stress. One basic requirement of machine learning implies, that training and testing is done in the same domain and the same distribution. Different genotypes, environmental conditions, illumination and sensors violate this requirement in most practical circumstances. Here, we present an approach, which enables the detection of physiological processes by transferring the prior knowledge within an existing model into a related target domain, where no label information is available. We propose a two-step transformation of the target features, which enables a direct application of an existing model. The transformation is evaluated by an objective function including additional prior knowledge about classification and physiological processes in plants. We have applied the approach to three sets of hyperspectral images, which were acquired with different plant species in different environments observed with different sensors. It is shown, that a classification model, derived on one of the sets, delivers satisfying classification results on the transformed features of the other data sets. Furthermore, in all cases early non-invasive detection of drought stress was possible.
Applying deep neural networks to HEP job classification

NASA Astrophysics Data System (ADS)

Wang, L.; Shi, J.; Yan, X.

2015-12-01

The cluster of IHEP computing center is a middle-sized computing system which provides 10 thousands CPU cores, 5 PB disk storage, and 40 GB/s IO throughput. Its 1000+ users come from a variety of HEP experiments. In such a system, job classification is an indispensable task. Although experienced administrator can classify a HEP job by its IO pattern, it is unpractical to classify millions of jobs manually. We present how to solve this problem with deep neural networks in a supervised learning way. Firstly, we built a training data set of 320K samples by an IO pattern collection agent and a semi-automatic process of sample labelling. Then we implemented and trained DNNs models with Torch. During the process of model training, several meta-parameters was tuned with cross-validations. Test results show that a 5- hidden-layer DNNs model achieves 96% precision on the classification task. By comparison, it outperforms a linear model by 8% precision.
Data-driven advice for applying machine learning to bioinformatics problems

PubMed Central

Olson, Randal S.; La Cava, William; Mustahsan, Zairah; Varik, Akshay; Moore, Jason H.

2017-01-01

As the bioinformatics field grows, it must keep pace not only with new data but with new algorithms. Here we contribute a thorough analysis of 13 state-of-the-art, commonly used machine learning algorithms on a set of 165 publicly available classification problems in order to provide data-driven algorithm recommendations to current researchers. We present a number of statistical and visual comparisons of algorithm performance and quantify the effect of model selection and algorithm tuning for each algorithm and dataset. The analysis culminates in the recommendation of five algorithms with hyperparameters that maximize classifier performance across the tested problems, as well as general guidelines for applying machine learning to supervised classification problems. PMID:29218881
Metabolomics for organic food authentication: Results from a long-term field study in carrots.

PubMed

Cubero-Leon, Elena; De Rudder, Olivier; Maquet, Alain

2018-01-15

Increasing demand for organic products and their premium prices make them an attractive target for fraudulent malpractices. In this study, a large-scale comparative metabolomics approach was applied to investigate the effect of the agronomic production system on the metabolite composition of carrots and to build statistical models for prediction purposes. Orthogonal projections to latent structures-discriminant analysis (OPLS-DA) was applied successfully to predict the origin of the agricultural system of the harvested carrots on the basis of features determined by liquid chromatography-mass spectrometry. When the training set used to build the OPLS-DA models contained samples representative of each harvest year, the models were able to classify unknown samples correctly (100% correct classification). If a harvest year was left out of the training sets and used for predictions, the correct classification rates achieved ranged from 76% to 100%. The results therefore highlight the potential of metabolomic fingerprinting for organic food authentication purposes. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.
A Neural Relevance Model for Feature Extraction from Hyperspectral Images, and Its Application in the Wavelet Domain

DTIC Science & Technology

2006-08-01

Nikolas Avouris. Evaluation of classifiers for an uneven class distribution problem. Applied Artificial Intellegence , pages 1-24, 2006. Draft manuscript...data by a hybrid artificial neural network so we may evaluate the classification capabilities of the baseline GRLVQ and our improved GRLVQI. Chapter 4...performance of GRLVQ(I), we compare the results against a baseline classification of the 23-class problem with a hybrid artificial neural network (ANN
Deep neural network and noise classification-based speech enhancement

NASA Astrophysics Data System (ADS)

Shi, Wenhua; Zhang, Xiongwei; Zou, Xia; Han, Wei

2017-07-01

In this paper, a speech enhancement method using noise classification and Deep Neural Network (DNN) was proposed. Gaussian mixture model (GMM) was employed to determine the noise type in speech-absent frames. DNN was used to model the relationship between noisy observation and clean speech. Once the noise type was determined, the corresponding DNN model was applied to enhance the noisy speech. GMM was trained with mel-frequency cepstrum coefficients (MFCC) and the parameters were estimated with an iterative expectation-maximization (EM) algorithm. Noise type was updated by spectrum entropy-based voice activity detection (VAD). Experimental results demonstrate that the proposed method could achieve better objective speech quality and smaller distortion under stationary and non-stationary conditions.
New Casemix Classification as an Alternative Method for Budget Allocation in Thai Oral Healthcare Service: A Pilot Study

PubMed Central

Wisaijohn, Thunthita; Pimkhaokham, Atiphan; Lapying, Phenkhae; Itthichaisri, Chumpot; Pannarunothai, Supasit; Igarashi, Isao; Kawabuchi, Koichi

2010-01-01

This study aimed to develop a new casemix classification system as an alternative method for the budget allocation of oral healthcare service (OHCS). Initially, the International Statistical of Diseases and Related Health Problem, 10th revision, Thai Modification (ICD-10-TM) related to OHCS was used for developing the software “Grouper”. This model was designed to allow the translation of dental procedures into eight-digit codes. Multiple regression analysis was used to analyze the relationship between the factors used for developing the model and the resource consumption. Furthermore, the coefficient of variance, reduction in variance, and relative weight (RW) were applied to test the validity. The results demonstrated that 1,624 OHCS classifications, according to the diagnoses and the procedures performed, showed high homogeneity within groups and heterogeneity between groups. Moreover, the RW of the OHCS could be used to predict and control the production costs. In conclusion, this new OHCS casemix classification has a potential use in a global decision making. PMID:20936134
New casemix classification as an alternative method for budget allocation in thai oral healthcare service: a pilot study.

PubMed

Wisaijohn, Thunthita; Pimkhaokham, Atiphan; Lapying, Phenkhae; Itthichaisri, Chumpot; Pannarunothai, Supasit; Igarashi, Isao; Kawabuchi, Koichi

2010-01-01

This study aimed to develop a new casemix classification system as an alternative method for the budget allocation of oral healthcare service (OHCS). Initially, the International Statistical of Diseases and Related Health Problem, 10th revision, Thai Modification (ICD-10-TM) related to OHCS was used for developing the software "Grouper". This model was designed to allow the translation of dental procedures into eight-digit codes. Multiple regression analysis was used to analyze the relationship between the factors used for developing the model and the resource consumption. Furthermore, the coefficient of variance, reduction in variance, and relative weight (RW) were applied to test the validity. The results demonstrated that 1,624 OHCS classifications, according to the diagnoses and the procedures performed, showed high homogeneity within groups and heterogeneity between groups. Moreover, the RW of the OHCS could be used to predict and control the production costs. In conclusion, this new OHCS casemix classification has a potential use in a global decision making.
Multi-sparse dictionary colorization algorithm based on the feature classification and detail enhancement

NASA Astrophysics Data System (ADS)

Yan, Dan; Bai, Lianfa; Zhang, Yi; Han, Jing

2018-02-01

For the problems of missing details and performance of the colorization based on sparse representation, we propose a conceptual model framework for colorizing gray-scale images, and then a multi-sparse dictionary colorization algorithm based on the feature classification and detail enhancement (CEMDC) is proposed based on this framework. The algorithm can achieve a natural colorized effect for a gray-scale image, and it is consistent with the human vision. First, the algorithm establishes a multi-sparse dictionary classification colorization model. Then, to improve the accuracy rate of the classification, the corresponding local constraint algorithm is proposed. Finally, we propose a detail enhancement based on Laplacian Pyramid, which is effective in solving the problem of missing details and improving the speed of image colorization. In addition, the algorithm not only realizes the colorization of the visual gray-scale image, but also can be applied to the other areas, such as color transfer between color images, colorizing gray fusion images, and infrared images.
Career Decision Making and Its Evaluation.

ERIC Educational Resources Information Center

Miller-Tiedeman, Anna

1979-01-01

The author discusses a career decision-making program which she designed and implemented using a pyramidal model of exploration, crystallization, choice, and classification. Her article outlines the value of rigorous evaluation techniques applied by the local practitioner. (MF)
Simultaneous data pre-processing and SVM classification model selection based on a parallel genetic algorithm applied to spectroscopic data of olive oils.

PubMed

Devos, Olivier; Downey, Gerard; Duponchel, Ludovic

2014-04-01

Classification is an important task in chemometrics. For several years now, support vector machines (SVMs) have proven to be powerful for infrared spectral data classification. However such methods require optimisation of parameters in order to control the risk of overfitting and the complexity of the boundary. Furthermore, it is established that the prediction ability of classification models can be improved using pre-processing in order to remove unwanted variance in the spectra. In this paper we propose a new methodology based on genetic algorithm (GA) for the simultaneous optimisation of SVM parameters and pre-processing (GENOPT-SVM). The method has been tested for the discrimination of the geographical origin of Italian olive oil (Ligurian and non-Ligurian) on the basis of near infrared (NIR) or mid infrared (FTIR) spectra. Different classification models (PLS-DA, SVM with mean centre data, GENOPT-SVM) have been tested and statistically compared using McNemar's statistical test. For the two datasets, SVM with optimised pre-processing give models with higher accuracy than the one obtained with PLS-DA on pre-processed data. In the case of the NIR dataset, most of this accuracy improvement (86.3% compared with 82.8% for PLS-DA) occurred using only a single pre-processing step. For the FTIR dataset, three optimised pre-processing steps are required to obtain SVM model with significant accuracy improvement (82.2%) compared to the one obtained with PLS-DA (78.6%). Furthermore, this study demonstrates that even SVM models have to be developed on the basis of well-corrected spectral data in order to obtain higher classification rates. Copyright © 2013 Elsevier Ltd. All rights reserved.
Automated detection of radioisotopes from an aircraft platform by pattern recognition analysis of gamma-ray spectra.

PubMed

Dess, Brian W; Cardarelli, John; Thomas, Mark J; Stapleton, Jeff; Kroutil, Robert T; Miller, David; Curry, Timothy; Small, Gary W

2018-03-08

A generalized methodology was developed for automating the detection of radioisotopes from gamma-ray spectra collected from an aircraft platform using sodium-iodide detectors. Employing data provided by the U.S Environmental Protection Agency Airborne Spectral Photometric Environmental Collection Technology (ASPECT) program, multivariate classification models based on nonparametric linear discriminant analysis were developed for application to spectra that were preprocessed through a combination of altitude-based scaling and digital filtering. Training sets of spectra for use in building classification models were assembled from a combination of background spectra collected in the field and synthesized spectra obtained by superimposing laboratory-collected spectra of target radioisotopes onto field backgrounds. This approach eliminated the need for field experimentation with radioactive sources for use in building classification models. Through a bi-Gaussian modeling procedure, the discriminant scores that served as the outputs from the classification models were related to associated confidence levels. This provided an easily interpreted result regarding the presence or absence of the signature of a specific radioisotope in each collected spectrum. Through the use of this approach, classifiers were built for cesium-137 ( 137 Cs) and cobalt-60 ( 60 Co), two radioisotopes that are of interest in airborne radiological monitoring applications. The optimized classifiers were tested with field data collected from a set of six geographically diverse sites, three of which contained either 137 Cs, 60 Co, or both. When the optimized classification models were applied, the overall percentages of correct classifications for spectra collected at these sites were 99.9 and 97.9% for the 60 Co and 137 Cs classifiers, respectively. Copyright © 2018 Elsevier Ltd. All rights reserved.
Identification of cortex in magnetic resonance images

NASA Astrophysics Data System (ADS)

VanMeter, John W.; Sandon, Peter A.

1992-06-01

The overall goal of the work described here is to make available to the neurosurgeon in the operating room an on-line, three-dimensional, anatomically labeled model of the patient brain, based on pre-operative magnetic resonance (MR) images. A stereotactic operating microscope is currently in experimental use, which allows structures that have been manually identified in MR images to be made available on-line. We have been working to enhance this system by combining image processing techniques applied to the MR data with an anatomically labeled 3-D brain model developed from the Talairach and Tournoux atlas. Here we describe the process of identifying cerebral cortex in the patient MR images. MR images of brain tissue are reasonably well described by material mixture models, which identify each pixel as corresponding to one of a small number of materials, or as being a composite of two materials. Our classification algorithm consists of three steps. First, we apply hierarchical, adaptive grayscale adjustments to correct for nonlinearities in the MR sensor. The goal of this preprocessing step, based on the material mixture model, is to make the grayscale distribution of each tissue type constant across the entire image. Next, we perform an initial classification of all tissue types according to gray level. We have used a sum of Gaussian's approximation of the histogram to perform this classification. Finally, we identify pixels corresponding to cortex, by taking into account the spatial patterns characteristic of this tissue. For this purpose, we use a set of matched filters to identify image locations having the appropriate configuration of gray matter (cortex), cerebrospinal fluid and white matter, as determined by the previous classification step.
Target-classification approach applied to active UXO sites

NASA Astrophysics Data System (ADS)

Shubitidze, F.; Fernández, J. P.; Shamatava, Irma; Barrowes, B. E.; O'Neill, K.

2013-06-01

This study is designed to illustrate the discrimination performance at two UXO active sites (Oklahoma's Fort Sill and the Massachusetts Military Reservation) of a set of advanced electromagnetic induction (EMI) inversion/discrimination models which include the orthonormalized volume magnetic source (ONVMS), joint diagonalization (JD), and differential evolution (DE) approaches and whose power and flexibility greatly exceed those of the simple dipole model. The Fort Sill site is highly contaminated by a mix of the following types of munitions: 37-mm target practice tracers, 60-mm illumination mortars, 75-mm and 4.5'' projectiles, 3.5'', 2.36'', and LAAW rockets, antitank mine fuzes with and without hex nuts, practice MK2 and M67 grenades, 2.5'' ballistic windshields, M2A1-mines with/without bases, M19-14 time fuzes, and 40-mm practice grenades with/without cartridges. The site at the MMR site contains targets of yet different sizes. In this work we apply our models to EMI data collected using the MetalMapper (MM) and 2 × 2 TEMTADS sensors. The data for each anomaly are inverted to extract estimates of the extrinsic and intrinsic parameters associated with each buried target. (The latter include the total volume magnetic source or NVMS, which relates to size, shape, and material properties; the former includes location, depth, and orientation). The estimated intrinsic parameters are then used for classification performed via library matching and the use of statistical classification algorithms; this process yielded prioritized dig-lists that were submitted to the Institute for Defense Analyses (IDA) for independent scoring. The models' classification performance is illustrated and assessed based on these independent evaluations.
Validation of statistical predictive models meant to select melanoma patients for sentinel lymph node biopsy.

PubMed

Sabel, Michael S; Rice, John D; Griffith, Kent A; Lowe, Lori; Wong, Sandra L; Chang, Alfred E; Johnson, Timothy M; Taylor, Jeremy M G

2012-01-01

To identify melanoma patients at sufficiently low risk of nodal metastases who could avoid sentinel lymph node biopsy (SLNB), several statistical models have been proposed based upon patient/tumor characteristics, including logistic regression, classification trees, random forests, and support vector machines. We sought to validate recently published models meant to predict sentinel node status. We queried our comprehensive, prospectively collected melanoma database for consecutive melanoma patients undergoing SLNB. Prediction values were estimated based upon four published models, calculating the same reported metrics: negative predictive value (NPV), rate of negative predictions (RNP), and false-negative rate (FNR). Logistic regression performed comparably with our data when considering NPV (89.4 versus 93.6%); however, the model's specificity was not high enough to significantly reduce the rate of biopsies (SLN reduction rate of 2.9%). When applied to our data, the classification tree produced NPV and reduction in biopsy rates that were lower (87.7 versus 94.1 and 29.8 versus 14.3, respectively). Two published models could not be applied to our data due to model complexity and the use of proprietary software. Published models meant to reduce the SLNB rate among patients with melanoma either underperformed when applied to our larger dataset, or could not be validated. Differences in selection criteria and histopathologic interpretation likely resulted in underperformance. Statistical predictive models must be developed in a clinically applicable manner to allow for both validation and ultimately clinical utility.
Incremental Transductive Learning Approaches to Schistosomiasis Vector Classification

NASA Astrophysics Data System (ADS)

Fusco, Terence; Bi, Yaxin; Wang, Haiying; Browne, Fiona

2016-08-01

The key issues pertaining to collection of epidemic disease data for our analysis purposes are that it is a labour intensive, time consuming and expensive process resulting in availability of sparse sample data which we use to develop prediction models. To address this sparse data issue, we present the novel Incremental Transductive methods to circumvent the data collection process by applying previously acquired data to provide consistent, confidence-based labelling alternatives to field survey research. We investigated various reasoning approaches for semi-supervised machine learning including Bayesian models for labelling data. The results show that using the proposed methods, we can label instances of data with a class of vector density at a high level of confidence. By applying the Liberal and Strict Training Approaches, we provide a labelling and classification alternative to standalone algorithms. The methods in this paper are components in the process of reducing the proliferation of the Schistosomiasis disease and its effects.
Complex versus simple models: ion-channel cardiac toxicity prediction.

PubMed

Mistry, Hitesh B

2018-01-01

There is growing interest in applying detailed mathematical models of the heart for ion-channel related cardiac toxicity prediction. However, a debate as to whether such complex models are required exists. Here an assessment in the predictive performance between two established large-scale biophysical cardiac models and a simple linear model B net was conducted. Three ion-channel data-sets were extracted from literature. Each compound was designated a cardiac risk category using two different classification schemes based on information within CredibleMeds. The predictive performance of each model within each data-set for each classification scheme was assessed via a leave-one-out cross validation. Overall the B net model performed equally as well as the leading cardiac models in two of the data-sets and outperformed both cardiac models on the latest. These results highlight the importance of benchmarking complex versus simple models but also encourage the development of simple models.
Probabilistic grammatical model for helix‐helix contact site classification

PubMed Central

2013-01-01

Background Hidden Markov Models power many state‐of‐the‐art tools in the field of protein bioinformatics. While excelling in their tasks, these methods of protein analysis do not convey directly information on medium‐ and long‐range residue‐residue interactions. This requires an expressive power of at least context‐free grammars. However, application of more powerful grammar formalisms to protein analysis has been surprisingly limited. Results In this work, we present a probabilistic grammatical framework for problem‐specific protein languages and apply it to classification of transmembrane helix‐helix pairs configurations. The core of the model consists of a probabilistic context‐free grammar, automatically inferred by a genetic algorithm from only a generic set of expert‐based rules and positive training samples. The model was applied to produce sequence based descriptors of four classes of transmembrane helix‐helix contact site configurations. The highest performance of the classifiers reached AUCROC of 0.70. The analysis of grammar parse trees revealed the ability of representing structural features of helix‐helix contact sites. Conclusions We demonstrated that our probabilistic context‐free framework for analysis of protein sequences outperforms the state of the art in the task of helix‐helix contact site classification. However, this is achieved without necessarily requiring modeling long range dependencies between interacting residues. A significant feature of our approach is that grammar rules and parse trees are human‐readable. Thus they could provide biologically meaningful information for molecular biologists. PMID:24350601
Evaluating terrain based criteria for snow avalanche exposure ratings using GIS

NASA Astrophysics Data System (ADS)

Delparte, Donna; Jamieson, Bruce; Waters, Nigel

2010-05-01

Snow avalanche terrain in backcountry regions of Canada is increasingly being assessed based upon the Avalanche Terrain Exposure Scale (ATES). ATES is a terrain based classification introduced in 2004 by Parks Canada to identify "simple", "challenging" and "complex" backcountry areas. The ATES rating system has been applied to well over 200 backcountry routes, has been used in guidebooks, trailhead signs and maps and is part of the trip planning component of the AVALUATOR™, a simple decision-support tool for backcountry users. Geographic Information Systems (GIS) offers a means to model and visualize terrain based criteria through the use of digital elevation model (DEM) and land cover data. Primary topographic variables such as slope, aspect and curvature are easily derived from a DEM and are compatible with the equivalent evaluation criteria in ATES. Other components of the ATES classification are difficult to extract from a DEM as they are not strictly terrain based. An overview is provided of the terrain variables that can be generated from DEM and land cover data; criteria from ATES which are not clearly terrain based are identified for further study or revision. The second component of this investigation was the development of an algorithm for inputting suitable ATES criteria into a GIS, thereby mimicking the process avalanche experts use when applying the ATES classification to snow avalanche terrain. GIS based classifications were compared to existing expert assessments for validity. The advantage of automating the ATES classification process through GIS is to assist avalanche experts with categorizing and mapping remote backcountry terrain.
Interannual drought index variations in Central Europe related to the large-scale atmospheric circulation—application and evaluation of statistical downscaling approaches based on circulation type classifications

NASA Astrophysics Data System (ADS)

Beck, Christoph; Philipp, Andreas; Jacobeit, Jucundus

2015-08-01

This contribution investigates the relationship between the large-scale atmospheric circulation and interannual variations of the standardized precipitation index (SPI) in Central Europe. To this end, circulation types (CT) have been derived from a variety of circulation type classifications (CTC) applied to daily sea level pressure (SLP) data and mean circulation indices of vorticity ( V), zonality ( Z) and meridionality ( M) have been calculated. Occurrence frequencies of CTs and circulation indices have been utilized as predictors within multiple regression models (MRM) for the estimation of gridded 3-month SPI values over Central Europe, for the period 1950 to 2010. CTC-based MRMs used in the analyses comprise variants concerning the basic method for CT classification, the number of CTs, the size and location of the spatial domain used for CTCs and the exclusive use of CT frequencies or the combined use of CT frequencies and mean circulation indices as predictors. Adequate MRM predictor combinations have been identified by applying stepwise multiple regression analyses within a resampling framework. The performance (robustness) of the resulting MRMs has been quantified based on a leave-one-out cross-validation procedure applying several skill scores. Furthermore, the relative importance of individual predictors has been estimated for each MRM. From these analyses, it can be stated that model skill is improved by (i) the consideration of vorticity characteristics within CTCs, (ii) a relatively small size of the spatial domain to which CTCs are applied and (iii) the inclusion of mean circulation indices. However, model skill exhibits distinct variations between seasons and regions. Whereas promising skill can be stated for the western and northwestern parts of the Central European domain, only unsatisfactory skill is reached in the more continental regions and particularly during summer. Thus, it can be concluded that the presented approaches feature the potential for the downscaling of Central European drought index variations from the large-scale circulation, at least for some regions. Further improvements of CTC-based approaches may be expected from the optimization of CTCs for explaining the SPI, e.g. via the inclusion of additional variables in the classification procedure.

CARSVM: a class association rule-based classification framework and its application to gene expression data.

PubMed

Kianmehr, Keivan; Alhajj, Reda

2008-09-01

In this study, we aim at building a classification framework, namely the CARSVM model, which integrates association rule mining and support vector machine (SVM). The goal is to benefit from advantages of both, the discriminative knowledge represented by class association rules and the classification power of the SVM algorithm, to construct an efficient and accurate classifier model that improves the interpretability problem of SVM as a traditional machine learning technique and overcomes the efficiency issues of associative classification algorithms. In our proposed framework: instead of using the original training set, a set of rule-based feature vectors, which are generated based on the discriminative ability of class association rules over the training samples, are presented to the learning component of the SVM algorithm. We show that rule-based feature vectors present a high-qualified source of discrimination knowledge that can impact substantially the prediction power of SVM and associative classification techniques. They provide users with more conveniences in terms of understandability and interpretability as well. We have used four datasets from UCI ML repository to evaluate the performance of the developed system in comparison with five well-known existing classification methods. Because of the importance and popularity of gene expression analysis as real world application of the classification model, we present an extension of CARSVM combined with feature selection to be applied to gene expression data. Then, we describe how this combination will provide biologists with an efficient and understandable classifier model. The reported test results and their biological interpretation demonstrate the applicability, efficiency and effectiveness of the proposed model. From the results, it can be concluded that a considerable increase in classification accuracy can be obtained when the rule-based feature vectors are integrated in the learning process of the SVM algorithm. In the context of applicability, according to the results obtained from gene expression analysis, we can conclude that the CARSVM system can be utilized in a variety of real world applications with some adjustments.
Surface Water Detection Using Fused Synthetic Aperture Radar, Airborne LiDAR and Optical Imagery

NASA Astrophysics Data System (ADS)

Braun, A.; Irwin, K.; Beaulne, D.; Fotopoulos, G.; Lougheed, S. C.

2016-12-01

Each remote sensing technique has its unique set of strengths and weaknesses, but by combining techniques the classification accuracy can be increased. The goal of this project is to underline the strengths and weaknesses of Synthetic Aperture Radar (SAR), LiDAR and optical imagery data and highlight the opportunities where integration of the three data types can increase the accuracy of identifying water in a principally natural landscape. The study area is located at the Queen's University Biological Station, Ontario, Canada. TerraSAR-X (TSX) data was acquired between April and July 2016, consisting of four single polarization (HH) staring spotlight mode backscatter intensity images. Grey-level thresholding is used to extract surface water bodies, before identifying and masking zones of radar shadow and layover by using LiDAR elevation models to estimate the canopy height and applying simple geometry algorithms. The airborne LiDAR survey was conducted in June 2014, resulting in a discrete return dataset with a density of 1 point/m2. Radiometric calibration to correct for range and incidence angle is applied, before classifying the points as water or land based on corrected intensity, elevation, roughness, and intensity density. Panchromatic and multispectral (4-band) imagery from Quickbird was collected in September 2005 at spatial resolutions of 0.6m and 2.5m respectively. Pixel-based classification is applied to identify and distinguish water bodies from land. A classification system which inputs SAR-, LiDAR- and optically-derived water presence models in raster formats is developed to exploit the strengths and weaknesses of each technique. The total percentage of water detected in the sample area for SAR backscatter, LiDAR intensity, and optical imagery was 27%, 19% and 18% respectively. The output matrix of the classification system indicates that in over 72% of the study area all three methods agree on the classification. Analysis was specifically targeted towards areas where the methods disagree, highlighting how each technique should be properly weighted over these areas to increase the classification accuracy of water. The conclusions and techniques developed in this study are applicable to other areas where similar environmental conditions and data availability exist.
Benthic Habitat Mapping by Combining Lyzenga’s Optical Model and Relative Water Depth Model in Lintea Island, Southeast Sulawesi

NASA Astrophysics Data System (ADS)

Hafizt, M.; Manessa, M. D. M.; Adi, N. S.; Prayudha, B.

2017-12-01

Benthic habitat mapping using satellite data is one challenging task for practitioners and academician as benthic objects are covered by light-attenuating water column obscuring object discrimination. One common method to reduce this water-column effect is by using depth-invariant index (DII) image. However, the application of the correction in shallow coastal areas is challenging as a dark object such as seagrass could have a very low pixel value, preventing its reliable identification and classification. This limitation can be solved by specifically applying a classification process to areas with different water depth levels. The water depth level can be extracted from satellite imagery using Relative Water Depth Index (RWDI). This study proposed a new approach to improve the mapping accuracy, particularly for benthic dark objects by combining the DII of Lyzenga’s water column correction method and the RWDI of Stumpt’s method. This research was conducted in Lintea Island which has a high variation of benthic cover using Sentinel-2A imagery. To assess the effectiveness of the proposed new approach for benthic habitat mapping two different classification procedures are implemented. The first procedure is the commonly applied method in benthic habitat mapping where DII image is used as input data to all coastal area for image classification process regardless of depth variation. The second procedure is the proposed new approach where its initial step begins with the separation of the study area into shallow and deep waters using the RWDI image. Shallow area was then classified using the sunglint-corrected image as input data and the deep area was classified using DII image as input data. The final classification maps of those two areas were merged as a single benthic habitat map. A confusion matrix was then applied to evaluate the mapping accuracy of the final map. The result shows that the new proposed mapping approach can be used to map all benthic objects in all depth ranges and shows a better accuracy compared to that of classification map produced using only with DII.
Accelerometer and Camera-Based Strategy for Improved Human Fall Detection.

PubMed

Zerrouki, Nabil; Harrou, Fouzi; Sun, Ying; Houacine, Amrane

2016-12-01

In this paper, we address the problem of detecting human falls using anomaly detection. Detection and classification of falls are based on accelerometric data and variations in human silhouette shape. First, we use the exponentially weighted moving average (EWMA) monitoring scheme to detect a potential fall in the accelerometric data. We used an EWMA to identify features that correspond with a particular type of fall allowing us to classify falls. Only features corresponding with detected falls were used in the classification phase. A benefit of using a subset of the original data to design classification models minimizes training time and simplifies models. Based on features corresponding to detected falls, we used the support vector machine (SVM) algorithm to distinguish between true falls and fall-like events. We apply this strategy to the publicly available fall detection databases from the university of Rzeszow's. Results indicated that our strategy accurately detected and classified fall events, suggesting its potential application to early alert mechanisms in the event of fall situations and its capability for classification of detected falls. Comparison of the classification results using the EWMA-based SVM classifier method with those achieved using three commonly used machine learning classifiers, neural network, K-nearest neighbor and naïve Bayes, proved our model superior.
Classification of Microarray Data Using Kernel Fuzzy Inference System

PubMed Central

Kumar Rath, Santanu

2014-01-01

The DNA microarray classification technique has gained more popularity in both research and practice. In real data analysis, such as microarray data, the dataset contains a huge number of insignificant and irrelevant features that tend to lose useful information. Classes with high relevance and feature sets with high significance are generally referred for the selected features, which determine the samples classification into their respective classes. In this paper, kernel fuzzy inference system (K-FIS) algorithm is applied to classify the microarray data (leukemia) using t-test as a feature selection method. Kernel functions are used to map original data points into a higher-dimensional (possibly infinite-dimensional) feature space defined by a (usually nonlinear) function ϕ through a mathematical process called the kernel trick. This paper also presents a comparative study for classification using K-FIS along with support vector machine (SVM) for different set of features (genes). Performance parameters available in the literature such as precision, recall, specificity, F-measure, ROC curve, and accuracy are considered to analyze the efficiency of the classification model. From the proposed approach, it is apparent that K-FIS model obtains similar results when compared with SVM model. This is an indication that the proposed approach relies on kernel function. PMID:27433543
Robust BMPM training based on second-order cone programming and its application in medical diagnosis.

PubMed

Peng, Xiang; King, Irwin

2008-01-01

The Biased Minimax Probability Machine (BMPM) constructs a classifier which deals with the imbalanced learning tasks. It provides a worst-case bound on the probability of misclassification of future data points based on reliable estimates of means and covariance matrices of the classes from the training data samples, and achieves promising performance. In this paper, we develop a novel yet critical extension training algorithm for BMPM that is based on Second-Order Cone Programming (SOCP). Moreover, we apply the biased classification model to medical diagnosis problems to demonstrate its usefulness. By removing some crucial assumptions in the original solution to this model, we make the new method more accurate and robust. We outline the theoretical derivatives of the biased classification model, and reformulate it into an SOCP problem which could be efficiently solved with global optima guarantee. We evaluate our proposed SOCP-based BMPM (BMPMSOCP) scheme in comparison with traditional solutions on medical diagnosis tasks where the objectives are to focus on improving the sensitivity (the accuracy of the more important class, say "ill" samples) instead of the overall accuracy of the classification. Empirical results have shown that our method is more effective and robust to handle imbalanced classification problems than traditional classification approaches, and the original Fractional Programming-based BMPM (BMPMFP).
Cognitive Modeling of Learning Abilities: A Status Report of LAMP.

ERIC Educational Resources Information Center

Kyllonen, Patrick C.; Christal, Raymond E.

Research activities underway as part of the Air Force's Learning Abilities Measurement Program (LAMP) are described. A major objective of the program is to devise new models of the nature and organization of human abilities, that could be applied to improve personnel selection and classification systems. The activities of the project have been…
Validation of Statistical Predictive Models Meant to Select Melanoma Patients for Sentinel Lymph Node Biopsy

PubMed Central

Sabel, Michael S.; Rice, John D.; Griffith, Kent A.; Lowe, Lori; Wong, Sandra L.; Chang, Alfred E.; Johnson, Timothy M.; Taylor, Jeremy M.G.

2013-01-01

Introduction To identify melanoma patients at sufficiently low risk of nodal metastases who could avoid SLN biopsy (SLNB). Several statistical models have been proposed based upon patient/tumor characteristics, including logistic regression, classification trees, random forests and support vector machines. We sought to validate recently published models meant to predict sentinel node status. Methods We queried our comprehensive, prospectively-collected melanoma database for consecutive melanoma patients undergoing SLNB. Prediction values were estimated based upon 4 published models, calculating the same reported metrics: negative predictive value (NPV), rate of negative predictions (RNP), and false negative rate (FNR). Results Logistic regression performed comparably with our data when considering NPV (89.4% vs. 93.6%); however the model’s specificity was not high enough to significantly reduce the rate of biopsies (SLN reduction rate of 2.9%). When applied to our data, the classification tree produced NPV and reduction in biopsies rates that were lower 87.7% vs. 94.1% and 29.8% vs. 14.3%, respectively. Two published models could not be applied to our data due to model complexity and the use of proprietary software. Conclusions Published models meant to reduce the SLNB rate among patients with melanoma either underperformed when applied to our larger dataset, or could not be validated. Differences in selection criteria and histopathologic interpretation likely resulted in underperformance. Development of statistical predictive models must be created in a clinically applicable manner to allow for both validation and ultimately clinical utility. PMID:21822550
Cohen's Kappa and classification table metrics 2.0: An ArcView 3.x extension for accuracy assessment of spatially explicit models

Treesearch

Jeff Jenness; J. Judson Wynne

2005-01-01

In the field of spatially explicit modeling, well-developed accuracy assessment methodologies are often poorly applied. Deriving model accuracy metrics have been possible for decades, but these calculations were made by hand or with the use of a spreadsheet application. Accuracy assessments may be useful for: (1) ascertaining the quality of a model; (2) improving model...
A joint latent class model for classifying severely hemorrhaging trauma patients.

PubMed

Rahbar, Mohammad H; Ning, Jing; Choi, Sangbum; Piao, Jin; Hong, Chuan; Huang, Hanwen; Del Junco, Deborah J; Fox, Erin E; Rahbar, Elaheh; Holcomb, John B

2015-10-24

In trauma research, "massive transfusion" (MT), historically defined as receiving ≥10 units of red blood cells (RBCs) within 24 h of admission, has been routinely used as a "gold standard" for quantifying bleeding severity. Due to early in-hospital mortality, however, MT is subject to survivor bias and thus a poorly defined criterion to classify bleeding trauma patients. Using the data from a retrospective trauma transfusion study, we applied a latent-class (LC) mixture model to identify severely hemorrhaging (SH) patients. Based on the joint distribution of cumulative units of RBCs and binary survival outcome at 24 h of admission, we applied an expectation-maximization (EM) algorithm to obtain model parameters. Estimated posterior probabilities were used for patients' classification and compared with the MT rule. To evaluate predictive performance of the LC-based classification, we examined the role of six clinical variables as predictors using two separate logistic regression models. Out of 471 trauma patients, 211 (45 %) were MT, while our latent SH classifier identified only 127 (27 %) of patients as SH. The agreement between the two classification methods was 73 %. A non-ignorable portion of patients (17 out of 68, 25 %) who died within 24 h were not classified as MT but the SH group included 62 patients (91 %) who died during the same period. Our comparison of the predictive models based on MT and SH revealed significant differences between the coefficients of potential predictors of patients who may be in need of activation of the massive transfusion protocol. The traditional MT classification does not adequately reflect transfusion practices and outcomes during the trauma reception and initial resuscitation phase. Although we have demonstrated that joint latent class modeling could be used to correct for potential bias caused by misclassification of severely bleeding patients, improvement in this approach could be made in the presence of time to event data from prospective studies.
ISBDD Model for Classification of Hyperspectral Remote Sensing Imagery

PubMed Central

Li, Na; Xu, Zhaopeng; Zhao, Huijie; Huang, Xinchen; Drummond, Jane; Wang, Daming

2018-01-01

The diverse density (DD) algorithm was proposed to handle the problem of low classification accuracy when training samples contain interference such as mixed pixels. The DD algorithm can learn a feature vector from training bags, which comprise instances (pixels). However, the feature vector learned by the DD algorithm cannot always effectively represent one type of ground cover. To handle this problem, an instance space-based diverse density (ISBDD) model that employs a novel training strategy is proposed in this paper. In the ISBDD model, DD values of each pixel are computed instead of learning a feature vector, and as a result, the pixel can be classified according to its DD values. Airborne hyperspectral data collected by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor and the Push-broom Hyperspectral Imager (PHI) are applied to evaluate the performance of the proposed model. Results show that the overall classification accuracy of ISBDD model on the AVIRIS and PHI images is up to 97.65% and 89.02%, respectively, while the kappa coefficient is up to 0.97 and 0.88, respectively. PMID:29510547
Hyperspectral Imaging Analysis for the Classification of Soil Types and the Determination of Soil Total Nitrogen

PubMed Central

Jia, Shengyao; Li, Hongyang; Wang, Yanjie; Tong, Renyuan; Li, Qing

2017-01-01

Soil is an important environment for crop growth. Quick and accurately access to soil nutrient content information is a prerequisite for scientific fertilization. In this work, hyperspectral imaging (HSI) technology was applied for the classification of soil types and the measurement of soil total nitrogen (TN) content. A total of 183 soil samples collected from Shangyu City (People’s Republic of China), were scanned by a near-infrared hyperspectral imaging system with a wavelength range of 874–1734 nm. The soil samples belonged to three major soil types typical of this area, including paddy soil, red soil and seashore saline soil. The successive projections algorithm (SPA) method was utilized to select effective wavelengths from the full spectrum. Pattern texture features (energy, contrast, homogeneity and entropy) were extracted from the gray-scale images at the effective wavelengths. The support vector machines (SVM) and partial least squares regression (PLSR) methods were used to establish classification and prediction models, respectively. The results showed that by using the combined data sets of effective wavelengths and texture features for modelling an optimal correct classification rate of 91.8%. could be achieved. The soil samples were first classified, then the local models were established for soil TN according to soil types, which achieved better prediction results than the general models. The overall results indicated that hyperspectral imaging technology could be used for soil type classification and soil TN determination, and data fusion combining spectral and image texture information showed advantages for the classification of soil types. PMID:28974005
Multi-level discriminative dictionary learning with application to large scale image classification.

PubMed

Shen, Li; Sun, Gang; Huang, Qingming; Wang, Shuhui; Lin, Zhouchen; Wu, Enhua

2015-10-01

The sparse coding technique has shown flexibility and capability in image representation and analysis. It is a powerful tool in many visual applications. Some recent work has shown that incorporating the properties of task (such as discrimination for classification task) into dictionary learning is effective for improving the accuracy. However, the traditional supervised dictionary learning methods suffer from high computation complexity when dealing with large number of categories, making them less satisfactory in large scale applications. In this paper, we propose a novel multi-level discriminative dictionary learning method and apply it to large scale image classification. Our method takes advantage of hierarchical category correlation to encode multi-level discriminative information. Each internal node of the category hierarchy is associated with a discriminative dictionary and a classification model. The dictionaries at different layers are learnt to capture the information of different scales. Moreover, each node at lower layers also inherits the dictionary of its parent, so that the categories at lower layers can be described with multi-scale information. The learning of dictionaries and associated classification models is jointly conducted by minimizing an overall tree loss. The experimental results on challenging data sets demonstrate that our approach achieves excellent accuracy and competitive computation cost compared with other sparse coding methods for large scale image classification.
Classification of diesel pool refinery streams through near infrared spectroscopy and support vector machines using C-SVC and ν-SVC.

PubMed

Alves, Julio Cesar L; Henriques, Claudete B; Poppi, Ronei J

2014-01-03

The use of near infrared (NIR) spectroscopy combined with chemometric methods have been widely used in petroleum and petrochemical industry and provides suitable methods for process control and quality control. The algorithm support vector machines (SVM) has demonstrated to be a powerful chemometric tool for development of classification models due to its ability to nonlinear modeling and with high generalization capability and these characteristics can be especially important for treating near infrared (NIR) spectroscopy data of complex mixtures such as petroleum refinery streams. In this work, a study on the performance of the support vector machines algorithm for classification was carried out, using C-SVC and ν-SVC, applied to near infrared (NIR) spectroscopy data of different types of streams that make up the diesel pool in a petroleum refinery: light gas oil, heavy gas oil, hydrotreated diesel, kerosene, heavy naphtha and external diesel. In addition to these six streams, the diesel final blend produced in the refinery was added to complete the data set. C-SVC and ν-SVC classification models with 2, 4, 6 and 7 classes were developed for comparison between its results and also for comparison with the soft independent modeling of class analogy (SIMCA) models results. It is demonstrated the superior performance of SVC models especially using ν-SVC for development of classification models for 6 and 7 classes leading to an improvement of sensitivity on validation sample sets of 24% and 15%, respectively, when compared to SIMCA models, providing better identification of chemical compositions of different diesel pool refinery streams. Copyright © 2013 Elsevier B.V. All rights reserved.
Can SLE classification rules be effectively applied to diagnose unclear SLE cases?

PubMed Central

Mesa, Annia; Fernandez, Mitch; Wu, Wensong; Narasimhan, Giri; Greidinger, Eric L.; Mills, DeEtta K.

2016-01-01

Summary Objective Develop a novel classification criteria to distinguish between unclear SLE and MCTD cases. Methods A total of 205 variables from 111 SLE and 55 MCTD patients were evaluated to uncover unique molecular and clinical markers for each disease. Binomial logistic regressions (BLR) were performed on currently used SLE and MCTD classification criteria sets to obtain six reduced models with power to discriminate between unclear SLE and MCTD patients which were confirmed by Receiving Operating Characteristic (ROC) curve. Decision trees were employed to delineate novel classification rules to discriminate between unclear SLE and MCTD patients. Results SLE and MCTD patients exhibited contrasting molecular markers and clinical manifestations. Furthermore, reduced models highlighted SLE patients exhibit prevalence of skin rashes and renal disease while MCTD cases show dominance of myositis and muscle weakness. Additionally decision trees analyses revealed a novel classification rule tailored to differentiate unclear SLE and MCTD patients (Lu-vs-M) with an overall accuracy of 88%. Conclusions Validation of our novel proposed classification rule (Lu-vs-M) includes novel contrasting characteristics (calcinosis, CPK elevated and anti-IgM reactivity for U1-70K, U1A and U1C) between SLE and MCTD patients and showed a 33% improvement in distinguishing these disorders when compare to currently used classification criteria sets. Pending additional validation, our novel classification rule is a promising method to distinguish between patients with unclear SLE and MCTD diagnosis. PMID:27353506
Refining Time-Activity Classification of Human Subjects Using the Global Positioning System.

PubMed

Hu, Maogui; Li, Wei; Li, Lianfa; Houston, Douglas; Wu, Jun

2016-01-01

Detailed spatial location information is important in accurately estimating personal exposure to air pollution. Global Position System (GPS) has been widely used in tracking personal paths and activities. Previous researchers have developed time-activity classification models based on GPS data, most of them were developed for specific regions. An adaptive model for time-location classification can be widely applied to air pollution studies that use GPS to track individual level time-activity patterns. Time-activity data were collected for seven days using GPS loggers and accelerometers from thirteen adult participants from Southern California under free living conditions. We developed an automated model based on random forests to classify major time-activity patterns (i.e. indoor, outdoor-static, outdoor-walking, and in-vehicle travel). Sensitivity analysis was conducted to examine the contribution of the accelerometer data and the supplemental spatial data (i.e. roadway and tax parcel data) to the accuracy of time-activity classification. Our model was evaluated using both leave-one-fold-out and leave-one-subject-out methods. Maximum speeds in averaging time intervals of 7 and 5 minutes, and distance to primary highways with limited access were found to be the three most important variables in the classification model. Leave-one-fold-out cross-validation showed an overall accuracy of 99.71%. Sensitivities varied from 84.62% (outdoor walking) to 99.90% (indoor). Specificities varied from 96.33% (indoor) to 99.98% (outdoor static). The exclusion of accelerometer and ambient light sensor variables caused a slight loss in sensitivity for outdoor walking, but little loss in overall accuracy. However, leave-one-subject-out cross-validation showed considerable loss in sensitivity for outdoor static and outdoor walking conditions. The random forests classification model can achieve high accuracy for the four major time-activity categories. The model also performed well with just GPS, road and tax parcel data. However, caution is warranted when generalizing the model developed from a small number of subjects to other populations.
Wavelet-based multicomponent denoising on GPU to improve the classification of hyperspectral images

NASA Astrophysics Data System (ADS)

Quesada-Barriuso, Pablo; Heras, Dora B.; Argüello, Francisco; Mouriño, J. C.

2017-10-01

Supervised classification allows handling a wide range of remote sensing hyperspectral applications. Enhancing the spatial organization of the pixels over the image has proven to be beneficial for the interpretation of the image content, thus increasing the classification accuracy. Denoising in the spatial domain of the image has been shown as a technique that enhances the structures in the image. This paper proposes a multi-component denoising approach in order to increase the classification accuracy when a classification method is applied. It is computed on multicore CPUs and NVIDIA GPUs. The method combines feature extraction based on a 1Ddiscrete wavelet transform (DWT) applied in the spectral dimension followed by an Extended Morphological Profile (EMP) and a classifier (SVM or ELM). The multi-component noise reduction is applied to the EMP just before the classification. The denoising recursively applies a separable 2D DWT after which the number of wavelet coefficients is reduced by using a threshold. Finally, inverse 2D-DWT filters are applied to reconstruct the noise free original component. The computational cost of the classifiers as well as the cost of the whole classification chain is high but it is reduced achieving real-time behavior for some applications through their computation on NVIDIA multi-GPU platforms.
SeaQuaKE: Sea-Optimized Quantum Key Exchange

DTIC Science & Technology

2014-08-01

which is led by Applied Communications Sciences under the ONR Free Space Optical Quantum Key Distribution Special Notice (13-SN-0004 under ONRBAA13...aerosol model scenarios. 15. SUBJECT TERMS Quantum communications, free - space optical communications 16. SECURITY CLASSIFICATION OF: 17...SeaQuaKE) project, which is led by Applied Communications Sciences under the ONR Free Space Optical Quantum Key Distribution Special Notice (13-SN
Combination of support vector machine, artificial neural network and random forest for improving the classification of convective and stratiform rain using spectral features of SEVIRI data

NASA Astrophysics Data System (ADS)

Lazri, Mourad; Ameur, Soltane

2018-05-01

A model combining three classifiers, namely Support vector machine, Artificial neural network and Random forest (SAR) is designed for improving the classification of convective and stratiform rain. This model (SAR model) has been trained and then tested on a datasets derived from MSG-SEVIRI (Meteosat Second Generation-Spinning Enhanced Visible and Infrared Imager). Well-classified, mid-classified and misclassified pixels are determined from the combination of three classifiers. Mid-classified and misclassified pixels that are considered unreliable pixels are reclassified by using a novel training of the developed scheme. In this novel training, only the input data corresponding to the pixels in question to are used. This whole process is repeated a second time and applied to mid-classified and misclassified pixels separately. Learning and validation of the developed scheme are realized against co-located data observed by ground radar. The developed scheme outperformed different classifiers used separately and reached 97.40% of overall accuracy of classification.
Probabilistic neural networks for diagnosis of Alzheimer's disease using conventional and wavelet coherence.

PubMed

Sankari, Ziad; Adeli, Hojjat

2011-04-15

Recently, the authors presented an EEG (electroencephalogram) coherence study of the Alzheimer's disease (AD) and found statistically significant differences between AD and control groups. In this paper a probabilistic neural network (PNN) model is presented for classification of AD and healthy controls using features extracted in coherence and wavelet coherence studies on cortical connectivity in AD. The model is verified using EEGs obtained from 20 AD probable patients and 7 healthy/control subjects based on a standard 10-20 electrode configuration on the scalp. It is shown that extracting features from EEG sub-bands using coherence, as a measure of cortical connectivity, can discriminate AD patients from healthy controls effectively when a mixed band classification model is applied. For the data set used a classification accuracy of 100% is achieved using the conventional coherence and a spread parameter of the Gaussian function in a particular range found in this research. Copyright © 2011 Elsevier B.V. All rights reserved.

A tool for enhancing strategic health planning: a modeled use of the International Classification of Functioning, Disability and Health

PubMed Central

Sinclair, Lisa Bundara; Fox, Michael H.; Betts, Donald R.

2015-01-01

SUMMARY This article describes use of the International Classification of Functioning, Disability and Health (ICF) as a tool for strategic planning. The ICF is the international classification system for factors that influence health, including Body Structures, Body Functions, Activities and Participation and Environmental Factors. An overview of strategic planning and the ICF are provided. Selected ICF concepts and nomenclature are used to demonstrate its utility in helping develop a classic planning framework, objectives, measures and actions. Some issues and resolutions for applying the ICF are described. Applying the ICF for strategic health planning is an innovative approach that fosters the inclusion of social ecological health determinants and broad populations. If employed from the onset of planning, the ICF can help public health organizations systematically conceptualize, organize and communicate a strategic health plan. This article is a US Government work and is in the public domain in the USA. PMID:23147247
Application of linear discriminant analysis and Attenuated Total Reflectance Fourier Transform Infrared microspectroscopy for diagnosis of colon cancer.

PubMed

Khanmohammadi, Mohammadreza; Bagheri Garmarudi, Amir; Samani, Simin; Ghasemi, Keyvan; Ashuri, Ahmad

2011-06-01

Attenuated Total Reflectance Fourier Transform Infrared (ATR-FTIR) microspectroscopy was applied for detection of colon cancer according to the spectral features of colon tissues. Supervised classification models can be trained to identify the tissue type based on the spectroscopic fingerprint. A total of 78 colon tissues were used in spectroscopy studies. Major spectral differences were observed in 1,740-900 cm(-1) spectral region. Several chemometric methods such as analysis of variance (ANOVA), cluster analysis (CA) and linear discriminate analysis (LDA) were applied for classification of IR spectra. Utilizing the chemometric techniques, clear and reproducible differences were observed between the spectra of normal and cancer cases, suggesting that infrared microspectroscopy in conjunction with spectral data processing would be useful for diagnostic classification. Using LDA technique, the spectra were classified into cancer and normal tissue classes with an accuracy of 95.8%. The sensitivity and specificity was 100 and 93.1%, respectively.
A tool for enhancing strategic health planning: a modeled use of the International Classification of Functioning, Disability and Health.

PubMed

Sinclair, Lisa Bundara; Fox, Michael H; Betts, Donald R

2013-01-01

This article describes use of the International Classification of Functioning, Disability and Health (ICF) as a tool for strategic planning. The ICF is the international classification system for factors that influence health, including Body Structures, Body Functions, Activities and Participation and Environmental Factors. An overview of strategic planning and the ICF are provided. Selected ICF concepts and nomenclature are used to demonstrate its utility in helping develop a classic planning framework, objectives, measures and actions. Some issues and resolutions for applying the ICF are described. Applying the ICF for strategic health planning is an innovative approach that fosters the inclusion of social ecological health determinants and broad populations. If employed from the onset of planning, the ICF can help public health organizations systematically conceptualize, organize and communicate a strategic health plan. Published 2012. This article is a US Government work and is in the public domain in the USA.
An Iterative Inference Procedure Applying Conditional Random Fields for Simultaneous Classification of Land Cover and Land Use

NASA Astrophysics Data System (ADS)

Albert, L.; Rottensteiner, F.; Heipke, C.

2015-08-01

Land cover and land use exhibit strong contextual dependencies. We propose a novel approach for the simultaneous classification of land cover and land use, where semantic and spatial context is considered. The image sites for land cover and land use classification form a hierarchy consisting of two layers: a land cover layer and a land use layer. We apply Conditional Random Fields (CRF) at both layers. The layers differ with respect to the image entities corresponding to the nodes, the employed features and the classes to be distinguished. In the land cover layer, the nodes represent super-pixels; in the land use layer, the nodes correspond to objects from a geospatial database. Both CRFs model spatial dependencies between neighbouring image sites. The complex semantic relations between land cover and land use are integrated in the classification process by using contextual features. We propose a new iterative inference procedure for the simultaneous classification of land cover and land use, in which the two classification tasks mutually influence each other. This helps to improve the classification accuracy for certain classes. The main idea of this approach is that semantic context helps to refine the class predictions, which, in turn, leads to more expressive context information. Thus, potentially wrong decisions can be reversed at later stages. The approach is designed for input data based on aerial images. Experiments are carried out on a test site to evaluate the performance of the proposed method. We show the effectiveness of the iterative inference procedure and demonstrate that a smaller size of the super-pixels has a positive influence on the classification result.
3D texture analysis for classification of second harmonic generation images of human ovarian cancer

NASA Astrophysics Data System (ADS)

Wen, Bruce; Campbell, Kirby R.; Tilbury, Karissa; Nadiarnykh, Oleg; Brewer, Molly A.; Patankar, Manish; Singh, Vikas; Eliceiri, Kevin. W.; Campagnola, Paul J.

2016-10-01

Remodeling of the collagen architecture in the extracellular matrix (ECM) has been implicated in ovarian cancer. To quantify these alterations we implemented a form of 3D texture analysis to delineate the fibrillar morphology observed in 3D Second Harmonic Generation (SHG) microscopy image data of normal (1) and high risk (2) ovarian stroma, benign ovarian tumors (3), low grade (4) and high grade (5) serous tumors, and endometrioid tumors (6). We developed a tailored set of 3D filters which extract textural features in the 3D image sets to build (or learn) statistical models of each tissue class. By applying k-nearest neighbor classification using these learned models, we achieved 83-91% accuracies for the six classes. The 3D method outperformed the analogous 2D classification on the same tissues, where we suggest this is due the increased information content. This classification based on ECM structural changes will complement conventional classification based on genetic profiles and can serve as an additional biomarker. Moreover, the texture analysis algorithm is quite general, as it does not rely on single morphological metrics such as fiber alignment, length, and width but their combined convolution with a customizable basis set.
Improving galaxy morphologies for SDSS with Deep Learning

NASA Astrophysics Data System (ADS)

Domínguez Sánchez, H.; Huertas-Company, M.; Bernardi, M.; Tuccillo, D.; Fischer, J. L.

2018-05-01

We present a morphological catalogue for ˜670 000 galaxies in the Sloan Digital Sky Survey in two flavours: T-type, related to the Hubble sequence, and Galaxy Zoo 2 (GZ2 hereafter) classification scheme. By combining accurate existing visual classification catalogues with machine learning, we provide the largest and most accurate morphological catalogue up to date. The classifications are obtained with Deep Learning algorithms using Convolutional Neural Networks (CNNs). We use two visual classification catalogues, GZ2 and Nair & Abraham (2010), for training CNNs with colour images in order to obtain T-types and a series of GZ2 type questions (disc/features, edge-on galaxies, bar signature, bulge prominence, roundness, and mergers). We also provide an additional probability enabling a separation between pure elliptical (E) from S0, where the T-type model is not so efficient. For the T-type, our results show smaller offset and scatter than previous models trained with support vector machines. For the GZ2 type questions, our models have large accuracy (>97 per cent), precision and recall values (>90 per cent), when applied to a test sample with the same characteristics as the one used for training. The catalogue is publicly released with the paper.
Sparse Multivariate Autoregressive Modeling for Mild Cognitive Impairment Classification

PubMed Central

Li, Yang; Wee, Chong-Yaw; Jie, Biao; Peng, Ziwen

2014-01-01

Brain connectivity network derived from functional magnetic resonance imaging (fMRI) is becoming increasingly prevalent in the researches related to cognitive and perceptual processes. The capability to detect causal or effective connectivity is highly desirable for understanding the cooperative nature of brain network, particularly when the ultimate goal is to obtain good performance of control-patient classification with biological meaningful interpretations. Understanding directed functional interactions between brain regions via brain connectivity network is a challenging task. Since many genetic and biomedical networks are intrinsically sparse, incorporating sparsity property into connectivity modeling can make the derived models more biologically plausible. Accordingly, we propose an effective connectivity modeling of resting-state fMRI data based on the multivariate autoregressive (MAR) modeling technique, which is widely used to characterize temporal information of dynamic systems. This MAR modeling technique allows for the identification of effective connectivity using the Granger causality concept and reducing the spurious causality connectivity in assessment of directed functional interaction from fMRI data. A forward orthogonal least squares (OLS) regression algorithm is further used to construct a sparse MAR model. By applying the proposed modeling to mild cognitive impairment (MCI) classification, we identify several most discriminative regions, including middle cingulate gyrus, posterior cingulate gyrus, lingual gyrus and caudate regions, in line with results reported in previous findings. A relatively high classification accuracy of 91.89 % is also achieved, with an increment of 5.4 % compared to the fully-connected, non-directional Pearson-correlation-based functional connectivity approach. PMID:24595922
Support vector machine based classification of fast Fourier transform spectroscopy of proteins

NASA Astrophysics Data System (ADS)

Lazarevic, Aleksandar; Pokrajac, Dragoljub; Marcano, Aristides; Melikechi, Noureddine

2009-02-01

Fast Fourier transform spectroscopy has proved to be a powerful method for study of the secondary structure of proteins since peak positions and their relative amplitude are affected by the number of hydrogen bridges that sustain this secondary structure. However, to our best knowledge, the method has not been used yet for identification of proteins within a complex matrix like a blood sample. The principal reason is the apparent similarity of protein infrared spectra with actual differences usually masked by the solvent contribution and other interactions. In this paper, we propose a novel machine learning based method that uses protein spectra for classification and identification of such proteins within a given sample. The proposed method uses principal component analysis (PCA) to identify most important linear combinations of original spectral components and then employs support vector machine (SVM) classification model applied on such identified combinations to categorize proteins into one of given groups. Our experiments have been performed on the set of four different proteins, namely: Bovine Serum Albumin, Leptin, Insulin-like Growth Factor 2 and Osteopontin. Our proposed method of applying principal component analysis along with support vector machines exhibits excellent classification accuracy when identifying proteins using their infrared spectra.
Modeling and Simulation of Ceramic Arrays to Improve Ballistic Performance

DTIC Science & Technology

2014-01-17

30cal AP M2 Projectile, 762x39 PS Projectile, SPH , Aluminum 5083, SiC, DoP Expeminets, AutoDyn Simulations, Tile Gap 16. SECURITY CLASSIFICATION...particle hydrodynamics ( SPH ) is applied for all parts. The SPH particle size is .4 mm, with the assumption that modeling dust smaller than .4 mm can be
Statistical Signal Models and Algorithms for Image Analysis

DTIC Science & Technology

1984-10-25

In this report, two-dimensional stochastic linear models are used in developing algorithms for image analysis such as classification, segmentation, and object detection in images characterized by textured backgrounds. These models generate two-dimensional random processes as outputs to which statistical inference procedures can naturally be applied. A common thread throughout our algorithms is the interpretation of the inference procedures in terms of linear prediction
Ordinal convolutional neural networks for predicting RDoC positive valence psychiatric symptom severity scores.

PubMed

Rios, Anthony; Kavuluru, Ramakanth

2017-11-01

The CEGS N-GRID 2016 Shared Task in Clinical Natural Language Processing (NLP) provided a set of 1000 neuropsychiatric notes to participants as part of a competition to predict psychiatric symptom severity scores. This paper summarizes our methods, results, and experiences based on our participation in the second track of the shared task. Classical methods of text classification usually fall into one of three problem types: binary, multi-class, and multi-label classification. In this effort, we study ordinal regression problems with text data where misclassifications are penalized differently based on how far apart the ground truth and model predictions are on the ordinal scale. Specifically, we present our entries (methods and results) in the N-GRID shared task in predicting research domain criteria (RDoC) positive valence ordinal symptom severity scores (absent, mild, moderate, and severe) from psychiatric notes. We propose a novel convolutional neural network (CNN) model designed to handle ordinal regression tasks on psychiatric notes. Broadly speaking, our model combines an ordinal loss function, a CNN, and conventional feature engineering (wide features) into a single model which is learned end-to-end. Given interpretability is an important concern with nonlinear models, we apply a recent approach called locally interpretable model-agnostic explanation (LIME) to identify important words that lead to instance specific predictions. Our best model entered into the shared task placed third among 24 teams and scored a macro mean absolute error (MMAE) based normalized score (100·(1-MMAE)) of 83.86. Since the competition, we improved our score (using basic ensembling) to 85.55, comparable with the winning shared task entry. Applying LIME to model predictions, we demonstrate the feasibility of instance specific prediction interpretation by identifying words that led to a particular decision. In this paper, we present a method that successfully uses wide features and an ordinal loss function applied to convolutional neural networks for ordinal text classification specifically in predicting psychiatric symptom severity scores. Our approach leads to excellent performance on the N-GRID shared task and is also amenable to interpretability using existing model-agnostic approaches. Copyright © 2017 Elsevier Inc. All rights reserved.
Deep Recurrent Neural Networks for Supernovae Classification

NASA Astrophysics Data System (ADS)

Charnock, Tom; Moss, Adam

2017-03-01

We apply deep recurrent neural networks, which are capable of learning complex sequential information, to classify supernovae (code available at https://github.com/adammoss/supernovae). The observational time and filter fluxes are used as inputs to the network, but since the inputs are agnostic, additional data such as host galaxy information can also be included. Using the Supernovae Photometric Classification Challenge (SPCC) data, we find that deep networks are capable of learning about light curves, however the performance of the network is highly sensitive to the amount of training data. For a training size of 50% of the representational SPCC data set (around 104 supernovae) we obtain a type-Ia versus non-type-Ia classification accuracy of 94.7%, an area under the Receiver Operating Characteristic curve AUC of 0.986 and an SPCC figure-of-merit F 1 = 0.64. When using only the data for the early-epoch challenge defined by the SPCC, we achieve a classification accuracy of 93.1%, AUC of 0.977, and F 1 = 0.58, results almost as good as with the whole light curve. By employing bidirectional neural networks, we can acquire impressive classification results between supernovae types I, II and III at an accuracy of 90.4% and AUC of 0.974. We also apply a pre-trained model to obtain classification probabilities as a function of time and show that it can give early indications of supernovae type. Our method is competitive with existing algorithms and has applications for future large-scale photometric surveys.
SU-G-BRC-13: Model Based Classification for Optimal Position Selection for Left-Sided Breast Radiotherapy: Free Breathing, DIBH, Or Prone

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lin, H; Liu, T; Xu, X

Purpose: There are clinical decision challenges to select optimal treatment positions for left-sided breast cancer patients—supine free breathing (FB), supine Deep Inspiration Breath Hold (DIBH) and prone free breathing (prone). Physicians often make the decision based on experiences and trials, which might not always result optimal OAR doses. We herein propose a mathematical model to predict the lowest OAR doses among these three positions, providing a quantitative tool for corresponding clinical decision. Methods: Patients were scanned in FB, DIBH, and prone positions under an IRB approved protocol. Tangential beam plans were generated for each position, and OAR doses were calculated.more » The position with least OAR doses is defined as the optimal position. The following features were extracted from each scan to build the model: heart, ipsilateral lung, breast volume, in-field heart, ipsilateral lung volume, distance between heart and target, laterality of heart, and dose to heart and ipsilateral lung. Principal Components Analysis (PCA) was applied to remove the co-linearity of the input data and also to lower the data dimensionality. Feature selection, another method to reduce dimensionality, was applied as a comparison. Support Vector Machine (SVM) was then used for classification. Thirtyseven patient data were acquired; up to now, five patient plans were available. K-fold cross validation was used to validate the accuracy of the classifier model with small training size. Results: The classification results and K-fold cross validation demonstrated the model is capable of predicting the optimal position for patients. The accuracy of K-fold cross validations has reached 80%. Compared to PCA, feature selection allows causal features of dose to be determined. This provides more clinical insights. Conclusion: The proposed classification system appeared to be feasible. We are generating plans for the rest of the 37 patient images, and more statistically significant results are to be presented.« less
Skin lesion computational diagnosis of dermoscopic images: Ensemble models based on input feature manipulation.

PubMed

Oliveira, Roberta B; Pereira, Aledir S; Tavares, João Manuel R S

2017-10-01

The number of deaths worldwide due to melanoma has risen in recent times, in part because melanoma is the most aggressive type of skin cancer. Computational systems have been developed to assist dermatologists in early diagnosis of skin cancer, or even to monitor skin lesions. However, there still remains a challenge to improve classifiers for the diagnosis of such skin lesions. The main objective of this article is to evaluate different ensemble classification models based on input feature manipulation to diagnose skin lesions. Input feature manipulation processes are based on feature subset selections from shape properties, colour variation and texture analysis to generate diversity for the ensemble models. Three subset selection models are presented here: (1) a subset selection model based on specific feature groups, (2) a correlation-based subset selection model, and (3) a subset selection model based on feature selection algorithms. Each ensemble classification model is generated using an optimum-path forest classifier and integrated with a majority voting strategy. The proposed models were applied on a set of 1104 dermoscopic images using a cross-validation procedure. The best results were obtained by the first ensemble classification model that generates a feature subset ensemble based on specific feature groups. The skin lesion diagnosis computational system achieved 94.3% accuracy, 91.8% sensitivity and 96.7% specificity. The input feature manipulation process based on specific feature subsets generated the greatest diversity for the ensemble classification model with very promising results. Copyright © 2017 Elsevier B.V. All rights reserved.
FT-MIR and NIR spectral data fusion: a synergetic strategy for the geographical traceability of Panax notoginseng.

PubMed

Li, Yun; Zhang, Jin-Yu; Wang, Yuan-Zhong

2018-01-01

Three data fusion strategies (low-llevel, mid-llevel, and high-llevel) combined with a multivariate classification algorithm (random forest, RF) were applied to authenticate the geographical origins of Panax notoginseng collected from five regions of Yunnan province in China. In low-level fusion, the original data from two spectra (Fourier transform mid-IR spectrum and near-IR spectrum) were directly concatenated into a new matrix, which then was applied for the classification. Mid-level fusion was the strategy that inputted variables extracted from the spectral data into an RF classification model. The extracted variables were processed by iterate variable selection of the RF model and principal component analysis. The use of high-level fusion combined the decision making of each spectroscopic technique and resulted in an ensemble decision. The results showed that the mid-level and high-level data fusion take advantage of the information synergy from two spectroscopic techniques and had better classification performance than that of independent decision making. High-level data fusion is the most effective strategy since the classification results are better than those of the other fusion strategies: accuracy rates ranged between 93% and 96% for the low-level data fusion, between 95% and 98% for the mid-level data fusion, and between 98% and 100% for the high-level data fusion. In conclusion, the high-level data fusion strategy for Fourier transform mid-IR and near-IR spectra can be used as a reliable tool for correct geographical identification of P. notoginseng. Graphical abstract The analytical steps of Fourier transform mid-IR and near-IR spectral data fusion for the geographical traceability of Panax notoginseng.
Investigation of hydrometeor classification uncertainties through the POLARRIS polarimetric radar simulator

NASA Astrophysics Data System (ADS)

Dolan, B.; Rutledge, S. A.; Barnum, J. I.; Matsui, T.; Tao, W. K.; Iguchi, T.

2017-12-01

POLarimetric Radar Retrieval and Instrument Simulator (POLARRIS) is a framework that has been developed to simulate radar observations from cloud resolving model (CRM) output and subject model data and observations to the same retrievals, analysis and visualization. This framework not only enables validation of bulk microphysical model simulated properties, but also offers an opportunity to study the uncertainties associated with retrievals such as hydrometeor classification (HID). For the CSU HID, membership beta functions (MBFs) are built using a set of simulations with realistic microphysical assumptions about axis ratio, density, canting angles, size distributions for each of ten hydrometeor species. These assumptions are tested using POLARRIS to understand their influence on the resulting simulated polarimetric data and final HID classification. Several of these parameters (density, size distributions) are set by the model microphysics, and therefore the specific assumptions of axis ratio and canting angle are carefully studied. Through these sensitivity studies, we hope to be able to provide uncertainties in retrieved polarimetric variables and HID as applied to CRM output. HID retrievals assign a classification to each point by determining the highest score, thereby identifying the dominant hydrometeor type within a volume. However, in nature, there is rarely just one a single hydrometeor type at a particular point. Models allow for mixing ratios of different hydrometeors within a grid point. We use the mixing ratios from CRM output in concert with the HID scores and classifications to understand how the HID algorithm can provide information about mixtures within a volume, as well as calculate a confidence in the classifications. We leverage the POLARRIS framework to additionally probe radar wavelength differences toward the possibility of a multi-wavelength HID which could utilize the strengths of different wavelengths to improve HID classifications. With these uncertainties and algorithm improvements, cases of convection are studied in a continental (Oklahoma) and maritime (Darwin, Australia) regime. Observations from C-band polarimetric data in both locations are compared to CRM simulations from NU-WRF using the POLARRIS framework.
Virtual Sensor of Surface Electromyography in a New Extensive Fault-Tolerant Classification System.

PubMed

de Moura, Karina de O A; Balbinot, Alexandre

2018-05-01

A few prosthetic control systems in the scientific literature obtain pattern recognition algorithms adapted to changes that occur in the myoelectric signal over time and, frequently, such systems are not natural and intuitive. These are some of the several challenges for myoelectric prostheses for everyday use. The concept of the virtual sensor, which has as its fundamental objective to estimate unavailable measures based on other available measures, is being used in other fields of research. The virtual sensor technique applied to surface electromyography can help to minimize these problems, typically related to the degradation of the myoelectric signal that usually leads to a decrease in the classification accuracy of the movements characterized by computational intelligent systems. This paper presents a virtual sensor in a new extensive fault-tolerant classification system to maintain the classification accuracy after the occurrence of the following contaminants: ECG interference, electrode displacement, movement artifacts, power line interference, and saturation. The Time-Varying Autoregressive Moving Average (TVARMA) and Time-Varying Kalman filter (TVK) models are compared to define the most robust model for the virtual sensor. Results of movement classification were presented comparing the usual classification techniques with the method of the degraded signal replacement and classifier retraining. The experimental results were evaluated for these five noise types in 16 surface electromyography (sEMG) channel degradation case studies. The proposed system without using classifier retraining techniques recovered of mean classification accuracy was of 4% to 38% for electrode displacement, movement artifacts, and saturation noise. The best mean classification considering all signal contaminants and channel combinations evaluated was the classification using the retraining method, replacing the degraded channel by the virtual sensor TVARMA model. This method recovered the classification accuracy after the degradations, reaching an average of 5.7% below the classification of the clean signal, that is the signal without the contaminants or the original signal. Moreover, the proposed intelligent technique minimizes the impact of the motion classification caused by signal contamination related to degrading events over time. There are improvements in the virtual sensor model and in the algorithm optimization that need further development to provide an increase the clinical application of myoelectric prostheses but already presents robust results to enable research with virtual sensors on biological signs with stochastic behavior.
Virtual Sensor of Surface Electromyography in a New Extensive Fault-Tolerant Classification System

PubMed Central

Balbinot, Alexandre

2018-01-01

A few prosthetic control systems in the scientific literature obtain pattern recognition algorithms adapted to changes that occur in the myoelectric signal over time and, frequently, such systems are not natural and intuitive. These are some of the several challenges for myoelectric prostheses for everyday use. The concept of the virtual sensor, which has as its fundamental objective to estimate unavailable measures based on other available measures, is being used in other fields of research. The virtual sensor technique applied to surface electromyography can help to minimize these problems, typically related to the degradation of the myoelectric signal that usually leads to a decrease in the classification accuracy of the movements characterized by computational intelligent systems. This paper presents a virtual sensor in a new extensive fault-tolerant classification system to maintain the classification accuracy after the occurrence of the following contaminants: ECG interference, electrode displacement, movement artifacts, power line interference, and saturation. The Time-Varying Autoregressive Moving Average (TVARMA) and Time-Varying Kalman filter (TVK) models are compared to define the most robust model for the virtual sensor. Results of movement classification were presented comparing the usual classification techniques with the method of the degraded signal replacement and classifier retraining. The experimental results were evaluated for these five noise types in 16 surface electromyography (sEMG) channel degradation case studies. The proposed system without using classifier retraining techniques recovered of mean classification accuracy was of 4% to 38% for electrode displacement, movement artifacts, and saturation noise. The best mean classification considering all signal contaminants and channel combinations evaluated was the classification using the retraining method, replacing the degraded channel by the virtual sensor TVARMA model. This method recovered the classification accuracy after the degradations, reaching an average of 5.7% below the classification of the clean signal, that is the signal without the contaminants or the original signal. Moreover, the proposed intelligent technique minimizes the impact of the motion classification caused by signal contamination related to degrading events over time. There are improvements in the virtual sensor model and in the algorithm optimization that need further development to provide an increase the clinical application of myoelectric prostheses but already presents robust results to enable research with virtual sensors on biological signs with stochastic behavior. PMID:29723994
Molecular classification of pesticides including persistent organic pollutants, phenylurea and sulphonylurea herbicides.

PubMed

Torrens, Francisco; Castellano, Gloria

2014-06-05

Pesticide residues in wine were analyzed by liquid chromatography-tandem mass spectrometry. Retentions are modelled by structure-property relationships. Bioplastic evolution is an evolutionary perspective conjugating effect of acquired characters and evolutionary indeterminacy-morphological determination-natural selection principles; its application to design co-ordination index barely improves correlations. Fractal dimensions and partition coefficient differentiate pesticides. Classification algorithms are based on information entropy and its production. Pesticides allow a structural classification by nonplanarity, and number of O, S, N and Cl atoms and cycles; different behaviours depend on number of cycles. The novelty of the approach is that the structural parameters are related to retentions. Classification algorithms are based on information entropy. When applying procedures to moderate-sized sets, excessive results appear compatible with data suffering a combinatorial explosion. However, equipartition conjecture selects criterion resulting from classification between hierarchical trees. Information entropy permits classifying compounds agreeing with principal component analyses. Periodic classification shows that pesticides in the same group present similar properties; those also in equal period, maximum resemblance. The advantage of the classification is to predict the retentions for molecules not included in the categorization. Classification extends to phenyl/sulphonylureas and the application will be to predict their retentions.
A binary genetic programing model for teleconnection identification between global sea surface temperature and local maximum monthly rainfall events

NASA Astrophysics Data System (ADS)

Danandeh Mehr, Ali; Nourani, Vahid; Hrnjica, Bahrudin; Molajou, Amir

2017-12-01

The effectiveness of genetic programming (GP) for solving regression problems in hydrology has been recognized in recent studies. However, its capability to solve classification problems has not been sufficiently explored so far. This study develops and applies a novel classification-forecasting model, namely Binary GP (BGP), for teleconnection studies between sea surface temperature (SST) variations and maximum monthly rainfall (MMR) events. The BGP integrates certain types of data pre-processing and post-processing methods with conventional GP engine to enhance its ability to solve both regression and classification problems simultaneously. The model was trained and tested using SST series of Black Sea, Mediterranean Sea, and Red Sea as potential predictors as well as classified MMR events at two locations in Iran as predictand. Skill of the model was measured in regard to different rainfall thresholds and SST lags and compared to that of the hybrid decision tree-association rule (DTAR) model available in the literature. The results indicated that the proposed model can identify potential teleconnection signals of surrounding seas beneficial to long-term forecasting of the occurrence of the classified MMR events.

Modeling Spatial Dependencies and Semantic Concepts in Data Mining

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vatsavai, Raju

Data mining is the process of discovering new patterns and relationships in large datasets. However, several studies have shown that general data mining techniques often fail to extract meaningful patterns and relationships from the spatial data owing to the violation of fundamental geospatial principles. In this tutorial, we introduce basic principles behind explicit modeling of spatial and semantic concepts in data mining. In particular, we focus on modeling these concepts in the widely used classification, clustering, and prediction algorithms. Classification is the process of learning a structure or model (from user given inputs) and applying the known model to themore » new data. Clustering is the process of discovering groups and structures in the data that are ``similar,'' without applying any known structures in the data. Prediction is the process of finding a function that models (explains) the data with least error. One common assumption among all these methods is that the data is independent and identically distributed. Such assumptions do not hold well in spatial data, where spatial dependency and spatial heterogeneity are a norm. In addition, spatial semantics are often ignored by the data mining algorithms. In this tutorial we cover recent advances in explicitly modeling of spatial dependencies and semantic concepts in data mining.« less
Personnel and Vehicle Data Collection at Aberdeen Proving Ground (APG) and its Distribution for Research

DTIC Science & Technology

2015-10-01

28 Magnetometer Applied Physics Model 1540-digital 3-axis fluxgate 5 Amplifiers Alligator Technologies USBPGF-S1 programmable instrumentation...Acoustic, Seismic, magnetic, footstep, vehicle, magnetometer , geophone, unattended ground sensor (UGS) 16. SECURITY CLASSIFICATION OF: 17. LIMITATION
Camera-Model Identification Using Markovian Transition Probability Matrix

NASA Astrophysics Data System (ADS)

Xu, Guanshuo; Gao, Shang; Shi, Yun Qing; Hu, Ruimin; Su, Wei

Detecting the (brands and) models of digital cameras from given digital images has become a popular research topic in the field of digital forensics. As most of images are JPEG compressed before they are output from cameras, we propose to use an effective image statistical model to characterize the difference JPEG 2-D arrays of Y and Cb components from the JPEG images taken by various camera models. Specifically, the transition probability matrices derived from four different directional Markov processes applied to the image difference JPEG 2-D arrays are used to identify statistical difference caused by image formation pipelines inside different camera models. All elements of the transition probability matrices, after a thresholding technique, are directly used as features for classification purpose. Multi-class support vector machines (SVM) are used as the classification tool. The effectiveness of our proposed statistical model is demonstrated by large-scale experimental results.
Identification and classification of conopeptides using profile Hidden Markov Models.

PubMed

Laht, Silja; Koua, Dominique; Kaplinski, Lauris; Lisacek, Frédérique; Stöcklin, Reto; Remm, Maido

2012-03-01

Conopeptides are small toxins produced by predatory marine snails of the genus Conus. They are studied with increasing intensity due to their potential in neurosciences and pharmacology. The number of existing conopeptides is estimated to be 1 million, but only about 1000 have been described to date. Thanks to new high-throughput sequencing technologies the number of known conopeptides is likely to increase exponentially in the near future. There is therefore a need for a fast and accurate computational method for identification and classification of the novel conopeptides in large data sets. 62 profile Hidden Markov Models (pHMMs) were built for prediction and classification of all described conopeptide superfamilies and families, based on the different parts of the corresponding protein sequences. These models showed very high specificity in detection of new peptides. 56 out of 62 models do not give a single false positive in a test with the entire UniProtKB/Swiss-Prot protein sequence database. Our study demonstrates the usefulness of mature peptide models for automatic classification with accuracy of 96% for the mature peptide models and 100% for the pro- and signal peptide models. Our conopeptide profile HMMs can be used for finding and annotation of new conopeptides from large datasets generated by transcriptome or genome sequencing. To our knowledge this is the first time this kind of computational method has been applied to predict all known conopeptide superfamilies and some conopeptide families. Copyright Â© 2012 Elsevier B.V. All rights reserved.
DockQ: A Quality Measure for Protein-Protein Docking Models

PubMed Central

Basu, Sankar

2016-01-01

The state-of-the-art to assess the structural quality of docking models is currently based on three related yet independent quality measures: Fnat, LRMS, and iRMS as proposed and standardized by CAPRI. These quality measures quantify different aspects of the quality of a particular docking model and need to be viewed together to reveal the true quality, e.g. a model with relatively poor LRMS (>10Å) might still qualify as 'acceptable' with a descent Fnat (>0.50) and iRMS (<3.0Å). This is also the reason why the so called CAPRI criteria for assessing the quality of docking models is defined by applying various ad-hoc cutoffs on these measures to classify a docking model into the four classes: Incorrect, Acceptable, Medium, or High quality. This classification has been useful in CAPRI, but since models are grouped in only four bins it is also rather limiting, making it difficult to rank models, correlate with scoring functions or use it as target function in machine learning algorithms. Here, we present DockQ, a continuous protein-protein docking model quality measure derived by combining Fnat, LRMS, and iRMS to a single score in the range [0, 1] that can be used to assess the quality of protein docking models. By using DockQ on CAPRI models it is possible to almost completely reproduce the original CAPRI classification into Incorrect, Acceptable, Medium and High quality. An average PPV of 94% at 90% Recall demonstrating that there is no need to apply predefined ad-hoc cutoffs to classify docking models. Since DockQ recapitulates the CAPRI classification almost perfectly, it can be viewed as a higher resolution version of the CAPRI classification, making it possible to estimate model quality in a more quantitative way using Z-scores or sum of top ranked models, which has been so valuable for the CASP community. The possibility to directly correlate a quality measure to a scoring function has been crucial for the development of scoring functions for protein structure prediction, and DockQ should be useful in a similar development in the protein docking field. DockQ is available at http://github.com/bjornwallner/DockQ/ PMID:27560519
Sequence-structure relationship study in all-α transmembrane proteins using an unsupervised learning approach.

PubMed

Esque, Jérémy; Urbain, Aurélie; Etchebest, Catherine; de Brevern, Alexandre G

2015-11-01

Transmembrane proteins (TMPs) are major drug targets, but the knowledge of their precise topology structure remains highly limited compared with globular proteins. In spite of the difficulties in obtaining their structures, an important effort has been made these last years to increase their number from an experimental and computational point of view. In view of this emerging challenge, the development of computational methods to extract knowledge from these data is crucial for the better understanding of their functions and in improving the quality of structural models. Here, we revisit an efficient unsupervised learning procedure, called Hybrid Protein Model (HPM), which is applied to the analysis of transmembrane proteins belonging to the all-α structural class. HPM method is an original classification procedure that efficiently combines sequence and structure learning. The procedure was initially applied to the analysis of globular proteins. In the present case, HPM classifies a set of overlapping protein fragments, extracted from a non-redundant databank of TMP 3D structure. After fine-tuning of the learning parameters, the optimal classification results in 65 clusters. They represent at best similar relationships between sequence and local structure properties of TMPs. Interestingly, HPM distinguishes among the resulting clusters two helical regions with distinct hydrophobic patterns. This underlines the complexity of the topology of these proteins. The HPM classification enlightens unusual relationship between amino acids in TMP fragments, which can be useful to elaborate new amino acids substitution matrices. Finally, two challenging applications are described: the first one aims at annotating protein functions (channel or not), the second one intends to assess the quality of the structures (X-ray or models) via a new scoring function deduced from the HPM classification.
Patch-based Convolutional Neural Network for Whole Slide Tissue Image Classification

PubMed Central

Hou, Le; Samaras, Dimitris; Kurc, Tahsin M.; Gao, Yi; Davis, James E.; Saltz, Joel H.

2016-01-01

Convolutional Neural Networks (CNN) are state-of-the-art models for many image classification tasks. However, to recognize cancer subtypes automatically, training a CNN on gigapixel resolution Whole Slide Tissue Images (WSI) is currently computationally impossible. The differentiation of cancer subtypes is based on cellular-level visual features observed on image patch scale. Therefore, we argue that in this situation, training a patch-level classifier on image patches will perform better than or similar to an image-level classifier. The challenge becomes how to intelligently combine patch-level classification results and model the fact that not all patches will be discriminative. We propose to train a decision fusion model to aggregate patch-level predictions given by patch-level CNNs, which to the best of our knowledge has not been shown before. Furthermore, we formulate a novel Expectation-Maximization (EM) based method that automatically locates discriminative patches robustly by utilizing the spatial relationships of patches. We apply our method to the classification of glioma and non-small-cell lung carcinoma cases into subtypes. The classification accuracy of our method is similar to the inter-observer agreement between pathologists. Although it is impossible to train CNNs on WSIs, we experimentally demonstrate using a comparable non-cancer dataset of smaller images that a patch-based CNN can outperform an image-based CNN. PMID:27795661
22 CFR 9.4 - Original classification.

Code of Federal Regulations, 2014 CFR

2014-04-01

... 22 Foreign Relations 1 2014-04-01 2014-04-01 false Original classification. 9.4 Section 9.4... classification. (a) Definition. Original classification is the initial determination that certain information... classification. (b) Classification levels. (1) Top Secret shall be applied to information the unauthorized...
22 CFR 9.4 - Original classification.

Code of Federal Regulations, 2013 CFR

2013-04-01

... 22 Foreign Relations 1 2013-04-01 2013-04-01 false Original classification. 9.4 Section 9.4... classification. (a) Definition. Original classification is the initial determination that certain information... classification. (b) Classification levels. (1) Top Secret shall be applied to information the unauthorized...
22 CFR 9.4 - Original classification.

Code of Federal Regulations, 2012 CFR

2012-04-01

... 22 Foreign Relations 1 2012-04-01 2012-04-01 false Original classification. 9.4 Section 9.4... classification. (a) Definition. Original classification is the initial determination that certain information... classification. (b) Classification levels. (1) Top Secret shall be applied to information the unauthorized...
22 CFR 9.4 - Original classification.

Code of Federal Regulations, 2011 CFR

2011-04-01

... 22 Foreign Relations 1 2011-04-01 2011-04-01 false Original classification. 9.4 Section 9.4... classification. (a) Definition. Original classification is the initial determination that certain information... classification. (b) Classification levels. (1) Top Secret shall be applied to information the unauthorized...
Development of a novel fingerprint for chemical reactions and its application to large-scale reaction classification and similarity.

PubMed

Schneider, Nadine; Lowe, Daniel M; Sayle, Roger A; Landrum, Gregory A

2015-01-26

Fingerprint methods applied to molecules have proven to be useful for similarity determination and as inputs to machine-learning models. Here, we present the development of a new fingerprint for chemical reactions and validate its usefulness in building machine-learning models and in similarity assessment. Our final fingerprint is constructed as the difference of the atom-pair fingerprints of products and reactants and includes agents via calculated physicochemical properties. We validated the fingerprints on a large data set of reactions text-mined from granted United States patents from the last 40 years that have been classified using a substructure-based expert system. We applied machine learning to build a 50-class predictive model for reaction-type classification that correctly predicts 97% of the reactions in an external test set. Impressive accuracies were also observed when applying the classifier to reactions from an in-house electronic laboratory notebook. The performance of the novel fingerprint for assessing reaction similarity was evaluated by a cluster analysis that recovered 48 out of 50 of the reaction classes with a median F-score of 0.63 for the clusters. The data sets used for training and primary validation as well as all python scripts required to reproduce the analysis are provided in the Supporting Information.
Applying Data Mining Techniques to Extract Hidden Patterns about Breast Cancer Survival in an Iranian Cohort Study.

PubMed

Khalkhali, Hamid Reza; Lotfnezhad Afshar, Hadi; Esnaashari, Omid; Jabbari, Nasrollah

2016-01-01

Breast cancer survival has been analyzed by many standard data mining algorithms. A group of these algorithms belonged to the decision tree category. Ability of the decision tree algorithms in terms of visualizing and formulating of hidden patterns among study variables were main reasons to apply an algorithm from the decision tree category in the current study that has not studied already. The classification and regression trees (CART) was applied to a breast cancer database contained information on 569 patients in 2007-2010. The measurement of Gini impurity used for categorical target variables was utilized. The classification error that is a function of tree size was measured by 10-fold cross-validation experiments. The performance of created model was evaluated by the criteria as accuracy, sensitivity and specificity. The CART model produced a decision tree with 17 nodes, 9 of which were associated with a set of rules. The rules were meaningful clinically. They showed in the if-then format that Stage was the most important variable for predicting breast cancer survival. The scores of accuracy, sensitivity and specificity were: 80.3%, 93.5% and 53%, respectively. The current study model as the first one created by the CART was able to extract useful hidden rules from a relatively small size dataset.
A Corpus-Based Approach for Automatic Thai Unknown Word Recognition Using Boosting Techniques

NASA Astrophysics Data System (ADS)

Techo, Jakkrit; Nattee, Cholwich; Theeramunkong, Thanaruk

While classification techniques can be applied for automatic unknown word recognition in a language without word boundary, it faces with the problem of unbalanced datasets where the number of positive unknown word candidates is dominantly smaller than that of negative candidates. To solve this problem, this paper presents a corpus-based approach that introduces a so-called group-based ranking evaluation technique into ensemble learning in order to generate a sequence of classification models that later collaborate to select the most probable unknown word from multiple candidates. Given a classification model, the group-based ranking evaluation (GRE) is applied to construct a training dataset for learning the succeeding model, by weighing each of its candidates according to their ranks and correctness when the candidates of an unknown word are considered as one group. A number of experiments have been conducted on a large Thai medical text to evaluate performance of the proposed group-based ranking evaluation approach, namely V-GRE, compared to the conventional naïve Bayes classifier and our vanilla version without ensemble learning. As the result, the proposed method achieves an accuracy of 90.93±0.50% when the first rank is selected while it gains 97.26±0.26% when the top-ten candidates are considered, that is 8.45% and 6.79% improvement over the conventional record-based naïve Bayes classifier and the vanilla version. Another result on applying only best features show 93.93±0.22% and up to 98.85±0.15% accuracy for top-1 and top-10, respectively. They are 3.97% and 9.78% improvement over naive Bayes and the vanilla version. Finally, an error analysis is given.
Label-noise resistant logistic regression for functional data classification with an application to Alzheimer's disease study.

PubMed

Lee, Seokho; Shin, Hyejin; Lee, Sang Han

2016-12-01

Alzheimer's disease (AD) is usually diagnosed by clinicians through cognitive and functional performance test with a potential risk of misdiagnosis. Since the progression of AD is known to cause structural changes in the corpus callosum (CC), the CC thickness can be used as a functional covariate in AD classification problem for a diagnosis. However, misclassified class labels negatively impact the classification performance. Motivated by AD-CC association studies, we propose a logistic regression for functional data classification that is robust to misdiagnosis or label noise. Specifically, our logistic regression model is constructed by adopting individual intercepts to functional logistic regression model. This approach enables to indicate which observations are possibly mislabeled and also lead to a robust and efficient classifier. An effective algorithm using MM algorithm provides simple closed-form update formulas. We test our method using synthetic datasets to demonstrate its superiority over an existing method, and apply it to differentiating patients with AD from healthy normals based on CC from MRI. © 2016, The International Biometric Society.
Algorithmic framework for group analysis of differential equations and its application to generalized Zakharov-Kuznetsov equations

NASA Astrophysics Data System (ADS)

Huang, Ding-jiang; Ivanova, Nataliya M.

2016-02-01

In this paper, we explain in more details the modern treatment of the problem of group classification of (systems of) partial differential equations (PDEs) from the algorithmic point of view. More precisely, we revise the classical Lie algorithm of construction of symmetries of differential equations, describe the group classification algorithm and discuss the process of reduction of (systems of) PDEs to (systems of) equations with smaller number of independent variables in order to construct invariant solutions. The group classification algorithm and reduction process are illustrated by the example of the generalized Zakharov-Kuznetsov (GZK) equations of form ut +(F (u)) xxx +(G (u)) xyy +(H (u)) x = 0. As a result, a complete group classification of the GZK equations is performed and a number of new interesting nonlinear invariant models which have non-trivial invariance algebras are obtained. Lie symmetry reductions and exact solutions for two important invariant models, i.e., the classical and modified Zakharov-Kuznetsov equations, are constructed. The algorithmic framework for group analysis of differential equations presented in this paper can also be applied to other nonlinear PDEs.
EEG-based driver fatigue detection using hybrid deep generic model.

PubMed

Phyo Phyo San; Sai Ho Ling; Rifai Chai; Tran, Yvonne; Craig, Ashley; Hung Nguyen

2016-08-01

Classification of electroencephalography (EEG)-based application is one of the important process for biomedical engineering. Driver fatigue is a major case of traffic accidents worldwide and considered as a significant problem in recent decades. In this paper, a hybrid deep generic model (DGM)-based support vector machine is proposed for accurate detection of driver fatigue. Traditionally, a probabilistic DGM with deep architecture is quite good at learning invariant features, but it is not always optimal for classification due to its trainable parameters are in the middle layer. Alternatively, Support Vector Machine (SVM) itself is unable to learn complicated invariance, but produces good decision surface when applied to well-behaved features. Consolidating unsupervised high-level feature extraction techniques, DGM and SVM classification makes the integrated framework stronger and enhance mutually in feature extraction and classification. The experimental results showed that the proposed DBN-based driver fatigue monitoring system achieves better testing accuracy of 73.29 % with 91.10 % sensitivity and 55.48 % specificity. In short, the proposed hybrid DGM-based SVM is an effective method for the detection of driver fatigue in EEG.
Big Data: A Parallel Particle Swarm Optimization-Back-Propagation Neural Network Algorithm Based on MapReduce.

PubMed

Cao, Jianfang; Cui, Hongyan; Shi, Hao; Jiao, Lijuan

2016-01-01

A back-propagation (BP) neural network can solve complicated random nonlinear mapping problems; therefore, it can be applied to a wide range of problems. However, as the sample size increases, the time required to train BP neural networks becomes lengthy. Moreover, the classification accuracy decreases as well. To improve the classification accuracy and runtime efficiency of the BP neural network algorithm, we proposed a parallel design and realization method for a particle swarm optimization (PSO)-optimized BP neural network based on MapReduce on the Hadoop platform using both the PSO algorithm and a parallel design. The PSO algorithm was used to optimize the BP neural network's initial weights and thresholds and improve the accuracy of the classification algorithm. The MapReduce parallel programming model was utilized to achieve parallel processing of the BP algorithm, thereby solving the problems of hardware and communication overhead when the BP neural network addresses big data. Datasets on 5 different scales were constructed using the scene image library from the SUN Database. The classification accuracy of the parallel PSO-BP neural network algorithm is approximately 92%, and the system efficiency is approximately 0.85, which presents obvious advantages when processing big data. The algorithm proposed in this study demonstrated both higher classification accuracy and improved time efficiency, which represents a significant improvement obtained from applying parallel processing to an intelligent algorithm on big data.
Detection and classification of Breast Cancer in Wavelet Sub-bands of Fractal Segmented Cancerous Zones.

PubMed

Shirazinodeh, Alireza; Noubari, Hossein Ahmadi; Rabbani, Hossein; Dehnavi, Alireza Mehri

2015-01-01

Recent studies on wavelet transform and fractal modeling applied on mammograms for the detection of cancerous tissues indicate that microcalcifications and masses can be utilized for the study of the morphology and diagnosis of cancerous cases. It is shown that the use of fractal modeling, as applied to a given image, can clearly discern cancerous zones from noncancerous areas. In this paper, for fractal modeling, the original image is first segmented into appropriate fractal boxes followed by identifying the fractal dimension of each windowed section using a computationally efficient two-dimensional box-counting algorithm. Furthermore, using appropriate wavelet sub-bands and image Reconstruction based on modified wavelet coefficients, it is shown that it is possible to arrive at enhanced features for detection of cancerous zones. In this paper, we have attempted to benefit from the advantages of both fractals and wavelets by introducing a new algorithm. By using a new algorithm named F1W2, the original image is first segmented into appropriate fractal boxes, and the fractal dimension of each windowed section is extracted. Following from that, by applying a maximum level threshold on fractal dimensions matrix, the best-segmented boxes are selected. In the next step, the segmented Cancerous zones which are candidates are then decomposed by utilizing standard orthogonal wavelet transform and db2 wavelet in three different resolution levels, and after nullifying wavelet coefficients of the image at the first scale and low frequency band of the third scale, the modified reconstructed image is successfully utilized for detection of breast cancer regions by applying an appropriate threshold. For detection of cancerous zones, our simulations indicate the accuracy of 90.9% for masses and 88.99% for microcalcifications detection results using the F1W2 method. For classification of detected mictocalcification into benign and malignant cases, eight features are identified and utilized in radial basis function neural network. Our simulation results indicate the accuracy of 92% classification using F1W2 method.
Systematic Model-in-the-Loop Test of Embedded Control Systems

NASA Astrophysics Data System (ADS)

Krupp, Alexander; Müller, Wolfgang

Current model-based development processes offer new opportunities for verification automation, e.g., in automotive development. The duty of functional verification is the detection of design flaws. Current functional verification approaches exhibit a major gap between requirement definition and formal property definition, especially when analog signals are involved. Besides lack of methodical support for natural language formalization, there does not exist a standardized and accepted means for formal property definition as a target for verification planning. This article addresses several shortcomings of embedded system verification. An Enhanced Classification Tree Method is developed based on the established Classification Tree Method for Embeded Systems CTM/ES which applies a hardware verification language to define a verification environment.

CNN for breaking text-based CAPTCHA with noise

NASA Astrophysics Data System (ADS)

Liu, Kaixuan; Zhang, Rong; Qing, Ke

2017-07-01

A CAPTCHA ("Completely Automated Public Turing test to tell Computers and Human Apart") system is a program that most humans can pass but current computer programs could hardly pass. As the most common type of CAPTCHAs , text-based CAPTCHA has been widely used in different websites to defense network bots. In order to breaking textbased CAPTCHA, in this paper, two trained CNN models are connected for the segmentation and classification of CAPTCHA images. Then base on these two models, we apply sliding window segmentation and voting classification methods realize an end-to-end CAPTCHA breaking system with high success rate. The experiment results show that our method is robust and effective in breaking text-based CAPTCHA with noise.
Assessing therapeutic relevance of biologically interesting, ampholytic substances based on their physicochemical and spectral characteristics with chemometric tools

NASA Astrophysics Data System (ADS)

Judycka, U.; Jagiello, K.; Bober, L.; Błażejowski, J.; Puzyn, T.

2018-06-01

Chemometric tools were applied to investigate the biological behaviour of ampholytic substances in relation to their physicochemical and spectral properties. Results of the Principal Component Analysis suggest that size of molecules and their electronic and spectral characteristics are the key properties required to predict therapeutic relevance of the compounds examined. These properties were used for developing the structure-activity classification model. The classification model allows assessing the therapeutic behaviour of ampholytic substances on the basis of solely values of descriptors that can be obtained computationally. Thus, the prediction is possible without necessity of carrying out time-consuming and expensive laboratory tests, which is its main advantage.
FT-Raman and NIR spectroscopy data fusion strategy for multivariate qualitative analysis of food fraud.

PubMed

Márquez, Cristina; López, M Isabel; Ruisánchez, Itziar; Callao, M Pilar

2016-12-01

Two data fusion strategies (high- and mid-level) combined with a multivariate classification approach (Soft Independent Modelling of Class Analogy, SIMCA) have been applied to take advantage of the synergistic effect of the information obtained from two spectroscopic techniques: FT-Raman and NIR. Mid-level data fusion consists of merging some of the previous selected variables from the spectra obtained from each spectroscopic technique and then applying the classification technique. High-level data fusion combines the SIMCA classification results obtained individually from each spectroscopic technique. Of the possible ways to make the necessary combinations, we decided to use fuzzy aggregation connective operators. As a case study, we considered the possible adulteration of hazelnut paste with almond. Using the two-class SIMCA approach, class 1 consisted of unadulterated hazelnut samples and class 2 of samples adulterated with almond. Models performance was also studied with samples adulterated with chickpea. The results show that data fusion is an effective strategy since the performance parameters are better than the individual ones: sensitivity and specificity values between 75% and 100% for the individual techniques and between 96-100% and 88-100% for the mid- and high-level data fusion strategies, respectively. Copyright © 2016 Elsevier B.V. All rights reserved.
A theory of fine structure image models with an application to detection and classification of dementia

PubMed Central

Penn, Richard; Werner, Michael; Thomas, Justin

2015-01-01

Background Estimation of stochastic process models from data is a common application of time series analysis methods. Such system identification processes are often cast as hypothesis testing exercises whose intent is to estimate model parameters and test them for statistical significance. Ordinary least squares (OLS) regression and the Levenberg-Marquardt algorithm (LMA) have proven invaluable computational tools for models being described by non-homogeneous, linear, stationary, ordinary differential equations. Methods In this paper we extend stochastic model identification to linear, stationary, partial differential equations in two independent variables (2D) and show that OLS and LMA apply equally well to these systems. The method employs an original nonparametric statistic as a test for the significance of estimated parameters. Results We show gray scale and color images are special cases of 2D systems satisfying a particular autoregressive partial difference equation which estimates an analogous partial differential equation. Several applications to medical image modeling and classification illustrate the method by correctly classifying demented and normal OLS models of axial magnetic resonance brain scans according to subject Mini Mental State Exam (MMSE) scores. Comparison with 13 image classifiers from the literature indicates our classifier is at least 14 times faster than any of them and has a classification accuracy better than all but one. Conclusions Our modeling method applies to any linear, stationary, partial differential equation and the method is readily extended to 3D whole-organ systems. Further, in addition to being a robust image classifier, estimated image models offer insights into which parameters carry the most diagnostic image information and thereby suggest finer divisions could be made within a class. Image models can be estimated in milliseconds which translate to whole-organ models in seconds; such runtimes could make real-time medicine and surgery modeling possible. PMID:26029638
19 CFR 152.16 - Judicial changes in classification.

Code of Federal Regulations, 2010 CFR

2010-04-01

... OF THE TREASURY (CONTINUED) CLASSIFICATION AND APPRAISEMENT OF MERCHANDISE Classification § 152.16 Judicial changes in classification. The following procedures apply to changes in classification made by... 19 Customs Duties 2 2010-04-01 2010-04-01 false Judicial changes in classification. 152.16 Section...
Common component classification: what can we learn from machine learning?

PubMed

Anderson, Ariana; Labus, Jennifer S; Vianna, Eduardo P; Mayer, Emeran A; Cohen, Mark S

2011-05-15

Machine learning methods have been applied to classifying fMRI scans by studying locations in the brain that exhibit temporal intensity variation between groups, frequently reporting classification accuracy of 90% or better. Although empirical results are quite favorable, one might doubt the ability of classification methods to withstand changes in task ordering and the reproducibility of activation patterns over runs, and question how much of the classification machines' power is due to artifactual noise versus genuine neurological signal. To examine the true strength and power of machine learning classifiers we create and then deconstruct a classifier to examine its sensitivity to physiological noise, task reordering, and across-scan classification ability. The models are trained and tested both within and across runs to assess stability and reproducibility across conditions. We demonstrate the use of independent components analysis for both feature extraction and artifact removal and show that removal of such artifacts can reduce predictive accuracy even when data has been cleaned in the preprocessing stages. We demonstrate how mistakes in the feature selection process can cause the cross-validation error seen in publication to be a biased estimate of the testing error seen in practice and measure this bias by purposefully making flawed models. We discuss other ways to introduce bias and the statistical assumptions lying behind the data and model themselves. Finally we discuss the complications in drawing inference from the smaller sample sizes typically seen in fMRI studies, the effects of small or unbalanced samples on the Type 1 and Type 2 error rates, and how publication bias can give a false confidence of the power of such methods. Collectively this work identifies challenges specific to fMRI classification and methods affecting the stability of models. Copyright © 2010 Elsevier Inc. All rights reserved.
Data preprocessing methods of FT-NIR spectral data for the classification cooking oil

NASA Astrophysics Data System (ADS)

Ruah, Mas Ezatul Nadia Mohd; Rasaruddin, Nor Fazila; Fong, Sim Siong; Jaafar, Mohd Zuli

2014-12-01

This recent work describes the data pre-processing method of FT-NIR spectroscopy datasets of cooking oil and its quality parameters with chemometrics method. Pre-processing of near-infrared (NIR) spectral data has become an integral part of chemometrics modelling. Hence, this work is dedicated to investigate the utility and effectiveness of pre-processing algorithms namely row scaling, column scaling and single scaling process with Standard Normal Variate (SNV). The combinations of these scaling methods have impact on exploratory analysis and classification via Principle Component Analysis plot (PCA). The samples were divided into palm oil and non-palm cooking oil. The classification model was build using FT-NIR cooking oil spectra datasets in absorbance mode at the range of 4000cm-1-14000cm-1. Savitzky Golay derivative was applied before developing the classification model. Then, the data was separated into two sets which were training set and test set by using Duplex method. The number of each class was kept equal to 2/3 of the class that has the minimum number of sample. Then, the sample was employed t-statistic as variable selection method in order to select which variable is significant towards the classification models. The evaluation of data pre-processing were looking at value of modified silhouette width (mSW), PCA and also Percentage Correctly Classified (%CC). The results show that different data processing strategies resulting to substantial amount of model performances quality. The effects of several data pre-processing i.e. row scaling, column standardisation and single scaling process with Standard Normal Variate indicated by mSW and %CC. At two PCs model, all five classifier gave high %CC except Quadratic Distance Analysis.
Probabilistic topic modeling for the analysis and classification of genomic sequences

PubMed Central

2015-01-01

Background Studies on genomic sequences for classification and taxonomic identification have a leading role in the biomedical field and in the analysis of biodiversity. These studies are focusing on the so-called barcode genes, representing a well defined region of the whole genome. Recently, alignment-free techniques are gaining more importance because they are able to overcome the drawbacks of sequence alignment techniques. In this paper a new alignment-free method for DNA sequences clustering and classification is proposed. The method is based on k-mers representation and text mining techniques. Methods The presented method is based on Probabilistic Topic Modeling, a statistical technique originally proposed for text documents. Probabilistic topic models are able to find in a document corpus the topics (recurrent themes) characterizing classes of documents. This technique, applied on DNA sequences representing the documents, exploits the frequency of fixed-length k-mers and builds a generative model for a training group of sequences. This generative model, obtained through the Latent Dirichlet Allocation (LDA) algorithm, is then used to classify a large set of genomic sequences. Results and conclusions We performed classification of over 7000 16S DNA barcode sequences taken from Ribosomal Database Project (RDP) repository, training probabilistic topic models. The proposed method is compared to the RDP tool and Support Vector Machine (SVM) classification algorithm in a extensive set of trials using both complete sequences and short sequence snippets (from 400 bp to 25 bp). Our method reaches very similar results to RDP classifier and SVM for complete sequences. The most interesting results are obtained when short sequence snippets are considered. In these conditions the proposed method outperforms RDP and SVM with ultra short sequences and it exhibits a smooth decrease of performance, at every taxonomic level, when the sequence length is decreased. PMID:25916734
Investigation of Pharmaceutical Residues in Hospital Effluents, in Ground- and Drinking Water from Bundeswehr Facilities, and their Removal During Drinking Water Purification (Arzneimittelrueckstaende in Trinkwasser(versorgungsanlagen) und Krankenhausabwaessern der Bundeswehr: Methodenentwicklung - Verkommen - Wasseraufbereitung)

DTIC Science & Technology

1999-11-01

Drinking water processing plant , Analysis, Calculation model, Field experiment 16. PRICE CODE 17. SECURITY CLASSIFICATION 18. SECURITY CLASSIFICATION...sewage effluents and from the sewer of the municipal sewage treatment plant in Berlin-Ruhleben. In the field trials, the MDWPUs that both apply reverse...waste water samples, along the municipal sewer system and In the influents and effluents of the receiving sewage treatment plants . To estimate the
Pattern-recognition techniques applied to performance monitoring of the DSS 13 34-meter antenna control assembly

NASA Technical Reports Server (NTRS)

Mellstrom, J. A.; Smyth, P.

1991-01-01

The results of applying pattern recognition techniques to diagnose fault conditions in the pointing system of one of the Deep Space network's large antennas, the DSS 13 34-meter structure, are discussed. A previous article described an experiment whereby a neural network technique was used to identify fault classes by using data obtained from a simulation model of the Deep Space Network (DSN) 70-meter antenna system. Described here is the extension of these classification techniques to the analysis of real data from the field. The general architecture and philosophy of an autonomous monitoring paradigm is described and classification results are discussed and analyzed in this context. Key features of this approach include a probabilistic time-varying context model, the effective integration of signal processing and system identification techniques with pattern recognition algorithms, and the ability to calibrate the system given limited amounts of training data. Reported here are recognition accuracies in the 97 to 98 percent range for the particular fault classes included in the experiments.
Penalized gaussian process regression and classification for high-dimensional nonlinear data.

PubMed

Yi, G; Shi, J Q; Choi, T

2011-12-01

The model based on Gaussian process (GP) prior and a kernel covariance function can be used to fit nonlinear data with multidimensional covariates. It has been used as a flexible nonparametric approach for curve fitting, classification, clustering, and other statistical problems, and has been widely applied to deal with complex nonlinear systems in many different areas particularly in machine learning. However, it is a challenging problem when the model is used for the large-scale data sets and high-dimensional data, for example, for the meat data discussed in this article that have 100 highly correlated covariates. For such data, it suffers from large variance of parameter estimation and high predictive errors, and numerically, it suffers from unstable computation. In this article, penalized likelihood framework will be applied to the model based on GPs. Different penalties will be investigated, and their ability in application given to suit the characteristics of GP models will be discussed. The asymptotic properties will also be discussed with the relevant proofs. Several applications to real biomechanical and bioinformatics data sets will be reported. © 2011, The International Biometric Society No claim to original US government works.
Single-trial EEG RSVP classification using convolutional neural networks

NASA Astrophysics Data System (ADS)

Shamwell, Jared; Lee, Hyungtae; Kwon, Heesung; Marathe, Amar R.; Lawhern, Vernon; Nothwang, William

2016-05-01

Traditionally, Brain-Computer Interfaces (BCI) have been explored as a means to return function to paralyzed or otherwise debilitated individuals. An emerging use for BCIs is in human-autonomy sensor fusion where physiological data from healthy subjects is combined with machine-generated information to enhance the capabilities of artificial systems. While human-autonomy fusion of physiological data and computer vision have been shown to improve classification during visual search tasks, to date these approaches have relied on separately trained classification models for each modality. We aim to improve human-autonomy classification performance by developing a single framework that builds codependent models of human electroencephalograph (EEG) and image data to generate fused target estimates. As a first step, we developed a novel convolutional neural network (CNN) architecture and applied it to EEG recordings of subjects classifying target and non-target image presentations during a rapid serial visual presentation (RSVP) image triage task. The low signal-to-noise ratio (SNR) of EEG inherently limits the accuracy of single-trial classification and when combined with the high dimensionality of EEG recordings, extremely large training sets are needed to prevent overfitting and achieve accurate classification from raw EEG data. This paper explores a new deep CNN architecture for generalized multi-class, single-trial EEG classification across subjects. We compare classification performance from the generalized CNN architecture trained across all subjects to the individualized XDAWN, HDCA, and CSP neural classifiers which are trained and tested on single subjects. Preliminary results show that our CNN meets and slightly exceeds the performance of the other classifiers despite being trained across subjects.
Adaptive phase k-means algorithm for waveform classification

NASA Astrophysics Data System (ADS)

Song, Chengyun; Liu, Zhining; Wang, Yaojun; Xu, Feng; Li, Xingming; Hu, Guangmin

2018-01-01

Waveform classification is a powerful technique for seismic facies analysis that describes the heterogeneity and compartments within a reservoir. Horizon interpretation is a critical step in waveform classification. However, the horizon often produces inconsistent waveform phase, and thus results in an unsatisfied classification. To alleviate this problem, an adaptive phase waveform classification method called the adaptive phase k-means is introduced in this paper. Our method improves the traditional k-means algorithm using an adaptive phase distance for waveform similarity measure. The proposed distance is a measure with variable phases as it moves from sample to sample along the traces. Model traces are also updated with the best phase interference in the iterative process. Therefore, our method is robust to phase variations caused by the interpretation horizon. We tested the effectiveness of our algorithm by applying it to synthetic and real data. The satisfactory results reveal that the proposed method tolerates certain waveform phase variation and is a good tool for seismic facies analysis.
A fingerprint classification algorithm based on combination of local and global information

NASA Astrophysics Data System (ADS)

Liu, Chongjin; Fu, Xiang; Bian, Junjie; Feng, Jufu

2011-12-01

Fingerprint recognition is one of the most important technologies in biometric identification and has been wildly applied in commercial and forensic areas. Fingerprint classification, as the fundamental procedure in fingerprint recognition, can sharply decrease the quantity for fingerprint matching and improve the efficiency of fingerprint recognition. Most fingerprint classification algorithms are based on the number and position of singular points. Because the singular points detecting method only considers the local information commonly, the classification algorithms are sensitive to noise. In this paper, we propose a novel fingerprint classification algorithm combining the local and global information of fingerprint. Firstly we use local information to detect singular points and measure their quality considering orientation structure and image texture in adjacent areas. Furthermore the global orientation model is adopted to measure the reliability of singular points group. Finally the local quality and global reliability is weighted to classify fingerprint. Experiments demonstrate the accuracy and effectivity of our algorithm especially for the poor quality fingerprint images.
Computational approaches for the classification of seed storage proteins.

PubMed

Radhika, V; Rao, V Sree Hari

2015-07-01

Seed storage proteins comprise a major part of the protein content of the seed and have an important role on the quality of the seed. These storage proteins are important because they determine the total protein content and have an effect on the nutritional quality and functional properties for food processing. Transgenic plants are being used to develop improved lines for incorporation into plant breeding programs and the nutrient composition of seeds is a major target of molecular breeding programs. Hence, classification of these proteins is crucial for the development of superior varieties with improved nutritional quality. In this study we have applied machine learning algorithms for classification of seed storage proteins. We have presented an algorithm based on nearest neighbor approach for classification of seed storage proteins and compared its performance with decision tree J48, multilayer perceptron neural (MLP) network and support vector machine (SVM) libSVM. The model based on our algorithm has been able to give higher classification accuracy in comparison to the other methods.
A Just-in-Time Learning based Monitoring and Classification Method for Hyper/Hypocalcemia Diagnosis.

PubMed

Peng, Xin; Tang, Yang; He, Wangli; Du, Wenli; Qian, Feng

2017-01-20

This study focuses on the classification and pathological status monitoring of hyper/hypo-calcemia in the calcium regulatory system. By utilizing the Independent Component Analysis (ICA) mixture model, samples from healthy patients are collected, diagnosed, and subsequently classified according to their underlying behaviors, characteristics, and mechanisms. Then, a Just-in-Time Learning (JITL) has been employed in order to estimate the diseased status dynamically. In terms of JITL, for the purpose of the construction of an appropriate similarity index to identify relevant datasets, a novel similarity index based on the ICA mixture model is proposed in this paper to improve online model quality. The validity and effectiveness of the proposed approach have been demonstrated by applying it to the calcium regulatory system under various hypocalcemic and hypercalcemic diseased conditions.
Predicting students' happiness from physiology, phone, mobility, and behavioral data.

PubMed

Jaques, Natasha; Taylor, Sara; Azaria, Asaph; Ghandeharioun, Asma; Sano, Akane; Picard, Rosalind

2015-09-01

In order to model students' happiness, we apply machine learning methods to data collected from undergrad students monitored over the course of one month each. The data collected include physiological signals, location, smartphone logs, and survey responses to behavioral questions. Each day, participants reported their wellbeing on measures including stress, health, and happiness. Because of the relationship between happiness and depression, modeling happiness may help us to detect individuals who are at risk of depression and guide interventions to help them. We are also interested in how behavioral factors (such as sleep and social activity) affect happiness positively and negatively. A variety of machine learning and feature selection techniques are compared, including Gaussian Mixture Models and ensemble classification. We achieve 70% classification accuracy of self-reported happiness on held-out test data.
SAR-based change detection using hypothesis testing and Markov random field modelling

NASA Astrophysics Data System (ADS)

Cao, W.; Martinis, S.

2015-04-01

The objective of this study is to automatically detect changed areas caused by natural disasters from bi-temporal co-registered and calibrated TerraSAR-X data. The technique in this paper consists of two steps: Firstly, an automatic coarse detection step is applied based on a statistical hypothesis test for initializing the classification. The original analytical formula as proposed in the constant false alarm rate (CFAR) edge detector is reviewed and rewritten in a compact form of the incomplete beta function, which is a builtin routine in commercial scientific software such as MATLAB and IDL. Secondly, a post-classification step is introduced to optimize the noisy classification result in the previous step. Generally, an optimization problem can be formulated as a Markov random field (MRF) on which the quality of a classification is measured by an energy function. The optimal classification based on the MRF is related to the lowest energy value. Previous studies provide methods for the optimization problem using MRFs, such as the iterated conditional modes (ICM) algorithm. Recently, a novel algorithm was presented based on graph-cut theory. This method transforms a MRF to an equivalent graph and solves the optimization problem by a max-flow/min-cut algorithm on the graph. In this study this graph-cut algorithm is applied iteratively to improve the coarse classification. At each iteration the parameters of the energy function for the current classification are set by the logarithmic probability density function (PDF). The relevant parameters are estimated by the method of logarithmic cumulants (MoLC). Experiments are performed using two flood events in Germany and Australia in 2011 and a forest fire on La Palma in 2009 using pre- and post-event TerraSAR-X data. The results show convincing coarse classifications and considerable improvement by the graph-cut post-classification step.
Integrated approach using data mining-based decision tree and object-based image analysis for high-resolution urban mapping of WorldView-2 satellite sensor data

NASA Astrophysics Data System (ADS)

Hamedianfar, Alireza; Shafri, Helmi Zulhaidi Mohd

2016-04-01

This paper integrates decision tree-based data mining (DM) and object-based image analysis (OBIA) to provide a transferable model for the detailed characterization of urban land-cover classes using WorldView-2 (WV-2) satellite images. Many articles have been published on OBIA in recent years based on DM for different applications. However, less attention has been paid to the generation of a transferable model for characterizing detailed urban land cover features. Three subsets of WV-2 images were used in this paper to generate transferable OBIA rule-sets. Many features were explored by using a DM algorithm, which created the classification rules as a decision tree (DT) structure from the first study area. The developed DT algorithm was applied to object-based classifications in the first study area. After this process, we validated the capability and transferability of the classification rules into second and third subsets. Detailed ground truth samples were collected to assess the classification results. The first, second, and third study areas achieved 88%, 85%, and 85% overall accuracies, respectively. Results from the investigation indicate that DM was an efficient method to provide the optimal and transferable classification rules for OBIA, which accelerates the rule-sets creation stage in the OBIA classification domain.
Automated morphological analysis of bone marrow cells in microscopic images for diagnosis of leukemia: nucleus-plasma separation and cell classification using a hierarchical tree model of hematopoesis

NASA Astrophysics Data System (ADS)

Krappe, Sebastian; Wittenberg, Thomas; Haferlach, Torsten; Münzenmayer, Christian

2016-03-01

The morphological differentiation of bone marrow is fundamental for the diagnosis of leukemia. Currently, the counting and classification of the different types of bone marrow cells is done manually under the use of bright field microscopy. This is a time-consuming, subjective, tedious and error-prone process. Furthermore, repeated examinations of a slide may yield intra- and inter-observer variances. For that reason a computer assisted diagnosis system for bone marrow differentiation is pursued. In this work we focus (a) on a new method for the separation of nucleus and plasma parts and (b) on a knowledge-based hierarchical tree classifier for the differentiation of bone marrow cells in 16 different classes. Classification trees are easily interpretable and understandable and provide a classification together with an explanation. Using classification trees, expert knowledge (i.e. knowledge about similar classes and cell lines in the tree model of hematopoiesis) is integrated in the structure of the tree. The proposed segmentation method is evaluated with more than 10,000 manually segmented cells. For the evaluation of the proposed hierarchical classifier more than 140,000 automatically segmented bone marrow cells are used. Future automated solutions for the morphological analysis of bone marrow smears could potentially apply such an approach for the pre-classification of bone marrow cells and thereby shortening the examination time.

Joint Feature Selection and Classification for Multilabel Learning.

PubMed

Huang, Jun; Li, Guorong; Huang, Qingming; Wu, Xindong

2018-03-01

Multilabel learning deals with examples having multiple class labels simultaneously. It has been applied to a variety of applications, such as text categorization and image annotation. A large number of algorithms have been proposed for multilabel learning, most of which concentrate on multilabel classification problems and only a few of them are feature selection algorithms. Current multilabel classification models are mainly built on a single data representation composed of all the features which are shared by all the class labels. Since each class label might be decided by some specific features of its own, and the problems of classification and feature selection are often addressed independently, in this paper, we propose a novel method which can perform joint feature selection and classification for multilabel learning, named JFSC. Different from many existing methods, JFSC learns both shared features and label-specific features by considering pairwise label correlations, and builds the multilabel classifier on the learned low-dimensional data representations simultaneously. A comparative study with state-of-the-art approaches manifests a competitive performance of our proposed method both in classification and feature selection for multilabel learning.
An application to pulmonary emphysema classification based on model of texton learning by sparse representation

NASA Astrophysics Data System (ADS)

Zhang, Min; Zhou, Xiangrong; Goshima, Satoshi; Chen, Huayue; Muramatsu, Chisako; Hara, Takeshi; Yokoyama, Ryojiro; Kanematsu, Masayuki; Fujita, Hiroshi

2012-03-01

We aim at using a new texton based texture classification method in the classification of pulmonary emphysema in computed tomography (CT) images of the lungs. Different from conventional computer-aided diagnosis (CAD) pulmonary emphysema classification methods, in this paper, firstly, the dictionary of texton is learned via applying sparse representation(SR) to image patches in the training dataset. Then the SR coefficients of the test images over the dictionary are used to construct the histograms for texture presentations. Finally, classification is performed by using a nearest neighbor classifier with a histogram dissimilarity measure as distance. The proposed approach is tested on 3840 annotated regions of interest consisting of normal tissue and mild, moderate and severe pulmonary emphysema of three subtypes. The performance of the proposed system, with an accuracy of about 88%, is comparably higher than state of the art method based on the basic rotation invariant local binary pattern histograms and the texture classification method based on texton learning by k-means, which performs almost the best among other approaches in the literature.
Model selection for anomaly detection

NASA Astrophysics Data System (ADS)

Burnaev, E.; Erofeev, P.; Smolyakov, D.

2015-12-01

Anomaly detection based on one-class classification algorithms is broadly used in many applied domains like image processing (e.g. detection of whether a patient is "cancerous" or "healthy" from mammography image), network intrusion detection, etc. Performance of an anomaly detection algorithm crucially depends on a kernel, used to measure similarity in a feature space. The standard approaches (e.g. cross-validation) for kernel selection, used in two-class classification problems, can not be used directly due to the specific nature of a data (absence of a second, abnormal, class data). In this paper we generalize several kernel selection methods from binary-class case to the case of one-class classification and perform extensive comparison of these approaches using both synthetic and real-world data.
Classification of wheat: Badhwar profile similarity technique

NASA Technical Reports Server (NTRS)

Austin, W. W.

1980-01-01

The Badwar profile similarity classification technique used successfully for classification of corn was applied to spring wheat classifications. The software programs and the procedures used to generate full-scene classifications are presented, and numerical results of the acreage estimations are given.
Classification and recognition of dynamical models: the role of phase, independent components, kernels and optimal transport.

PubMed

Bissacco, Alessandro; Chiuso, Alessandro; Soatto, Stefano

2007-11-01

We address the problem of performing decision tasks, and in particular classification and recognition, in the space of dynamical models in order to compare time series of data. Motivated by the application of recognition of human motion in image sequences, we consider a class of models that include linear dynamics, both stable and marginally stable (periodic), both minimum and non-minimum phase, driven by non-Gaussian processes. This requires extending existing learning and system identification algorithms to handle periodic modes and nonminimum phase behavior, while taking into account higher-order statistics of the data. Once a model is identified, we define a kernel-based cord distance between models that includes their dynamics, their initial conditions as well as input distribution. This is made possible by a novel kernel defined between two arbitrary (non-Gaussian) distributions, which is computed by efficiently solving an optimal transport problem. We validate our choice of models, inference algorithm, and distance on the tasks of human motion synthesis (sample paths of the learned models), and recognition (nearest-neighbor classification in the computed distance). However, our work can be applied more broadly where one needs to compare historical data while taking into account periodic trends, non-minimum phase behavior, and non-Gaussian input distributions.
Classification model based on Raman spectra of selected morphological and biochemical tissue constituents for identification of atherosclerosis in human coronary arteries.

PubMed

Peres, Marines Bertolo; Silveira, Landulfo; Zângaro, Renato Amaro; Pacheco, Marcos Tadeu Tavares; Pasqualucci, Carlos Augusto

2011-09-01

This study presents the results of Raman spectroscopy applied to the classification of arterial tissue based on a simplified model using basal morphological and biochemical information extracted from the Raman spectra of arteries. The Raman spectrograph uses an 830-nm diode laser, imaging spectrograph, and a CCD camera. A total of 111 Raman spectra from arterial fragments were used to develop the model, and those spectra were compared to the spectra of collagen, fat cells, smooth muscle cells, calcification, and cholesterol in a linear fit model. Non-atherosclerotic (NA), fatty and fibrous-fatty atherosclerotic plaques (A) and calcified (C) arteries exhibited different spectral signatures related to different morphological structures presented in each tissue type. Discriminant analysis based on Mahalanobis distance was employed to classify the tissue type with respect to the relative intensity of each compound. This model was subsequently tested prospectively in a set of 55 spectra. The simplified diagnostic model showed that cholesterol, collagen, and adipocytes were the tissue constituents that gave the best classification capability and that those changes were correlated to histopathology. The simplified model, using spectra obtained from a few tissue morphological and biochemical constituents, showed feasibility by using a small amount of variables, easily extracted from gross samples.
A thyroid nodule classification method based on TI-RADS

NASA Astrophysics Data System (ADS)

Wang, Hao; Yang, Yang; Peng, Bo; Chen, Qin

2017-07-01

Thyroid Imaging Reporting and Data System(TI-RADS) is a valuable tool for differentiating the benign and the malignant thyroid nodules. In clinic, doctors can determine the extent of being benign or malignant in terms of different classes by using TI-RADS. Classification represents the degree of malignancy of thyroid nodules. TI-RADS as a classification standard can be used to guide the ultrasonic doctor to examine thyroid nodules more accurately and reliably. In this paper, we aim to classify the thyroid nodules with the help of TI-RADS. To this end, four ultrasound signs, i.e., cystic and solid, echo pattern, boundary feature and calcification of thyroid nodules are extracted and converted into feature vectors. Then semi-supervised fuzzy C-means ensemble (SS-FCME) model is applied to obtain the classification results. The experimental results demonstrate that the proposed method can help doctors diagnose the thyroid nodules effectively.
A modified method for MRF segmentation and bias correction of MR image with intensity inhomogeneity.

PubMed

Xie, Mei; Gao, Jingjing; Zhu, Chongjin; Zhou, Yan

2015-01-01

Markov random field (MRF) model is an effective method for brain tissue classification, which has been applied in MR image segmentation for decades. However, it falls short of the expected classification in MR images with intensity inhomogeneity for the bias field is not considered in the formulation. In this paper, we propose an interleaved method joining a modified MRF classification and bias field estimation in an energy minimization framework, whose initial estimation is based on k-means algorithm in view of prior information on MRI. The proposed method has a salient advantage of overcoming the misclassifications from the non-interleaved MRF classification for the MR image with intensity inhomogeneity. In contrast to other baseline methods, experimental results also have demonstrated the effectiveness and advantages of our algorithm via its applications in the real and the synthetic MR images.
Rank preserving sparse learning for Kinect based scene classification.

PubMed

Tao, Dapeng; Jin, Lianwen; Yang, Zhao; Li, Xuelong

2013-10-01

With the rapid development of the RGB-D sensors and the promptly growing population of the low-cost Microsoft Kinect sensor, scene classification, which is a hard, yet important, problem in computer vision, has gained a resurgence of interest recently. That is because the depth of information provided by the Kinect sensor opens an effective and innovative way for scene classification. In this paper, we propose a new scheme for scene classification, which applies locality-constrained linear coding (LLC) to local SIFT features for representing the RGB-D samples and classifies scenes through the cooperation between a new rank preserving sparse learning (RPSL) based dimension reduction and a simple classification method. RPSL considers four aspects: 1) it preserves the rank order information of the within-class samples in a local patch; 2) it maximizes the margin between the between-class samples on the local patch; 3) the L1-norm penalty is introduced to obtain the parsimony property; and 4) it models the classification error minimization by utilizing the least-squares error minimization. Experiments are conducted on the NYU Depth V1 dataset and demonstrate the robustness and effectiveness of RPSL for scene classification.
Pathway activity inference for multiclass disease classification through a mathematical programming optimisation framework.

PubMed

Yang, Lingjian; Ainali, Chrysanthi; Tsoka, Sophia; Papageorgiou, Lazaros G

2014-12-05

Applying machine learning methods on microarray gene expression profiles for disease classification problems is a popular method to derive biomarkers, i.e. sets of genes that can predict disease state or outcome. Traditional approaches where expression of genes were treated independently suffer from low prediction accuracy and difficulty of biological interpretation. Current research efforts focus on integrating information on protein interactions through biochemical pathway datasets with expression profiles to propose pathway-based classifiers that can enhance disease diagnosis and prognosis. As most of the pathway activity inference methods in literature are either unsupervised or applied on two-class datasets, there is good scope to address such limitations by proposing novel methodologies. A supervised multiclass pathway activity inference method using optimisation techniques is reported. For each pathway expression dataset, patterns of its constituent genes are summarised into one composite feature, termed pathway activity, and a novel mathematical programming model is proposed to infer this feature as a weighted linear summation of expression of its constituent genes. Gene weights are determined by the optimisation model, in a way that the resulting pathway activity has the optimal discriminative power with regards to disease phenotypes. Classification is then performed on the resulting low-dimensional pathway activity profile. The model was evaluated through a variety of published gene expression profiles that cover different types of disease. We show that not only does it improve classification accuracy, but it can also perform well in multiclass disease datasets, a limitation of other approaches from the literature. Desirable features of the model include the ability to control the maximum number of genes that may participate in determining pathway activity, which may be pre-specified by the user. Overall, this work highlights the potential of building pathway-based multi-phenotype classifiers for accurate disease diagnosis and prognosis problems.
Bayes classification of interferometric TOPSAR data

NASA Technical Reports Server (NTRS)

Michel, T. R.; Rodriguez, E.; Houshmand, B.; Carande, R.

1995-01-01

We report the Bayes classification of terrain types at different sites using airborne interferometric synthetic aperture radar (INSAR) data. A Gaussian maximum likelihood classifier was applied on multidimensional observations derived from the SAR intensity, the terrain elevation model, and the magnitude of the interferometric correlation. Training sets for forested, urban, agricultural, or bare areas were obtained either by selecting samples with known ground truth, or by k-means clustering of random sets of samples uniformly distributed across all sites, and subsequent assignments of these clusters using ground truth. The accuracy of the classifier was used to optimize the discriminating efficiency of the set of features that was chosen. The most important features include the SAR intensity, a canopy penetration depth model, and the terrain slope. We demonstrate the classifier's performance across sites using a unique set of training classes for the four main terrain categories. The scenes examined include San Francisco (CA) (predominantly urban and water), Mount Adams (WA) (forested with clear cuts), Pasadena (CA) (urban with mountains), and Antioch Hills (CA) (water, swamps, fields). Issues related to the effects of image calibration and the robustness of the classification to calibration errors are explored. The relative performance of single polarization Interferometric data classification is contrasted against classification schemes based on polarimetric SAR data.
Conditional Density Estimation with HMM Based Support Vector Machines

NASA Astrophysics Data System (ADS)

Hu, Fasheng; Liu, Zhenqiu; Jia, Chunxin; Chen, Dechang

Conditional density estimation is very important in financial engineer, risk management, and other engineering computing problem. However, most regression models have a latent assumption that the probability density is a Gaussian distribution, which is not necessarily true in many real life applications. In this paper, we give a framework to estimate or predict the conditional density mixture dynamically. Through combining the Input-Output HMM with SVM regression together and building a SVM model in each state of the HMM, we can estimate a conditional density mixture instead of a single gaussian. With each SVM in each node, this model can be applied for not only regression but classifications as well. We applied this model to denoise the ECG data. The proposed method has the potential to apply to other time series such as stock market return predictions.
Differences in chewing sounds of dry-crisp snacks by multivariate data analysis

NASA Astrophysics Data System (ADS)

De Belie, N.; Sivertsvik, M.; De Baerdemaeker, J.

2003-09-01

Chewing sounds of different types of dry-crisp snacks (two types of potato chips, prawn crackers, cornflakes and low calorie snacks from extruded starch) were analysed to assess differences in sound emission patterns. The emitted sounds were recorded by a microphone placed over the ear canal. The first bite and the first subsequent chew were selected from the time signal and a fast Fourier transformation provided the power spectra. Different multivariate analysis techniques were used for classification of the snack groups. This included principal component analysis (PCA) and unfold partial least-squares (PLS) algorithms, as well as multi-way techniques such as three-way PLS, three-way PCA (Tucker3), and parallel factor analysis (PARAFAC) on the first bite and subsequent chew. The models were evaluated by calculating the classification errors and the root mean square error of prediction (RMSEP) for independent validation sets. It appeared that the logarithm of the power spectra obtained from the chewing sounds could be used successfully to distinguish the different snack groups. When different chewers were used, recalibration of the models was necessary. Multi-way models distinguished better between chewing sounds of different snack groups than PCA on bite or chew separately and than unfold PLS. From all three-way models applied, N-PLS with three components showed the best classification capabilities, resulting in classification errors of 14-18%. The major amount of incorrect classifications was due to one type of potato chips that had a very irregular shape, resulting in a wide variation of the emitted sounds.
Pattern recognition analysis and classification modeling of selenium-producing areas

USGS Publications Warehouse

Naftz, D.L.

1996-01-01

Established chemometric and geochemical techniques were applied to water quality data from 23 National Irrigation Water Quality Program (NIWQP) study areas in the Western United States. These techniques were applied to the NIWQP data set to identify common geochemical processes responsible for mobilization of selenium and to develop a classification model that uses major-ion concentrations to identify areas that contain elevated selenium concentrations in water that could pose a hazard to water fowl. Pattern recognition modeling of the simple-salt data computed with the SNORM geochemical program indicate three principal components that explain 95% of the total variance. A three-dimensional plot of PC 1, 2 and 3 scores shows three distinct clusters that correspond to distinct hydrochemical facies denoted as facies 1, 2 and 3. Facies 1 samples are distinguished by water samples without the CaCO3 simple salt and elevated concentrations of NaCl, CaSO4, MgSO4 and Na2SO4 simple salts relative to water samples in facies 2 and 3. Water samples in facies 2 are distinguished from facies 1 by the absence of the MgSO4 simple salt and the presence of the CaCO3 simple salt. Water samples in facies 3 are similar to samples in facies 2, with the absence of both MgSO4 and CaSO4 simple salts. Water samples in facies 1 have the largest selenium concentration (10 ??gl-1), compared to a median concentration of 2.0 ??gl-1 and less than 1.0 ??gl-1 for samples in facies 2 and 3. A classification model using the soft independent modeling by class analogy (SIMCA) algorithm was constructed with data from the NIWQP study areas. The classification model was successful in identifying water samples with a selenium concentration that is hazardous to some species of water-fowl from a test data set comprised of 2,060 water samples from throughout Utah and Wyoming. Application of chemometric and geochemical techniques during data synthesis analysis of multivariate environmental databases from other national-scale environmental programs such as the NIWQP could also provide useful insights for addressing 'real world' environmental problems.
Classifying clinical decision making: a unifying approach.

PubMed

Buckingham, C D; Adams, A

2000-10-01

This is the first of two linked papers exploring decision making in nursing which integrate research evidence from different clinical and academic disciplines. Currently there are many decision-making theories, each with their own distinctive concepts and terminology, and there is a tendency for separate disciplines to view their own decision-making processes as unique. Identifying good nursing decisions and where improvements can be made is therefore problematic, and this can undermine clinical and organizational effectiveness, as well as nurses' professional status. Within the unifying framework of psychological classification, the overall aim of the two papers is to clarify and compare terms, concepts and processes identified in a diversity of decision-making theories, and to demonstrate their underlying similarities. It is argued that the range of explanations used across disciplines can usefully be re-conceptualized as classification behaviour. This paper explores problems arising from multiple theories of decision making being applied to separate clinical disciplines. Attention is given to detrimental effects on nursing practice within the context of multidisciplinary health-care organizations and the changing role of nurses. The different theories are outlined and difficulties in applying them to nursing decisions highlighted. An alternative approach based on a general model of classification is then presented in detail to introduce its terminology and the unifying framework for interpreting all types of decisions. The classification model is used to provide the context for relating alternative philosophical approaches and to define decision-making activities common to all clinical domains. This may benefit nurses by improving multidisciplinary collaboration and weakening clinical elitism.
Solid phase excitation-emission fluorescence method for the classification of complex substances: Cortex Phellodendri and other traditional Chinese medicines as examples.

PubMed

Gu, Yao; Ni, Yongnian; Kokot, Serge

2012-09-13

A novel, simple and direct fluorescence method for analysis of complex substances and their potential substitutes has been researched and developed. Measurements involved excitation and emission (EEM) fluorescence spectra of powdered, complex, medicinal herbs, Cortex Phellodendri Chinensis (CPC) and the similar Cortex Phellodendri Amurensis (CPA); these substances were compared and discriminated from each other and the potentially adulterated samples (Caulis mahoniae (CM) and David poplar bark (DPB)). Different chemometrics methods were applied for resolution of the complex spectra, and the excitation spectra were found to be the most informative; only the rank-ordering PROMETHEE method was able to classify the samples with single ingredients (CPA, CPC, CM) or those with binary mixtures (CPA/CPC, CPA/CM, CPC/CM). Interestingly, it was essential to use the geometrical analysis for interactive aid (GAIA) display for a full understanding of the classification results. However, these two methods, like the other chemometrics models, were unable to classify composite spectral matrices consisting of data from samples of single ingredients and binary mixtures; this suggested that the excitation spectra of the different samples were very similar. However, the method is useful for classification of single-ingredient samples and, separately, their binary mixtures; it may also be applied for similar classification work with other complex substances.
Using remote sensing in support of environmental management: A framework for selecting products, algorithms and methods.

PubMed

de Klerk, Helen M; Gilbertson, Jason; Lück-Vogel, Melanie; Kemp, Jaco; Munch, Zahn

2016-11-01

Traditionally, to map environmental features using remote sensing, practitioners will use training data to develop models on various satellite data sets using a number of classification approaches and use test data to select a single 'best performer' from which the final map is made. We use a combination of an omission/commission plot to evaluate various results and compile a probability map based on consistently strong performing models across a range of standard accuracy measures. We suggest that this easy-to-use approach can be applied in any study using remote sensing to map natural features for management action. We demonstrate this approach using optical remote sensing products of different spatial and spectral resolution to map the endemic and threatened flora of quartz patches in the Knersvlakte, South Africa. Quartz patches can be mapped using either SPOT 5 (used due to its relatively fine spatial resolution) or Landsat8 imagery (used because it is freely accessible and has higher spectral resolution). Of the variety of classification algorithms available, we tested maximum likelihood and support vector machine, and applied these to raw spectral data, the first three PCA summaries of the data, and the standard normalised difference vegetation index. We found that there is no 'one size fits all' solution to the choice of a 'best fit' model (i.e. combination of classification algorithm or data sets), which is in agreement with the literature that classifier performance will vary with data properties. We feel this lends support to our suggestion that rather than the identification of a 'single best' model and a map based on this result alone, a probability map based on the range of consistently top performing models provides a rigorous solution to environmental mapping. Copyright © 2016 Elsevier Ltd. All rights reserved.
High-throughput screening of chemicals as functional ...

EPA Pesticide Factsheets

Identifying chemicals that provide a specific function within a product, yet have minimal impact on the human body or environment, is the goal of most formulation chemists and engineers practicing green chemistry. We present a methodology to identify potential chemical functional substitutes from large libraries of chemicals using machine learning based models. We collect and analyze publicly available information on the function of chemicals in consumer products or industrial processes to identify a suite of harmonized function categories suitable for modeling. We use structural and physicochemical descriptors for these chemicals to build 41 quantitative structure–use relationship (QSUR) models for harmonized function categories using random forest classification. We apply these models to screen a library of nearly 6400 chemicals with available structure information for potential functional substitutes. Using our Functional Use database (FUse), we could identify uses for 3121 chemicals; 4412 predicted functional uses had a probability of 80% or greater. We demonstrate the potential application of the models to high-throughput (HT) screening for “candidate alternatives” by merging the valid functional substitute classifications with hazard metrics developed from HT screening assays for bioactivity. A descriptor set could be obtained for 6356 Tox21 chemicals that have undergone a battery of HT in vitro bioactivity screening assays. By applying QSURs, we wer
GA(M)E-QSAR: a novel, fully automatic genetic-algorithm-(meta)-ensembles approach for binary classification in ligand-based drug design.

PubMed

Pérez-Castillo, Yunierkis; Lazar, Cosmin; Taminau, Jonatan; Froeyen, Mathy; Cabrera-Pérez, Miguel Ángel; Nowé, Ann

2012-09-24

Computer-aided drug design has become an important component of the drug discovery process. Despite the advances in this field, there is not a unique modeling approach that can be successfully applied to solve the whole range of problems faced during QSAR modeling. Feature selection and ensemble modeling are active areas of research in ligand-based drug design. Here we introduce the GA(M)E-QSAR algorithm that combines the search and optimization capabilities of Genetic Algorithms with the simplicity of the Adaboost ensemble-based classification algorithm to solve binary classification problems. We also explore the usefulness of Meta-Ensembles trained with Adaboost and Voting schemes to further improve the accuracy, generalization, and robustness of the optimal Adaboost Single Ensemble derived from the Genetic Algorithm optimization. We evaluated the performance of our algorithm using five data sets from the literature and found that it is capable of yielding similar or better classification results to what has been reported for these data sets with a higher enrichment of active compounds relative to the whole actives subset when only the most active chemicals are considered. More important, we compared our methodology with state of the art feature selection and classification approaches and found that it can provide highly accurate, robust, and generalizable models. In the case of the Adaboost Ensembles derived from the Genetic Algorithm search, the final models are quite simple since they consist of a weighted sum of the output of single feature classifiers. Furthermore, the Adaboost scores can be used as ranking criterion to prioritize chemicals for synthesis and biological evaluation after virtual screening experiments.
Classification without labels: learning from mixed samples in high energy physics

NASA Astrophysics Data System (ADS)

Metodiev, Eric M.; Nachman, Benjamin; Thaler, Jesse

2017-10-01

Modern machine learning techniques can be used to construct powerful models for difficult collider physics problems. In many applications, however, these models are trained on imperfect simulations due to a lack of truth-level information in the data, which risks the model learning artifacts of the simulation. In this paper, we introduce the paradigm of classification without labels (CWoLa) in which a classifier is trained to distinguish statistical mixtures of classes, which are common in collider physics. Crucially, neither individual labels nor class proportions are required, yet we prove that the optimal classifier in the CWoLa paradigm is also the optimal classifier in the traditional fully-supervised case where all label information is available. After demonstrating the power of this method in an analytical toy example, we consider a realistic benchmark for collider physics: distinguishing quark- versus gluon-initiated jets using mixed quark/gluon training samples. More generally, CWoLa can be applied to any classification problem where labels or class proportions are unknown or simulations are unreliable, but statistical mixtures of the classes are available.

Classification without labels: learning from mixed samples in high energy physics

DOE PAGES

Metodiev, Eric M.; Nachman, Benjamin; Thaler, Jesse

2017-10-25

Modern machine learning techniques can be used to construct powerful models for difficult collider physics problems. In many applications, however, these models are trained on imperfect simulations due to a lack of truth-level information in the data, which risks the model learning artifacts of the simulation. In this paper, we introduce the paradigm of classification without labels (CWoLa) in which a classifier is trained to distinguish statistical mixtures of classes, which are common in collider physics. Crucially, neither individual labels nor class proportions are required, yet we prove that the optimal classifier in the CWoLa paradigm is also the optimalmore » classifier in the traditional fully-supervised case where all label information is available. After demonstrating the power of this method in an analytical toy example, we consider a realistic benchmark for collider physics: distinguishing quark- versus gluon-initiated jets using mixed quark/gluon training samples. More generally, CWoLa can be applied to any classification problem where labels or class proportions are unknown or simulations are unreliable, but statistical mixtures of the classes are available.« less
Classification without labels: learning from mixed samples in high energy physics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Metodiev, Eric M.; Nachman, Benjamin; Thaler, Jesse

Modern machine learning techniques can be used to construct powerful models for difficult collider physics problems. In many applications, however, these models are trained on imperfect simulations due to a lack of truth-level information in the data, which risks the model learning artifacts of the simulation. In this paper, we introduce the paradigm of classification without labels (CWoLa) in which a classifier is trained to distinguish statistical mixtures of classes, which are common in collider physics. Crucially, neither individual labels nor class proportions are required, yet we prove that the optimal classifier in the CWoLa paradigm is also the optimalmore » classifier in the traditional fully-supervised case where all label information is available. After demonstrating the power of this method in an analytical toy example, we consider a realistic benchmark for collider physics: distinguishing quark- versus gluon-initiated jets using mixed quark/gluon training samples. More generally, CWoLa can be applied to any classification problem where labels or class proportions are unknown or simulations are unreliable, but statistical mixtures of the classes are available.« less
Sample classification for improved performance of PLS models applied to the quality control of deep-frying oils of different botanic origins analyzed using ATR-FTIR spectroscopy.

PubMed

Kuligowski, Julia; Carrión, David; Quintás, Guillermo; Garrigues, Salvador; de la Guardia, Miguel

2011-01-01

The selection of an appropriate calibration set is a critical step in multivariate method development. In this work, the effect of using different calibration sets, based on a previous classification of unknown samples, on the partial least squares (PLS) regression model performance has been discussed. As an example, attenuated total reflection (ATR) mid-infrared spectra of deep-fried vegetable oil samples from three botanical origins (olive, sunflower, and corn oil), with increasing polymerized triacylglyceride (PTG) content induced by a deep-frying process were employed. The use of a one-class-classifier partial least squares-discriminant analysis (PLS-DA) and a rooted binary directed acyclic graph tree provided accurate oil classification. Oil samples fried without foodstuff could be classified correctly, independent of their PTG content. However, class separation of oil samples fried with foodstuff, was less evident. The combined use of double-cross model validation with permutation testing was used to validate the obtained PLS-DA classification models, confirming the results. To discuss the usefulness of the selection of an appropriate PLS calibration set, the PTG content was determined by calculating a PLS model based on the previously selected classes. In comparison to a PLS model calculated using a pooled calibration set containing samples from all classes, the root mean square error of prediction could be improved significantly using PLS models based on the selected calibration sets using PLS-DA, ranging between 1.06 and 2.91% (w/w).
Detecting single-trial EEG evoked potential using a wavelet domain linear mixed model: application to error potentials classification.

PubMed

Spinnato, J; Roubaud, M-C; Burle, B; Torrésani, B

2015-06-01

The main goal of this work is to develop a model for multisensor signals, such as magnetoencephalography or electroencephalography (EEG) signals that account for inter-trial variability, suitable for corresponding binary classification problems. An important constraint is that the model be simple enough to handle small size and unbalanced datasets, as often encountered in BCI-type experiments. The method involves the linear mixed effects statistical model, wavelet transform, and spatial filtering, and aims at the characterization of localized discriminant features in multisensor signals. After discrete wavelet transform and spatial filtering, a projection onto the relevant wavelet and spatial channels subspaces is used for dimension reduction. The projected signals are then decomposed as the sum of a signal of interest (i.e., discriminant) and background noise, using a very simple Gaussian linear mixed model. Thanks to the simplicity of the model, the corresponding parameter estimation problem is simplified. Robust estimates of class-covariance matrices are obtained from small sample sizes and an effective Bayes plug-in classifier is derived. The approach is applied to the detection of error potentials in multichannel EEG data in a very unbalanced situation (detection of rare events). Classification results prove the relevance of the proposed approach in such a context. The combination of the linear mixed model, wavelet transform and spatial filtering for EEG classification is, to the best of our knowledge, an original approach, which is proven to be effective. This paper improves upon earlier results on similar problems, and the three main ingredients all play an important role.
Classification of M1/M2-polarized human macrophages by label-free hyperspectral reflectance confocal microscopy and multivariate analysis.

PubMed

Bertani, Francesca R; Mozetic, Pamela; Fioramonti, Marco; Iuliani, Michele; Ribelli, Giulia; Pantano, Francesco; Santini, Daniele; Tonini, Giuseppe; Trombetta, Marcella; Businaro, Luca; Selci, Stefano; Rainer, Alberto

2017-08-21

The possibility of detecting and classifying living cells in a label-free and non-invasive manner holds significant theranostic potential. In this work, Hyperspectral Imaging (HSI) has been successfully applied to the analysis of macrophagic polarization, given its central role in several pathological settings, including the regulation of tumour microenvironment. Human monocyte derived macrophages have been investigated using hyperspectral reflectance confocal microscopy, and hyperspectral datasets have been analysed in terms of M1 vs. M2 polarization by Principal Components Analysis (PCA). Following PCA, Linear Discriminant Analysis has been implemented for semi-automatic classification of macrophagic polarization from HSI data. Our results confirm the possibility to perform single-cell-level in vitro classification of M1 vs. M2 macrophages in a non-invasive and label-free manner with a high accuracy (above 98% for cells deriving from the same donor), supporting the idea of applying the technique to the study of complex interacting cellular systems, such in the case of tumour-immunity in vitro models.
Refining Time-Activity Classification of Human Subjects Using the Global Positioning System

PubMed Central

Hu, Maogui; Li, Wei; Li, Lianfa; Houston, Douglas; Wu, Jun

2016-01-01

Background Detailed spatial location information is important in accurately estimating personal exposure to air pollution. Global Position System (GPS) has been widely used in tracking personal paths and activities. Previous researchers have developed time-activity classification models based on GPS data, most of them were developed for specific regions. An adaptive model for time-location classification can be widely applied to air pollution studies that use GPS to track individual level time-activity patterns. Methods Time-activity data were collected for seven days using GPS loggers and accelerometers from thirteen adult participants from Southern California under free living conditions. We developed an automated model based on random forests to classify major time-activity patterns (i.e. indoor, outdoor-static, outdoor-walking, and in-vehicle travel). Sensitivity analysis was conducted to examine the contribution of the accelerometer data and the supplemental spatial data (i.e. roadway and tax parcel data) to the accuracy of time-activity classification. Our model was evaluated using both leave-one-fold-out and leave-one-subject-out methods. Results Maximum speeds in averaging time intervals of 7 and 5 minutes, and distance to primary highways with limited access were found to be the three most important variables in the classification model. Leave-one-fold-out cross-validation showed an overall accuracy of 99.71%. Sensitivities varied from 84.62% (outdoor walking) to 99.90% (indoor). Specificities varied from 96.33% (indoor) to 99.98% (outdoor static). The exclusion of accelerometer and ambient light sensor variables caused a slight loss in sensitivity for outdoor walking, but little loss in overall accuracy. However, leave-one-subject-out cross-validation showed considerable loss in sensitivity for outdoor static and outdoor walking conditions. Conclusions The random forests classification model can achieve high accuracy for the four major time-activity categories. The model also performed well with just GPS, road and tax parcel data. However, caution is warranted when generalizing the model developed from a small number of subjects to other populations. PMID:26919723
Application of Convolution Neural Network to the forecasts of flare classification and occurrence using SOHO MDI data

NASA Astrophysics Data System (ADS)

Park, Eunsu; Moon, Yong-Jae

2017-08-01

A Convolutional Neural Network(CNN) is one of the well-known deep-learning methods in image processing and computer vision area. In this study, we apply CNN to two kinds of flare forecasting models: flare classification and occurrence. For this, we consider several pre-trained models (e.g., AlexNet, GoogLeNet, and ResNet) and customize them by changing several options such as the number of layers, activation function, and optimizer. Our inputs are the same number of SOHO)/MDI images for each flare class (None, C, M and X) at 00:00 UT from Jan 1996 to Dec 2010 (total 1600 images). Outputs are the results of daily flare forecasting for flare class and occurrence. We build, train, and test the models on TensorFlow, which is well-known machine learning software library developed by Google. Our major results from this study are as follows. First, most of the models have accuracies more than 0.7. Second, ResNet developed by Microsoft has the best accuracies : 0.86 for flare classification and 0.84 for flare occurrence. Third, the accuracies of these models vary greatly with changing parameters. We discuss several possibilities to improve the models.
Change Detection Analysis of Water Pollution in Coimbatore Region using Different Color Models

NASA Astrophysics Data System (ADS)

Jiji, G. Wiselin; Devi, R. Naveena

2017-12-01

The data acquired through remote sensing satellites furnish facts about the land and water at varying resolutions and has been widely used for several change detection studies. Apart from the existence of many change detection methodologies and techniques, emergence of new ones continues to subsist. Existing change detection techniques exploit images that are either in gray scale or RGB color model. In this paper we introduced color models for performing change detection for water pollution. Here the polluted lakes are classified and post-classification change detection techniques are applied to RGB images and results obtained are analysed for changes to exist or not. Furthermore RGB images obtained after classification when converted to any of the two color models YCbCr and YIQ is found to produce the same results as that of the RGB model images. Thus it can be concluded that other color models like YCbCr, YIQ can be used as substitution to RGB color model for analysing change detection with regard to water pollution.
A discrimination model in waste plastics sorting using NIR hyperspectral imaging system.

PubMed

Zheng, Yan; Bai, Jiarui; Xu, Jingna; Li, Xiayang; Zhang, Yimin

2018-02-01

Classification of plastics is important in the recycling industry. A plastic identification model in the near infrared spectroscopy wavelength range 1000-2500 nm is proposed for the characterization and sorting of waste plastics using acrylonitrile butadiene styrene (ABS), polystyrene (PS), polypropylene (PP), polyethylene (PE), polyethylene terephthalate (PET), and polyvinyl chloride (PVC). The model is built by the feature wavelengths of standard samples applying the principle component analysis (PCA), and the accuracy, property and cross-validation of the model were analyzed. The model just contains a simple equation, center of mass coordinates, and radial distance, with which it is easy to develop classification and sorting software. A hyperspectral imaging system (HIS) with the identification model verified its practical application by using the unknown plastics. Results showed that the identification accuracy of unknown samples is 100%. All results suggested that the discrimination model was potential to an on-line characterization and sorting platform of waste plastics based on HIS. Copyright © 2017 Elsevier Ltd. All rights reserved.
Guidelines for a priori grouping of species in hierarchical community models

USGS Publications Warehouse

Pacifici, Krishna; Zipkin, Elise; Collazo, Jaime; Irizarry, Julissa I.; DeWan, Amielle A.

2014-01-01

Recent methodological advances permit the estimation of species richness and occurrences for rare species by linking species-level occurrence models at the community level. The value of such methods is underscored by the ability to examine the influence of landscape heterogeneity on species assemblages at large spatial scales. A salient advantage of community-level approaches is that parameter estimates for data-poor species are more precise as the estimation process borrows from data-rich species. However, this analytical benefit raises a question about the degree to which inferences are dependent on the implicit assumption of relatedness among species. Here, we assess the sensitivity of community/group-level metrics, and individual-level species inferences given various classification schemes for grouping species assemblages using multispecies occurrence models. We explore the implications of these groupings on parameter estimates for avian communities in two ecosystems: tropical forests in Puerto Rico and temperate forests in northeastern United States. We report on the classification performance and extent of variability in occurrence probabilities and species richness estimates that can be observed depending on the classification scheme used. We found estimates of species richness to be most precise and to have the best predictive performance when all of the data were grouped at a single community level. Community/group-level parameters appear to be heavily influenced by the grouping criteria, but were not driven strictly by total number of detections for species. We found different grouping schemes can provide an opportunity to identify unique assemblage responses that would not have been found if all of the species were analyzed together. We suggest three guidelines: (1) classification schemes should be determined based on study objectives; (2) model selection should be used to quantitatively compare different classification approaches; and (3) sensitivity of results to different classification approaches should be assessed. These guidelines should help researchers apply hierarchical community models in the most effective manner.
Dutch Special Education Schools for Children with Learning Disabilities in the Interwar Period

ERIC Educational Resources Information Center

van Drenth, Annemieke; van Essen, Mineke

2011-01-01

In this article Copeland's model of visualising the classification of children with learning disabilities is applied in examining the development of special education schools in the Netherlands during the interwar period. Central are three intertwined social practices: the teacher's professionalism (in pedagogic and practical concerns), the…
Linear mixing model applied to coarse resolution satellite data

NASA Technical Reports Server (NTRS)

Holben, Brent N.; Shimabukuro, Yosio E.

1992-01-01

A linear mixing model typically applied to high resolution data such as Airborne Visible/Infrared Imaging Spectrometer, Thematic Mapper, and Multispectral Scanner System is applied to the NOAA Advanced Very High Resolution Radiometer coarse resolution satellite data. The reflective portion extracted from the middle IR channel 3 (3.55 - 3.93 microns) is used with channels 1 (0.58 - 0.68 microns) and 2 (0.725 - 1.1 microns) to run the Constrained Least Squares model to generate fraction images for an area in the west central region of Brazil. The derived fraction images are compared with an unsupervised classification and the fraction images derived from Landsat TM data acquired in the same day. In addition, the relationship betweeen these fraction images and the well known NDVI images are presented. The results show the great potential of the unmixing techniques for applying to coarse resolution data for global studies.
Structural classification of CDR-H3 revisited: a lesson in antibody modeling.

PubMed

Kuroda, Daisuke; Shirai, Hiroki; Kobori, Masato; Nakamura, Haruki

2008-11-15

Among the six complementarity-determining regions (CDRs) in the variable domains of an antibody, the third CDR of the heavy chain (CDR-H3), which lies in the center of the antigen-binding site, plays a particularly important role in antigen recognition. CDR-H3 shows significant variability in its length, sequence, and structure. Although difficult, model building of this segment is the most critical step in antibody modeling. Since our first proposal of the "H3-rules," which classify CDR-H3 structure based on amino acid sequence, the number of experimentally determined antibody structures has increased. Here, we revise these H3-rules and propose an improved classification scheme for CDR-H3 structure modeling. In addition, we determine the common features of CDR-H3 in antibody drugs as well as discuss the concept of "antibody druggability," which can be applied as an indicator of antibody evaluation during drug discovery.
Classification of hyperspectral imagery using MapReduce on a NVIDIA graphics processing unit (Conference Presentation)

NASA Astrophysics Data System (ADS)

Ramirez, Andres; Rahnemoonfar, Maryam

2017-04-01

A hyperspectral image provides multidimensional figure rich in data consisting of hundreds of spectral dimensions. Analyzing the spectral and spatial information of such image with linear and non-linear algorithms will result in high computational time. In order to overcome this problem, this research presents a system using a MapReduce-Graphics Processing Unit (GPU) model that can help analyzing a hyperspectral image through the usage of parallel hardware and a parallel programming model, which will be simpler to handle compared to other low-level parallel programming models. Additionally, Hadoop was used as an open-source version of the MapReduce parallel programming model. This research compared classification accuracy results and timing results between the Hadoop and GPU system and tested it against the following test cases: the CPU and GPU test case, a CPU test case and a test case where no dimensional reduction was applied.
Detection of stress factors in crop and weed species using hyperspectral remote sensing reflectance

NASA Astrophysics Data System (ADS)

Henry, William Brien

The primary objective of this work was to determine if stress factors such as moisture stress or herbicide injury stress limit the ability to distinguish between weeds and crops using remotely sensed data. Additional objectives included using hyperspectral reflectance data to measure moisture content within a species, and to measure crop injury in response to drift rates of non-selective herbicides. Moisture stress did not reduce the ability to discriminate between species. Regardless of analysis technique, the trend was that as moisture stress increased, so too did the ability to distinguish between species. Signature amplitudes (SA) of the top 5 bands, discrete wavelet transforms (DWT), and multiple indices were promising analysis techniques. Discriminant models created from one year's data set and validated on additional data sets provided, on average, approximately 80% accurate classification among weeds and crop. This suggests that these models are relatively robust and could potentially be used across environmental conditions in field scenarios. Distinguishing between leaves grown at high-moisture stress and no-stress was met with limited success, primarily because there was substantial variation among samples within the treatments. Leaf water potential (LWP) was measured, and these were classified into three categories using indices. Classification accuracies were as high as 68%. The 10 bands most highly correlated to LWP were selected; however, there were no obvious trends or patterns in these top 10 bands with respect to time, species or moisture level, suggesting that LWP is an elusive parameter to quantify spectrally. In order to address herbicide injury stress and its impact on species discrimination, discriminant models were created from combinations of multiple indices. The model created from the second experimental run's data set and validated on the first experimental run's data provided an average of 97% correct classification of soybean and an overall average classification accuracy of 65% for all species. This suggests that these models are relatively robust and could potentially be used across a wide range of herbicide applications in field scenarios. From the pooled data set, a single discriminant model was created with multiple indices that discriminated soybean from weeds 88%, on average, regardless of herbicide, rate or species. Several analysis techniques including multiple indices, signature amplitude with spectral bands as features, and wavelet analysis were employed to distinguish between herbicide-treated and nontreated plants. Classification accuracy using signature amplitude (SA) analysis of paraquat injury on soybean was better than 75% for both 1/2 and 1/8X rates at 1, 4, and 7 DAA. Classification accuracy of paraquat injury on corn was better than 72% for the 1/2X rate at 1, 4, and 7 DAA. These data suggest that hyperspectral reflectance may be used to distinguish between healthy plants and injured plants to which herbicides have been applied; however, the classification accuracies remained at 75% or higher only when the higher rates of herbicide were applied. (Abstract shortened by UMI.)
High-Reproducibility and High-Accuracy Method for Automated Topic Classification

NASA Astrophysics Data System (ADS)

Lancichinetti, Andrea; Sirer, M. Irmak; Wang, Jane X.; Acuna, Daniel; Körding, Konrad; Amaral, Luís A. Nunes

2015-01-01

Much of human knowledge sits in large databases of unstructured text. Leveraging this knowledge requires algorithms that extract and record metadata on unstructured text documents. Assigning topics to documents will enable intelligent searching, statistical characterization, and meaningful classification. Latent Dirichlet allocation (LDA) is the state of the art in topic modeling. Here, we perform a systematic theoretical and numerical analysis that demonstrates that current optimization techniques for LDA often yield results that are not accurate in inferring the most suitable model parameters. Adapting approaches from community detection in networks, we propose a new algorithm that displays high reproducibility and high accuracy and also has high computational efficiency. We apply it to a large set of documents in the English Wikipedia and reveal its hierarchical structure.
The use of multi-temporal Landsat Normalized Difference Vegetation Index (NDVI) data for mapping fuels in Yosemite National Park, USA

USGS Publications Warehouse

Van Wagtendonk, Jan W.; Root, Ralph R.

2003-01-01

The objective of this study was to test the applicability of using Normalized Difference Vegetation Index (NDVI) values derived from a temporal sequence of six Landsat Thematic Mapper (TM) scenes to map fuel models for Yosemite National Park, USA. An unsupervised classification algorithm was used to define 30 unique spectral-temporal classes of NDVI values. A combination of graphical, statistical and visual techniques was used to characterize the 30 classes and identify those that responded similarly and could be combined into fuel models. The final classification of fuel models included six different types: short annual and perennial grasses, tall perennial grasses, medium brush and evergreen hardwoods, short-needled conifers with no heavy fuels, long-needled conifers and deciduous hardwoods, and short-needled conifers with a component of heavy fuels. The NDVI, when analysed over a season of phenologically distinct periods along with ancillary data, can elicit information necessary to distinguish fuel model types. Fuels information derived from remote sensors has proven to be useful for initial classification of fuels and has been applied to fire management situations on the ground.
Sparse kernel methods for high-dimensional survival data.

PubMed

Evers, Ludger; Messow, Claudia-Martina

2008-07-15

Sparse kernel methods like support vector machines (SVM) have been applied with great success to classification and (standard) regression settings. Existing support vector classification and regression techniques however are not suitable for partly censored survival data, which are typically analysed using Cox's proportional hazards model. As the partial likelihood of the proportional hazards model only depends on the covariates through inner products, it can be 'kernelized'. The kernelized proportional hazards model however yields a solution that is dense, i.e. the solution depends on all observations. One of the key features of an SVM is that it yields a sparse solution, depending only on a small fraction of the training data. We propose two methods. One is based on a geometric idea, where-akin to support vector classification-the margin between the failed observation and the observations currently at risk is maximised. The other approach is based on obtaining a sparse model by adding observations one after another akin to the Import Vector Machine (IVM). Data examples studied suggest that both methods can outperform competing approaches. Software is available under the GNU Public License as an R package and can be obtained from the first author's website http://www.maths.bris.ac.uk/~maxle/software.html.
5 CFR 1312.8 - Standard identification and markings.

Code of Federal Regulations, 2014 CFR

2014-01-01

... CLASSIFICATION, DOWNGRADING, DECLASSIFICATION AND SAFEGUARDING OF NATIONAL SECURITY INFORMATION Classification.... (a) Original classification. At the time classified material is produced, the classifier shall apply...: (1) Classification authority. The name/personal identifier, and position title of the original...
5 CFR 1312.8 - Standard identification and markings.

Code of Federal Regulations, 2013 CFR

2013-01-01

... CLASSIFICATION, DOWNGRADING, DECLASSIFICATION AND SAFEGUARDING OF NATIONAL SECURITY INFORMATION Classification.... (a) Original classification. At the time classified material is produced, the classifier shall apply...: (1) Classification authority. The name/personal identifier, and position title of the original...

5 CFR 1312.8 - Standard identification and markings.

Code of Federal Regulations, 2012 CFR

2012-01-01

... CLASSIFICATION, DOWNGRADING, DECLASSIFICATION AND SAFEGUARDING OF NATIONAL SECURITY INFORMATION Classification.... (a) Original classification. At the time classified material is produced, the classifier shall apply...: (1) Classification authority. The name/personal identifier, and position title of the original...
Big Data: A Parallel Particle Swarm Optimization-Back-Propagation Neural Network Algorithm Based on MapReduce

PubMed Central

Cao, Jianfang; Cui, Hongyan; Shi, Hao; Jiao, Lijuan

2016-01-01

A back-propagation (BP) neural network can solve complicated random nonlinear mapping problems; therefore, it can be applied to a wide range of problems. However, as the sample size increases, the time required to train BP neural networks becomes lengthy. Moreover, the classification accuracy decreases as well. To improve the classification accuracy and runtime efficiency of the BP neural network algorithm, we proposed a parallel design and realization method for a particle swarm optimization (PSO)-optimized BP neural network based on MapReduce on the Hadoop platform using both the PSO algorithm and a parallel design. The PSO algorithm was used to optimize the BP neural network’s initial weights and thresholds and improve the accuracy of the classification algorithm. The MapReduce parallel programming model was utilized to achieve parallel processing of the BP algorithm, thereby solving the problems of hardware and communication overhead when the BP neural network addresses big data. Datasets on 5 different scales were constructed using the scene image library from the SUN Database. The classification accuracy of the parallel PSO-BP neural network algorithm is approximately 92%, and the system efficiency is approximately 0.85, which presents obvious advantages when processing big data. The algorithm proposed in this study demonstrated both higher classification accuracy and improved time efficiency, which represents a significant improvement obtained from applying parallel processing to an intelligent algorithm on big data. PMID:27304987
Multiple-rule bias in the comparison of classification rules

PubMed Central

Yousefi, Mohammadmahdi R.; Hua, Jianping; Dougherty, Edward R.

2011-01-01

Motivation: There is growing discussion in the bioinformatics community concerning overoptimism of reported results. Two approaches contributing to overoptimism in classification are (i) the reporting of results on datasets for which a proposed classification rule performs well and (ii) the comparison of multiple classification rules on a single dataset that purports to show the advantage of a certain rule. Results: This article provides a careful probabilistic analysis of the second issue and the ‘multiple-rule bias’, resulting from choosing a classification rule having minimum estimated error on the dataset. It quantifies this bias corresponding to estimating the expected true error of the classification rule possessing minimum estimated error and it characterizes the bias from estimating the true comparative advantage of the chosen classification rule relative to the others by the estimated comparative advantage on the dataset. The analysis is applied to both synthetic and real data using a number of classification rules and error estimators. Availability: We have implemented in C code the synthetic data distribution model, classification rules, feature selection routines and error estimation methods. The code for multiple-rule analysis is implemented in MATLAB. The source code is available at http://gsp.tamu.edu/Publications/supplementary/yousefi11a/. Supplementary simulation results are also included. Contact: edward@ece.tamu.edu Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:21546390
The role of identity in the DSM-5 classification of personality disorders.

PubMed

Schmeck, Klaus; Schlüter-Müller, Susanne; Foelsch, Pamela A; Doering, Stephan

2013-07-31

In the revised Diagnostic and Statistical Manual DSM-5 the definition of personality disorder diagnoses has not been changed from that in the DSM-IV-TR. However, an alternative model for diagnosing personality disorders where the construct "identity" has been integrated as a central diagnostic criterion for personality disorders has been placed in section III of the manual. The alternative model's hybrid nature leads to the simultaneous use of diagnoses and the newly developed "Level of Personality Functioning-Scale" (a dimensional tool to define the severity of the disorder). Pathological personality traits are assessed in five broad domains which are divided into 25 trait facets. With this dimensional approach, the new classification system gives, both clinicians and researchers, the opportunity to describe the patient in much more detail than previously possible. The relevance of identity problems in assessing and understanding personality pathology is illustrated using the new classification system applied in two case examples of adolescents with a severe personality disorder.
A hybrid technique for speech segregation and classification using a sophisticated deep neural network

PubMed Central

Nawaz, Tabassam; Mehmood, Zahid; Rashid, Muhammad; Habib, Hafiz Adnan

2018-01-01

Recent research on speech segregation and music fingerprinting has led to improvements in speech segregation and music identification algorithms. Speech and music segregation generally involves the identification of music followed by speech segregation. However, music segregation becomes a challenging task in the presence of noise. This paper proposes a novel method of speech segregation for unlabelled stationary noisy audio signals using the deep belief network (DBN) model. The proposed method successfully segregates a music signal from noisy audio streams. A recurrent neural network (RNN)-based hidden layer segregation model is applied to remove stationary noise. Dictionary-based fisher algorithms are employed for speech classification. The proposed method is tested on three datasets (TIMIT, MIR-1K, and MusicBrainz), and the results indicate the robustness of proposed method for speech segregation. The qualitative and quantitative analysis carried out on three datasets demonstrate the efficiency of the proposed method compared to the state-of-the-art speech segregation and classification-based methods. PMID:29558485
Hierarchical classification method and its application in shape representation

NASA Astrophysics Data System (ADS)

Ireton, M. A.; Oakley, John P.; Xydeas, Costas S.

1992-04-01

In this paper we describe a technique for performing shaped-based content retrieval of images from a large database. In order to be able to formulate such user-generated queries about visual objects, we have developed an hierarchical classification technique. This hierarchical classification technique enables similarity matching between objects, with the position in the hierarchy signifying the level of generality to be used in the query. The classification technique is unsupervised, robust, and general; it can be applied to any suitable parameter set. To establish the potential of this classifier for aiding visual querying, we have applied it to the classification of the 2-D outlines of leaves.
Predicting students’ happiness from physiology, phone, mobility, and behavioral data

PubMed Central

Jaques, Natasha; Taylor, Sara; Azaria, Asaph; Ghandeharioun, Asma; Sano, Akane; Picard, Rosalind

2017-01-01

In order to model students’ happiness, we apply machine learning methods to data collected from undergrad students monitored over the course of one month each. The data collected include physiological signals, location, smartphone logs, and survey responses to behavioral questions. Each day, participants reported their wellbeing on measures including stress, health, and happiness. Because of the relationship between happiness and depression, modeling happiness may help us to detect individuals who are at risk of depression and guide interventions to help them. We are also interested in how behavioral factors (such as sleep and social activity) affect happiness positively and negatively. A variety of machine learning and feature selection techniques are compared, including Gaussian Mixture Models and ensemble classification. We achieve 70% classification accuracy of self-reported happiness on held-out test data. PMID:28515966
Prediction of cause of death from forensic autopsy reports using text classification techniques: A comparative study.

PubMed

Mujtaba, Ghulam; Shuib, Liyana; Raj, Ram Gopal; Rajandram, Retnagowri; Shaikh, Khairunisa

2018-07-01

Automatic text classification techniques are useful for classifying plaintext medical documents. This study aims to automatically predict the cause of death from free text forensic autopsy reports by comparing various schemes for feature extraction, term weighing or feature value representation, text classification, and feature reduction. For experiments, the autopsy reports belonging to eight different causes of death were collected, preprocessed and converted into 43 master feature vectors using various schemes for feature extraction, representation, and reduction. The six different text classification techniques were applied on these 43 master feature vectors to construct a classification model that can predict the cause of death. Finally, classification model performance was evaluated using four performance measures i.e. overall accuracy, macro precision, macro-F-measure, and macro recall. From experiments, it was found that that unigram features obtained the highest performance compared to bigram, trigram, and hybrid-gram features. Furthermore, in feature representation schemes, term frequency, and term frequency with inverse document frequency obtained similar and better results when compared with binary frequency, and normalized term frequency with inverse document frequency. Furthermore, the chi-square feature reduction approach outperformed Pearson correlation, and information gain approaches. Finally, in text classification algorithms, support vector machine classifier outperforms random forest, Naive Bayes, k-nearest neighbor, decision tree, and ensemble-voted classifier. Our results and comparisons hold practical importance and serve as references for future works. Moreover, the comparison outputs will act as state-of-art techniques to compare future proposals with existing automated text classification techniques. Copyright © 2017 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
6 CFR 7.24 - Duration of classification.

Code of Federal Regulations, 2013 CFR

2013-01-01

... 6 Domestic Security 1 2013-01-01 2013-01-01 false Duration of classification. 7.24 Section 7.24... INFORMATION Classified Information § 7.24 Duration of classification. (a) At the time of original classification, original classification authorities shall apply a date or event in which the information will be...
6 CFR 7.24 - Duration of classification.

Code of Federal Regulations, 2014 CFR

2014-01-01

... 6 Domestic Security 1 2014-01-01 2014-01-01 false Duration of classification. 7.24 Section 7.24... INFORMATION Classified Information § 7.24 Duration of classification. (a) At the time of original classification, original classification authorities shall apply a date or event in which the information will be...
6 CFR 7.24 - Duration of classification.

Code of Federal Regulations, 2012 CFR

2012-01-01

... 6 Domestic Security 1 2012-01-01 2012-01-01 false Duration of classification. 7.24 Section 7.24... INFORMATION Classified Information § 7.24 Duration of classification. (a) At the time of original classification, original classification authorities shall apply a date or event in which the information will be...
6 CFR 7.24 - Duration of classification.

Code of Federal Regulations, 2011 CFR

2011-01-01

... 6 Domestic Security 1 2011-01-01 2011-01-01 false Duration of classification. 7.24 Section 7.24... INFORMATION Classified Information § 7.24 Duration of classification. (a) At the time of original classification, original classification authorities shall apply a date or event in which the information will be...
Progressive Classification Using Support Vector Machines

NASA Technical Reports Server (NTRS)

Wagstaff, Kiri; Kocurek, Michael

2009-01-01

An algorithm for progressive classification of data, analogous to progressive rendering of images, makes it possible to compromise between speed and accuracy. This algorithm uses support vector machines (SVMs) to classify data. An SVM is a machine learning algorithm that builds a mathematical model of the desired classification concept by identifying the critical data points, called support vectors. Coarse approximations to the concept require only a few support vectors, while precise, highly accurate models require far more support vectors. Once the model has been constructed, the SVM can be applied to new observations. The cost of classifying a new observation is proportional to the number of support vectors in the model. When computational resources are limited, an SVM of the appropriate complexity can be produced. However, if the constraints are not known when the model is constructed, or if they can change over time, a method for adaptively responding to the current resource constraints is required. This capability is particularly relevant for spacecraft (or any other real-time systems) that perform onboard data analysis. The new algorithm enables the fast, interactive application of an SVM classifier to a new set of data. The classification process achieved by this algorithm is characterized as progressive because a coarse approximation to the true classification is generated rapidly and thereafter iteratively refined. The algorithm uses two SVMs: (1) a fast, approximate one and (2) slow, highly accurate one. New data are initially classified by the fast SVM, producing a baseline approximate classification. For each classified data point, the algorithm calculates a confidence index that indicates the likelihood that it was classified correctly in the first pass. Next, the data points are sorted by their confidence indices and progressively reclassified by the slower, more accurate SVM, starting with the items most likely to be incorrectly classified. The user can halt this reclassification process at any point, thereby obtaining the best possible result for a given amount of computation time. Alternatively, the results can be displayed as they are generated, providing the user with real-time feedback about the current accuracy of classification.
Variable Selection for Road Segmentation in Aerial Images

NASA Astrophysics Data System (ADS)

Warnke, S.; Bulatov, D.

2017-05-01

For extraction of road pixels from combined image and elevation data, Wegner et al. (2015) proposed classification of superpixels into road and non-road, after which a refinement of the classification results using minimum cost paths and non-local optimization methods took place. We believed that the variable set used for classification was to a certain extent suboptimal, because many variables were redundant while several features known as useful in Photogrammetry and Remote Sensing are missed. This motivated us to implement a variable selection approach which builds a model for classification using portions of training data and subsets of features, evaluates this model, updates the feature set, and terminates when a stopping criterion is satisfied. The choice of classifier is flexible; however, we tested the approach with Logistic Regression and Random Forests, and taylored the evaluation module to the chosen classifier. To guarantee a fair comparison, we kept the segment-based approach and most of the variables from the related work, but we extended them by additional, mostly higher-level features. Applying these superior features, removing the redundant ones, as well as using more accurately acquired 3D data allowed to keep stable or even to reduce the misclassification error in a challenging dataset.
Evolutionary fuzzy ARTMAP neural networks for classification of semiconductor defects.

PubMed

Tan, Shing Chiang; Watada, Junzo; Ibrahim, Zuwairie; Khalid, Marzuki

2015-05-01

Wafer defect detection using an intelligent system is an approach of quality improvement in semiconductor manufacturing that aims to enhance its process stability, increase production capacity, and improve yields. Occasionally, only few records that indicate defective units are available and they are classified as a minority group in a large database. Such a situation leads to an imbalanced data set problem, wherein it engenders a great challenge to deal with by applying machine-learning techniques for obtaining effective solution. In addition, the database may comprise overlapping samples of different classes. This paper introduces two models of evolutionary fuzzy ARTMAP (FAM) neural networks to deal with the imbalanced data set problems in a semiconductor manufacturing operations. In particular, both the FAM models and hybrid genetic algorithms are integrated in the proposed evolutionary artificial neural networks (EANNs) to classify an imbalanced data set. In addition, one of the proposed EANNs incorporates a facility to learn overlapping samples of different classes from the imbalanced data environment. The classification results of the proposed evolutionary FAM neural networks are presented, compared, and analyzed using several classification metrics. The outcomes positively indicate the effectiveness of the proposed networks in handling classification problems with imbalanced data sets.
Quantification of urban structure on building block level utilizing multisensoral remote sensing data

NASA Astrophysics Data System (ADS)

Wurm, Michael; Taubenböck, Hannes; Dech, Stefan

2010-10-01

Dynamics of urban environments are a challenge to a sustainable development. Urban areas promise wealth, realization of individual dreams and power. Hence, many cities are characterized by a population growth as well as physical development. Traditional, visual mapping and updating of urban structure information of cities is a very laborious and cost-intensive task, especially for large urban areas. For this purpose, we developed a workflow for the extraction of the relevant information by means of object-based image classification. In this manner, multisensoral remote sensing data has been analyzed in terms of very high resolution optical satellite imagery together with height information by a digital surface model to retrieve a detailed 3D city model with the relevant land-use / land-cover information. This information has been aggregated on the level of the building block to describe the urban structure by physical indicators. A comparison between the indicators derived by the classification and a reference classification has been accomplished to show the correlation between the individual indicators and a reference classification of urban structure types. The indicators have been used to apply a cluster analysis to group the individual blocks into similar clusters.
Gender classification under extended operating conditions

NASA Astrophysics Data System (ADS)

Rude, Howard N.; Rizki, Mateen

2014-06-01

Gender classification is a critical component of a robust image security system. Many techniques exist to perform gender classification using facial features. In contrast, this paper explores gender classification using body features extracted from clothed subjects. Several of the most effective types of features for gender classification identified in literature were implemented and applied to the newly developed Seasonal Weather And Gender (SWAG) dataset. SWAG contains video clips of approximately 2000 samples of human subjects captured over a period of several months. The subjects are wearing casual business attire and outer garments appropriate for the specific weather conditions observed in the Midwest. The results from a series of experiments are presented that compare the classification accuracy of systems that incorporate various types and combinations of features applied to multiple looks at subjects at different image resolutions to determine a baseline performance for gender classification.
Artificial neural network for normal, hypertensive, and preeclamptic pregnancy classification using maternal heart rate variability indexes.

PubMed

Tejera, Eduardo; Jose Areias, Maria; Rodrigues, Ana; Ramõa, Ana; Manuel Nieto-Villar, Jose; Rebelo, Irene

2011-09-01

A model construction for classification of women with normal, hypertensive and preeclamptic pregnancy in different gestational ages using maternal heart rate variability (HRV) indexes. In the present work, we applied the artificial neural network for the classification problem, using the signal composed by the time intervals between consecutive RR peaks (RR) (n = 568) obtained from ECG records. Beside the HRV indexes, we also considered other factors like maternal history and blood pressure measurements. The obtained result reveals sensitivity for preeclampsia around 80% that increases for hypertensive and normal pregnancy groups. On the other hand, specificity is around 85-90%. These results indicate that the combination of HRV indexes with artificial neural networks (ANN) could be helpful for pregnancy study and characterization.
Motion data classification on the basis of dynamic time warping with a cloud point distance measure

NASA Astrophysics Data System (ADS)

Switonski, Adam; Josinski, Henryk; Zghidi, Hafedh; Wojciechowski, Konrad

2016-06-01

The paper deals with the problem of classification of model free motion data. The nearest neighbors classifier which is based on comparison performed by Dynamic Time Warping transform with cloud point distance measure is proposed. The classification utilizes both specific gait features reflected by a movements of subsequent skeleton joints and anthropometric data. To validate proposed approach human gait identification challenge problem is taken into consideration. The motion capture database containing data of 30 different humans collected in Human Motion Laboratory of Polish-Japanese Academy of Information Technology is used. The achieved results are satisfactory, the obtained accuracy of human recognition exceeds 90%. What is more, the applied cloud point distance measure does not depend on calibration process of motion capture system which results in reliable validation.
A real-time heat strain risk classifier using heart rate and skin temperature.

PubMed

Buller, Mark J; Latzka, William A; Yokota, Miyo; Tharion, William J; Moran, Daniel S

2008-12-01

Heat injury is a real concern to workers engaged in physically demanding tasks in high heat strain environments. Several real-time physiological monitoring systems exist that can provide indices of heat strain, e.g. physiological strain index (PSI), and provide alerts to medical personnel. However, these systems depend on core temperature measurement using expensive, ingestible thermometer pills. Seeking a better solution, we suggest the use of a model which can identify the probability that individuals are 'at risk' from heat injury using non-invasive measures. The intent is for the system to identify individuals who need monitoring more closely or who should apply heat strain mitigation strategies. We generated a model that can identify 'at risk' (PSI 7.5) workers from measures of heart rate and chest skin temperature. The model was built using data from six previously published exercise studies in which some subjects wore chemical protective equipment. The model has an overall classification error rate of 10% with one false negative error (2.7%), and outperforms an earlier model and a least squares regression model with classification errors of 21% and 14%, respectively. Additionally, the model allows the classification criteria to be adjusted based on the task and acceptable level of risk. We conclude that the model could be a valuable part of a multi-faceted heat strain management system.

From learning taxonomies to phylogenetic learning: integration of 16S rRNA gene data into FAME-based bacterial classification.

PubMed

Slabbinck, Bram; Waegeman, Willem; Dawyndt, Peter; De Vos, Paul; De Baets, Bernard

2010-01-30

Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for the discrimination of bacterial species. Summarized, by phylogenetic learning we are able to situate and evaluate FAME-based bacterial species classification in a more informative context.
From learning taxonomies to phylogenetic learning: Integration of 16S rRNA gene data into FAME-based bacterial classification

PubMed Central

2010-01-01

Background Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. Results In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. Conclusions FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for the discrimination of bacterial species. Summarized, by phylogenetic learning we are able to situate and evaluate FAME-based bacterial species classification in a more informative context. PMID:20113515
New activity-based funding model for Australian private sector overnight rehabilitation cases: the rehabilitation Australian National Sub-Acute and Non-Acute Patient (AN-SNAP) model.

PubMed

Hanning, Brian; Predl, Nicolle

2015-09-01

Traditional overnight rehabilitation payment models in the private sector are not based on a rigorous classification system and vary greatly between contracts with no consideration of patient complexity. The payment rates are not based on relative cost and the length-of-stay (LOS) point at which a reduced rate applies (step downs) varies markedly. The rehabilitation Australian National Sub-Acute and Non-Acute Patient (AN-SNAP) model (RAM), which has been in place for over 2 years in some private hospitals, bases payment on a rigorous classification system, relative cost and industry LOS. RAM is in the process of being rolled out more widely. This paper compares and contrasts RAM with traditional overnight rehabilitation payment models. It considers the advantages of RAM for hospitals and Australian Health Service Alliance. It also considers payment model changes in the context of maintaining industry consistency with Electronic Claims Lodgement and Information Processing System Environment (ECLIPSE) and health reform generally.
Drift diffusion model of reward and punishment learning in schizophrenia: Modeling and experimental data.

PubMed

Moustafa, Ahmed A; Kéri, Szabolcs; Somlai, Zsuzsanna; Balsdon, Tarryn; Frydecka, Dorota; Misiak, Blazej; White, Corey

2015-09-15

In this study, we tested reward- and punishment learning performance using a probabilistic classification learning task in patients with schizophrenia (n=37) and healthy controls (n=48). We also fit subjects' data using a Drift Diffusion Model (DDM) of simple decisions to investigate which components of the decision process differ between patients and controls. Modeling results show between-group differences in multiple components of the decision process. Specifically, patients had slower motor/encoding time, higher response caution (favoring accuracy over speed), and a deficit in classification learning for punishment, but not reward, trials. The results suggest that patients with schizophrenia adopt a compensatory strategy of favoring accuracy over speed to improve performance, yet still show signs of a deficit in learning based on negative feedback. Our data highlights the importance of applying fitting models (particularly drift diffusion models) to behavioral data. The implications of these findings are discussed relative to theories of schizophrenia and cognitive processing. Copyright © 2015 Elsevier B.V. All rights reserved.
Applying Active Learning to Assertion Classification of Concepts in Clinical Text

PubMed Central

Chen, Yukun; Mani, Subramani; Xu, Hua

2012-01-01

Supervised machine learning methods for clinical natural language processing (NLP) research require a large number of annotated samples, which are very expensive to build because of the involvement of physicians. Active learning, an approach that actively samples from a large pool, provides an alternative solution. Its major goal in classification is to reduce the annotation effort while maintaining the quality of the predictive model. However, few studies have investigated its uses in clinical NLP. This paper reports an application of active learning to a clinical text classification task: to determine the assertion status of clinical concepts. The annotated corpus for the assertion classification task in the 2010 i2b2/VA Clinical NLP Challenge was used in this study. We implemented several existing and newly developed active learning algorithms and assessed their uses. The outcome is reported in the global ALC score, based on the Area under the average Learning Curve of the AUC (Area Under the Curve) score. Results showed that when the same number of annotated samples was used, active learning strategies could generate better classification models (best ALC – 0.7715) than the passive learning method (random sampling) (ALC – 0.7411). Moreover, to achieve the same classification performance, active learning strategies required fewer samples than the random sampling method. For example, to achieve an AUC of 0.79, the random sampling method used 32 samples, while our best active learning algorithm required only 12 samples, a reduction of 62.5% in manual annotation effort. PMID:22127105
Determination of the ecological connectivity between landscape patches obtained using the knowledge engineer (expert) classification technique

NASA Astrophysics Data System (ADS)

Selim, Serdar; Sonmez, Namik Kemal; Onur, Isin; Coslu, Mesut

2017-10-01

Connection of similar landscape patches with ecological corridors supports habitat quality of these patches, increases urban ecological quality, and constitutes an important living and expansion area for wild life. Furthermore, habitat connectivity provided by urban green areas is supporting biodiversity in urban areas. In this study, possible ecological connections between landscape patches, which were achieved by using Expert classification technique and modeled with probabilistic connection index. Firstly, the reflection responses of plants to various bands are used as data in hypotheses. One of the important features of this method is being able to use more than one image at the same time in the formation of the hypothesis. For this reason, before starting the application of the Expert classification, the base images are prepared. In addition to the main image, the hypothesis conditions were also created for each class with the NDVI image which is commonly used in the vegetation researches. Besides, the results of the previously conducted supervised classification were taken into account. We applied this classification method by using the raster imagery with user-defined variables. Hereupon, to provide ecological connections of the tree cover which was achieved from the classification, we used Probabilistic Connection (PC) index. The probabilistic connection model which is used for landscape planning and conservation studies via detecting and prioritization critical areas for ecological connection characterizes the possibility of direct connection between habitats. As a result we obtained over % 90 total accuracy in accuracy assessment analysis. We provided ecological connections with PC index and we created inter-connected green spaces system. Thus, we offered and implicated green infrastructure system model takes place in the agenda of recent years.
Hidden Semi-Markov Models and Their Application

NASA Astrophysics Data System (ADS)

Beyreuther, M.; Wassermann, J.

2008-12-01

In the framework of detection and classification of seismic signals there are several different approaches. Our choice for a more robust detection and classification algorithm is to adopt Hidden Markov Models (HMM), a technique showing major success in speech recognition. HMM provide a powerful tool to describe highly variable time series based on a double stochastic model and therefore allow for a broader class description than e.g. template based pattern matching techniques. Being a fully probabilistic model, HMM directly provide a confidence measure of an estimated classification. Furthermore and in contrast to classic artificial neuronal networks or support vector machines, HMM are incorporating the time dependence explicitly in the models thus providing a adequate representation of the seismic signal. As the majority of detection algorithms, HMM are not based on the time and amplitude dependent seismogram itself but on features estimated from the seismogram which characterize the different classes. Features, or in other words characteristic functions, are e.g. the sonogram bands, instantaneous frequency, instantaneous bandwidth or centroid time. In this study we apply continuous Hidden Semi-Markov Models (HSMM), an extension of continuous HMM. The duration probability of a HMM is an exponentially decaying function of the time, which is not a realistic representation of the duration of an earthquake. In contrast HSMM use Gaussians as duration probabilities, which results in an more adequate model. The HSMM detection and classification system is running online as an EARTHWORM module at the Bavarian Earthquake Service. Here the signals that are to be classified simply differ in epicentral distance. This makes it possible to easily decide whether a classification is correct or wrong and thus allows to better evaluate the advantages and disadvantages of the proposed algorithm. The evaluation is based on several month long continuous data and the results are additionally compared to the previously published discrete HMM, continuous HMM and a classic STA/LTA. The intermediate evaluation results are very promising.
Rapid discrimination and quantification of alkaloids in Corydalis Tuber by near-infrared spectroscopy.

PubMed

Lu, Hai-yan; Wang, Shi-sheng; Cai, Rui; Meng, Yu; Xie, Xin; Zhao, Wei-jie

2012-02-05

With the application of near-infrared spectroscopy (NIRS), a convenient and rapid method for determination of alkaloids in Corydalis Tuber extract and classification for samples from different locations have been developed. Five different samples were collected according to their geographical origin, 2-Der with smoothing point of 17 was applied as the spectral pre-treatment, and the 1st to scaling range algorithm was adjusted to be optimal approach, classification model was constructed over the wavelength range of 4582-4270 cm⁻¹, 5562-4976 cm⁻¹ and 7000-7467 cm⁻¹ with a great recognition rate. For prediction model, partial least squares (PLS) algorithm was utilized referring to HPLC-UV reference method, the optimum models were obtained after adjustment. Pre-processing methods of calibration models were COE for protopine and min-max normalization for palmatine and MSC for tetrahydropalmatine, respectively. The root mean square errors of cross-validation (RMSECV) for protopine, palmatine, tetrahydropalmatine were 0.884, 1.83, 3.23 mg/g. The correlation coefficients (R²) were 99.75, 98.41 and 97.34%. T test was applied, in the model of tetrahydropalmatine; there is no significant difference between NIR prediction and HPLC reference method at 95% confidence interval with t=0.746
Application of multispectral imaging to determine quality attributes and ripeness stage in strawberry fruit.

PubMed

Liu, Changhong; Liu, Wei; Lu, Xuzhong; Ma, Fei; Chen, Wei; Yang, Jianbo; Zheng, Lei

2014-01-01

Multispectral imaging with 19 wavelengths in the range of 405-970 nm has been evaluated for nondestructive determination of firmness, total soluble solids (TSS) content and ripeness stage in strawberry fruit. Several analysis approaches, including partial least squares (PLS), support vector machine (SVM) and back propagation neural network (BPNN), were applied to develop theoretical models for predicting the firmness and TSS of intact strawberry fruit. Compared with PLS and SVM, BPNN considerably improved the performance of multispectral imaging for predicting firmness and total soluble solids content with the correlation coefficient (r) of 0.94 and 0.83, SEP of 0.375 and 0.573, and bias of 0.035 and 0.056, respectively. Subsequently, the ability of multispectral imaging technology to classify fruit based on ripeness stage was tested using SVM and principal component analysis-back propagation neural network (PCA-BPNN) models. The higher classification accuracy of 100% was achieved using SVM model. Moreover, the results of all these models demonstrated that the VIS parts of the spectra were the main contributor to the determination of firmness, TSS content estimation and classification of ripeness stage in strawberry fruit. These results suggest that multispectral imaging, together with suitable analysis model, is a promising technology for rapid estimation of quality attributes and classification of ripeness stage in strawberry fruit.
QSAR classification models for the prediction of endocrine disrupting activity of brominated flame retardants.

PubMed

Kovarich, Simona; Papa, Ester; Gramatica, Paola

2011-06-15

The identification of potential endocrine disrupting (ED) chemicals is an important task for the scientific community due to their diffusion in the environment; the production and use of such compounds will be strictly regulated through the authorization process of the REACH regulation. To overcome the problem of insufficient experimental data, the quantitative structure-activity relationship (QSAR) approach is applied to predict the ED activity of new chemicals. In the present study QSAR classification models are developed, according to the OECD principles, to predict the ED potency for a class of emerging ubiquitary pollutants, viz. brominated flame retardants (BFRs). Different endpoints related to ED activity (i.e. aryl hydrocarbon receptor agonism and antagonism, estrogen receptor agonism and antagonism, androgen and progesterone receptor antagonism, T4-TTR competition, E2SULT inhibition) are modeled using the k-NN classification method. The best models are selected by maximizing the sensitivity and external predictive ability. We propose simple QSARs (based on few descriptors) characterized by internal stability, good predictive power and with a verified applicability domain. These models are simple tools that are applicable to screen BFRs in relation to their ED activity, and also to design safer alternatives, in agreement with the requirements of REACH regulation at the authorization step. Copyright © 2011 Elsevier B.V. All rights reserved.
Weakly Supervised Dictionary Learning

NASA Astrophysics Data System (ADS)

You, Zeyu; Raich, Raviv; Fern, Xiaoli Z.; Kim, Jinsub

2018-05-01

We present a probabilistic modeling and inference framework for discriminative analysis dictionary learning under a weak supervision setting. Dictionary learning approaches have been widely used for tasks such as low-level signal denoising and restoration as well as high-level classification tasks, which can be applied to audio and image analysis. Synthesis dictionary learning aims at jointly learning a dictionary and corresponding sparse coefficients to provide accurate data representation. This approach is useful for denoising and signal restoration, but may lead to sub-optimal classification performance. By contrast, analysis dictionary learning provides a transform that maps data to a sparse discriminative representation suitable for classification. We consider the problem of analysis dictionary learning for time-series data under a weak supervision setting in which signals are assigned with a global label instead of an instantaneous label signal. We propose a discriminative probabilistic model that incorporates both label information and sparsity constraints on the underlying latent instantaneous label signal using cardinality control. We present the expectation maximization (EM) procedure for maximum likelihood estimation (MLE) of the proposed model. To facilitate a computationally efficient E-step, we propose both a chain and a novel tree graph reformulation of the graphical model. The performance of the proposed model is demonstrated on both synthetic and real-world data.
5 CFR 1312.4 - Classified designations.

Code of Federal Regulations, 2010 CFR

2010-01-01

...) Top Secret. This classification shall be applied only to information the unauthorized disclosure of... original classification authority is able to identify or describe. (2) Secret. This classification shall be...
A fuzzy decision tree for fault classification.

PubMed

Zio, Enrico; Baraldi, Piero; Popescu, Irina C

2008-02-01

In plant accident management, the control room operators are required to identify the causes of the accident, based on the different patterns of evolution of the monitored process variables thereby developing. This task is often quite challenging, given the large number of process parameters monitored and the intense emotional states under which it is performed. To aid the operators, various techniques of fault classification have been engineered. An important requirement for their practical application is the physical interpretability of the relationships among the process variables underpinning the fault classification. In this view, the present work propounds a fuzzy approach to fault classification, which relies on fuzzy if-then rules inferred from the clustering of available preclassified signal data, which are then organized in a logical and transparent decision tree structure. The advantages offered by the proposed approach are precisely that a transparent fault classification model is mined out of the signal data and that the underlying physical relationships among the process variables are easily interpretable as linguistic if-then rules that can be explicitly visualized in the decision tree structure. The approach is applied to a case study regarding the classification of simulated faults in the feedwater system of a boiling water reactor.
Exploring the impact of wavelet-based denoising in the classification of remote sensing hyperspectral images

NASA Astrophysics Data System (ADS)

Quesada-Barriuso, Pablo; Heras, Dora B.; Argüello, Francisco

2016-10-01

The classification of remote sensing hyperspectral images for land cover applications is a very intensive topic. In the case of supervised classification, Support Vector Machines (SVMs) play a dominant role. Recently, the Extreme Learning Machine algorithm (ELM) has been extensively used. The classification scheme previously published by the authors, and called WT-EMP, introduces spatial information in the classification process by means of an Extended Morphological Profile (EMP) that is created from features extracted by wavelets. In addition, the hyperspectral image is denoised in the 2-D spatial domain, also using wavelets and it is joined to the EMP via a stacked vector. In this paper, the scheme is improved achieving two goals. The first one is to reduce the classification time while preserving the accuracy of the classification by using ELM instead of SVM. The second one is to improve the accuracy results by performing not only a 2-D denoising for every spectral band, but also a previous additional 1-D spectral signature denoising applied to each pixel vector of the image. For each denoising the image is transformed by applying a 1-D or 2-D wavelet transform, and then a NeighShrink thresholding is applied. Improvements in terms of classification accuracy are obtained, especially for images with close regions in the classification reference map, because in these cases the accuracy of the classification in the edges between classes is more relevant.
A Classification and Analysis of Contracting Literature

DTIC Science & Technology

1989-12-01

Pricing Model ( CAPM . This is a model designed by investment analysts to determine required rates of return given the systematic risk of a company. The...For the amount of risk they take, these profit margins were not excessively high. The author examined profitability in terms of the Capital Asset ...taxonomy was applied was limited , the results were necessarily qualified. However, at the least this application provided areas for further research
Pattern recognition applied to seismic signals of Llaima volcano (Chile): An evaluation of station-dependent classifiers

NASA Astrophysics Data System (ADS)

Curilem, Millaray; Huenupan, Fernando; Beltrán, Daniel; San Martin, Cesar; Fuentealba, Gustavo; Franco, Luis; Cardona, Carlos; Acuña, Gonzalo; Chacón, Max; Khan, M. Salman; Becerra Yoma, Nestor

2016-04-01

Automatic pattern recognition applied to seismic signals from volcanoes may assist seismic monitoring by reducing the workload of analysts, allowing them to focus on more challenging activities, such as producing reports, implementing models, and understanding volcanic behaviour. In a previous work, we proposed a structure for automatic classification of seismic events in Llaima volcano, one of the most active volcanoes in the Southern Andes, located in the Araucanía Region of Chile. A database of events taken from three monitoring stations on the volcano was used to create a classification structure, independent of which station provided the signal. The database included three types of volcanic events: tremor, long period, and volcano-tectonic and a contrast group which contains other types of seismic signals. In the present work, we maintain the same classification scheme, but we consider separately the stations information in order to assess whether the complementary information provided by different stations improves the performance of the classifier in recognising seismic patterns. This paper proposes two strategies for combining the information from the stations: i) combining the features extracted from the signals from each station and ii) combining the classifiers of each station. In the first case, the features extracted from the signals from each station are combined forming the input for a single classification structure. In the second, a decision stage combines the results of the classifiers for each station to give a unique output. The results confirm that the station-dependent strategies that combine the features and the classifiers from several stations improves the classification performance, and that the combination of the features provides the best performance. The results show an average improvement of 9% in the classification accuracy when compared with the station-independent method.
14 CFR 1203.412 - Classification guides.

Code of Federal Regulations, 2010 CFR

2010-01-01

... of the classification designations (i.e., Top Secret, Secret or Confidential) apply to the identified... writing by an official with original Top Secret classification authority; the identity of the official...
Multivariate analysis of full-term neonatal polysomnographic data.

PubMed

Gerla, V; Paul, K; Lhotska, L; Krajca, V

2009-01-01

Polysomnography (PSG) is one of the most important noninvasive methods for studying maturation of the child brain. Sleep in infants is significantly different from sleep in adults. This paper addresses the problem of computer analysis of neonatal polygraphic signals. We applied methods designed for differentiating three important neonatal behavioral states: quiet sleep, active sleep, and wakefulness. The proportion of these states is a significant indicator of the maturity of the newborn brain in clinical practice. In this study, we used data provided by the Institute for Care of Mother and Child, Prague (12 newborn infants of similar postconceptional age). The data were scored by an experienced physician to four states (wake, quiet sleep, active sleep, movement artifact). For accurate classification, it was necessary to determine the most informative features. We used a method based on power spectral density (PSD) applied to each EEG channel. We also used features derived from electrooculogram (EOG), electromyogram (EMG), ECG, and respiration [pneumogram (PNG)] signals. The most informative feature was the measure of regularity of respiration from the PNG signal. We designed an algorithm for interpreting these characteristics. This algorithm was based on Markov models. The results of automatic detection of sleep states were compared to the "sleep profiles" determined visually. We evaluated both the success rate and the true positive rate of the classification, and statistically significant agreement of the two scorings was found. Two variants, for learning and for testing, were applied, namely learning from the data of all 12 newborns and tenfold cross-validation, and learning from the data of 11 newborns and testing on the data from the 12th newborn. We utilized information obtained from several biological signals (EEG, ECG, PNG, EMG, EOG) for our final classification. We reached the final success rate of 82.5%. The true positive rate was 81.8% and the false positive rate was 6.1%. The most important step in the whole process is feature extraction and feature selection. In this process, we used visualization as an additional tool that helped us to decide which features to select. Proper selection of features may significantly influence the success rate of the classification. We made a visual comparison of the computed features with the manual scoring provided by the expert. A hidden Markov model was used for classification. The advantage of this model is that it determines the future behavior of the process by its present state. In this way, it preserves information about temporal development.
A practical approach to Sasang constitutional diagnosis using vocal features

PubMed Central

2013-01-01

Background Sasang constitutional medicine (SCM) is a type of tailored medicine that divides human beings into four Sasang constitutional (SC) types. Diagnosis of SC types is crucial to proper treatment in SCM. Voice characteristics have been used as an essential clue for diagnosing SC types. In the past, many studies tried to extract quantitative vocal features to make diagnosis models; however, these studies were flawed by limited data collected from one or a few sites, long recording time, and low accuracy. We propose a practical diagnosis model having only a few variables, which decreases model complexity. This in turn, makes our model appropriate for clinical applications. Methods A total of 2,341 participants’ voice recordings were used in making a SC classification model and to test the generalization ability of the model. Although the voice data consisted of five vowels and two repeated sentences per participant, we used only the sentence part for our study. A total of 21 features were extracted, and an advanced feature selection method—the least absolute shrinkage and selection operator (LASSO)—was applied to reduce the number of variables for classifier learning. A SC classification model was developed using multinomial logistic regression via LASSO. Results We compared the proposed classification model to the previous study, which used both sentences and five vowels from the same patient’s group. The classification accuracies for the test set were 47.9% and 40.4% for male and female, respectively. Our result showed that the proposed method was superior to the previous study in that it required shorter voice recordings, is more applicable to practical use, and had better generalization performance. Conclusions We proposed a practical SC classification method and showed that our model having fewer variables outperformed the model having many variables in the generalization test. We attempted to reduce the number of variables in two ways: 1) the initial number of candidate features was decreased by considering shorter voice recording, and 2) LASSO was introduced for reducing model complexity. The proposed method is suitable for an actual clinical environment. Moreover, we expect it to yield more stable results because of the model’s simplicity. PMID:24200041
48 CFR 52.247-53 - Freight Classification Description.

Code of Federal Regulations, 2012 CFR

2012-10-01

... 48 Federal Acquisition Regulations System 2 2012-10-01 2012-10-01 false Freight Classification....247-53 Freight Classification Description. As prescribed in 47.305-9(b)(1), insert the following... modifications of previously shipped items, and different freight classifications may apply: Freight...

48 CFR 52.247-53 - Freight Classification Description.

Code of Federal Regulations, 2014 CFR

2014-10-01

... modifications of previously shipped items, and different freight classifications may apply: Freight... 48 Federal Acquisition Regulations System 2 2014-10-01 2014-10-01 false Freight Classification....247-53 Freight Classification Description. As prescribed in 47.305-9(b)(1), insert the following...
48 CFR 52.247-53 - Freight Classification Description.

Code of Federal Regulations, 2013 CFR

2013-10-01

... modifications of previously shipped items, and different freight classifications may apply: Freight... 48 Federal Acquisition Regulations System 2 2013-10-01 2013-10-01 false Freight Classification....247-53 Freight Classification Description. As prescribed in 47.305-9(b)(1), insert the following...
48 CFR 52.247-53 - Freight Classification Description.

Code of Federal Regulations, 2011 CFR

2011-10-01

... 48 Federal Acquisition Regulations System 2 2011-10-01 2011-10-01 false Freight Classification....247-53 Freight Classification Description. As prescribed in 47.305-9(b)(1), insert the following... modifications of previously shipped items, and different freight classifications may apply: Freight...
48 CFR 52.247-53 - Freight Classification Description.

Code of Federal Regulations, 2010 CFR

2010-10-01

... 48 Federal Acquisition Regulations System 2 2010-10-01 2010-10-01 false Freight Classification....247-53 Freight Classification Description. As prescribed in 47.305-9(b)(1), insert the following... modifications of previously shipped items, and different freight classifications may apply: Freight...
Role of heterogeneous research and development funds in the productivity of the US manufacturing industry

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rhee, C.O.

1987-01-01

This paper investigates, in the framework of firm's optimal behavior, the effect of company-funded and federally-funded RandD on productivity in selected US industries. Especially, the role of federal funding RandD in productivity through direct as well as indirect mechanisms is analyzed. Using different model specification, two types of RandD-federal and company, and data of industry level, no support can be found for the blanket statement that federally-funded RandD (FRD) crowds out or pulls in company-funding RandD in productivity growth. Whether crowding-out or pulling-in is shown to be industry-specific as well as based on FRD's time dimension. Hence, the lag effectmore » of heterogeneous RandD funds on productivity is emphasized. The classification of heterogeneous RandD funds into basic research, applied research, and development is adopted to look at the impact of each on productivity. The model of firm's optimal behavior following such classification demonstrates that federally-funded basic research has a tremendous pulling-in impact on company-funded applied research and development, respectively.« less
Hearing and Cognitive Impairment and the Role of the International Classification of Functioning, Disability and Health as a Rehabilitation Framework

PubMed Central

Lind, Christopher; Meyer, Carly; Young, Jessica

2016-01-01

The International Classification of Functioning, Disability and Health (ICF) has been applied widely in the literature to describe and differentiate the broad implications of hearing impairment (HI) and cognitive impairment (CI) on communication. As CI and HI are largely age-related conditions, the likelihood of comorbidity of these conditions is high. In the context of an aging population, the prevalence of comorbidity is likely to rise, yet much of the clinical assessment and intervention in HI and CI occur separately. The benefit of addressing the dual impact of these conditions is of increasing clinical importance for all clinicians working with older adults and for audiologists and speech pathologists in particular. In this article, the ICF model will be applied to explore the everyday implications of HI and CI. Furthermore, the clinical implications of the ICF model are explored with particular respect to communication assessment and intervention options. The potential benefit of combining activity- and participation-focused interventions currently offered for HI and CI independently is examined. PMID:27489399
Authentication of fattening diet of Iberian pigs according to their volatile compounds profile from raw subcutaneous fat.

PubMed

Narváez-Rivas, M; Pablos, F; Jurado, J M; León-Camacho, M

2011-02-01

The composition of volatile components of subcutaneous fat from Iberian pig has been studied. Purge and trap gas chromatography-mass spectrometry has been used. The composition of the volatile fraction of subcutaneous fat has been used for authentication purposes of different types of Iberian pig fat. Three types of this product have been considered, montanera, extensive cebo and intensive cebo. With classification purposes, several pattern recognition techniques have been applied. In order to find out possible tendencies in the sample distribution as well as the discriminant power of the variables, principal component analysis was applied as visualisation technique. Linear discriminant analysis (LDA) and soft independent modelling by class analogy (SIMCA) were used to obtain suitable classification models. LDA and SIMCA allowed the differentiation of three fattening diets by using the contents in 2,2,4,6,6-pentamethyl-heptane, m-xylene, 2,4-dimethyl-heptane, 6-methyl-tridecane, 1-methoxy-2-propanol, isopropyl alcohol, o-xylene, 3-ethyl-2,2-dimethyl-oxirane, 2,6-dimethyl-undecane, 3-methyl-3-pentanol and limonene.
Proteomics Versus Clinical Data and Stochastic Local Search Based Feature Selection for Acute Myeloid Leukemia Patients' Classification.

PubMed

Chebouba, Lokmane; Boughaci, Dalila; Guziolowski, Carito

2018-06-04

The use of data issued from high throughput technologies in drug target problems is widely widespread during the last decades. This study proposes a meta-heuristic framework using stochastic local search (SLS) combined with random forest (RF) where the aim is to specify the most important genes and proteins leading to the best classification of Acute Myeloid Leukemia (AML) patients. First we use a stochastic local search meta-heuristic as a feature selection technique to select the most significant proteins to be used in the classification task step. Then we apply RF to classify new patients into their corresponding classes. The evaluation technique is to run the RF classifier on the training data to get a model. Then, we apply this model on the test data to find the appropriate class. We use as metrics the balanced accuracy (BAC) and the area under the receiver operating characteristic curve (AUROC) to measure the performance of our model. The proposed method is evaluated on the dataset issued from DREAM 9 challenge. The comparison is done with a pure random forest (without feature selection), and with the two best ranked results of the DREAM 9 challenge. We used three types of data: only clinical data, only proteomics data, and finally clinical and proteomics data combined. The numerical results show that the highest scores are obtained when using clinical data alone, and the lowest is obtained when using proteomics data alone. Further, our method succeeds in finding promising results compared to the methods presented in the DREAM challenge.
Enhancing spatial resolution of (18)F positron imaging with the Timepix detector by classification of primary fired pixels using support vector machine.

PubMed

Wang, Qian; Liu, Zhen; Ziegler, Sibylle I; Shi, Kuangyu

2015-07-07

Position-sensitive positron cameras using silicon pixel detectors have been applied for some preclinical and intraoperative clinical applications. However, the spatial resolution of a positron camera is limited by positron multiple scattering in the detector. An incident positron may fire a number of successive pixels on the imaging plane. It is still impossible to capture the primary fired pixel along a particle trajectory by hardware or to perceive the pixel firing sequence by direct observation. Here, we propose a novel data-driven method to improve the spatial resolution by classifying the primary pixels within the detector using support vector machine. A classification model is constructed by learning the features of positron trajectories based on Monte-Carlo simulations using Geant4. Topological and energy features of pixels fired by (18)F positrons were considered for the training and classification. After applying the classification model on measurements, the primary fired pixels of the positron tracks in the silicon detector were estimated. The method was tested and assessed for [(18)F]FDG imaging of an absorbing edge protocol and a leaf sample. The proposed method improved the spatial resolution from 154.6 ± 4.2 µm (energy weighted centroid approximation) to 132.3 ± 3.5 µm in the absorbing edge measurements. For the positron imaging of a leaf sample, the proposed method achieved lower root mean square error relative to phosphor plate imaging, and higher similarity with the reference optical image. The improvements of the preliminary results support further investigation of the proposed algorithm for the enhancement of positron imaging in clinical and preclinical applications.
Enhancing spatial resolution of 18F positron imaging with the Timepix detector by classification of primary fired pixels using support vector machine

NASA Astrophysics Data System (ADS)

Wang, Qian; Liu, Zhen; Ziegler, Sibylle I.; Shi, Kuangyu

2015-07-01

Position-sensitive positron cameras using silicon pixel detectors have been applied for some preclinical and intraoperative clinical applications. However, the spatial resolution of a positron camera is limited by positron multiple scattering in the detector. An incident positron may fire a number of successive pixels on the imaging plane. It is still impossible to capture the primary fired pixel along a particle trajectory by hardware or to perceive the pixel firing sequence by direct observation. Here, we propose a novel data-driven method to improve the spatial resolution by classifying the primary pixels within the detector using support vector machine. A classification model is constructed by learning the features of positron trajectories based on Monte-Carlo simulations using Geant4. Topological and energy features of pixels fired by 18F positrons were considered for the training and classification. After applying the classification model on measurements, the primary fired pixels of the positron tracks in the silicon detector were estimated. The method was tested and assessed for [18F]FDG imaging of an absorbing edge protocol and a leaf sample. The proposed method improved the spatial resolution from 154.6 ± 4.2 µm (energy weighted centroid approximation) to 132.3 ± 3.5 µm in the absorbing edge measurements. For the positron imaging of a leaf sample, the proposed method achieved lower root mean square error relative to phosphor plate imaging, and higher similarity with the reference optical image. The improvements of the preliminary results support further investigation of the proposed algorithm for the enhancement of positron imaging in clinical and preclinical applications.
Use of Binary Partition Tree and energy minimization for object-based classification of urban land cover

NASA Astrophysics Data System (ADS)

Li, Mengmeng; Bijker, Wietske; Stein, Alfred

2015-04-01

Two main challenges are faced when classifying urban land cover from very high resolution satellite images: obtaining an optimal image segmentation and distinguishing buildings from other man-made objects. For optimal segmentation, this work proposes a hierarchical representation of an image by means of a Binary Partition Tree (BPT) and an unsupervised evaluation of image segmentations by energy minimization. For building extraction, we apply fuzzy sets to create a fuzzy landscape of shadows which in turn involves a two-step procedure. The first step is a preliminarily image classification at a fine segmentation level to generate vegetation and shadow information. The second step models the directional relationship between building and shadow objects to extract building information at the optimal segmentation level. We conducted the experiments on two datasets of Pléiades images from Wuhan City, China. To demonstrate its performance, the proposed classification is compared at the optimal segmentation level with Maximum Likelihood Classification and Support Vector Machine classification. The results show that the proposed classification produced the highest overall accuracies and kappa coefficients, and the smallest over-classification and under-classification geometric errors. We conclude first that integrating BPT with energy minimization offers an effective means for image segmentation. Second, we conclude that the directional relationship between building and shadow objects represented by a fuzzy landscape is important for building extraction.
A model for simulating the grinding and classification cyclic system of waste PCBs recycling production line.

PubMed

Yang, Deming; Xu, Zhenming

2011-09-15

Crushing and separating technology is widely used in waste printed circuit boards (PCBs) recycling process. A set of automatic line without negative impact to environment for recycling waste PCBs was applied in industry scale. Crushed waste PCBs particles grinding and classification cyclic system is the most important part of the automatic production line, and it decides the efficiency of the whole production line. In this paper, a model for computing the process of the system was established, and matrix analysis method was adopted. The result showed that good agreement can be achieved between the simulation model and the actual production line, and the system is anti-jamming. This model possibly provides a basis for the automatic process control of waste PCBs production line. With this model, many engineering problems can be reduced, such as metals and nonmetals insufficient dissociation, particles over-pulverizing, incomplete comminuting, material plugging and equipment fever. Copyright © 2011 Elsevier B.V. All rights reserved.
Age-class separation of blue-winged ducks

USGS Publications Warehouse

Hohman, W.L.; Moore, J.L.; Twedt, D.J.; Mensik, John G.; Logerwell, E.

1995-01-01

Accurate determination of age is of fundamental importance to population and life history studies of waterfowl and their management. Therefore, we developed quantitative methods that separate adult and immature blue-winged teal (Anas discors), cinnamon teal (A. cyanoptera), and northern shovelers (A. clypeata) during spring and summer. To assess suitability of discriminant models using 9 remigial measurements, we compared model performance (% agreement between predicted age and age assigned to birds on the basis of definitive cloacal or rectral feather characteristics) in different flyways (Mississippi and Pacific) and between years (1990-91 and 1991-92). We also applied age-classification models to wings obtained from U.S. Fish and Wildlife Service harvest surveys in the Mississippi and Central-Pacific flyways (wing-bees) for which age had been determined using qualitative characteristics (i.e., remigial markings, shape, or wear). Except for male northern shovelers, models correctly aged lt 90% (range 70-86%) of blue-winged ducks. Model performance varied among species and differed between sexes and years. Proportions of individuals that were correctly aged were greater for males (range 63-86%) than females (range 39-69%). Models for northern shovelers performed better in flyway comparisons within year (1991-92, La. model applied to Calif. birds, and Calif. model applied to La. birds: 90 and 94% for M, and 89 and 76% for F, respectively) than in annual comparisons within the Mississippi Flyway (1991-92 model applied to 1990-91 data: 79% for M, 50% for F). Exclusion of measurements that varied by flyway or year did not improve model performance. Quantitative methods appear to be of limited value for age separation of female blue-winged ducks. Close agreement between predicted age and age assigned to wings from the wing-bees suggests that qualitative and quantitative methods may be equally accurate for age separation of male blue-winged ducks. We interpret annual and flyway differences in remigial measurements and reduced performance of age classification models as evidence of high variability in size of blue-winged ducks' remiges. Variability in remigial size of these and other small-bodied waterfowl may be related to nutrition during molt.
Interannual drought index variations in Central Europe related to large-scale atmospheric circulation

NASA Astrophysics Data System (ADS)

Beck, Christoph; Philipp, Andreas; Jacobeit, Jucundus

2014-05-01

This contribution investigates the relationship between large-scale atmospheric circulation and interannual variations of the standardized precipitation index (SPI) in central Europe. To this end occurrence frequencies of circulation types (CT) derived from a variety of circulation type classifications (CTC) applied to daily sea level pressure (SLP) data and mean circulation indices of vorticity (V), zonality (Z) and meridionality (M) have been utilized as predictors within multiple regression models (MRM) for the estimation of gridded 3-month SPI values over central Europe for the period 1950 to 2010. CTC based MRMs used in the analyses comprise variants concerning the basic method for CT classification, the number of CTs, the size and location of the spatial domain used for CTCs and the exclusive use of CT frequencies or the combined use of CT frequencies and mean circulation indices as predictors. Adequate MRM predictor combinations have been identified by applying stepwise multiple regression analyses within a resampling framework. The performance (robustness) of the resulting MRMs has been quantified based on a leave-one out cross-validation procedure applying several skill scores. Furthermore the relative importance of individual predictors has been estimated for each MRM. From these analyses it can be stated that i.) the consideration of vorticity characteristics within CTCs, ii.) a relatively small size of the spatial domain to which CTCs are applied and iii.) the inclusion of mean circulation indices appear to improve model skill. However model skill exhibits distinct variations between seasons and regions. Whereas promising skill can be stated for the western and northwestern parts of the central European domain only unsatisfactorily skill is reached in the more continental regions and particularly during summer. Thus it can be concluded that the here presented approaches feature the potential for the downscaling of central European drought index variations from large-scale circulation at least for some regions. Further improvements of CTC based approaches may be expected from the optimization of CTCs for explaining the SPI e.g. via the inclusion of additional variables into the classification procedure.
48 CFR 47.305-9 - Commodity description and freight classification.

Code of Federal Regulations, 2014 CFR

2014-10-01

... of previously shipped items, and different freight classifications may apply, the contracting officer... freight classification. 47.305-9 Section 47.305-9 Federal Acquisition Regulations System FEDERAL... Commodity description and freight classification. (a) Generally, the freight rate for supplies is based on...
48 CFR 47.305-9 - Commodity description and freight classification.

Code of Federal Regulations, 2013 CFR

2013-10-01

... of previously shipped items, and different freight classifications may apply, the contracting officer... freight classification. 47.305-9 Section 47.305-9 Federal Acquisition Regulations System FEDERAL... Commodity description and freight classification. (a) Generally, the freight rate for supplies is based on...
48 CFR 47.305-9 - Commodity description and freight classification.

Code of Federal Regulations, 2012 CFR

2012-10-01

... of previously shipped items, and different freight classifications may apply, the contracting officer... freight classification. 47.305-9 Section 47.305-9 Federal Acquisition Regulations System FEDERAL... Commodity description and freight classification. (a) Generally, the freight rate for supplies is based on...
48 CFR 47.305-9 - Commodity description and freight classification.

Code of Federal Regulations, 2011 CFR

2011-10-01

... of previously shipped items, and different freight classifications may apply, the contracting officer... freight classification. 47.305-9 Section 47.305-9 Federal Acquisition Regulations System FEDERAL... Commodity description and freight classification. (a) Generally, the freight rate for supplies is based on...
Non-linear dynamical classification of short time series of the rössler system in high noise regimes.

PubMed

Lainscsek, Claudia; Weyhenmeyer, Jonathan; Hernandez, Manuel E; Poizner, Howard; Sejnowski, Terrence J

2013-01-01

Time series analysis with delay differential equations (DDEs) reveals non-linear properties of the underlying dynamical system and can serve as a non-linear time-domain classification tool. Here global DDE models were used to analyze short segments of simulated time series from a known dynamical system, the Rössler system, in high noise regimes. In a companion paper, we apply the DDE model developed here to classify short segments of encephalographic (EEG) data recorded from patients with Parkinson's disease and healthy subjects. Nine simulated subjects in each of two distinct classes were generated by varying the bifurcation parameter b and keeping the other two parameters (a and c) of the Rössler system fixed. All choices of b were in the chaotic parameter range. We diluted the simulated data using white noise ranging from 10 to -30 dB signal-to-noise ratios (SNR). Structure selection was supervised by selecting the number of terms, delays, and order of non-linearity of the model DDE model that best linearly separated the two classes of data. The distances d from the linear dividing hyperplane was then used to assess the classification performance by computing the area A' under the ROC curve. The selected model was tested on untrained data using repeated random sub-sampling validation. DDEs were able to accurately distinguish the two dynamical conditions, and moreover, to quantify the changes in the dynamics. There was a significant correlation between the dynamical bifurcation parameter b of the simulated data and the classification parameter d from our analysis. This correlation still held for new simulated subjects with new dynamical parameters selected from each of the two dynamical regimes. Furthermore, the correlation was robust to added noise, being significant even when the noise was greater than the signal. We conclude that DDE models may be used as a generalizable and reliable classification tool for even small segments of noisy data.
Non-Linear Dynamical Classification of Short Time Series of the Rössler System in High Noise Regimes

PubMed Central

Lainscsek, Claudia; Weyhenmeyer, Jonathan; Hernandez, Manuel E.; Poizner, Howard; Sejnowski, Terrence J.

2013-01-01

Time series analysis with delay differential equations (DDEs) reveals non-linear properties of the underlying dynamical system and can serve as a non-linear time-domain classification tool. Here global DDE models were used to analyze short segments of simulated time series from a known dynamical system, the Rössler system, in high noise regimes. In a companion paper, we apply the DDE model developed here to classify short segments of encephalographic (EEG) data recorded from patients with Parkinson’s disease and healthy subjects. Nine simulated subjects in each of two distinct classes were generated by varying the bifurcation parameter b and keeping the other two parameters (a and c) of the Rössler system fixed. All choices of b were in the chaotic parameter range. We diluted the simulated data using white noise ranging from 10 to −30 dB signal-to-noise ratios (SNR). Structure selection was supervised by selecting the number of terms, delays, and order of non-linearity of the model DDE model that best linearly separated the two classes of data. The distances d from the linear dividing hyperplane was then used to assess the classification performance by computing the area A′ under the ROC curve. The selected model was tested on untrained data using repeated random sub-sampling validation. DDEs were able to accurately distinguish the two dynamical conditions, and moreover, to quantify the changes in the dynamics. There was a significant correlation between the dynamical bifurcation parameter b of the simulated data and the classification parameter d from our analysis. This correlation still held for new simulated subjects with new dynamical parameters selected from each of the two dynamical regimes. Furthermore, the correlation was robust to added noise, being significant even when the noise was greater than the signal. We conclude that DDE models may be used as a generalizable and reliable classification tool for even small segments of noisy data. PMID:24379798

Predicting temperate forest stand types using only structural profiles from discrete return airborne lidar

NASA Astrophysics Data System (ADS)

Fedrigo, Melissa; Newnham, Glenn J.; Coops, Nicholas C.; Culvenor, Darius S.; Bolton, Douglas K.; Nitschke, Craig R.

2018-02-01

Light detection and ranging (lidar) data have been increasingly used for forest classification due to its ability to penetrate the forest canopy and provide detail about the structure of the lower strata. In this study we demonstrate forest classification approaches using airborne lidar data as inputs to random forest and linear unmixing classification algorithms. Our results demonstrated that both random forest and linear unmixing models identified a distribution of rainforest and eucalypt stands that was comparable to existing ecological vegetation class (EVC) maps based primarily on manual interpretation of high resolution aerial imagery. Rainforest stands were also identified in the region that have not previously been identified in the EVC maps. The transition between stand types was better characterised by the random forest modelling approach. In contrast, the linear unmixing model placed greater emphasis on field plots selected as endmembers which may not have captured the variability in stand structure within a single stand type. The random forest model had the highest overall accuracy (84%) and Cohen's kappa coefficient (0.62). However, the classification accuracy was only marginally better than linear unmixing. The random forest model was applied to a region in the Central Highlands of south-eastern Australia to produce maps of stand type probability, including areas of transition (the 'ecotone') between rainforest and eucalypt forest. The resulting map provided a detailed delineation of forest classes, which specifically recognised the coalescing of stand types at the landscape scale. This represents a key step towards mapping the structural and spatial complexity of these ecosystems, which is important for both their management and conservation.
Towards the Optimal Pixel Size of dem for Automatic Mapping of Landslide Areas

NASA Astrophysics Data System (ADS)

Pawłuszek, K.; Borkowski, A.; Tarolli, P.

2017-05-01

Determining appropriate spatial resolution of digital elevation model (DEM) is a key step for effective landslide analysis based on remote sensing data. Several studies demonstrated that choosing the finest DEM resolution is not always the best solution. Various DEM resolutions can be applicable for diverse landslide applications. Thus, this study aims to assess the influence of special resolution on automatic landslide mapping. Pixel-based approach using parametric and non-parametric classification methods, namely feed forward neural network (FFNN) and maximum likelihood classification (ML), were applied in this study. Additionally, this allowed to determine the impact of used classification method for selection of DEM resolution. Landslide affected areas were mapped based on four DEMs generated at 1 m, 2 m, 5 m and 10 m spatial resolution from airborne laser scanning (ALS) data. The performance of the landslide mapping was then evaluated by applying landslide inventory map and computation of confusion matrix. The results of this study suggests that the finest scale of DEM is not always the best fit, however working at 1 m DEM resolution on micro-topography scale, can show different results. The best performance was found at 5 m DEM-resolution for FFNN and 1 m DEM resolution for results. The best performance was found to be using 5 m DEM-resolution for FFNN and 1 m DEM resolution for ML classification.
UAS-SfM for coastal research: Geomorphic feature extraction and land cover classification from high-resolution elevation and optical imagery

USGS Publications Warehouse

Sturdivant, Emily; Lentz, Erika; Thieler, E. Robert; Farris, Amy; Weber, Kathryn; Remsen, David P.; Miner, Simon; Henderson, Rachel

2017-01-01

The vulnerability of coastal systems to hazards such as storms and sea-level rise is typically characterized using a combination of ground and manned airborne systems that have limited spatial or temporal scales. Structure-from-motion (SfM) photogrammetry applied to imagery acquired by unmanned aerial systems (UAS) offers a rapid and inexpensive means to produce high-resolution topographic and visual reflectance datasets that rival existing lidar and imagery standards. Here, we use SfM to produce an elevation point cloud, an orthomosaic, and a digital elevation model (DEM) from data collected by UAS at a beach and wetland site in Massachusetts, USA. We apply existing methods to (a) determine the position of shorelines and foredunes using a feature extraction routine developed for lidar point clouds and (b) map land cover from the rasterized surfaces using a supervised classification routine. In both analyses, we experimentally vary the input datasets to understand the benefits and limitations of UAS-SfM for coastal vulnerability assessment. We find that (a) geomorphic features are extracted from the SfM point cloud with near-continuous coverage and sub-meter precision, better than was possible from a recent lidar dataset covering the same area; and (b) land cover classification is greatly improved by including topographic data with visual reflectance, but changes to resolution (when <50 cm) have little influence on the classification accuracy.
Inventory and comparative evaluation of seabed mapping, classification and modeling activities in the Northwest Atlantic, USA to support regional ocean planning

NASA Astrophysics Data System (ADS)

Shumchenia, Emily J.; Guarinello, Marisa L.; Carey, Drew A.; Lipsky, Andrew; Greene, Jennifer; Mayer, Larry; Nixon, Matthew E.; Weber, John

2015-06-01

Efforts are in motion globally to address coastal and marine management needs through spatial planning and concomitant seabed habitat mapping. Contrasting strategies are often evident in these processes among local, regional, national and international scientific approaches and policy needs. In answer to such contrasts among its member states, the United States Northeast Regional Ocean Council formed a Habitat Working Group to conduct a regional inventory and comparative evaluation of seabed characterization, classification, and modeling activities in New England. The goals of this effort were to advance regional understanding of ocean habitats and identify opportunities for collaboration. Working closely with the Habitat Working Group, we organized and led the inventory and comparative analysis with a focus on providing processes and tools that can be used by scientists and managers, updated and adapted for future use, and applied in other ocean management regions throughout the world. Visual schematics were a critical component of the comparative analysis and aided discussion among scientists and managers. Regional consensus was reached on a common habitat classification scheme (U.S. Coastal and Marine Ecological Classification Standard) for regional seabed maps. Results and schematics were presented at a region-wide workshop where further steps were taken to initiate collaboration among projects. The workshop culminated in an agreement on a set of future seabed mapping goals for the region. The work presented here may serve as an example to other ocean planning regions in the U.S., Europe or elsewhere seeking to integrate a variety of seabed characterization, classification and modeling activities.
Fuzzy sets, rough sets, and modeling evidence: Theory and Application. A Dempster-Shafer based approach to compromise decision making with multiattributes applied to product selection

NASA Technical Reports Server (NTRS)

Dekorvin, Andre

1992-01-01

The Dempster-Shafer theory of evidence is applied to a multiattribute decision making problem whereby the decision maker (DM) must compromise with available alternatives, none of which exactly satisfies his ideal. The decision mechanism is constrained by the uncertainty inherent in the determination of the relative importance of each attribute element and the classification of existing alternatives. The classification of alternatives is addressed through expert evaluation of the degree to which each element is contained in each available alternative. The relative importance of each attribute element is determined through pairwise comparisons of the elements by the decision maker and implementation of a ratio scale quantification method. Then the 'belief' and 'plausibility' that an alternative will satisfy the decision maker's ideal are calculated and combined to rank order the available alternatives. Application to the problem of selecting computer software is given.
A LANDSAT study of ephemeral and perennial rangeland vegetation and soils

NASA Technical Reports Server (NTRS)

Bentley, R. G., Jr. (Principal Investigator); Salmon-Drexler, B. C.; Bonner, W. J.; Vincent, R. K.

1976-01-01

The author has identified the following significant results. Several methods of computer processing were applied to LANDSAT data for mapping vegetation characteristics of perennial rangeland in Montana and ephemeral rangeland in Arizona. The choice of optimal processing technique was dependent on prescribed mapping and site condition. Single channel level slicing and ratioing of channels were used for simple enhancement. Predictive models for mapping percent vegetation cover based on data from field spectra and LANDSAT data were generated by multiple linear regression of six unique LANDSAT spectral ratios. Ratio gating logic and maximum likelihood classification were applied successfully to recognize plant communities in Montana. Maximum likelihood classification did little to improve recognition of terrain features when compared to a single channel density slice in sparsely vegetated Arizona. LANDSAT was found to be more sensitive to differences between plant communities based on percentages of vigorous vegetation than to actual physical or spectral differences among plant species.
Unresolved Galaxy Classifier for ESA/Gaia mission: Support Vector Machines approach

NASA Astrophysics Data System (ADS)

Bellas-Velidis, Ioannis; Kontizas, Mary; Dapergolas, Anastasios; Livanou, Evdokia; Kontizas, Evangelos; Karampelas, Antonios

A software package Unresolved Galaxy Classifier (UGC) is being developed for the ground-based pipeline of ESA's Gaia mission. It aims to provide an automated taxonomic classification and specific parameters estimation analyzing Gaia BP/RP instrument low-dispersion spectra of unresolved galaxies. The UGC algorithm is based on a supervised learning technique, the Support Vector Machines (SVM). The software is implemented in Java as two separate modules. An offline learning module provides functions for SVM-models training. Once trained, the set of models can be repeatedly applied to unknown galaxy spectra by the pipeline's application module. A library of galaxy models synthetic spectra, simulated for the BP/RP instrument, is used to train and test the modules. Science tests show a very good classification performance of UGC and relatively good regression performance, except for some of the parameters. Possible approaches to improve the performance are discussed.
Classification of Chemicals Based On Structured Toxicity ...

EPA Pesticide Factsheets

Thirty years and millions of dollars worth of pesticide registration toxicity studies, historically stored as hardcopy and scanned documents, have been digitized into highly standardized and structured toxicity data within the Toxicity Reference Database (ToxRefDB). Toxicity-based classifications of chemicals were performed as a model application of ToxRefDB. These endpoints will ultimately provide the anchoring toxicity information for the development of predictive models and biological signatures utilizing in vitro assay data. Utilizing query and structured data mining approaches, toxicity profiles were uniformly generated for greater than 300 chemicals. Based on observation rate, species concordance and regulatory relevance, individual and aggregated effects have been selected to classify the chemicals providing a set of predictable endpoints. ToxRefDB exhibits the utility of transforming unstructured toxicity data into structured data and, furthermore, into computable outputs, and serves as a model for applying such data to address modern toxicological problems.
Combating speckle in SAR images - Vector filtering and sequential classification based on a multiplicative noise model

NASA Technical Reports Server (NTRS)

Lin, Qian; Allebach, Jan P.

1990-01-01

An adaptive vector linear minimum mean-squared error (LMMSE) filter for multichannel images with multiplicative noise is presented. It is shown theoretically that the mean-squared error in the filter output is reduced by making use of the correlation between image bands. The vector and conventional scalar LMMSE filters are applied to a three-band SIR-B SAR, and their performance is compared. Based on a mutliplicative noise model, the per-pel maximum likelihood classifier was derived. The authors extend this to the design of sequential and robust classifiers. These classifiers are also applied to the three-band SIR-B SAR image.
Blockmodels for connectome analysis

NASA Astrophysics Data System (ADS)

Moyer, Daniel; Gutman, Boris; Prasad, Gautam; Faskowitz, Joshua; Ver Steeg, Greg; Thompson, Paul

2015-12-01

In the present work we study a family of generative network model and its applications for modeling the human connectome. We introduce a minor but novel variant of the Mixed Membership Stochastic Blockmodel and apply it and two other related model to two human connectome datasets (ADNI and a Bipolar Disorder dataset) with both control and diseased subjects. We further provide a simple generative classifier that, alongside more discriminating methods, provides evidence that blockmodels accurately summarize tractography count networks with respect to a disease classification task.
Probability of identification: adulteration of American Ginseng with Asian Ginseng.

PubMed

Harnly, James; Chen, Pei; Harrington, Peter De B

2013-01-01

The AOAC INTERNATIONAL guidelines for validation of botanical identification methods were applied to the detection of Asian Ginseng [Panax ginseng (PG)] as an adulterant for American Ginseng [P. quinquefolius (PQ)] using spectral fingerprints obtained by flow injection mass spectrometry (FIMS). Samples of 100% PQ and 100% PG were physically mixed to provide 90, 80, and 50% PQ. The multivariate FIMS fingerprint data were analyzed using soft independent modeling of class analogy (SIMCA) based on 100% PQ. The Q statistic, a measure of the degree of non-fit of the test samples with the calibration model, was used as the analytical parameter. FIMS was able to discriminate between 100% PQ and 100% PG, and between 100% PQ and 90, 80, and 50% PQ. The probability of identification (POI) curve was estimated based on the SD of 90% PQ. A digital model of adulteration, obtained by mathematically summing the experimentally acquired spectra of 100% PQ and 100% PG in the desired ratios, agreed well with the physical data and provided an easy and more accurate method for constructing the POI curve. Two chemometric modeling methods, SIMCA and fuzzy optimal associative memories, and two classification methods, partial least squares-discriminant analysis and fuzzy rule-building expert systems, were applied to the data. The modeling methods correctly identified the adulterated samples; the classification methods did not.
A combined M5P tree and hazard-based duration model for predicting urban freeway traffic accident durations.

PubMed

Lin, Lei; Wang, Qian; Sadek, Adel W

2016-06-01

The duration of freeway traffic accidents duration is an important factor, which affects traffic congestion, environmental pollution, and secondary accidents. Among previous studies, the M5P algorithm has been shown to be an effective tool for predicting incident duration. M5P builds a tree-based model, like the traditional classification and regression tree (CART) method, but with multiple linear regression models as its leaves. The problem with M5P for accident duration prediction, however, is that whereas linear regression assumes that the conditional distribution of accident durations is normally distributed, the distribution for a "time-to-an-event" is almost certainly nonsymmetrical. A hazard-based duration model (HBDM) is a better choice for this kind of a "time-to-event" modeling scenario, and given this, HBDMs have been previously applied to analyze and predict traffic accidents duration. Previous research, however, has not yet applied HBDMs for accident duration prediction, in association with clustering or classification of the dataset to minimize data heterogeneity. The current paper proposes a novel approach for accident duration prediction, which improves on the original M5P tree algorithm through the construction of a M5P-HBDM model, in which the leaves of the M5P tree model are HBDMs instead of linear regression models. Such a model offers the advantage of minimizing data heterogeneity through dataset classification, and avoids the need for the incorrect assumption of normality for traffic accident durations. The proposed model was then tested on two freeway accident datasets. For each dataset, the first 500 records were used to train the following three models: (1) an M5P tree; (2) a HBDM; and (3) the proposed M5P-HBDM, and the remainder of data were used for testing. The results show that the proposed M5P-HBDM managed to identify more significant and meaningful variables than either M5P or HBDMs. Moreover, the M5P-HBDM had the lowest overall mean absolute percentage error (MAPE). Copyright © 2016 Elsevier Ltd. All rights reserved.
46 CFR 503.55 - Derivative classification.

Code of Federal Regulations, 2011 CFR

2011-10-01

... 46 Shipping 9 2011-10-01 2011-10-01 false Derivative classification. 503.55 Section 503.55... Security Program § 503.55 Derivative classification. (a) In accordance with Part 2 of Executive Order 13526... developed material consistent with the classification markings that apply to the source information, is...
46 CFR 503.55 - Derivative classification.

Code of Federal Regulations, 2010 CFR

2010-10-01

... 46 Shipping 9 2010-10-01 2010-10-01 false Derivative classification. 503.55 Section 503.55... Security Program § 503.55 Derivative classification. (a) In accordance with Part 2 of Executive Order 12958... developed material consistent with the classification markings that apply to the source information, is...
49 CFR 1105.6 - Classification of actions.

Code of Federal Regulations, 2011 CFR

2011-10-01

... 49 Transportation 8 2011-10-01 2011-10-01 false Classification of actions. 1105.6 Section 1105.6... Classification of actions. (a) Environmental Impact Statements will normally be prepared for rail construction... classifications in this section apply without regard to whether the action is proposed by application, petition...
46 CFR 503.55 - Derivative classification.

Code of Federal Regulations, 2014 CFR

2014-10-01

... 46 Shipping 9 2014-10-01 2014-10-01 false Derivative classification. 503.55 Section 503.55... Security Program § 503.55 Derivative classification. (a) In accordance with Part 2 of Executive Order 13526... developed material consistent with the classification markings that apply to the source information, is...
46 CFR 503.55 - Derivative classification.

Code of Federal Regulations, 2013 CFR

2013-10-01

... 46 Shipping 9 2013-10-01 2013-10-01 false Derivative classification. 503.55 Section 503.55... Security Program § 503.55 Derivative classification. (a) In accordance with Part 2 of Executive Order 13526... developed material consistent with the classification markings that apply to the source information, is...
46 CFR 503.55 - Derivative classification.

Code of Federal Regulations, 2012 CFR

2012-10-01

... 46 Shipping 9 2012-10-01 2012-10-01 false Derivative classification. 503.55 Section 503.55... Security Program § 503.55 Derivative classification. (a) In accordance with Part 2 of Executive Order 13526... developed material consistent with the classification markings that apply to the source information, is...
49 CFR 1105.6 - Classification of actions.

Code of Federal Regulations, 2010 CFR

2010-10-01

... 49 Transportation 8 2010-10-01 2010-10-01 false Classification of actions. 1105.6 Section 1105.6... Classification of actions. (a) Environmental Impact Statements will normally be prepared for rail construction... classifications in this section apply without regard to whether the action is proposed by application, petition...
Model-based and Model-free Machine Learning Techniques for Diagnostic Prediction and Classification of Clinical Outcomes in Parkinson's Disease.

PubMed

Gao, Chao; Sun, Hanbo; Wang, Tuo; Tang, Ming; Bohnen, Nicolaas I; Müller, Martijn L T M; Herman, Talia; Giladi, Nir; Kalinin, Alexandr; Spino, Cathie; Dauer, William; Hausdorff, Jeffrey M; Dinov, Ivo D

2018-05-08

In this study, we apply a multidisciplinary approach to investigate falls in PD patients using clinical, demographic and neuroimaging data from two independent initiatives (University of Michigan and Tel Aviv Sourasky Medical Center). Using machine learning techniques, we construct predictive models to discriminate fallers and non-fallers. Through controlled feature selection, we identified the most salient predictors of patient falls including gait speed, Hoehn and Yahr stage, postural instability and gait difficulty-related measurements. The model-based and model-free analytical methods we employed included logistic regression, random forests, support vector machines, and XGboost. The reliability of the forecasts was assessed by internal statistical (5-fold) cross validation as well as by external out-of-bag validation. Four specific challenges were addressed in the study: Challenge 1, develop a protocol for harmonizing and aggregating complex, multisource, and multi-site Parkinson's disease data; Challenge 2, identify salient predictive features associated with specific clinical traits, e.g., patient falls; Challenge 3, forecast patient falls and evaluate the classification performance; and Challenge 4, predict tremor dominance (TD) vs. posture instability and gait difficulty (PIGD). Our findings suggest that, compared to other approaches, model-free machine learning based techniques provide a more reliable clinical outcome forecasting of falls in Parkinson's patients, for example, with a classification accuracy of about 70-80%.

5 CFR 1312.4 - Classified designations.

Code of Federal Regulations, 2014 CFR

2014-01-01

... describe. (3) Confidential. This classification shall be applied only to information the unauthorized... 1312.4 Administrative Personnel OFFICE OF MANAGEMENT AND BUDGET OMB DIRECTIVES CLASSIFICATION, DOWNGRADING, DECLASSIFICATION AND SAFEGUARDING OF NATIONAL SECURITY INFORMATION Classification and...
5 CFR 1312.4 - Classified designations.

Code of Federal Regulations, 2012 CFR

2012-01-01

... describe. (3) Confidential. This classification shall be applied only to information the unauthorized... 1312.4 Administrative Personnel OFFICE OF MANAGEMENT AND BUDGET OMB DIRECTIVES CLASSIFICATION, DOWNGRADING, DECLASSIFICATION AND SAFEGUARDING OF NATIONAL SECURITY INFORMATION Classification and...
5 CFR 1312.4 - Classified designations.

Code of Federal Regulations, 2013 CFR

2013-01-01

... describe. (3) Confidential. This classification shall be applied only to information the unauthorized... 1312.4 Administrative Personnel OFFICE OF MANAGEMENT AND BUDGET OMB DIRECTIVES CLASSIFICATION, DOWNGRADING, DECLASSIFICATION AND SAFEGUARDING OF NATIONAL SECURITY INFORMATION Classification and...
Collagen morphology and texture analysis: from statistics to classification

PubMed Central

Mostaço-Guidolin, Leila B.; Ko, Alex C.-T.; Wang, Fei; Xiang, Bo; Hewko, Mark; Tian, Ganghong; Major, Arkady; Shiomi, Masashi; Sowa, Michael G.

2013-01-01

In this study we present an image analysis methodology capable of quantifying morphological changes in tissue collagen fibril organization caused by pathological conditions. Texture analysis based on first-order statistics (FOS) and second-order statistics such as gray level co-occurrence matrix (GLCM) was explored to extract second-harmonic generation (SHG) image features that are associated with the structural and biochemical changes of tissue collagen networks. Based on these extracted quantitative parameters, multi-group classification of SHG images was performed. With combined FOS and GLCM texture values, we achieved reliable classification of SHG collagen images acquired from atherosclerosis arteries with >90% accuracy, sensitivity and specificity. The proposed methodology can be applied to a wide range of conditions involving collagen re-modeling, such as in skin disorders, different types of fibrosis and muscular-skeletal diseases affecting ligaments and cartilage. PMID:23846580
A Bayesian state-space approach for damage detection and classification

NASA Astrophysics Data System (ADS)

Dzunic, Zoran; Chen, Justin G.; Mobahi, Hossein; Büyüköztürk, Oral; Fisher, John W.

2017-11-01

The problem of automatic damage detection in civil structures is complex and requires a system that can interpret collected sensor data into meaningful information. We apply our recently developed switching Bayesian model for dependency analysis to the problems of damage detection and classification. The model relies on a state-space approach that accounts for noisy measurement processes and missing data, which also infers the statistical temporal dependency between measurement locations signifying the potential flow of information within the structure. A Gibbs sampling algorithm is used to simultaneously infer the latent states, parameters of the state dynamics, the dependence graph, and any changes in behavior. By employing a fully Bayesian approach, we are able to characterize uncertainty in these variables via their posterior distribution and provide probabilistic estimates of the occurrence of damage or a specific damage scenario. We also implement a single class classification method which is more realistic for most real world situations where training data for a damaged structure is not available. We demonstrate the methodology with experimental test data from a laboratory model structure and accelerometer data from a real world structure during different environmental and excitation conditions.
A Standard-Driven Data Dictionary for Data Harmonization of Heterogeneous Datasets in Urban Geological Information Systems

NASA Astrophysics Data System (ADS)

Liu, G.; Wu, C.; Li, X.; Song, P.

2013-12-01

The 3D urban geological information system has been a major part of the national urban geological survey project of China Geological Survey in recent years. Large amount of multi-source and multi-subject data are to be stored in the urban geological databases. There are various models and vocabularies drafted and applied by industrial companies in urban geological data. The issues such as duplicate and ambiguous definition of terms and different coding structure increase the difficulty of information sharing and data integration. To solve this problem, we proposed a national standard-driven information classification and coding method to effectively store and integrate urban geological data, and we applied the data dictionary technology to achieve structural and standard data storage. The overall purpose of this work is to set up a common data platform to provide information sharing service. Research progresses are as follows: (1) A unified classification and coding method for multi-source data based on national standards. Underlying national standards include GB 9649-88 for geology and GB/T 13923-2006 for geography. Current industrial models are compared with national standards to build a mapping table. The attributes of various urban geological data entity models are reduced to several categories according to their application phases and domains. Then a logical data model is set up as a standard format to design data file structures for a relational database. (2) A multi-level data dictionary for data standardization constraint. Three levels of data dictionary are designed: model data dictionary is used to manage system database files and enhance maintenance of the whole database system; attribute dictionary organizes fields used in database tables; term and code dictionary is applied to provide a standard for urban information system by adopting appropriate classification and coding methods; comprehensive data dictionary manages system operation and security. (3) An extension to system data management function based on data dictionary. Data item constraint input function is making use of the standard term and code dictionary to get standard input result. Attribute dictionary organizes all the fields of an urban geological information database to ensure the consistency of term use for fields. Model dictionary is used to generate a database operation interface automatically with standard semantic content via term and code dictionary. The above method and technology have been applied to the construction of Fuzhou Urban Geological Information System, South-East China with satisfactory results.
Differentiation of tea varieties using UV-Vis spectra and pattern recognition techniques

NASA Astrophysics Data System (ADS)

Palacios-Morillo, Ana; Alcázar, Ángela.; de Pablos, Fernando; Jurado, José Marcos

2013-02-01

Tea, one of the most consumed beverages all over the world, is of great importance in the economies of a number of countries. Several methods have been developed to classify tea varieties or origins based in pattern recognition techniques applied to chemical data, such as metal profile, amino acids, catechins and volatile compounds. Some of these analytical methods become tedious and expensive to be applied in routine works. The use of UV-Vis spectral data as discriminant variables, highly influenced by the chemical composition, can be an alternative to these methods. UV-Vis spectra of methanol-water extracts of tea have been obtained in the interval 250-800 nm. Absorbances have been used as input variables. Principal component analysis was used to reduce the number of variables and several pattern recognition methods, such as linear discriminant analysis, support vector machines and artificial neural networks, have been applied in order to differentiate the most common tea varieties. A successful classification model was built by combining principal component analysis and multilayer perceptron artificial neural networks, allowing the differentiation between tea varieties. This rapid and simple methodology can be applied to solve classification problems in food industry saving economic resources.
Mouse Model for Aerosol Infection of Influenza (Postprint)

DTIC Science & Technology

2011-12-01

Min, J.-Y., Lamirande, E.W., Santos, C., Jin, H., Kemble, G. and Subbarao , K . (2011) Comparison of a live attenuated 2009 H1N1 vaccine with...AFRL-RX-TY-TP-2012-0010 MOUSE MODEL FOR AEROSOL INFECTION OF INFLUENZA POSTPRINT Rashelle S. McDonald, Brian K . Heimbuch Applied Research...AVAILABILITY STATEMENT 13. SUPPLEMENTARY NOTES 14. ABSTRACT 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: a. REPORT b . ABSTRACT c. THIS PAGE 17
Mapping urban impervious surface using object-based image analysis with WorldView-3 satellite imagery

NASA Astrophysics Data System (ADS)

Iabchoon, Sanwit; Wongsai, Sangdao; Chankon, Kanoksuk

2017-10-01

Land use and land cover (LULC) data are important to monitor and assess environmental change. LULC classification using satellite images is a method widely used on a global and local scale. Especially, urban areas that have various LULC types are important components of the urban landscape and ecosystem. This study aims to classify urban LULC using WorldView-3 (WV-3) very high-spatial resolution satellite imagery and the object-based image analysis method. A decision rules set was applied to classify the WV-3 images in Kathu subdistrict, Phuket province, Thailand. The main steps were as follows: (1) the image was ortho-rectified with ground control points and using the digital elevation model, (2) multiscale image segmentation was applied to divide the image pixel level into image object level, (3) development of the decision ruleset for LULC classification using spectral bands, spectral indices, spatial and contextual information, and (4) accuracy assessment was computed using testing data, which sampled by statistical random sampling. The results show that seven LULC classes (water, vegetation, open space, road, residential, building, and bare soil) were successfully classified with overall classification accuracy of 94.14% and a kappa coefficient of 92.91%.
Classification of Parkinsonian syndromes from FDG-PET brain data using decision trees with SSM/PCA features.

PubMed

Mudali, D; Teune, L K; Renken, R J; Leenders, K L; Roerdink, J B T M

2015-01-01

Medical imaging techniques like fluorodeoxyglucose positron emission tomography (FDG-PET) have been used to aid in the differential diagnosis of neurodegenerative brain diseases. In this study, the objective is to classify FDG-PET brain scans of subjects with Parkinsonian syndromes (Parkinson's disease, multiple system atrophy, and progressive supranuclear palsy) compared to healthy controls. The scaled subprofile model/principal component analysis (SSM/PCA) method was applied to FDG-PET brain image data to obtain covariance patterns and corresponding subject scores. The latter were used as features for supervised classification by the C4.5 decision tree method. Leave-one-out cross validation was applied to determine classifier performance. We carried out a comparison with other types of classifiers. The big advantage of decision tree classification is that the results are easy to understand by humans. A visual representation of decision trees strongly supports the interpretation process, which is very important in the context of medical diagnosis. Further improvements are suggested based on enlarging the number of the training data, enhancing the decision tree method by bagging, and adding additional features based on (f)MRI data.
Weight-elimination neural networks applied to coronary surgery mortality prediction.

PubMed

Ennett, Colleen M; Frize, Monique

2003-06-01

The objective was to assess the effectiveness of the weight-elimination cost function in improving classification performance of artificial neural networks (ANNs) and to observe how changing the a priori distribution of the training set affects network performance. Backpropagation feedforward ANNs with and without weight-elimination estimated mortality for coronary artery surgery patients. The ANNs were trained and tested on cases with 32 input variables describing the patient's medical history; the output variable was in-hospital mortality (mortality rates: training 3.7%, test 3.8%). Artificial training sets with mortality rates of 20%, 50%, and 80% were created to observe the impact of training with a higher-than-normal prevalence. When the results were averaged, weight-elimination networks achieved higher sensitivity rates than those without weight-elimination. Networks trained on higher-than-normal prevalence achieved higher sensitivity rates at the cost of lower specificity and correct classification. The weight-elimination cost function can improve the classification performance when the network is trained with a higher-than-normal prevalence. A network trained with a moderately high artificial mortality rate (artificial mortality rate of 20%) can improve the sensitivity of the model without significantly affecting other aspects of the model's performance. The ANN mortality model achieved comparable performance as additive and statistical models for coronary surgery mortality estimation in the literature.
Mixtures of GAMs for habitat suitability analysis with overdispersed presence / absence data

PubMed Central

Pleydell, David R.J.; Chrétien, Stéphane

2009-01-01

A new approach to species distribution modelling based on unsupervised classification via a finite mixture of GAMs incorporating habitat suitability curves is proposed. A tailored EM algorithm is outlined for computing maximum likelihood estimates. Several submodels incorporating various parameter constraints are explored. Simulation studies confirm, that under certain constraints, the habitat suitability curves are recovered with good precision. The method is also applied to a set of real data concerning presence/absence of observable small mammal indices collected on the Tibetan plateau. The resulting classification was found to correspond to species-level differences in habitat preference described in previous ecological work. PMID:20401331
Predictive models to assess risk of type 2 diabetes, hypertension and comorbidity: machine-learning algorithms and validation using national health data from Kuwait--a cohort study.

PubMed

Farran, Bassam; Channanath, Arshad Mohamed; Behbehani, Kazem; Thanaraj, Thangavel Alphonse

2013-05-14

We build classification models and risk assessment tools for diabetes, hypertension and comorbidity using machine-learning algorithms on data from Kuwait. We model the increased proneness in diabetic patients to develop hypertension and vice versa. We ascertain the importance of ethnicity (and natives vs expatriate migrants) and of using regional data in risk assessment. Retrospective cohort study. Four machine-learning techniques were used: logistic regression, k-nearest neighbours (k-NN), multifactor dimensionality reduction and support vector machines. The study uses fivefold cross validation to obtain generalisation accuracies and errors. Kuwait Health Network (KHN) that integrates data from primary health centres and hospitals in Kuwait. 270 172 hospital visitors (of which, 89 858 are diabetic, 58 745 hypertensive and 30 522 comorbid) comprising Kuwaiti natives, Asian and Arab expatriates. Incident type 2 diabetes, hypertension and comorbidity. Classification accuracies of >85% (for diabetes) and >90% (for hypertension) are achieved using only simple non-laboratory-based parameters. Risk assessment tools based on k-NN classification models are able to assign 'high' risk to 75% of diabetic patients and to 94% of hypertensive patients. Only 5% of diabetic patients are seen assigned 'low' risk. Asian-specific models and assessments perform even better. Pathological conditions of diabetes in the general population or in hypertensive population and those of hypertension are modelled. Two-stage aggregate classification models and risk assessment tools, built combining both the component models on diabetes (or on hypertension), perform better than individual models. Data on diabetes, hypertension and comorbidity from the cosmopolitan State of Kuwait are available for the first time. This enabled us to apply four different case-control models to assess risks. These tools aid in the preliminary non-intrusive assessment of the population. Ethnicity is seen significant to the predictive models. Risk assessments need to be developed using regional data as we demonstrate the applicability of the American Diabetes Association online calculator on data from Kuwait.
Comparison of Pixel-Based and Object-Based Classification Using Parameters and Non-Parameters Approach for the Pattern Consistency of Multi Scale Landcover

NASA Astrophysics Data System (ADS)

Juniati, E.; Arrofiqoh, E. N.

2017-09-01

Information extraction from remote sensing data especially land cover can be obtained by digital classification. In practical some people are more comfortable using visual interpretation to retrieve land cover information. However, it is highly influenced by subjectivity and knowledge of interpreter, also takes time in the process. Digital classification can be done in several ways, depend on the defined mapping approach and assumptions on data distribution. The study compared several classifiers method for some data type at the same location. The data used Landsat 8 satellite imagery, SPOT 6 and Orthophotos. In practical, the data used to produce land cover map in 1:50,000 map scale for Landsat, 1:25,000 map scale for SPOT and 1:5,000 map scale for Orthophotos, but using visual interpretation to retrieve information. Maximum likelihood Classifiers (MLC) which use pixel-based and parameters approach applied to such data, and also Artificial Neural Network classifiers which use pixel-based and non-parameters approach applied too. Moreover, this study applied object-based classifiers to the data. The classification system implemented is land cover classification on Indonesia topographic map. The classification applied to data source, which is expected to recognize the pattern and to assess consistency of the land cover map produced by each data. Furthermore, the study analyse benefits and limitations the use of methods.
Accurate crop classification using hierarchical genetic fuzzy rule-based systems

NASA Astrophysics Data System (ADS)

Topaloglou, Charalampos A.; Mylonas, Stelios K.; Stavrakoudis, Dimitris G.; Mastorocostas, Paris A.; Theocharis, John B.

2014-10-01

This paper investigates the effectiveness of an advanced classification system for accurate crop classification using very high resolution (VHR) satellite imagery. Specifically, a recently proposed genetic fuzzy rule-based classification system (GFRBCS) is employed, namely, the Hierarchical Rule-based Linguistic Classifier (HiRLiC). HiRLiC's model comprises a small set of simple IF-THEN fuzzy rules, easily interpretable by humans. One of its most important attributes is that its learning algorithm requires minimum user interaction, since the most important learning parameters affecting the classification accuracy are determined by the learning algorithm automatically. HiRLiC is applied in a challenging crop classification task, using a SPOT5 satellite image over an intensively cultivated area in a lake-wetland ecosystem in northern Greece. A rich set of higher-order spectral and textural features is derived from the initial bands of the (pan-sharpened) image, resulting in an input space comprising 119 features. The experimental analysis proves that HiRLiC compares favorably to other interpretable classifiers of the literature, both in terms of structural complexity and classification accuracy. Its testing accuracy was very close to that obtained by complex state-of-the-art classification systems, such as the support vector machines (SVM) and random forest (RF) classifiers. Nevertheless, visual inspection of the derived classification maps shows that HiRLiC is characterized by higher generalization properties, providing more homogeneous classifications that the competitors. Moreover, the runtime requirements for producing the thematic map was orders of magnitude lower than the respective for the competitors.
A neural network approach for enhancing information extraction from multispectral image data

USGS Publications Warehouse

Liu, J.; Shao, G.; Zhu, H.; Liu, S.

2005-01-01

A back-propagation artificial neural network (ANN) was applied to classify multispectral remote sensing imagery data. The classification procedure included four steps: (i) noisy training that adds minor random variations to the sampling data to make the data more representative and to reduce the training sample size; (ii) iterative or multi-tier classification that reclassifies the unclassified pixels by making a subset of training samples from the original training set, which means the neural model can focus on fewer classes; (iii) spectral channel selection based on neural network weights that can distinguish the relative importance of each channel in the classification process to simplify the ANN model; and (iv) voting rules that adjust the accuracy of classification and produce outputs of different confidence levels. The Purdue Forest, located west of Purdue University, West Lafayette, Indiana, was chosen as the test site. The 1992 Landsat thematic mapper imagery was used as the input data. High-quality airborne photographs of the same Lime period were used for the ground truth. A total of 11 land use and land cover classes were defined, including water, broadleaved forest, coniferous forest, young forest, urban and road, and six types of cropland-grassland. The experiment, indicated that the back-propagation neural network application was satisfactory in distinguishing different land cover types at US Geological Survey levels II-III. The single-tier classification reached an overall accuracy of 85%. and the multi-tier classification an overall accuracy of 95%. For the whole test, region, the final output of this study reached an overall accuracy of 87%. ?? 2005 CASI.
Support Vector Machine Model for Automatic Detection and Classification of Seismic Events

NASA Astrophysics Data System (ADS)

Barros, Vesna; Barros, Lucas

2016-04-01

The automated processing of multiple seismic signals to detect, localize and classify seismic events is a central tool in both natural hazards monitoring and nuclear treaty verification. However, false detections and missed detections caused by station noise and incorrect classification of arrivals are still an issue and the events are often unclassified or poorly classified. Thus, machine learning techniques can be used in automatic processing for classifying the huge database of seismic recordings and provide more confidence in the final output. Applied in the context of the International Monitoring System (IMS) - a global sensor network developed for the Comprehensive Nuclear-Test-Ban Treaty (CTBT) - we propose a fully automatic method for seismic event detection and classification based on a supervised pattern recognition technique called the Support Vector Machine (SVM). According to Kortström et al., 2015, the advantages of using SVM are handleability of large number of features and effectiveness in high dimensional spaces. Our objective is to detect seismic events from one IMS seismic station located in an area of high seismicity and mining activity and classify them as earthquakes or quarry blasts. It is expected to create a flexible and easily adjustable SVM method that can be applied in different regions and datasets. Taken a step further, accurate results for seismic stations could lead to a modification of the model and its parameters to make it applicable to other waveform technologies used to monitor nuclear explosions such as infrasound and hydroacoustic waveforms. As an authorized user, we have direct access to all IMS data and bulletins through a secure signatory account. A set of significant seismic waveforms containing different types of events (e.g. earthquake, quarry blasts) and noise is being analysed to train the model and learn the typical pattern of the signal from these events. Moreover, comparing the performance of the support-vector network to various classical learning algorithms used before in seismic detection and classification is an essential final step to analyze the advantages and disadvantages of the model.
Linear mixing model applied to coarse spatial resolution data from multispectral satellite sensors

NASA Technical Reports Server (NTRS)

Holben, Brent N.; Shimabukuro, Yosio E.

1993-01-01

A linear mixing model was applied to coarse spatial resolution data from the NOAA Advanced Very High Resolution Radiometer. The reflective component of the 3.55-3.95 micron channel was used with the two reflective channels 0.58-0.68 micron and 0.725-1.1 micron to run a constrained least squares model to generate fraction images for an area in the west central region of Brazil. The fraction images were compared with an unsupervised classification derived from Landsat TM data acquired on the same day. The relationship between the fraction images and normalized difference vegetation index images show the potential of the unmixing techniques when using coarse spatial resolution data for global studies.
A higher order conditional random field model for simultaneous classification of land cover and land use

NASA Astrophysics Data System (ADS)

Albert, Lena; Rottensteiner, Franz; Heipke, Christian

2017-08-01

We propose a new approach for the simultaneous classification of land cover and land use considering spatial as well as semantic context. We apply a Conditional Random Fields (CRF) consisting of a land cover and a land use layer. In the land cover layer of the CRF, the nodes represent super-pixels; in the land use layer, the nodes correspond to objects from a geospatial database. Intra-layer edges of the CRF model spatial dependencies between neighbouring image sites. All spatially overlapping sites in both layers are connected by inter-layer edges, which leads to higher order cliques modelling the semantic relation between all land cover and land use sites in the clique. A generic formulation of the higher order potential is proposed. In order to enable efficient inference in the two-layer higher order CRF, we propose an iterative inference procedure in which the two classification tasks mutually influence each other. We integrate contextual relations between land cover and land use in the classification process by using contextual features describing the complex dependencies of all nodes in a higher order clique. These features are incorporated in a discriminative classifier, which approximates the higher order potentials during the inference procedure. The approach is designed for input data based on aerial images. Experiments are carried out on two test sites to evaluate the performance of the proposed method. The experiments show that the classification results are improved compared to the results of a non-contextual classifier. For land cover classification, the result is much more homogeneous and the delineation of land cover segments is improved. For the land use classification, an improvement is mainly achieved for land use objects showing non-typical characteristics or similarities to other land use classes. Furthermore, we have shown that the size of the super-pixels has an influence on the level of detail of the classification result, but also on the degree of smoothing induced by the segmentation method, which is especially beneficial for land cover classes covering large, homogeneous areas.
Distribution of cavity trees in midwestern old-growth and second-growth forests

Treesearch

Zhaofei Fan; Stephen R. Shifley; Martin A. Spetich; Frank R. Thompson; David R. Larsen

2003-01-01

We used classification and regression tree analysis to determine the primary variables associated with the occurrence of cavity trees and the hierarchical structure among those variables. We applied that information to develop logistic models predicting cavity tree probability as a function of diameter, species group, and decay class. Inventories of cavity abundance in...

Distribution of cavity trees in midwesternold-growth and second-growth forests

Treesearch

Zhaofei Fan; Stephen R. Shifley; Martin A. Spetich; Frank R., III Thompson; David R. Larsen

2003-01-01

We used classification and regression tree analysis to determine the primary variables associated with the occurrence of cavity trees and the hierarchical structure among those variables. We applied that information to develop logistic models predicting cavity tree probability as a function of diameter, species group, and decay class. Inventories of cavity abundance in...
Applying a Qualitative Modeling Shell to Process Diagnosis: The Caster System. ONR Technical Report #16.

ERIC Educational Resources Information Center

Thompson, Timothy F.; Clancey, William J.

This report describes the application of a shell expert system from the medical diagnostic system, Neomycin, to Caster, a diagnostic system for malfunctions in industrial sandcasting. This system was developed to test the hypothesis that starting with a well-developed classification procedure and a relational language for stating the…
Classification of time-of-flight secondary ion mass spectrometry spectra from complex Cu-Fe sulphides by principal component analysis and artificial neural networks.

PubMed

Kalegowda, Yogesh; Harmer, Sarah L

2013-01-08

Artificial neural network (ANN) and a hybrid principal component analysis-artificial neural network (PCA-ANN) classifiers have been successfully implemented for classification of static time-of-flight secondary ion mass spectrometry (ToF-SIMS) mass spectra collected from complex Cu-Fe sulphides (chalcopyrite, bornite, chalcocite and pyrite) at different flotation conditions. ANNs are very good pattern classifiers because of: their ability to learn and generalise patterns that are not linearly separable; their fault and noise tolerance capability; and high parallelism. In the first approach, fragments from the whole ToF-SIMS spectrum were used as input to the ANN, the model yielded high overall correct classification rates of 100% for feed samples, 88% for conditioned feed samples and 91% for Eh modified samples. In the second approach, the hybrid pattern classifier PCA-ANN was integrated. PCA is a very effective multivariate data analysis tool applied to enhance species features and reduce data dimensionality. Principal component (PC) scores which accounted for 95% of the raw spectral data variance, were used as input to the ANN, the model yielded high overall correct classification rates of 88% for conditioned feed samples and 95% for Eh modified samples. Copyright © 2012 Elsevier B.V. All rights reserved.
Scalable clustering algorithms for continuous environmental flow cytometry.

PubMed

Hyrkas, Jeremy; Clayton, Sophie; Ribalet, Francois; Halperin, Daniel; Armbrust, E Virginia; Howe, Bill

2016-02-01

Recent technological innovations in flow cytometry now allow oceanographers to collect high-frequency flow cytometry data from particles in aquatic environments on a scale far surpassing conventional flow cytometers. The SeaFlow cytometer continuously profiles microbial phytoplankton populations across thousands of kilometers of the surface ocean. The data streams produced by instruments such as SeaFlow challenge the traditional sample-by-sample approach in cytometric analysis and highlight the need for scalable clustering algorithms to extract population information from these large-scale, high-frequency flow cytometers. We explore how available algorithms commonly used for medical applications perform at classification of such a large-scale, environmental flow cytometry data. We apply large-scale Gaussian mixture models to massive datasets using Hadoop. This approach outperforms current state-of-the-art cytometry classification algorithms in accuracy and can be coupled with manual or automatic partitioning of data into homogeneous sections for further classification gains. We propose the Gaussian mixture model with partitioning approach for classification of large-scale, high-frequency flow cytometry data. Source code available for download at https://github.com/jhyrkas/seaflow_cluster, implemented in Java for use with Hadoop. hyrkas@cs.washington.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Identification of immune correlates of protection in Shigella infection by application of machine learning.

PubMed

Arevalillo, Jorge M; Sztein, Marcelo B; Kotloff, Karen L; Levine, Myron M; Simon, Jakub K

2017-10-01

Immunologic correlates of protection are important in vaccine development because they give insight into mechanisms of protection, assist in the identification of promising vaccine candidates, and serve as endpoints in bridging clinical vaccine studies. Our goal is the development of a methodology to identify immunologic correlates of protection using the Shigella challenge as a model. The proposed methodology utilizes the Random Forests (RF) machine learning algorithm as well as Classification and Regression Trees (CART) to detect immune markers that predict protection, identify interactions between variables, and define optimal cutoffs. Logistic regression modeling is applied to estimate the probability of protection and the confidence interval (CI) for such a probability is computed by bootstrapping the logistic regression models. The results demonstrate that the combination of Classification and Regression Trees and Random Forests complements the standard logistic regression and uncovers subtle immune interactions. Specific levels of immunoglobulin IgG antibody in blood on the day of challenge predicted protection in 75% (95% CI 67-86). Of those subjects that did not have blood IgG at or above a defined threshold, 100% were protected if they had IgA antibody secreting cells above a defined threshold. Comparison with the results obtained by applying only logistic regression modeling with standard Akaike Information Criterion for model selection shows the usefulness of the proposed method. Given the complexity of the immune system, the use of machine learning methods may enhance traditional statistical approaches. When applied together, they offer a novel way to quantify important immune correlates of protection that may help the development of vaccines. Copyright © 2017 Elsevier Inc. All rights reserved.
Data Mining Methods Applied to Flight Operations Quality Assurance Data: A Comparison to Standard Statistical Methods

NASA Technical Reports Server (NTRS)

Stolzer, Alan J.; Halford, Carl

2007-01-01

In a previous study, multiple regression techniques were applied to Flight Operations Quality Assurance-derived data to develop parsimonious model(s) for fuel consumption on the Boeing 757 airplane. The present study examined several data mining algorithms, including neural networks, on the fuel consumption problem and compared them to the multiple regression results obtained earlier. Using regression methods, parsimonious models were obtained that explained approximately 85% of the variation in fuel flow. In general data mining methods were more effective in predicting fuel consumption. Classification and Regression Tree methods reported correlation coefficients of .91 to .92, and General Linear Models and Multilayer Perceptron neural networks reported correlation coefficients of about .99. These data mining models show great promise for use in further examining large FOQA databases for operational and safety improvements.
Improving the timeliness and accuracy of injury severity data in road traffic accidents in an emerging economy setting.

PubMed

Lam, Carlos; Chen, Chang-I; Chuang, Chia-Chang; Wu, Chia-Chieh; Yu, Shih-Hsiang; Chang, Kai-Kuo; Chiu, Wen-Ta

2018-05-18

Road traffic injuries (RTIs) are among the leading causes of injury and fatality worldwide. RTI casualties are continually increasing in Taiwan; however, because of a lack of an advanced method for classifying RTI severity data, as well as the fragmentation of data sources, road traffic safety and health agencies encounter difficulties in analyzing RTIs and their burden on the healthcare system and national resources. These difficulties lead to blind spots during policy-making for RTI prevention and control. After compiling classifications applied in various countries, we summarized data sources for RTI severity in Taiwan, through which we identified data fragmentation. Accordingly, we proposed a practical classification for RTI severity, as well as a feasible model for collecting and integrating these data nationwide. This model can provide timely relevant data recorded by medical professionals and is valuable to healthcare providers. The proposed model's pros and cons are also compared to those of other current models.
A novel end-to-end classifier using domain transferred deep convolutional neural networks for biomedical images.

PubMed

Pang, Shuchao; Yu, Zhezhou; Orgun, Mehmet A

2017-03-01

Highly accurate classification of biomedical images is an essential task in the clinical diagnosis of numerous medical diseases identified from those images. Traditional image classification methods combined with hand-crafted image feature descriptors and various classifiers are not able to effectively improve the accuracy rate and meet the high requirements of classification of biomedical images. The same also holds true for artificial neural network models directly trained with limited biomedical images used as training data or directly used as a black box to extract the deep features based on another distant dataset. In this study, we propose a highly reliable and accurate end-to-end classifier for all kinds of biomedical images via deep learning and transfer learning. We first apply domain transferred deep convolutional neural network for building a deep model; and then develop an overall deep learning architecture based on the raw pixels of original biomedical images using supervised training. In our model, we do not need the manual design of the feature space, seek an effective feature vector classifier or segment specific detection object and image patches, which are the main technological difficulties in the adoption of traditional image classification methods. Moreover, we do not need to be concerned with whether there are large training sets of annotated biomedical images, affordable parallel computing resources featuring GPUs or long times to wait for training a perfect deep model, which are the main problems to train deep neural networks for biomedical image classification as observed in recent works. With the utilization of a simple data augmentation method and fast convergence speed, our algorithm can achieve the best accuracy rate and outstanding classification ability for biomedical images. We have evaluated our classifier on several well-known public biomedical datasets and compared it with several state-of-the-art approaches. We propose a robust automated end-to-end classifier for biomedical images based on a domain transferred deep convolutional neural network model that shows a highly reliable and accurate performance which has been confirmed on several public biomedical image datasets. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.
Fault classification method for the driving safety of electrified vehicles

NASA Astrophysics Data System (ADS)

Wanner, Daniel; Drugge, Lars; Stensson Trigell, Annika

2014-05-01

A fault classification method is proposed which has been applied to an electric vehicle. Potential faults in the different subsystems that can affect the vehicle directional stability were collected in a failure mode and effect analysis. Similar driveline faults were grouped together if they resembled each other with respect to their influence on the vehicle dynamic behaviour. The faults were physically modelled in a simulation environment before they were induced in a detailed vehicle model under normal driving conditions. A special focus was placed on faults in the driveline of electric vehicles employing in-wheel motors of the permanent magnet type. Several failures caused by mechanical and other faults were analysed as well. The fault classification method consists of a controllability ranking developed according to the functional safety standard ISO 26262. The controllability of a fault was determined with three parameters covering the influence of the longitudinal, lateral and yaw motion of the vehicle. The simulation results were analysed and the faults were classified according to their controllability using the proposed method. It was shown that the controllability decreased specifically with increasing lateral acceleration and increasing speed. The results for the electric driveline faults show that this trend cannot be generalised for all the faults, as the controllability deteriorated for some faults during manoeuvres with low lateral acceleration and low speed. The proposed method is generic and can be applied to various other types of road vehicles and faults.
Supervised Learning Applied to Air Traffic Trajectory Classification

NASA Technical Reports Server (NTRS)

Bosson, Christabelle S.; Nikoleris, Tasos

2018-01-01

Given the recent increase of interest in introducing new vehicle types and missions into the National Airspace System, a transition towards a more autonomous air traffic control system is required in order to enable and handle increased density and complexity. This paper presents an exploratory effort of the needed autonomous capabilities by exploring supervised learning techniques in the context of aircraft trajectories. In particular, it focuses on the application of machine learning algorithms and neural network models to a runway recognition trajectory-classification study. It investigates the applicability and effectiveness of various classifiers using datasets containing trajectory records for a month of air traffic. A feature importance and sensitivity analysis are conducted to challenge the chosen time-based datasets and the ten selected features. The study demonstrates that classification accuracy levels of 90% and above can be reached in less than 40 seconds of training for most machine learning classifiers when one track data point, described by the ten selected features at a particular time step, per trajectory is used as input. It also shows that neural network models can achieve similar accuracy levels but at higher training time costs.
Quality classification of Spanish olive oils by untargeted gas chromatography coupled to hybrid quadrupole-time of flight mass spectrometry with atmospheric pressure chemical ionization and metabolomics-based statistical approach.

PubMed

Sales, C; Cervera, M I; Gil, R; Portolés, T; Pitarch, E; Beltran, J

2017-02-01

The novel atmospheric pressure chemical ionization (APCI) source has been used in combination with gas chromatography (GC) coupled to hybrid quadrupole time-of-flight (QTOF) mass spectrometry (MS) for determination of volatile components of olive oil, enhancing its potential for classification of olive oil samples according to their quality using a metabolomics-based approach. The full-spectrum acquisition has allowed the detection of volatile organic compounds (VOCs) in olive oil samples, including Extra Virgin, Virgin and Lampante qualities. A dynamic headspace extraction with cartridge solvent elution was applied. The metabolomics strategy consisted of three different steps: a full mass spectral alignment of GC-MS data using MzMine 2.0, a multivariate analysis using Ez-Info and the creation of the statistical model with combinations of responses for molecular fragments. The model was finally validated using blind samples, obtaining an accuracy in oil classification of 70%, taking the official established method, "PANEL TEST", as reference. Copyright © 2016 Elsevier Ltd. All rights reserved.
Fault detection and diagnosis of induction motors using motor current signature analysis and a hybrid FMM-CART model.

PubMed

Seera, Manjeevan; Lim, Chee Peng; Ishak, Dahaman; Singh, Harapajan

2012-01-01

In this paper, a novel approach to detect and classify comprehensive fault conditions of induction motors using a hybrid fuzzy min-max (FMM) neural network and classification and regression tree (CART) is proposed. The hybrid model, known as FMM-CART, exploits the advantages of both FMM and CART for undertaking data classification and rule extraction problems. A series of real experiments is conducted, whereby the motor current signature analysis method is applied to form a database comprising stator current signatures under different motor conditions. The signal harmonics from the power spectral density are extracted as discriminative input features for fault detection and classification with FMM-CART. A comprehensive list of induction motor fault conditions, viz., broken rotor bars, unbalanced voltages, stator winding faults, and eccentricity problems, has been successfully classified using FMM-CART with good accuracy rates. The results are comparable, if not better, than those reported in the literature. Useful explanatory rules in the form of a decision tree are also elicited from FMM-CART to analyze and understand different fault conditions of induction motors.
12 CFR 605.502 - Program and procedures.

Code of Federal Regulations, 2013 CFR

2013-01-01

... procedures. (a) The Farm Credit Administration has no authority for the original classification of... classify information. (b) Derivative classification. “Derivative classification” means the incorporating... developed material consistent with the classification markings that apply to the source information...
12 CFR 605.502 - Program and procedures.

Code of Federal Regulations, 2014 CFR

2014-01-01

... procedures. (a) The Farm Credit Administration has no authority for the original classification of... classify information. (b) Derivative classification. “Derivative classification” means the incorporating... developed material consistent with the classification markings that apply to the source information...
12 CFR 605.502 - Program and procedures.

Code of Federal Regulations, 2011 CFR

2011-01-01

... procedures. (a) The Farm Credit Administration has no authority for the original classification of... classify information. (b) Derivative classification. “Derivative classification” means the incorporating... developed material consistent with the classification markings that apply to the source information...
12 CFR 605.502 - Program and procedures.

Code of Federal Regulations, 2012 CFR

2012-01-01

... procedures. (a) The Farm Credit Administration has no authority for the original classification of... classify information. (b) Derivative classification. “Derivative classification” means the incorporating... developed material consistent with the classification markings that apply to the source information...
MetaKTSP: a meta-analytic top scoring pair method for robust cross-study validation of omics prediction analysis.

PubMed

Kim, SungHwan; Lin, Chien-Wei; Tseng, George C

2016-07-01

Supervised machine learning is widely applied to transcriptomic data to predict disease diagnosis, prognosis or survival. Robust and interpretable classifiers with high accuracy are usually favored for their clinical and translational potential. The top scoring pair (TSP) algorithm is an example that applies a simple rank-based algorithm to identify rank-altered gene pairs for classifier construction. Although many classification methods perform well in cross-validation of single expression profile, the performance usually greatly reduces in cross-study validation (i.e. the prediction model is established in the training study and applied to an independent test study) for all machine learning methods, including TSP. The failure of cross-study validation has largely diminished the potential translational and clinical values of the models. The purpose of this article is to develop a meta-analytic top scoring pair (MetaKTSP) framework that combines multiple transcriptomic studies and generates a robust prediction model applicable to independent test studies. We proposed two frameworks, by averaging TSP scores or by combining P-values from individual studies, to select the top gene pairs for model construction. We applied the proposed methods in simulated data sets and three large-scale real applications in breast cancer, idiopathic pulmonary fibrosis and pan-cancer methylation. The result showed superior performance of cross-study validation accuracy and biomarker selection for the new meta-analytic framework. In conclusion, combining multiple omics data sets in the public domain increases robustness and accuracy of the classification model that will ultimately improve disease understanding and clinical treatment decisions to benefit patients. An R package MetaKTSP is available online. (http://tsenglab.biostat.pitt.edu/software.htm). ctseng@pitt.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Automatic Classification of Time-variable X-Ray Sources

NASA Astrophysics Data System (ADS)

Lo, Kitty K.; Farrell, Sean; Murphy, Tara; Gaensler, B. M.

2014-05-01

To maximize the discovery potential of future synoptic surveys, especially in the field of transient science, it will be necessary to use automatic classification to identify some of the astronomical sources. The data mining technique of supervised classification is suitable for this problem. Here, we present a supervised learning method to automatically classify variable X-ray sources in the Second XMM-Newton Serendipitous Source Catalog (2XMMi-DR2). Random Forest is our classifier of choice since it is one of the most accurate learning algorithms available. Our training set consists of 873 variable sources and their features are derived from time series, spectra, and other multi-wavelength contextual information. The 10 fold cross validation accuracy of the training data is ~97% on a 7 class data set. We applied the trained classification model to 411 unknown variable 2XMM sources to produce a probabilistically classified catalog. Using the classification margin and the Random Forest derived outlier measure, we identified 12 anomalous sources, of which 2XMM J180658.7-500250 appears to be the most unusual source in the sample. Its X-ray spectra is suggestive of a ultraluminous X-ray source but its variability makes it highly unusual. Machine-learned classification and anomaly detection will facilitate scientific discoveries in the era of all-sky surveys.
The influence of different classification standards of age groups on prognosis in high-grade hemispheric glioma patients.

PubMed

Chen, Jian-Wu; Zhou, Chang-Fu; Lin, Zhi-Xiong

2015-09-15

Although age is thought to correlate with the prognosis of glioma patients, the most appropriate age-group classification standard to evaluate prognosis had not been fully studied. This study aimed to investigate the influence of age-group classification standards on the prognosis of patients with high-grade hemispheric glioma (HGG). This retrospective study of 125 HGG patients used three different classification standards of age-groups (≤ 50 and >50 years old, ≤ 60 and >60 years old, ≤ 45 and 45-65 and ≥ 65 years old) to evaluate the impact of age on prognosis. The primary end-point was overall survival (OS). The Kaplan-Meier method was applied for univariate analysis and Cox proportional hazards model for multivariate analysis. Univariate analysis showed a significant correlation between OS and all three classification standards of age-groups as well as between OS and pathological grade, gender, location of glioma, and regular chemotherapy and radiotherapy treatment. Multivariate analysis showed that the only independent predictors of OS were classification standard of age-groups ≤ 50 and > 50 years old, pathological grade and regular chemotherapy. In summary, the most appropriate classification standard of age-groups as an independent prognostic factor was ≤ 50 and > 50 years old. Pathological grade and chemotherapy were also independent predictors of OS in post-operative HGG patients. Copyright © 2015. Published by Elsevier B.V.
Non-invasive classification of gas-liquid two-phase horizontal flow regimes using an ultrasonic Doppler sensor and a neural network

NASA Astrophysics Data System (ADS)

Musa Abbagoni, Baba; Yeung, Hoi

2016-08-01

The identification of flow pattern is a key issue in multiphase flow which is encountered in the petrochemical industry. It is difficult to identify the gas-liquid flow regimes objectively with the gas-liquid two-phase flow. This paper presents the feasibility of a clamp-on instrument for an objective flow regime classification of two-phase flow using an ultrasonic Doppler sensor and an artificial neural network, which records and processes the ultrasonic signals reflected from the two-phase flow. Experimental data is obtained on a horizontal test rig with a total pipe length of 21 m and 5.08 cm internal diameter carrying air-water two-phase flow under slug, elongated bubble, stratified-wavy and, stratified flow regimes. Multilayer perceptron neural networks (MLPNNs) are used to develop the classification model. The classifier requires features as an input which is representative of the signals. Ultrasound signal features are extracted by applying both power spectral density (PSD) and discrete wavelet transform (DWT) methods to the flow signals. A classification scheme of ‘1-of-C coding method for classification’ was adopted to classify features extracted into one of four flow regime categories. To improve the performance of the flow regime classifier network, a second level neural network was incorporated by using the output of a first level networks feature as an input feature. The addition of the two network models provided a combined neural network model which has achieved a higher accuracy than single neural network models. Classification accuracies are evaluated in the form of both the PSD and DWT features. The success rates of the two models are: (1) using PSD features, the classifier missed 3 datasets out of 24 test datasets of the classification and scored 87.5% accuracy; (2) with the DWT features, the network misclassified only one data point and it was able to classify the flow patterns up to 95.8% accuracy. This approach has demonstrated the success of a clamp-on ultrasound sensor for flow regime classification that would be possible in industry practice. It is considerably more promising than other techniques as it uses a non-invasive and non-radioactive sensor.

Event classification and optimization methods using artificial intelligence and other relevant techniques: Sharing the experiences

NASA Astrophysics Data System (ADS)

Mohamed, Abdul Aziz; Hasan, Abu Bakar; Ghazali, Abu Bakar Mhd.

2017-01-01

Classification of large data into respected classes or groups could be carried out with the help of artificial intelligence (AI) tools readily available in the market. To get the optimum or best results, optimization tool could be applied on those data. Classification and optimization have been used by researchers throughout their works, and the outcomes were very encouraging indeed. Here, the authors are trying to share what they have experienced in three different areas of applied research.
Application of Multispectral Imaging to Determine Quality Attributes and Ripeness Stage in Strawberry Fruit

PubMed Central

Liu, Changhong; Liu, Wei; Lu, Xuzhong; Ma, Fei; Chen, Wei; Yang, Jianbo; Zheng, Lei

2014-01-01

Multispectral imaging with 19 wavelengths in the range of 405–970 nm has been evaluated for nondestructive determination of firmness, total soluble solids (TSS) content and ripeness stage in strawberry fruit. Several analysis approaches, including partial least squares (PLS), support vector machine (SVM) and back propagation neural network (BPNN), were applied to develop theoretical models for predicting the firmness and TSS of intact strawberry fruit. Compared with PLS and SVM, BPNN considerably improved the performance of multispectral imaging for predicting firmness and total soluble solids content with the correlation coefficient (r) of 0.94 and 0.83, SEP of 0.375 and 0.573, and bias of 0.035 and 0.056, respectively. Subsequently, the ability of multispectral imaging technology to classify fruit based on ripeness stage was tested using SVM and principal component analysis-back propagation neural network (PCA-BPNN) models. The higher classification accuracy of 100% was achieved using SVM model. Moreover, the results of all these models demonstrated that the VIS parts of the spectra were the main contributor to the determination of firmness, TSS content estimation and classification of ripeness stage in strawberry fruit. These results suggest that multispectral imaging, together with suitable analysis model, is a promising technology for rapid estimation of quality attributes and classification of ripeness stage in strawberry fruit. PMID:24505317
SVM Based Descriptor Selection and Classification of Neurodegenerative Disease Drugs for Pharmacological Modeling.

PubMed

Shahid, Mohammad; Shahzad Cheema, Muhammad; Klenner, Alexander; Younesi, Erfan; Hofmann-Apitius, Martin

2013-03-01

Systems pharmacological modeling of drug mode of action for the next generation of multitarget drugs may open new routes for drug design and discovery. Computational methods are widely used in this context amongst which support vector machines (SVM) have proven successful in addressing the challenge of classifying drugs with similar features. We have applied a variety of such SVM-based approaches, namely SVM-based recursive feature elimination (SVM-RFE). We use the approach to predict the pharmacological properties of drugs widely used against complex neurodegenerative disorders (NDD) and to build an in-silico computational model for the binary classification of NDD drugs from other drugs. Application of an SVM-RFE model to a set of drugs successfully classified NDD drugs from non-NDD drugs and resulted in overall accuracy of ∼80 % with 10 fold cross validation using 40 top ranked molecular descriptors selected out of total 314 descriptors. Moreover, SVM-RFE method outperformed linear discriminant analysis (LDA) based feature selection and classification. The model reduced the multidimensional descriptors space of drugs dramatically and predicted NDD drugs with high accuracy, while avoiding over fitting. Based on these results, NDD-specific focused libraries of drug-like compounds can be designed and existing NDD-specific drugs can be characterized by a well-characterized set of molecular descriptors. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
15 CFR 748.7 - Applying electronically for a license or Classification request.

Code of Federal Regulations, 2010 CFR

2010-01-01

... 15 Commerce and Foreign Trade 2 2010-01-01 2010-01-01 false Applying electronically for a license or Classification request. 748.7 Section 748.7 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued) BUREAU OF INDUSTRY AND SECURITY, DEPARTMENT OF COMMERCE EXPORT...
A vegetation classification system applied to southern California

Treesearch

Timothy E. Paysen; Jeanine A. Derby; Hugh Black; Vernon C. Bleich; John W. Mincks

1980-01-01

A classification system for use in describing vegetation has been developed and is being applied to southern California. It is based upon a hierarchical stratification of vegetation, using physiognomic and taxonomic criteria. The system categories are Formation, Subformation. Series, Association, and Phase. Formations, Subformations, and Series have been specified for...
Toward improving fine needle aspiration cytology by applying Raman microspectroscopy

NASA Astrophysics Data System (ADS)

Becker-Putsche, Melanie; Bocklitz, Thomas; Clement, Joachim; Rösch, Petra; Popp, Jürgen

2013-04-01

Medical diagnosis of biopsies performed by fine needle aspiration has to be very reliable. Therefore, pathologists/cytologists need additional biochemical information on single cancer cells for an accurate diagnosis. Accordingly, we applied three different classification models for discriminating various features of six breast cancer cell lines by analyzing Raman microspectroscopic data. The statistical evaluations are implemented by linear discriminant analysis (LDA) and support vector machines (SVM). For the first model, a total of 61,580 Raman spectra from 110 single cells are discriminated at the cell-line level with an accuracy of 99.52% using an SVM. The LDA classification based on Raman data achieved an accuracy of 94.04% by discriminating cell lines by their origin (solid tumor versus pleural effusion). In the third model, Raman cell spectra are classified by their cancer subtypes. LDA results show an accuracy of 97.45% and specificities of 97.78%, 99.11%, and 98.97% for the subtypes basal-like, HER2+/ER-, and luminal, respectively. These subtypes are confirmed by gene expression patterns, which are important prognostic features in diagnosis. This work shows the applicability of Raman spectroscopy and statistical data handling in analyzing cancer-relevant biochemical information for advanced medical diagnosis on the single-cell level.
Feature Fusion of ICP-AES, UV-Vis and FT-MIR for Origin Traceability of Boletus edulis Mushrooms in Combination with Chemometrics.

PubMed

Qi, Luming; Liu, Honggao; Li, Jieqing; Li, Tao; Wang, Yuanzhong

2018-01-15

Origin traceability is an important step to control the nutritional and pharmacological quality of food products. Boletus edulis mushroom is a well-known food resource in the world. Its nutritional and medicinal properties are drastically varied depending on geographical origins. In this study, three sensor systems (inductively coupled plasma atomic emission spectrophotometer (ICP-AES), ultraviolet-visible (UV-Vis) and Fourier transform mid-infrared spectroscopy (FT-MIR)) were applied for the origin traceability of 192 mushroom samples (caps and stipes) in combination with chemometrics. The difference between cap and stipe was clearly illustrated based on a single sensor technique, respectively. Feature variables from three instruments were used for origin traceability. Two supervised classification methods, partial least square discriminant analysis (FLS-DA) and grid search support vector machine (GS-SVM), were applied to develop mathematical models. Two steps (internal cross-validation and external prediction for unknown samples) were used to evaluate the performance of a classification model. The result is satisfactory with high accuracies ranging from 90.625% to 100%. These models also have an excellent generalization ability with the optimal parameters. Based on the combination of three sensory systems, our study provides a multi-sensory and comprehensive origin traceability of B. edulis mushrooms.
Feature Fusion of ICP-AES, UV-Vis and FT-MIR for Origin Traceability of Boletus edulis Mushrooms in Combination with Chemometrics

PubMed Central

Qi, Luming; Liu, Honggao; Li, Jieqing; Li, Tao

2018-01-01

Origin traceability is an important step to control the nutritional and pharmacological quality of food products. Boletus edulis mushroom is a well-known food resource in the world. Its nutritional and medicinal properties are drastically varied depending on geographical origins. In this study, three sensor systems (inductively coupled plasma atomic emission spectrophotometer (ICP-AES), ultraviolet-visible (UV-Vis) and Fourier transform mid-infrared spectroscopy (FT-MIR)) were applied for the origin traceability of 184 mushroom samples (caps and stipes) in combination with chemometrics. The difference between cap and stipe was clearly illustrated based on a single sensor technique, respectively. Feature variables from three instruments were used for origin traceability. Two supervised classification methods, partial least square discriminant analysis (FLS-DA) and grid search support vector machine (GS-SVM), were applied to develop mathematical models. Two steps (internal cross-validation and external prediction for unknown samples) were used to evaluate the performance of a classification model. The result is satisfactory with high accuracies ranging from 90.625% to 100%. These models also have an excellent generalization ability with the optimal parameters. Based on the combination of three sensory systems, our study provides a multi-sensory and comprehensive origin traceability of B. edulis mushrooms. PMID:29342969
Modeling Verdict Outcomes Using Social Network Measures: The Watergate and Caviar Network Cases.

PubMed

Masías, Víctor Hugo; Valle, Mauricio; Morselli, Carlo; Crespo, Fernando; Vargas, Augusto; Laengle, Sigifredo

2016-01-01

Modelling criminal trial verdict outcomes using social network measures is an emerging research area in quantitative criminology. Few studies have yet analyzed which of these measures are the most important for verdict modelling or which data classification techniques perform best for this application. To compare the performance of different techniques in classifying members of a criminal network, this article applies three different machine learning classifiers-Logistic Regression, Naïve Bayes and Random Forest-with a range of social network measures and the necessary databases to model the verdicts in two real-world cases: the U.S. Watergate Conspiracy of the 1970's and the now-defunct Canada-based international drug trafficking ring known as the Caviar Network. In both cases it was found that the Random Forest classifier did better than either Logistic Regression or Naïve Bayes, and its superior performance was statistically significant. This being so, Random Forest was used not only for classification but also to assess the importance of the measures. For the Watergate case, the most important one proved to be betweenness centrality while for the Caviar Network, it was the effective size of the network. These results are significant because they show that an approach combining machine learning with social network analysis not only can generate accurate classification models but also helps quantify the importance social network variables in modelling verdict outcomes. We conclude our analysis with a discussion and some suggestions for future work in verdict modelling using social network measures.
Lava Morphology Classification of a Fast-Spreading Ridge Using Deep-Towed Sonar Data: East Pacific Rise

NASA Astrophysics Data System (ADS)

Meyer, J.; White, S.

2005-05-01

Classification of lava morphology on a regional scale contributes to the understanding of the distribution and extent of lava flows at a mid-ocean ridge. Seafloor classification is essential to understand the regional undersea environment at midocean ridges. In this study, the development of a classification scheme is found to identify and extract textural patterns of different lava morphologies along the East Pacific Rise using DSL-120 side-scan and ARGO camera imagery. Application of an accurate image classification technique to side-scan sonar allows us to expand upon the locally available visual ground reference data to make the first comprehensive regional maps of small-scale lava morphology present at a mid-ocean ridge. The submarine lava morphologies focused upon in this study; sheet flows, lobate flows, and pillow flows; have unique textures. Several algorithms were applied to the sonar backscatter intensity images to produce multiple textural image layers useful in distinguishing the different lava morphologies. The intensity and spatially enhanced images were then combined and applied to a hybrid classification technique. The hybrid classification involves two integrated classifiers, a rule-based expert system classifier and a machine learning classifier. The complementary capabilities of the two integrated classifiers provided a higher accuracy of regional seafloor classification compared to using either classifier alone. Once trained, the hybrid classifier can then be applied to classify neighboring images with relative ease. This classification technique has been used to map the lava morphology distribution and infer spatial variability of lava effusion rates along two segments of the East Pacific Rise, 17 deg S and 9 deg N. Future use of this technique may also be useful for attaining temporal information. Repeated documentation of morphology classification in this dynamic environment can be compared to detect regional seafloor change.
a Semi-Empirical Topographic Correction Model for Multi-Source Satellite Images

NASA Astrophysics Data System (ADS)

Xiao, Sa; Tian, Xinpeng; Liu, Qiang; Wen, Jianguang; Ma, Yushuang; Song, Zhenwei

2018-04-01

Topographic correction of surface reflectance in rugged terrain areas is the prerequisite for the quantitative application of remote sensing in mountainous areas. Physics-based radiative transfer model can be applied to correct the topographic effect and accurately retrieve the reflectance of the slope surface from high quality satellite image such as Landsat8 OLI. However, as more and more images data available from various of sensors, some times we can not get the accurate sensor calibration parameters and atmosphere conditions which are needed in the physics-based topographic correction model. This paper proposed a semi-empirical atmosphere and topographic corrction model for muti-source satellite images without accurate calibration parameters.Based on this model we can get the topographic corrected surface reflectance from DN data, and we tested and verified this model with image data from Chinese satellite HJ and GF. The result shows that the correlation factor was reduced almost 85 % for near infrared bands and the classification overall accuracy of classification increased 14 % after correction for HJ. The reflectance difference of slope face the sun and face away the sun have reduced after correction.
Deep convolutional neural network training enrichment using multi-view object-based analysis of Unmanned Aerial systems imagery for wetlands classification

NASA Astrophysics Data System (ADS)

Liu, Tao; Abd-Elrahman, Amr

2018-05-01

Deep convolutional neural network (DCNN) requires massive training datasets to trigger its image classification power, while collecting training samples for remote sensing application is usually an expensive process. When DCNN is simply implemented with traditional object-based image analysis (OBIA) for classification of Unmanned Aerial systems (UAS) orthoimage, its power may be undermined if the number training samples is relatively small. This research aims to develop a novel OBIA classification approach that can take advantage of DCNN by enriching the training dataset automatically using multi-view data. Specifically, this study introduces a Multi-View Object-based classification using Deep convolutional neural network (MODe) method to process UAS images for land cover classification. MODe conducts the classification on multi-view UAS images instead of directly on the orthoimage, and gets the final results via a voting procedure. 10-fold cross validation results show the mean overall classification accuracy increasing substantially from 65.32%, when DCNN was applied on the orthoimage to 82.08% achieved when MODe was implemented. This study also compared the performances of the support vector machine (SVM) and random forest (RF) classifiers with DCNN under traditional OBIA and the proposed multi-view OBIA frameworks. The results indicate that the advantage of DCNN over traditional classifiers in terms of accuracy is more obvious when these classifiers were applied with the proposed multi-view OBIA framework than when these classifiers were applied within the traditional OBIA framework.
High Accuracy Human Activity Recognition Based on Sparse Locality Preserving Projections.

PubMed

Zhu, Xiangbin; Qiu, Huiling

2016-01-01

Human activity recognition(HAR) from the temporal streams of sensory data has been applied to many fields, such as healthcare services, intelligent environments and cyber security. However, the classification accuracy of most existed methods is not enough in some applications, especially for healthcare services. In order to improving accuracy, it is necessary to develop a novel method which will take full account of the intrinsic sequential characteristics for time-series sensory data. Moreover, each human activity may has correlated feature relationship at different levels. Therefore, in this paper, we propose a three-stage continuous hidden Markov model (TSCHMM) approach to recognize human activities. The proposed method contains coarse, fine and accurate classification. The feature reduction is an important step in classification processing. In this paper, sparse locality preserving projections (SpLPP) is exploited to determine the optimal feature subsets for accurate classification of the stationary-activity data. It can extract more discriminative activities features from the sensor data compared with locality preserving projections. Furthermore, all of the gyro-based features are used for accurate classification of the moving data. Compared with other methods, our method uses significantly less number of features, and the over-all accuracy has been obviously improved.
High Accuracy Human Activity Recognition Based on Sparse Locality Preserving Projections

PubMed Central

2016-01-01

Human activity recognition(HAR) from the temporal streams of sensory data has been applied to many fields, such as healthcare services, intelligent environments and cyber security. However, the classification accuracy of most existed methods is not enough in some applications, especially for healthcare services. In order to improving accuracy, it is necessary to develop a novel method which will take full account of the intrinsic sequential characteristics for time-series sensory data. Moreover, each human activity may has correlated feature relationship at different levels. Therefore, in this paper, we propose a three-stage continuous hidden Markov model (TSCHMM) approach to recognize human activities. The proposed method contains coarse, fine and accurate classification. The feature reduction is an important step in classification processing. In this paper, sparse locality preserving projections (SpLPP) is exploited to determine the optimal feature subsets for accurate classification of the stationary-activity data. It can extract more discriminative activities features from the sensor data compared with locality preserving projections. Furthermore, all of the gyro-based features are used for accurate classification of the moving data. Compared with other methods, our method uses significantly less number of features, and the over-all accuracy has been obviously improved. PMID:27893761
Generation of 2D Land Cover Maps for Urban Areas Using Decision Tree Classification

NASA Astrophysics Data System (ADS)

Höhle, J.

2014-09-01

A 2D land cover map can automatically and efficiently be generated from high-resolution multispectral aerial images. First, a digital surface model is produced and each cell of the elevation model is then supplemented with attributes. A decision tree classification is applied to extract map objects like buildings, roads, grassland, trees, hedges, and walls from such an "intelligent" point cloud. The decision tree is derived from training areas which borders are digitized on top of a false-colour orthoimage. The produced 2D land cover map with six classes is then subsequently refined by using image analysis techniques. The proposed methodology is described step by step. The classification, assessment, and refinement is carried out by the open source software "R"; the generation of the dense and accurate digital surface model by the "Match-T DSM" program of the Trimble Company. A practical example of a 2D land cover map generation is carried out. Images of a multispectral medium-format aerial camera covering an urban area in Switzerland are used. The assessment of the produced land cover map is based on class-wise stratified sampling where reference values of samples are determined by means of stereo-observations of false-colour stereopairs. The stratified statistical assessment of the produced land cover map with six classes and based on 91 points per class reveals a high thematic accuracy for classes "building" (99 %, 95 % CI: 95 %-100 %) and "road and parking lot" (90 %, 95 % CI: 83 %-95 %). Some other accuracy measures (overall accuracy, kappa value) and their 95 % confidence intervals are derived as well. The proposed methodology has a high potential for automation and fast processing and may be applied to other scenes and sensors.
Operational Tree Species Mapping in a Diverse Tropical Forest with Airborne Imaging Spectroscopy.

PubMed

Baldeck, Claire A; Asner, Gregory P; Martin, Robin E; Anderson, Christopher B; Knapp, David E; Kellner, James R; Wright, S Joseph

2015-01-01

Remote identification and mapping of canopy tree species can contribute valuable information towards our understanding of ecosystem biodiversity and function over large spatial scales. However, the extreme challenges posed by highly diverse, closed-canopy tropical forests have prevented automated remote species mapping of non-flowering tree crowns in these ecosystems. We set out to identify individuals of three focal canopy tree species amongst a diverse background of tree and liana species on Barro Colorado Island, Panama, using airborne imaging spectroscopy data. First, we compared two leading single-class classification methods--binary support vector machine (SVM) and biased SVM--for their performance in identifying pixels of a single focal species. From this comparison we determined that biased SVM was more precise and created a multi-species classification model by combining the three biased SVM models. This model was applied to the imagery to identify pixels belonging to the three focal species and the prediction results were then processed to create a map of focal species crown objects. Crown-level cross-validation of the training data indicated that the multi-species classification model had pixel-level producer's accuracies of 94-97% for the three focal species, and field validation of the predicted crown objects indicated that these had user's accuracies of 94-100%. Our results demonstrate the ability of high spatial and spectral resolution remote sensing to accurately detect non-flowering crowns of focal species within a diverse tropical forest. We attribute the success of our model to recent classification and mapping techniques adapted to species detection in diverse closed-canopy forests, which can pave the way for remote species mapping in a wider variety of ecosystems.
Operational Tree Species Mapping in a Diverse Tropical Forest with Airborne Imaging Spectroscopy

PubMed Central

Baldeck, Claire A.; Asner, Gregory P.; Martin, Robin E.; Anderson, Christopher B.; Knapp, David E.; Kellner, James R.; Wright, S. Joseph

2015-01-01

Remote identification and mapping of canopy tree species can contribute valuable information towards our understanding of ecosystem biodiversity and function over large spatial scales. However, the extreme challenges posed by highly diverse, closed-canopy tropical forests have prevented automated remote species mapping of non-flowering tree crowns in these ecosystems. We set out to identify individuals of three focal canopy tree species amongst a diverse background of tree and liana species on Barro Colorado Island, Panama, using airborne imaging spectroscopy data. First, we compared two leading single-class classification methods—binary support vector machine (SVM) and biased SVM—for their performance in identifying pixels of a single focal species. From this comparison we determined that biased SVM was more precise and created a multi-species classification model by combining the three biased SVM models. This model was applied to the imagery to identify pixels belonging to the three focal species and the prediction results were then processed to create a map of focal species crown objects. Crown-level cross-validation of the training data indicated that the multi-species classification model had pixel-level producer’s accuracies of 94–97% for the three focal species, and field validation of the predicted crown objects indicated that these had user’s accuracies of 94–100%. Our results demonstrate the ability of high spatial and spectral resolution remote sensing to accurately detect non-flowering crowns of focal species within a diverse tropical forest. We attribute the success of our model to recent classification and mapping techniques adapted to species detection in diverse closed-canopy forests, which can pave the way for remote species mapping in a wider variety of ecosystems. PMID:26153693
Development of a brain MRI-based hidden Markov model for dementia recognition.

PubMed

Chen, Ying; Pham, Tuan D

2013-01-01

Dementia is an age-related cognitive decline which is indicated by an early degeneration of cortical and sub-cortical structures. Characterizing those morphological changes can help to understand the disease development and contribute to disease early prediction and prevention. But modeling that can best capture brain structural variability and can be valid in both disease classification and interpretation is extremely challenging. The current study aimed to establish a computational approach for modeling the magnetic resonance imaging (MRI)-based structural complexity of the brain using the framework of hidden Markov models (HMMs) for dementia recognition. Regularity dimension and semi-variogram were used to extract structural features of the brains, and vector quantization method was applied to convert extracted feature vectors to prototype vectors. The output VQ indices were then utilized to estimate parameters for HMMs. To validate its accuracy and robustness, experiments were carried out on individuals who were characterized as non-demented and mild Alzheimer's diseased. Four HMMs were constructed based on the cohort of non-demented young, middle-aged, elder and demented elder subjects separately. Classification was carried out using a data set including both non-demented and demented individuals with a wide age range. The proposed HMMs have succeeded in recognition of individual who has mild Alzheimer's disease and achieved a better classification accuracy compared to other related works using different classifiers. Results have shown the ability of the proposed modeling for recognition of early dementia. The findings from this research will allow individual classification to support the early diagnosis and prediction of dementia. By using the brain MRI-based HMMs developed in our proposed research, it will be more efficient, robust and can be easily used by clinicians as a computer-aid tool for validating imaging bio-markers for early prediction of dementia.
Biased visualization of hypoperfused tissue by computed tomography due to short imaging duration: improved classification by image down-sampling and vascular models.

PubMed

Mikkelsen, Irene Klærke; Jones, P Simon; Ribe, Lars Riisgaard; Alawneh, Josef; Puig, Josep; Bekke, Susanne Lise; Tietze, Anna; Gillard, Jonathan H; Warburton, Elisabeth A; Pedraza, Salva; Baron, Jean-Claude; Østergaard, Leif; Mouridsen, Kim

2015-07-01

Lesion detection in acute stroke by computed-tomography perfusion (CTP) can be affected by incomplete bolus coverage in veins and hypoperfused tissue, so-called bolus truncation (BT), and low contrast-to-noise ratio (CNR). We examined the BT-frequency and hypothesized that image down-sampling and a vascular model (VM) for perfusion calculation would improve normo- and hypoperfused tissue classification. CTP datasets from 40 acute stroke patients were retrospectively analysed for BT. In 16 patients with hypoperfused tissue but no BT, repeated 2-by-2 image down-sampling and uniform filtering was performed, comparing CNR to perfusion-MRI levels and tissue classification to that of unprocessed data. By simulating reduced scan duration, the minimum scan-duration at which estimated lesion volumes came within 10% of their true volume was compared for VM and state-of-the-art algorithms. BT in veins and hypoperfused tissue was observed in 9/40 (22.5%) and 17/40 patients (42.5%), respectively. Down-sampling to 128 × 128 resolution yielded CNR comparable to MR data and improved tissue classification (p = 0.0069). VM reduced minimum scan duration, providing reliable maps of cerebral blood flow and mean transit time: 5 s (p = 0.03) and 7 s (p < 0.0001), respectively). BT is not uncommon in stroke CTP with 40-s scan duration. Applying image down-sampling and VM improve tissue classification. • Too-short imaging duration is common in clinical acute stroke CTP imaging. • The consequence is impaired identification of hypoperfused tissue in acute stroke patients. • The vascular model is less sensitive than current algorithms to imaging duration. • Noise reduction by image down-sampling improves identification of hypoperfused tissue by CTP.
Application of classification tree and logistic regression for the management and health intervention plans in a community-based study.

PubMed

Teng, Ju-Hsi; Lin, Kuan-Chia; Ho, Bin-Shenq

2007-10-01

A community-based aboriginal study was conducted and analysed to explore the application of classification tree and logistic regression. A total of 1066 aboriginal residents in Yilan County were screened during 2003-2004. The independent variables include demographic characteristics, physical examinations, geographic location, health behaviours, dietary habits and family hereditary diseases history. Risk factors of cardiovascular diseases were selected as the dependent variables in further analysis. The completion rate for heath interview is 88.9%. The classification tree results find that if body mass index is higher than 25.72 kg m(-2) and the age is above 51 years, the predicted probability for number of cardiovascular risk factors > or =3 is 73.6% and the population is 322. If body mass index is higher than 26.35 kg m(-2) and geographical latitude of the village is lower than 24 degrees 22.8', the predicted probability for number of cardiovascular risk factors > or =4 is 60.8% and the population is 74. As the logistic regression results indicate that body mass index, drinking habit and menopause are the top three significant independent variables. The classification tree model specifically shows the discrimination paths and interactions between the risk groups. The logistic regression model presents and analyses the statistical independent factors of cardiovascular risks. Applying both models to specific situations will provide a different angle for the design and management of future health intervention plans after community-based study.

Authentication of bee pollen grains in bright-field microscopy by combining one-class classification techniques and image processing.

PubMed

Chica, Manuel

2012-11-01

A novel method for authenticating pollen grains in bright-field microscopic images is presented in this work. The usage of this new method is clear in many application fields such as bee-keeping sector, where laboratory experts need to identify fraudulent bee pollen samples against local known pollen types. Our system is based on image processing and one-class classification to reject unknown pollen grain objects. The latter classification technique allows us to tackle the major difficulty of the problem, the existence of many possible fraudulent pollen types, and the impossibility of modeling all of them. Different one-class classification paradigms are compared to study the most suitable technique for solving the problem. In addition, feature selection algorithms are applied to reduce the complexity and increase the accuracy of the models. For each local pollen type, a one-class classifier is trained and aggregated into a multiclassifier model. This multiclassification scheme combines the output of all the one-class classifiers in a unique final response. The proposed method is validated by authenticating pollen grains belonging to different Spanish bee pollen types. The overall accuracy of the system on classifying fraudulent microscopic pollen grain objects is 92.3%. The system is able to rapidly reject pollen grains, which belong to nonlocal pollen types, reducing the laboratory work and effort. The number of possible applications of this authentication method in the microscopy research field is unlimited. Copyright © 2012 Wiley Periodicals, Inc.
Environmental Monitoring Networks Optimization Using Advanced Active Learning Algorithms

NASA Astrophysics Data System (ADS)

Kanevski, Mikhail; Volpi, Michele; Copa, Loris

2010-05-01

The problem of environmental monitoring networks optimization (MNO) belongs to one of the basic and fundamental tasks in spatio-temporal data collection, analysis, and modeling. There are several approaches to this problem, which can be considered as a design or redesign of monitoring network by applying some optimization criteria. The most developed and widespread methods are based on geostatistics (family of kriging models, conditional stochastic simulations). In geostatistics the variance is mainly used as an optimization criterion which has some advantages and drawbacks. In the present research we study an application of advanced techniques following from the statistical learning theory (SLT) - support vector machines (SVM) and the optimization of monitoring networks when dealing with a classification problem (data are discrete values/classes: hydrogeological units, soil types, pollution decision levels, etc.) is considered. SVM is a universal nonlinear modeling tool for classification problems in high dimensional spaces. The SVM solution is maximizing the decision boundary between classes and has a good generalization property for noisy data. The sparse solution of SVM is based on support vectors - data which contribute to the solution with nonzero weights. Fundamentally the MNO for classification problems can be considered as a task of selecting new measurement points which increase the quality of spatial classification and reduce the testing error (error on new independent measurements). In SLT this is a typical problem of active learning - a selection of the new unlabelled points which efficiently reduce the testing error. A classical approach (margin sampling) to active learning is to sample the points closest to the classification boundary. This solution is suboptimal when points (or generally the dataset) are redundant for the same class. In the present research we propose and study two new advanced methods of active learning adapted to the solution of MNO problem: 1) hierarchical top-down clustering in an input space in order to remove redundancy when data are clustered, and 2) a general method (independent on classifier) which gives posterior probabilities that can be used to define the classifier confidence and corresponding proposals for new measurement points. The basic ideas and procedures are explained by applying simulated data sets. The real case study deals with the analysis and mapping of soil types, which is a multi-class classification problem. Maps of soil types are important for the analysis and 3D modeling of heavy metals migration in soil and prediction risk mapping. The results obtained demonstrate the high quality of SVM mapping and efficiency of monitoring network optimization by using active learning approaches. The research was partly supported by SNSF projects No. 200021-126505 and 200020-121835.
Predicting Chemically Induced Duodenal Ulcer and Adrenal Necrosis with Classification Trees

NASA Astrophysics Data System (ADS)

Giampaolo, Casimiro; Gray, Andrew T.; Olshen, Richard A.; Szabo, Sandor

1991-07-01

Binary tree-structured statistical classification algorithms and properties of 56 model alkyl nucleophiles were brought to bear on two problems of experimental pharmacology and toxicology. Each rat of a learning sample of 745 was administered one compound and autopsied to determine the presence of duodenal ulcer or adrenal hemorrhagic necrosis. The cited statistical classification schemes were then applied to these outcomes and 67 features of the compounds to ascertain those characteristics that are associated with biologic activity. For predicting duodenal ulceration, dipole moment, melting point, and solubility in octanol are particularly important, while for predicting adrenal necrosis, important features include the number of sulfhydryl groups and double bonds. These methods may constitute inexpensive but powerful ways to screen untested compounds for possible organ-specific toxicity. Mechanisms for the etiology and pathogenesis of the duodenal and adrenal lesions are suggested, as are additional avenues for drug design.
Research on cardiovascular disease prediction based on distance metric learning

NASA Astrophysics Data System (ADS)

Ni, Zhuang; Liu, Kui; Kang, Guixia

2018-04-01

Distance metric learning algorithm has been widely applied to medical diagnosis and exhibited its strengths in classification problems. The k-nearest neighbour (KNN) is an efficient method which treats each feature equally. The large margin nearest neighbour classification (LMNN) improves the accuracy of KNN by learning a global distance metric, which did not consider the locality of data distributions. In this paper, we propose a new distance metric algorithm adopting cosine metric and LMNN named COS-SUBLMNN which takes more care about local feature of data to overcome the shortage of LMNN and improve the classification accuracy. The proposed methodology is verified on CVDs patient vector derived from real-world medical data. The Experimental results show that our method provides higher accuracy than KNN and LMNN did, which demonstrates the effectiveness of the Risk predictive model of CVDs based on COS-SUBLMNN.
A Temporal Pattern Mining Approach for Classifying Electronic Health Record Data

PubMed Central

Batal, Iyad; Valizadegan, Hamed; Cooper, Gregory F.; Hauskrecht, Milos

2013-01-01

We study the problem of learning classification models from complex multivariate temporal data encountered in electronic health record systems. The challenge is to define a good set of features that are able to represent well the temporal aspect of the data. Our method relies on temporal abstractions and temporal pattern mining to extract the classification features. Temporal pattern mining usually returns a large number of temporal patterns, most of which may be irrelevant to the classification task. To address this problem, we present the Minimal Predictive Temporal Patterns framework to generate a small set of predictive and non-spurious patterns. We apply our approach to the real-world clinical task of predicting patients who are at risk of developing heparin induced thrombocytopenia. The results demonstrate the benefit of our approach in efficiently learning accurate classifiers, which is a key step for developing intelligent clinical monitoring systems. PMID:25309815
Applying FastSLAM to Articulated Rovers

NASA Astrophysics Data System (ADS)

Hewitt, Robert Alexander

This thesis presents the navigation algorithms designed for use on Kapvik, a 30 kg planetary micro-rover built for the Canadian Space Agency; the simulations used to test the algorithm; and novel techniques for terrain classification using Kapvik's LIDAR (Light Detection And Ranging) sensor. Kapvik implements a six-wheeled, skid-steered, rocker-bogie mobility system. This warrants a more complicated kinematic model for navigation than a typical 4-wheel differential drive system. The design of a 3D navigation algorithm is presented that includes nonlinear Kalman filtering and Simultaneous Localization and Mapping (SLAM). A neural network for terrain classification is used to improve navigation performance. Simulation is used to train the neural network and validate the navigation algorithms. Real world tests of the terrain classification algorithm validate the use of simulation for training and the improvement to SLAM through the reduction of extraneous LIDAR measurements in each scan.
Handwritten digits recognition based on immune network

NASA Astrophysics Data System (ADS)

Li, Yangyang; Wu, Yunhui; Jiao, Lc; Wu, Jianshe

2011-11-01

With the development of society, handwritten digits recognition technique has been widely applied to production and daily life. It is a very difficult task to solve these problems in the field of pattern recognition. In this paper, a new method is presented for handwritten digit recognition. The digit samples firstly are processed and features extraction. Based on these features, a novel immune network classification algorithm is designed and implemented to the handwritten digits recognition. The proposed algorithm is developed by Jerne's immune network model for feature selection and KNN method for classification. Its characteristic is the novel network with parallel commutating and learning. The performance of the proposed method is experimented to the handwritten number datasets MNIST and compared with some other recognition algorithms-KNN, ANN and SVM algorithm. The result shows that the novel classification algorithm based on immune network gives promising performance and stable behavior for handwritten digits recognition.
Integrated Change Detection and Classification in Urban Areas Based on Airborne Laser Scanning Point Clouds.

PubMed

Tran, Thi Huong Giang; Ressl, Camillo; Pfeifer, Norbert

2018-02-03

This paper suggests a new approach for change detection (CD) in 3D point clouds. It combines classification and CD in one step using machine learning. The point cloud data of both epochs are merged for computing features of four types: features describing the point distribution, a feature relating to relative terrain elevation, features specific for the multi-target capability of laser scanning, and features combining the point clouds of both epochs to identify the change. All these features are merged in the points and then training samples are acquired to create the model for supervised classification, which is then applied to the whole study area. The final results reach an overall accuracy of over 90% for both epochs of eight classes: lost tree, new tree, lost building, new building, changed ground, unchanged building, unchanged tree, and unchanged ground.
Satellite image analysis using neural networks

NASA Technical Reports Server (NTRS)

Sheldon, Roger A.

1990-01-01

The tremendous backlog of unanalyzed satellite data necessitates the development of improved methods for data cataloging and analysis. Ford Aerospace has developed an image analysis system, SIANN (Satellite Image Analysis using Neural Networks) that integrates the technologies necessary to satisfy NASA's science data analysis requirements for the next generation of satellites. SIANN will enable scientists to train a neural network to recognize image data containing scenes of interest and then rapidly search data archives for all such images. The approach combines conventional image processing technology with recent advances in neural networks to provide improved classification capabilities. SIANN allows users to proceed through a four step process of image classification: filtering and enhancement, creation of neural network training data via application of feature extraction algorithms, configuring and training a neural network model, and classification of images by application of the trained neural network. A prototype experimentation testbed was completed and applied to climatological data.
Gas Chromatography Data Classification Based on Complex Coefficients of an Autoregressive Model

DOE PAGES

Zhao, Weixiang; Morgan, Joshua T.; Davis, Cristina E.

2008-01-01

This paper introduces autoregressive (AR) modeling as a novel method to classify outputs from gas chromatography (GC). The inverse Fourier transformation was applied to the original sensor data, and then an AR model was applied to transform data to generate AR model complex coefficients. This series of coefficients effectively contains a compressed version of all of the information in the original GC signal output. We applied this method to chromatograms resulting from proliferating bacteria species grown in culture. Three types of neural networks were used to classify the AR coefficients: backward propagating neural network (BPNN), radial basis function-principal component analysismore » (RBF-PCA) approach, and radial basis function-partial least squares regression (RBF-PLSR) approach. This exploratory study demonstrates the feasibility of using complex root coefficient patterns to distinguish various classes of experimental data, such as those from the different bacteria species. This cognition approach also proved to be robust and potentially useful for freeing us from time alignment of GC signals.« less
Center for Neural Engineering: applications of pulse-coupled neural networks

NASA Astrophysics Data System (ADS)

Malkani, Mohan; Bodruzzaman, Mohammad; Johnson, John L.; Davis, Joel

1999-03-01

Pulsed-Coupled Neural Network (PCNN) is an oscillatory model neural network where grouping of cells and grouping among the groups that form the output time series (number of cells that fires in each input presentation also called `icon'). This is based on the synchronicity of oscillations. Recent work by Johnson and others demonstrated the functional capabilities of networks containing such elements for invariant feature extraction using intensity maps. PCNN thus presents itself as a more biologically plausible model with solid functional potential. This paper will present the summary of several projects and their results where we successfully applied PCNN. In project one, the PCNN was applied for object recognition and classification through a robotic vision system. The features (icons) generated by the PCNN were then fed into a feedforward neural network for classification. In project two, we developed techniques for sensory data fusion. The PCNN algorithm was implemented and tested on a B14 mobile robot. The PCNN-based features were extracted from the images taken from the robot vision system and used in conjunction with the map generated by data fusion of the sonar and wheel encoder data for the navigation of the mobile robot. In our third project, we applied the PCNN for speaker recognition. The spectrogram image of speech signals are fed into the PCNN to produce invariant feature icons which are then fed into a feedforward neural network for speaker identification.
Classification

NASA Astrophysics Data System (ADS)

Oza, Nikunj

2012-03-01

A supervised learning task involves constructing a mapping from input data (normally described by several features) to the appropriate outputs. A set of training examples— examples with known output values—is used by a learning algorithm to generate a model. This model is intended to approximate the mapping between the inputs and outputs. This model can be used to generate predicted outputs for inputs that have not been seen before. Within supervised learning, one type of task is a classification learning task, in which each output is one or more classes to which the input belongs. For example, we may have data consisting of observations of sunspots. In a classification learning task, our goal may be to learn to classify sunspots into one of several types. Each example may correspond to one candidate sunspot with various measurements or just an image. A learning algorithm would use the supplied examples to generate a model that approximates the mapping between each supplied set of measurements and the type of sunspot. This model can then be used to classify previously unseen sunspots based on the candidate’s measurements. The generalization performance of a learned model (how closely the target outputs and the model’s predicted outputs agree for patterns that have not been presented to the learning algorithm) would provide an indication of how well the model has learned the desired mapping. More formally, a classification learning algorithm L takes a training set T as its input. The training set consists of |T| examples or instances. It is assumed that there is a probability distribution D from which all training examples are drawn independently—that is, all the training examples are independently and identically distributed (i.i.d.). The ith training example is of the form (x_i, y_i), where x_i is a vector of values of several features and y_i represents the class to be predicted.* In the sunspot classification example given above, each training example would represent one sunspot’s classification (y_i) and the corresponding set of measurements (x_i). The output of a supervised learning algorithm is a model h that approximates the unknown mapping from the inputs to the outputs. In our example, h would map from the sunspot measurements to the type of sunspot. We may have a test set S—a set of examples not used in training that we use to test how well the model h predicts the outputs on new examples. Just as with the examples in T, the examples in S are assumed to be independent and identically distributed (i.i.d.) draws from the distribution D. We measure the error of h on the test set as the proportion of test cases that h misclassifies: 1/|S| Sigma(x,y union S)[I(h(x)!= y)] where I(v) is the indicator function—it returns 1 if v is true and 0 otherwise. In our sunspot classification example, we would identify additional examples of sunspots that were not used in generating the model, and use these to determine how accurate the model is—the fraction of the test samples that the model classifies correctly. An example of a classification model is the decision tree shown in Figure 23.1. We will discuss the decision tree learning algorithm in more detail later—for now, we assume that, given a training set with examples of sunspots, this decision tree is derived. This can be used to classify previously unseen examples of sunpots. For example, if a new sunspot’s inputs indicate that its "Group Length" is in the range 10-15, then the decision tree would classify the sunspot as being of type “E,” whereas if the "Group Length" is "NULL," the "Magnetic Type" is "bipolar," and the "Penumbra" is "rudimentary," then it would be classified as type "C." In this chapter, we will add to the above description of classification problems. We will discuss decision trees and several other classification models. In particular, we will discuss the learning algorithms that generate these classification models, how to use them to classify new examples, and the strengths and weaknesses of these models. We will end with pointers to further reading on classification methods applied to astronomy data.
A multi-characteristic based algorithm for classifying vegetation in a plateau area: Qinghai Lake watershed, northwestern China

NASA Astrophysics Data System (ADS)

Ma, Weiwei; Gong, Cailan; Hu, Yong; Li, Long; Meng, Peng

2015-10-01

Remote sensing technology has been broadly recognized for its convenience and efficiency in mapping vegetation, particularly in high-altitude and inaccessible areas where there are lack of in-situ observations. In this study, Landsat Thematic Mapper (TM) images and Chinese environmental mitigation satellite CCD sensor (HJ-1 CCD) images, both of which are at 30m spatial resolution were employed for identifying and monitoring of vegetation types in a area of Western China——Qinghai Lake Watershed(QHLW). A decision classification tree (DCT) algorithm using multi-characteristic including seasonal TM/HJ-1 CCD time series data combined with digital elevation models (DEMs) dataset, and a supervised maximum likelihood classification (MLC) algorithm with single-data TM image were applied vegetation classification. Accuracy of the two algorithms was assessed using field observation data. Based on produced vegetation classification maps, it was found that the DCT using multi-season data and geomorphologic parameters was superior to the MLC algorithm using single-data image, improving the overall accuracy by 11.86% at second class level and significantly reducing the "salt and pepper" noise. The DCT algorithm applied to TM /HJ-1 CCD time series data geomorphologic parameters appeared as a valuable and reliable tool for monitoring vegetation at first class level (5 vegetation classes) and second class level(8 vegetation subclasses). The DCT algorithm using multi-characteristic might provide a theoretical basis and general approach to automatic extraction of vegetation types from remote sensing imagery over plateau areas.
Visual Recognition Software for Binary Classification and Its Application to Spruce Pollen Identification

PubMed Central

Tcheng, David K.; Nayak, Ashwin K.; Fowlkes, Charless C.; Punyasena, Surangi W.

2016-01-01

Discriminating between black and white spruce (Picea mariana and Picea glauca) is a difficult palynological classification problem that, if solved, would provide valuable data for paleoclimate reconstructions. We developed an open-source visual recognition software (ARLO, Automated Recognition with Layered Optimization) capable of differentiating between these two species at an accuracy on par with human experts. The system applies pattern recognition and machine learning to the analysis of pollen images and discovers general-purpose image features, defined by simple features of lines and grids of pixels taken at different dimensions, size, spacing, and resolution. It adapts to a given problem by searching for the most effective combination of both feature representation and learning strategy. This results in a powerful and flexible framework for image classification. We worked with images acquired using an automated slide scanner. We first applied a hash-based “pollen spotting” model to segment pollen grains from the slide background. We next tested ARLO’s ability to reconstruct black to white spruce pollen ratios using artificially constructed slides of known ratios. We then developed a more scalable hash-based method of image analysis that was able to distinguish between the pollen of black and white spruce with an estimated accuracy of 83.61%, comparable to human expert performance. Our results demonstrate the capability of machine learning systems to automate challenging taxonomic classifications in pollen analysis, and our success with simple image representations suggests that our approach is generalizable to many other object recognition problems. PMID:26867017
Landsat 8 Multispectral and Pansharpened Imagery Processing on the Study of Civil Engineering Issues

NASA Astrophysics Data System (ADS)

Lazaridou, M. A.; Karagianni, A. Ch.

2016-06-01

Scientific and professional interests of civil engineering mainly include structures, hydraulics, geotechnical engineering, environment, and transportation issues. Topics included in the context of the above may concern urban environment issues, urban planning, hydrological modelling, study of hazards and road construction. Land cover information contributes significantly on the study of the above subjects. Land cover information can be acquired effectively by visual image interpretation of satellite imagery or after applying enhancement routines and also by imagery classification. The Landsat Data Continuity Mission (LDCM - Landsat 8) is the latest satellite in Landsat series, launched in February 2013. Landsat 8 medium spatial resolution multispectral imagery presents particular interest in extracting land cover, because of the fine spectral resolution, the radiometric quantization of 12bits, the capability of merging the high resolution panchromatic band of 15 meters with multispectral imagery of 30 meters as well as the policy of free data. In this paper, Landsat 8 multispectral and panchromatic imageries are being used, concerning surroundings of a lake in north-western Greece. Land cover information is extracted, using suitable digital image processing software. The rich spectral context of the multispectral image is combined with the high spatial resolution of the panchromatic image, applying image fusion - pansharpening, facilitating in this way visual image interpretation to delineate land cover. Further processing concerns supervised image classification. The classification of pansharpened image preceded multispectral image classification. Corresponding comparative considerations are also presented.
A New Tool for Climatic Analysis Using the Koppen Climate Classification

ERIC Educational Resources Information Center

Larson, Paul R.; Lohrengel, C. Frederick, II

2011-01-01

The purpose of climate classification is to help make order of the seemingly endless spatial distribution of climates. The Koppen classification system in a modified format is the most widely applied system in use today. This system may not be the best nor most complete climate classification that can be conceived, but it has gained widespread…
Land cover change of watersheds in Southern Guam from 1973 to 2001.

PubMed

Wen, Yuming; Khosrowpanah, Shahram; Heitz, Leroy

2011-08-01

Land cover change can be caused by human-induced activities and natural forces. Land cover change in watershed level has been a main concern for a long time in the world since watersheds play an important role in our life and environment. This paper is focused on how to apply Landsat Multi-Spectral Scanner (MSS) satellite image of 1973 and Landsat Thematic Mapper (TM) satellite image of 2001 to determine the land cover changes of coastal watersheds from 1973 to 2001. GIS and remote sensing are integrated to derive land cover information from Landsat satellite images of 1973 and 2001. The land cover classification is based on supervised classification method in remote sensing software ERDAS IMAGINE. Historical GIS data is used to replace the areas covered by clouds or shadows in the image of 1973 to improve classification accuracy. Then, temporal land cover is utilized to determine land cover change of coastal watersheds in southern Guam. The overall classification accuracies for Landsat MSS image of 1973 and Landsat TM image of 2001 are 82.74% and 90.42%, respectively. The overall classification of Landsat MSS image is particularly satisfactory considering its coarse spatial resolution and relatively bad data quality because of lots of clouds and shadows in the image. Watershed land cover change in southern Guam is affected greatly by anthropogenic activities. However, natural forces also affect land cover in space and time. Land cover information and change in watersheds can be applied for watershed management and planning, and environmental modeling and assessment. Based on spatio-temporal land cover information, the interaction behavior between human and environment may be evaluated. The findings in this research will be useful to similar research in other tropical islands.
Detection of Aspens Using High Resolution Aerial Laser Scanning Data and Digital Aerial Images

PubMed Central

Säynäjoki, Raita; Packalén, Petteri; Maltamo, Matti; Vehmas, Mikko; Eerikäinen, Kalle

2008-01-01

The aim was to use high resolution Aerial Laser Scanning (ALS) data and aerial images to detect European aspen (Populus tremula L.) from among other deciduous trees. The field data consisted of 14 sample plots of 30 m × 30 m size located in the Koli National Park in the North Karelia, Eastern Finland. A Canopy Height Model (CHM) was interpolated from the ALS data with a pulse density of 3.86/m2, low-pass filtered using Height-Based Filtering (HBF) and binarized to create the mask needed to separate the ground pixels from the canopy pixels within individual areas. Watershed segmentation was applied to the low-pass filtered CHM in order to create preliminary canopy segments, from which the non-canopy elements were extracted to obtain the final canopy segmentation, i.e. the ground mask was analysed against the canopy mask. A manual classification of aerial images was employed to separate the canopy segments of deciduous trees from those of coniferous trees. Finally, linear discriminant analysis was applied to the correctly classified canopy segments of deciduous trees to classify them into segments belonging to aspen and those belonging to other deciduous trees. The independent variables used in the classification were obtained from the first pulse ALS point data. The accuracy of discrimination between aspen and other deciduous trees was 78.6%. The independent variables in the classification function were the proportion of vegetation hits, the standard deviation of in pulse heights, accumulated intensity at the 90th percentile and the proportion of laser points reflected at the 60th height percentile. The accuracy of classification corresponded to the validation results of earlier ALS-based studies on the classification of individual deciduous trees to tree species. PMID:27873799
A system of vegetation classification applied to Hawaii

Treesearch

Michael G. Buck; Timothy E. Paysen

1984-01-01

A classification system for use in describing vegetation has been developed for Hawaii. Physiognomic and taxonomic criteria are used for a hierarchical stratification of vegetation in which the system categories are Formation, Subformation, Series, Association, and Phase. The System applies to local resource management activities and serves as a framework for resource...
Using an Ecological Land Hierarchy to Predict Seasonal-Wetland Abundance in Upland Forests

Treesearch

Brian J. Palik; Richard Buech; Leanne Egeland

2003-01-01

Hierarchy theory, when applied to landscapes, predicts that broader-scale ecosystems constrain the development of finer-scale, nested ecosystems. This prediction finds application in hierarchical land classifications. Such classifications typically apply to physiognomically similar ecosystems, or ecological land units, e.g., a set of multi-scale forest ecosystems. We...

Human Engineering Principles Applied to a Laboratory Development Model: A Demonstration.

DTIC Science & Technology

1979-05-22

Nov 6 SIs OSOLIKTE NCAS KlE SMH 0102.LP.0144601NCASIFE $1ECURITY CLASSIFICATION OF THIS PAGE fmi Deas FWG S3JS CLASSIAWWFI o fmm e 20. ABSTRACT...of a cathode -ray-tube (CRT) display device and an eight-button function key control device. These are shown in Figure 8. Design of the keyboard for
Odds Ratio, Delta, ETS Classification, and Standardization Measures of DIF Magnitude for Binary Logistic Regression

ERIC Educational Resources Information Center

Monahan, Patrick O.; McHorney, Colleen A.; Stump, Timothy E.; Perkins, Anthony J.

2007-01-01

Previous methodological and applied studies that used binary logistic regression (LR) for detection of differential item functioning (DIF) in dichotomously scored items either did not report an effect size or did not employ several useful measures of DIF magnitude derived from the LR model. Equations are provided for these effect size indices.…
Differential gene expression detection and sample classification using penalized linear regression models.

PubMed

Wu, Baolin

2006-02-15

Differential gene expression detection and sample classification using microarray data have received much research interest recently. Owing to the large number of genes p and small number of samples n (p > n), microarray data analysis poses big challenges for statistical analysis. An obvious problem owing to the 'large p small n' is over-fitting. Just by chance, we are likely to find some non-differentially expressed genes that can classify the samples very well. The idea of shrinkage is to regularize the model parameters to reduce the effects of noise and produce reliable inferences. Shrinkage has been successfully applied in the microarray data analysis. The SAM statistics proposed by Tusher et al. and the 'nearest shrunken centroid' proposed by Tibshirani et al. are ad hoc shrinkage methods. Both methods are simple, intuitive and prove to be useful in empirical studies. Recently Wu proposed the penalized t/F-statistics with shrinkage by formally using the (1) penalized linear regression models for two-class microarray data, showing good performance. In this paper we systematically discussed the use of penalized regression models for analyzing microarray data. We generalize the two-class penalized t/F-statistics proposed by Wu to multi-class microarray data. We formally derive the ad hoc shrunken centroid used by Tibshirani et al. using the (1) penalized regression models. And we show that the penalized linear regression models provide a rigorous and unified statistical framework for sample classification and differential gene expression detection.
Development Of Polarimetric Decomposition Techniques For Indian Forest Resource Assessment Using Radar Imaging Satellite (Risat-1) Images

NASA Astrophysics Data System (ADS)

Sridhar, J.

2015-12-01

The focus of this work is to examine polarimetric decomposition techniques primarily focussed on Pauli decomposition and Sphere Di-Plane Helix (SDH) decomposition for forest resource assessment. The data processing methods adopted are Pre-processing (Geometric correction and Radiometric calibration), Speckle Reduction, Image Decomposition and Image Classification. Initially to classify forest regions, unsupervised classification was applied to determine different unknown classes. It was observed K-means clustering method gave better results in comparison with ISO Data method.Using the algorithm developed for Radar Tools, the code for decomposition and classification techniques were applied in Interactive Data Language (IDL) and was applied to RISAT-1 image of Mysore-Mandya region of Karnataka, India. This region is chosen for studying forest vegetation and consists of agricultural lands, water and hilly regions. Polarimetric SAR data possess a high potential for classification of earth surface.After applying the decomposition techniques, classification was done by selecting region of interests andpost-classification the over-all accuracy was observed to be higher in the SDH decomposed image, as it operates on individual pixels on a coherent basis and utilises the complete intrinsic coherent nature of polarimetric SAR data. Thereby, making SDH decomposition particularly suited for analysis of high-resolution SAR data. The Pauli Decomposition represents all the polarimetric information in a single SAR image however interpretation of the resulting image is difficult. The SDH decomposition technique seems to produce better results and interpretation as compared to Pauli Decomposition however more quantification and further analysis are being done in this area of research. The comparison of Polarimetric decomposition techniques and evolutionary classification techniques will be the scope of this work.
A deep learning-based multi-model ensemble method for cancer prediction.

PubMed

Xiao, Yawen; Wu, Jun; Lin, Zongli; Zhao, Xiaodong

2018-01-01

Cancer is a complex worldwide health problem associated with high mortality. With the rapid development of the high-throughput sequencing technology and the application of various machine learning methods that have emerged in recent years, progress in cancer prediction has been increasingly made based on gene expression, providing insight into effective and accurate treatment decision making. Thus, developing machine learning methods, which can successfully distinguish cancer patients from healthy persons, is of great current interest. However, among the classification methods applied to cancer prediction so far, no one method outperforms all the others. In this paper, we demonstrate a new strategy, which applies deep learning to an ensemble approach that incorporates multiple different machine learning models. We supply informative gene data selected by differential gene expression analysis to five different classification models. Then, a deep learning method is employed to ensemble the outputs of the five classifiers. The proposed deep learning-based multi-model ensemble method was tested on three public RNA-seq data sets of three kinds of cancers, Lung Adenocarcinoma, Stomach Adenocarcinoma and Breast Invasive Carcinoma. The test results indicate that it increases the prediction accuracy of cancer for all the tested RNA-seq data sets as compared to using a single classifier or the majority voting algorithm. By taking full advantage of different classifiers, the proposed deep learning-based multi-model ensemble method is shown to be accurate and effective for cancer prediction. Copyright © 2017 Elsevier B.V. All rights reserved.
Advanced soft computing diagnosis method for tumour grading.

PubMed

Papageorgiou, E I; Spyridonos, P P; Stylios, C D; Ravazoula, P; Groumpos, P P; Nikiforidis, G N

2006-01-01

To develop an advanced diagnostic method for urinary bladder tumour grading. A novel soft computing modelling methodology based on the augmentation of fuzzy cognitive maps (FCMs) with the unsupervised active Hebbian learning (AHL) algorithm is applied. One hundred and twenty-eight cases of urinary bladder cancer were retrieved from the archives of the Department of Histopathology, University Hospital of Patras, Greece. All tumours had been characterized according to the classical World Health Organization (WHO) grading system. To design the FCM model for tumour grading, three experts histopathologists defined the main histopathological features (concepts) and their impact on grade characterization. The resulted FCM model consisted of nine concepts. Eight concepts represented the main histopathological features for tumour grading. The ninth concept represented the tumour grade. To increase the classification ability of the FCM model, the AHL algorithm was applied to adjust the weights of the FCM. The proposed FCM grading model achieved a classification accuracy of 72.5%, 74.42% and 95.55% for tumours of grades I, II and III, respectively. An advanced computerized method to support tumour grade diagnosis decision was proposed and developed. The novelty of the method is based on employing the soft computing method of FCMs to represent specialized knowledge on histopathology and on augmenting FCMs ability using an unsupervised learning algorithm, the AHL. The proposed method performs with reasonably high accuracy compared to other existing methods and at the same time meets the physicians' requirements for transparency and explicability.
Lightness computation by the human visual system

NASA Astrophysics Data System (ADS)

Rudd, Michael E.

2017-05-01

A model of achromatic color computation by the human visual system is presented, which is shown to account in an exact quantitative way for a large body of appearance matching data collected with simple visual displays. The model equations are closely related to those of the original Retinex model of Land and McCann. However, the present model differs in important ways from Land and McCann's theory in that it invokes additional biological and perceptual mechanisms, including contrast gain control, different inherent neural gains for incremental, and decremental luminance steps, and two types of top-down influence on the perceptual weights applied to local luminance steps in the display: edge classification and spatial integration attentional windowing. Arguments are presented to support the claim that these various visual processes must be instantiated by a particular underlying neural architecture. By pointing to correspondences between the architecture of the model and findings from visual neurophysiology, this paper suggests that edge classification involves a top-down gating of neural edge responses in early visual cortex (cortical areas V1 and/or V2) while spatial integration windowing occurs in cortical area V4 or beyond.
Village Building Identification Based on Ensemble Convolutional Neural Networks

PubMed Central

Guo, Zhiling; Chen, Qi; Xu, Yongwei; Shibasaki, Ryosuke; Shao, Xiaowei

2017-01-01

In this study, we present the Ensemble Convolutional Neural Network (ECNN), an elaborate CNN frame formulated based on ensembling state-of-the-art CNN models, to identify village buildings from open high-resolution remote sensing (HRRS) images. First, to optimize and mine the capability of CNN for village mapping and to ensure compatibility with our classification targets, a few state-of-the-art models were carefully optimized and enhanced based on a series of rigorous analyses and evaluations. Second, rather than directly implementing building identification by using these models, we exploited most of their advantages by ensembling their feature extractor parts into a stronger model called ECNN based on the multiscale feature learning method. Finally, the generated ECNN was applied to a pixel-level classification frame to implement object identification. The proposed method can serve as a viable tool for village building identification with high accuracy and efficiency. The experimental results obtained from the test area in Savannakhet province, Laos, prove that the proposed ECNN model significantly outperforms existing methods, improving overall accuracy from 96.64% to 99.26%, and kappa from 0.57 to 0.86. PMID:29084154
Progressive intervention strategy for the gait of sub-acute stroke patient using the International Classification of Functioning, Disability, and Health tool.

PubMed

Kang, Tae-Woo; Cynn, Heon-Seock

2017-01-01

The International Classification of Functioning, Disability, and Health (ICF) provides models for functions and disabilities. The ICF is presented as a frame that enables organizing physical therapists' clinical practice for application. The purpose of the present study was to describe processes through which stroke patients are assessed and treated based on the ICF model. The patient was a 65-year-old female diagnosed with right cerebral artery infarction with left hemiparesis. Progressive interventions were applied, such as those aiming at sitting and standing for the first two weeks, gait intervention for the third and fourth weeks, and those aiming at sitting from a standing position for the fifth and sixth weeks. The ICF model provides rehabilitation experts with a frame that enables them to accurately identify and understand their patients' problems. The ICF model helps the experts understand not only their patients' body structure, function, activity, and participation, but also their problems related to personal and environmental factors. The experts could efficiently make decisions and provide optimum treatment at clinics using the ICF model.
76 FR 76896 - International Anti-Fouling System Certificate

Federal Register 2010, 2011, 2012, 2013, 2014

2011-12-09

...-fouling System (IAFS) Certificate to the list of certificates a recognized classification society may..., 2001. This final rule will enable recognized classification societies to apply to the Coast Guard for... the Coast Guard to authorize recognized classification societies to issue IAFS Certificates...
Object-Based Classification as an Alternative Approach to the Traditional Pixel-Based Classification to Identify Potential Habitat of the Grasshopper Sparrow

NASA Astrophysics Data System (ADS)

Jobin, Benoît; Labrecque, Sandra; Grenier, Marcelle; Falardeau, Gilles

2008-01-01

The traditional method of identifying wildlife habitat distribution over large regions consists of pixel-based classification of satellite images into a suite of habitat classes used to select suitable habitat patches. Object-based classification is a new method that can achieve the same objective based on the segmentation of spectral bands of the image creating homogeneous polygons with regard to spatial or spectral characteristics. The segmentation algorithm does not solely rely on the single pixel value, but also on shape, texture, and pixel spatial continuity. The object-based classification is a knowledge base process where an interpretation key is developed using ground control points and objects are assigned to specific classes according to threshold values of determined spectral and/or spatial attributes. We developed a model using the eCognition software to identify suitable habitats for the Grasshopper Sparrow, a rare and declining species found in southwestern Québec. The model was developed in a region with known breeding sites and applied on other images covering adjacent regions where potential breeding habitats may be present. We were successful in locating potential habitats in areas where dairy farming prevailed but failed in an adjacent region covered by a distinct Landsat scene and dominated by annual crops. We discuss the added value of this method, such as the possibility to use the contextual information associated to objects and the ability to eliminate unsuitable areas in the segmentation and land cover classification processes, as well as technical and logistical constraints. A series of recommendations on the use of this method and on conservation issues of Grasshopper Sparrow habitat is also provided.
Bayesian classification theory

NASA Technical Reports Server (NTRS)

Hanson, Robin; Stutz, John; Cheeseman, Peter

1991-01-01

The task of inferring a set of classes and class descriptions most likely to explain a given data set can be placed on a firm theoretical foundation using Bayesian statistics. Within this framework and using various mathematical and algorithmic approximations, the AutoClass system searches for the most probable classifications, automatically choosing the number of classes and complexity of class descriptions. A simpler version of AutoClass has been applied to many large real data sets, has discovered new independently-verified phenomena, and has been released as a robust software package. Recent extensions allow attributes to be selectively correlated within particular classes, and allow classes to inherit or share model parameters though a class hierarchy. We summarize the mathematical foundations of AutoClass.
Prioritizing CD4 Count Monitoring in Response to ART in Resource-Constrained Settings: A Retrospective Application of Prediction-Based Classification

PubMed Central

Liu, Yan; Li, Xiaohong; Johnson, Margaret; Smith, Collette; Kamarulzaman, Adeeba bte; Montaner, Julio; Mounzer, Karam; Saag, Michael; Cahn, Pedro; Cesar, Carina; Krolewiecki, Alejandro; Sanne, Ian; Montaner, Luis J.

2012-01-01

Background Global programs of anti-HIV treatment depend on sustained laboratory capacity to assess treatment initiation thresholds and treatment response over time. Currently, there is no valid alternative to CD4 count testing for monitoring immunologic responses to treatment, but laboratory cost and capacity limit access to CD4 testing in resource-constrained settings. Thus, methods to prioritize patients for CD4 count testing could improve treatment monitoring by optimizing resource allocation. Methods and Findings Using a prospective cohort of HIV-infected patients (n = 1,956) monitored upon antiretroviral therapy initiation in seven clinical sites with distinct geographical and socio-economic settings, we retrospectively apply a novel prediction-based classification (PBC) modeling method. The model uses repeatedly measured biomarkers (white blood cell count and lymphocyte percent) to predict CD4+ T cell outcome through first-stage modeling and subsequent classification based on clinically relevant thresholds (CD4+ T cell count of 200 or 350 cells/µl). The algorithm correctly classified 90% (cross-validation estimate = 91.5%, standard deviation [SD] = 4.5%) of CD4 count measurements <200 cells/µl in the first year of follow-up; if laboratory testing is applied only to patients predicted to be below the 200-cells/µl threshold, we estimate a potential savings of 54.3% (SD = 4.2%) in CD4 testing capacity. A capacity savings of 34% (SD = 3.9%) is predicted using a CD4 threshold of 350 cells/µl. Similar results were obtained over the 3 y of follow-up available (n = 619). Limitations include a need for future economic healthcare outcome analysis, a need for assessment of extensibility beyond the 3-y observation time, and the need to assign a false positive threshold. Conclusions Our results support the use of PBC modeling as a triage point at the laboratory, lessening the need for laboratory-based CD4+ T cell count testing; implementation of this tool could help optimize the use of laboratory resources, directing CD4 testing towards higher-risk patients. However, further prospective studies and economic analyses are needed to demonstrate that the PBC model can be effectively applied in clinical settings. Please see later in the article for the Editors' Summary PMID:22529752
Bayesian learning for spatial filtering in an EEG-based brain-computer interface.

PubMed

Zhang, Haihong; Yang, Huijuan; Guan, Cuntai

2013-07-01

Spatial filtering for EEG feature extraction and classification is an important tool in brain-computer interface. However, there is generally no established theory that links spatial filtering directly to Bayes classification error. To address this issue, this paper proposes and studies a Bayesian analysis theory for spatial filtering in relation to Bayes error. Following the maximum entropy principle, we introduce a gamma probability model for describing single-trial EEG power features. We then formulate and analyze the theoretical relationship between Bayes classification error and the so-called Rayleigh quotient, which is a function of spatial filters and basically measures the ratio in power features between two classes. This paper also reports our extensive study that examines the theory and its use in classification, using three publicly available EEG data sets and state-of-the-art spatial filtering techniques and various classifiers. Specifically, we validate the positive relationship between Bayes error and Rayleigh quotient in real EEG power features. Finally, we demonstrate that the Bayes error can be practically reduced by applying a new spatial filter with lower Rayleigh quotient.
A Partial Least Squares Based Procedure for Upstream Sequence Classification in Prokaryotes.

PubMed

Mehmood, Tahir; Bohlin, Jon; Snipen, Lars

2015-01-01

The upstream region of coding genes is important for several reasons, for instance locating transcription factor, binding sites, and start site initiation in genomic DNA. Motivated by a recently conducted study, where multivariate approach was successfully applied to coding sequence modeling, we have introduced a partial least squares (PLS) based procedure for the classification of true upstream prokaryotic sequence from background upstream sequence. The upstream sequences of conserved coding genes over genomes were considered in analysis, where conserved coding genes were found by using pan-genomics concept for each considered prokaryotic species. PLS uses position specific scoring matrix (PSSM) to study the characteristics of upstream region. Results obtained by PLS based method were compared with Gini importance of random forest (RF) and support vector machine (SVM), which is much used method for sequence classification. The upstream sequence classification performance was evaluated by using cross validation, and suggested approach identifies prokaryotic upstream region significantly better to RF (p-value < 0.01) and SVM (p-value < 0.01). Further, the proposed method also produced results that concurred with known biological characteristics of the upstream region.
Exploring objective climate classification for the Himalayan arc and adjacent regions using gridded data sources

NASA Astrophysics Data System (ADS)

Forsythe, N.; Blenkinsop, S.; Fowler, H. J.

2015-05-01

A three-step climate classification was applied to a spatial domain covering the Himalayan arc and adjacent plains regions using input data from four global meteorological reanalyses. Input variables were selected based on an understanding of the climatic drivers of regional water resource variability and crop yields. Principal component analysis (PCA) of those variables and k-means clustering on the PCA outputs revealed a reanalysis ensemble consensus for eight macro-climate zones. Spatial statistics of input variables for each zone revealed consistent, distinct climatologies. This climate classification approach has potential for enhancing assessment of climatic influences on water resources and food security as well as for characterising the skill and bias of gridded data sets, both meteorological reanalyses and climate models, for reproducing subregional climatologies. Through their spatial descriptors (area, geographic centroid, elevation mean range), climate classifications also provide metrics, beyond simple changes in individual variables, with which to assess the magnitude of projected climate change. Such sophisticated metrics are of particular interest for regions, including mountainous areas, where natural and anthropogenic systems are expected to be sensitive to incremental climate shifts.
Real alerts and artifact classification in archived multi-signal vital sign monitoring data: implications for mining big data.

PubMed

Hravnak, Marilyn; Chen, Lujie; Dubrawski, Artur; Bose, Eliezer; Clermont, Gilles; Pinsky, Michael R

2016-12-01

Huge hospital information system databases can be mined for knowledge discovery and decision support, but artifact in stored non-invasive vital sign (VS) high-frequency data streams limits its use. We used machine-learning (ML) algorithms trained on expert-labeled VS data streams to automatically classify VS alerts as real or artifact, thereby "cleaning" such data for future modeling. 634 admissions to a step-down unit had recorded continuous noninvasive VS monitoring data [heart rate (HR), respiratory rate (RR), peripheral arterial oxygen saturation (SpO 2 ) at 1/20 Hz, and noninvasive oscillometric blood pressure (BP)]. Time data were across stability thresholds defined VS event epochs. Data were divided Block 1 as the ML training/cross-validation set and Block 2 the test set. Expert clinicians annotated Block 1 events as perceived real or artifact. After feature extraction, ML algorithms were trained to create and validate models automatically classifying events as real or artifact. The models were then tested on Block 2. Block 1 yielded 812 VS events, with 214 (26 %) judged by experts as artifact (RR 43 %, SpO 2 40 %, BP 15 %, HR 2 %). ML algorithms applied to the Block 1 training/cross-validation set (tenfold cross-validation) gave area under the curve (AUC) scores of 0.97 RR, 0.91 BP and 0.76 SpO 2 . Performance when applied to Block 2 test data was AUC 0.94 RR, 0.84 BP and 0.72 SpO 2 . ML-defined algorithms applied to archived multi-signal continuous VS monitoring data allowed accurate automated classification of VS alerts as real or artifact, and could support data mining for future model building.
Real Alerts and Artifact Classification in Archived Multi-signal Vital Sign Monitoring Data—Implications for Mining Big Data — Implications for Mining Big Data

PubMed Central

Hravnak, Marilyn; Chen, Lujie; Dubrawski, Artur; Bose, Eliezer; Clermont, Gilles; Pinsky, Michael R.

2015-01-01

PURPOSE Huge hospital information system databases can be mined for knowledge discovery and decision support, but artifact in stored non-invasive vital sign (VS) high-frequency data streams limits its use. We used machine-learning (ML) algorithms trained on expert-labeled VS data streams to automatically classify VS alerts as real or artifact, thereby “cleaning” such data for future modeling. METHODS 634 admissions to a step-down unit had recorded continuous noninvasive VS monitoring data (heart rate [HR], respiratory rate [RR], peripheral arterial oxygen saturation [SpO2] at 1/20Hz., and noninvasive oscillometric blood pressure [BP]) Time data were across stability thresholds defined VS event epochs. Data were divided Block 1 as the ML training/cross-validation set and Block 2 the test set. Expert clinicians annotated Block 1 events as perceived real or artifact. After feature extraction, ML algorithms were trained to create and validate models automatically classifying events as real or artifact. The models were then tested on Block 2. RESULTS Block 1 yielded 812 VS events, with 214 (26%) judged by experts as artifact (RR 43%, SpO2 40%, BP 15%, HR 2%). ML algorithms applied to the Block 1 training/cross-validation set (10-fold cross-validation) gave area under the curve (AUC) scores of 0.97 RR, 0.91 BP and 0.76 SpO2. Performance when applied to Block 2 test data was AUC 0.94 RR, 0.84 BP and 0.72 SpO2). CONCLUSIONS ML-defined algorithms applied to archived multi-signal continuous VS monitoring data allowed accurate automated classification of VS alerts as real or artifact, and could support data mining for future model building. PMID:26438655
Raman spectroscopy detection of platelet for Alzheimer’s disease with predictive probabilities

NASA Astrophysics Data System (ADS)

Wang, L. J.; Du, X. Q.; Du, Z. W.; Yang, Y. Y.; Chen, P.; Tian, Q.; Shang, X. L.; Liu, Z. C.; Yao, X. Q.; Wang, J. Z.; Wang, X. H.; Cheng, Y.; Peng, J.; Shen, A. G.; Hu, J. M.

2014-08-01

Alzheimer’s disease (AD) is a common form of dementia. Early and differential diagnosis of AD has always been an arduous task for the medical expert due to the unapparent early symptoms and the currently imperfect imaging examination methods. Therefore, obtaining reliable markers with clinical diagnostic value in easily assembled samples is worthy and significant. Our previous work with laser Raman spectroscopy (LRS), in which we detected platelet samples of different ages of AD transgenic mice and non-transgenic controls, showed great effect in the diagnosis of AD. In addition, a multilayer perception network (MLP) classification method was adopted to discriminate the spectral data. However, there were disturbances, which were induced by noise from the machines and so on, in the data set; thus the MLP method had to be trained with large-scale data. In this paper, we aim to re-establish the classification models of early and advanced AD and the control group with fewer features, and apply some mechanism of noise reduction to improve the accuracy of models. An adaptive classification method based on the Gaussian process (GP) featured, with predictive probabilities, is proposed, which could tell when a data set is related to some kind of disease. Compared with MLP on the same feature set, GP showed much better performance in the experimental results. What is more, since the spectra of platelets are isolated from AD, GP has good expansibility and can be applied in diagnosis of many other similar diseases, such as Parkinson’s disease (PD). Spectral data of 4 month and 12 month AD platelets, as well as control data, were collected. With predictive probabilities, the proposed GP classification method improved the diagnostic sensitivity to nearly 100%. Samples were also collected from PD platelets as classification and comparison to the 12 month AD. The presented approach and our experiments indicate that utilization of GP with predictive probabilities in platelet LRS detection analysis turns out to be more accurate for early and differential diagnosis of AD and has a wide application prospect.
Verification of the Accountability Method as a Means to Classify Radioactive Wastes Processed Using THOR Fluidized Bed Steam Reforming at the Studsvik Processing Facility in Erwin, Tennessee, USA - 13087

DOE Office of Scientific and Technical Information (OSTI.GOV)

Olander, Jonathan; Myers, Corey

2013-07-01

Studsviks' Processing Facility Erwin (SPFE) has been treating Low-Level Radioactive Waste using its patented THOR process for over 13 years. Studsvik has been mixing and processing wastes of the same waste classification but different chemical and isotopic characteristics for the full extent of this period as a general matter of operations. Studsvik utilizes the accountability method to track the movement of radionuclides from acceptance of waste, through processing, and finally in the classification of waste for disposal. Recently the NRC has proposed to revise the 1995 Branch Technical Position on Concentration Averaging and Encapsulation (1995 BTP on CA) with additionalmore » clarification (draft BTP on CA). The draft BTP on CA has paved the way for large scale blending of higher activity and lower activity waste to produce a single waste for the purpose of classification. With the onset of blending in the waste treatment industry, there is concern from the public and state regulators as to the robustness of the accountability method and the ability of processors to prevent the inclusion of hot spots in waste. To address these concerns and verify the accountability method as applied by the SPFE, as well as the SPFE's ability to control waste package classification, testing of actual waste packages was performed. Testing consisted of a comprehensive dose rate survey of a container of processed waste. Separately, the waste package was modeled chemically and radiologically. Comparing the observed and theoretical data demonstrated that actual dose rates were lower than, but consistent with, modeled dose rates. Moreover, the distribution of radioactivity confirms that the SPFE can produce a radiologically homogeneous waste form. The results of the study demonstrate: 1) the accountability method as applied by the SPFE is valid and produces expected results; 2) the SPFE can produce a radiologically homogeneous waste; and 3) the SPFE can effectively control the waste package classification. (authors)« less

Alternative ways of representing Zapotec and Cuicatec folk classification of birds: a multidimensional model and its implications for culturally-informed conservation in Oaxaca, México.

PubMed

Alcántara-Salinas, Graciela; Ellen, Roy F; Valiñas-Coalla, Leopoldo; Caballero, Javier; Argueta-Villamar, Arturo

2013-12-09

We report on a comparative ethno-ornithological study of Zapotec and Cuicatec communities in Northern Oaxaca, Mexico that provided a challenge to some existing descriptions of folk classification. Our default model was the taxonomic system of ranks developed by Brent Berlin. Fieldwork was conducted in the Zapotec village of San Miguel Tiltepec and in the Cuicatec village of San Juan Teponaxtla, using a combination of ethnographic interviews and pile-sorting tests. Post-fieldwork, Principal Component Analysis using NTSYSpc V. 2.11f was applied to obtain pattern variation for the answers from different participants. Using language and pile-sorting data analysed through Principal Component Analysis, we show how both Zapotec and Cuicatec subjects place a particular emphasis on an intermediate level of classification.These categories group birds with non-birds using ecological and behavioral criteria, and violate a strict distinction between symbolic and mundane (or ‘natural’), and between ‘general-purpose’ and ‘single-purpose’ schemes. We suggest that shared classificatory knowledge embodying everyday schemes for apprehending the world of birds might be better reflected in a multidimensional model that would also provide a more realistic basis for developing culturally-informed conservation strategies.
Study design requirements for RNA sequencing-based breast cancer diagnostics.

PubMed

Mer, Arvind Singh; Klevebring, Daniel; Grönberg, Henrik; Rantalainen, Mattias

2016-02-01

Sequencing-based molecular characterization of tumors provides information required for individualized cancer treatment. There are well-defined molecular subtypes of breast cancer that provide improved prognostication compared to routine biomarkers. However, molecular subtyping is not yet implemented in routine breast cancer care. Clinical translation is dependent on subtype prediction models providing high sensitivity and specificity. In this study we evaluate sample size and RNA-sequencing read requirements for breast cancer subtyping to facilitate rational design of translational studies. We applied subsampling to ascertain the effect of training sample size and the number of RNA sequencing reads on classification accuracy of molecular subtype and routine biomarker prediction models (unsupervised and supervised). Subtype classification accuracy improved with increasing sample size up to N = 750 (accuracy = 0.93), although with a modest improvement beyond N = 350 (accuracy = 0.92). Prediction of routine biomarkers achieved accuracy of 0.94 (ER) and 0.92 (Her2) at N = 200. Subtype classification improved with RNA-sequencing library size up to 5 million reads. Development of molecular subtyping models for cancer diagnostics requires well-designed studies. Sample size and the number of RNA sequencing reads directly influence accuracy of molecular subtyping. Results in this study provide key information for rational design of translational studies aiming to bring sequencing-based diagnostics to the clinic.
Classification of Multiple Seizure-Like States in Three Different Rodent Models of Epileptogenesis.

PubMed

Guirgis, Mirna; Serletis, Demitre; Zhang, Jane; Florez, Carlos; Dian, Joshua A; Carlen, Peter L; Bardakjian, Berj L

2014-01-01

Epilepsy is a dynamical disease and its effects are evident in over fifty million people worldwide. This study focused on objective classification of the multiple states involved in the brain's epileptiform activity. Four datasets from three different rodent hippocampal preparations were explored, wherein seizure-like-events (SLE) were induced by the perfusion of a low - Mg(2+) /high-K(+) solution or 4-Aminopyridine. Local field potentials were recorded from CA3 pyramidal neurons and interneurons and modeled as Markov processes. Specifically, hidden Markov models (HMM) were used to determine the nature of the states present. Properties of the Hilbert transform were used to construct the feature spaces for HMM training. By sequentially applying the HMM training algorithm, multiple states were identified both in episodes of SLE and nonSLE activity. Specifically, preSLE and postSLE states were differentiated and multiple inner SLE states were identified. This was accomplished using features extracted from the lower frequencies (1-4 Hz, 4-8 Hz) alongside those of both the low- (40-100 Hz) and high-gamma (100-200 Hz) of the recorded electrical activity. The learning paradigm of this HMM-based system eliminates the inherent bias associated with other learning algorithms that depend on predetermined state segmentation and renders it an appropriate candidate for SLE classification.
Alternative ways of representing Zapotec and Cuicatec folk classification of birds: a multidimensional model and its implications for culturally-informed conservation in Oaxaca, México

PubMed Central

2013-01-01

Background We report on a comparative ethno-ornithological study of Zapotec and Cuicatec communities in Northern Oaxaca, Mexico that provided a challenge to some existing descriptions of folk classification. Our default model was the taxonomic system of ranks developed by Brent Berlin. Methods Fieldwork was conducted in the Zapotec village of San Miguel Tiltepec and in the Cuicatec village of San Juan Teponaxtla, using a combination of ethnographic interviews and pile-sorting tests. Post-fieldwork, Principal Component Analysis using NTSYSpc V. 2.11f was applied to obtain pattern variation for the answers from different participants. Results and conclusion Using language and pile-sorting data analysed through Principal Component Analysis, we show how both Zapotec and Cuicatec subjects place a particular emphasis on an intermediate level of classification. These categories group birds with non-birds using ecological and behavioral criteria, and violate a strict distinction between symbolic and mundane (or ‘natural’), and between ‘general-purpose’ and ‘single-purpose’ schemes. We suggest that shared classificatory knowledge embodying everyday schemes for apprehending the world of birds might be better reflected in a multidimensional model that would also provide a more realistic basis for developing culturally-informed conservation strategies. PMID:24321280
Impact of atmospheric correction and image filtering on hyperspectral classification of tree species using support vector machine

NASA Astrophysics Data System (ADS)

Shahriari Nia, Morteza; Wang, Daisy Zhe; Bohlman, Stephanie Ann; Gader, Paul; Graves, Sarah J.; Petrovic, Milenko

2015-01-01

Hyperspectral images can be used to identify savannah tree species at the landscape scale, which is a key step in measuring biomass and carbon, and tracking changes in species distributions, including invasive species, in these ecosystems. Before automated species mapping can be performed, image processing and atmospheric correction is often performed, which can potentially affect the performance of classification algorithms. We determine how three processing and correction techniques (atmospheric correction, Gaussian filters, and shade/green vegetation filters) affect the prediction accuracy of classification of tree species at pixel level from airborne visible/infrared imaging spectrometer imagery of longleaf pine savanna in Central Florida, United States. Species classification using fast line-of-sight atmospheric analysis of spectral hypercubes (FLAASH) atmospheric correction outperformed ATCOR in the majority of cases. Green vegetation (normalized difference vegetation index) and shade (near-infrared) filters did not increase classification accuracy when applied to large and continuous patches of specific species. Finally, applying a Gaussian filter reduces interband noise and increases species classification accuracy. Using the optimal preprocessing steps, our classification accuracy of six species classes is about 75%.
Weather patterns as a downscaling tool - evaluating their skill in stratifying local climate variables

NASA Astrophysics Data System (ADS)

Murawski, Aline; Bürger, Gerd; Vorogushyn, Sergiy; Merz, Bruno

2016-04-01

The use of a weather pattern based approach for downscaling of coarse, gridded atmospheric data, as usually obtained from the output of general circulation models (GCM), allows for investigating the impact of anthropogenic greenhouse gas emissions on fluxes and state variables of the hydrological cycle such as e.g. on runoff in large river catchments. Here we aim at attributing changes in high flows in the Rhine catchment to anthropogenic climate change. Therefore we run an objective classification scheme (simulated annealing and diversified randomisation - SANDRA, available from the cost733 classification software) on ERA20C reanalyses data and apply the established classification to GCMs from the CMIP5 project. After deriving weather pattern time series from GCM runs using forcing from all greenhouse gases (All-Hist) and using natural greenhouse gas forcing only (Nat-Hist), a weather generator will be employed to obtain climate data time series for the hydrological model. The parameters of the weather pattern classification (i.e. spatial extent, number of patterns, classification variables) need to be selected in a way that allows for good stratification of the meteorological variables that are of interest for the hydrological modelling. We evaluate the skill of the classification in stratifying meteorological data using a multi-variable approach. This allows for estimating the stratification skill for all meteorological variables together, not separately as usually done in existing similar work. The advantage of the multi-variable approach is to properly account for situations where e.g. two patterns are associated with similar mean daily temperature, but one pattern is dry while the other one is related to considerable amounts of precipitation. Thus, the separation of these two patterns would not be justified when considering temperature only, but is perfectly reasonable when accounting for precipitation as well. Besides that, the weather patterns derived from reanalyses data should be well represented in the All-Hist GCM runs in terms of e.g. frequency, seasonality, and persistence. In this contribution we show how to select the most appropriate weather pattern classification and how the classes derived from it are reflected in the GCMs.
Model-based approach to the detection and classification of mines in sidescan sonar.

PubMed

Reed, Scott; Petillot, Yvan; Bell, Judith

2004-01-10

This paper presents a model-based approach to mine detection and classification by use of sidescan sonar. Advances in autonomous underwater vehicle technology have increased the interest in automatic target recognition systems in an effort to automate a process that is currently carried out by a human operator. Current automated systems generally require training and thus produce poor results when the test data set is different from the training set. This has led to research into unsupervised systems, which are able to cope with the large variability in conditions and terrains seen in sidescan imagery. The system presented in this paper first detects possible minelike objects using a Markov random field model, which operates well on noisy images, such as sidescan, and allows a priori information to be included through the use of priors. The highlight and shadow regions of the object are then extracted with a cooperating statistical snake, which assumes these regions are statistically separate from the background. Finally, a classification decision is made using Dempster-Shafer theory, where the extracted features are compared with synthetic realizations generated with a sidescan sonar simulator model. Results for the entire process are shown on real sidescan sonar data. Similarities between the sidescan sonar and synthetic aperture radar (SAR) imaging processes ensure that the approach outlined here could be made applied to SAR image analysis.
Noninvasive Classification of Hepatic Fibrosis Based on Texture Parameters From Double Contrast-Enhanced Magnetic Resonance Images

PubMed Central

Bahl, Gautam; Cruite, Irene; Wolfson, Tanya; Gamst, Anthony C.; Collins, Julie M.; Chavez, Alyssa D.; Barakat, Fatma; Hassanein, Tarek; Sirlin, Claude B.

2016-01-01

Purpose To demonstrate a proof of concept that quantitative texture feature analysis of double contrast-enhanced magnetic resonance imaging (MRI) can classify fibrosis noninvasively, using histology as a reference standard. Materials and Methods A Health Insurance Portability and Accountability Act (HIPAA)-compliant Institutional Review Board (IRB)-approved retrospective study of 68 patients with diffuse liver disease was performed at a tertiary liver center. All patients underwent double contrast-enhanced MRI, with histopathology-based staging of fibrosis obtained within 12 months of imaging. The MaZda software program was used to compute 279 texture parameters for each image. A statistical regularization technique, generalized linear model (GLM)-path, was used to develop a model based on texture features for dichotomous classification of fibrosis category (F ≤2 vs. F ≥3) of the 68 patients, with histology as the reference standard. The model's performance was assessed and cross-validated. There was no additional validation performed on an independent cohort. Results Cross-validated sensitivity, specificity, and total accuracy of the texture feature model in classifying fibrosis were 91.9%, 83.9%, and 88.2%, respectively. Conclusion This study shows proof of concept that accurate, noninvasive classification of liver fibrosis is possible by applying quantitative texture analysis to double contrast-enhanced MRI. Further studies are needed in independent cohorts of subjects. PMID:22851409
OBJECTIVE METEOROLOGICAL CLASSIFICATION SCHEME DESIGNED TO ELUCIDATE OZONE'S DEPENDENCE ON METEOROLOGY

EPA Science Inventory

This paper utilizes a two-stage clustering approach as part of an objective classification scheme designed to elucidate 03's dependence on meteorology. hen applied to ten years (1981-1990) of meteorological data for Birmingham, Alabama, the classification scheme identified seven ...
46 CFR 8.240 - Application for recognition.

Code of Federal Regulations, 2010 CFR

2010-10-01

... ALTERNATIVES Recognition of a Classification Society § 8.240 Application for recognition. (a) A classification society must apply for recognition in writing to the Commandant (CG-521). (b) An application must indicate which specific authority the classification society seeks to have delegated. (c) Upon verification from...
46 CFR 8.240 - Application for recognition.

Code of Federal Regulations, 2011 CFR

2011-10-01

... ALTERNATIVES Recognition of a Classification Society § 8.240 Application for recognition. (a) A classification society must apply for recognition in writing to the Commandant (CG-521). (b) An application must indicate which specific authority the classification society seeks to have delegated. (c) Upon verification from...
Magnesium-binding architectures in RNA crystal structures: validation, binding preferences, classification and motif detection

PubMed Central

Zheng, Heping; Shabalin, Ivan G.; Handing, Katarzyna B.; Bujnicki, Janusz M.; Minor, Wladek

2015-01-01

The ubiquitous presence of magnesium ions in RNA has long been recognized as a key factor governing RNA folding, and is crucial for many diverse functions of RNA molecules. In this work, Mg2+-binding architectures in RNA were systematically studied using a database of RNA crystal structures from the Protein Data Bank (PDB). Due to the abundance of poorly modeled or incorrectly identified Mg2+ ions, the set of all sites was comprehensively validated and filtered to identify a benchmark dataset of 15 334 ‘reliable’ RNA-bound Mg2+ sites. The normalized frequencies by which specific RNA atoms coordinate Mg2+ were derived for both the inner and outer coordination spheres. A hierarchical classification system of Mg2+ sites in RNA structures was designed and applied to the benchmark dataset, yielding a set of 41 types of inner-sphere and 95 types of outer-sphere coordinating patterns. This classification system has also been applied to describe six previously reported Mg2+-binding motifs and detect them in new RNA structures. Investigation of the most populous site types resulted in the identification of seven novel Mg2+-binding motifs, and all RNA structures in the PDB were screened for the presence of these motifs. PMID:25800744
Applicability of Hydrologic Landscapes for Model Calibration ...

EPA Pesticide Factsheets

The Pacific Northwest Hydrologic Landscapes (PNW HL) at the assessment unit scale has provided a solid conceptual classification framework to relate and transfer hydrologically meaningful information between watersheds without access to streamflow time series. A collection of techniques were applied to the HL assessment unit composition in watersheds across the Pacific Northwest to aggregate the hydrologic behavior of the Hydrologic Landscapes from the assessment unit scale to the watershed scale. This non-trivial solution both emphasizes HL classifications within the watershed that provide that majority of moisture surplus/deficit and considers the relative position (upstream vs. downstream) of these HL classifications. A clustering algorithm was applied to the HL-based characterization of assessment units within 185 watersheds to help organize watersheds into nine classes hypothesized to have similar hydrologic behavior. The HL-based classes were used to organize and describe hydrologic behavior information about watershed classes and both predictions and validations were independently performed with regard to the general magnitude of six hydroclimatic signature values. A second cluster analysis was then performed using the independently calculated signature values as similarity metrics, and it was found that the six signature clusters showed substantial overlap in watershed class membership to those in the HL-based classes. One hypothesis set forward from thi
How to apply the ICF and ICF core sets for low back pain.

PubMed

Stier-Jarmer, Marita; Cieza, Alarcos; Borchers, Michael; Stucki, Gerold

2009-01-01

To introduce the International Classification of Functioning, Disability and Health (ICF) as conceptual model and classification and the ICF Core Sets as a way to specify functioning for a specific health condition such as Low Back Pain, and to illustrate the application of the ICF and ICF Core Sets in the context of clinical practice, the planning and reporting of studies and the comparison of health status measures. A decision-making and consensus process was performed to develop the ICF Core Sets for Low Back Pain, the linking procedure was applied as basis for the content comparison of health-status measures and the Rehab-Cycle was used to exemplify the application of the ICE and ICF Core Sets in clinical practice. Two different ICF Core Sets, namely, a comprehensive and a brief, are presented, three different health-status measures were linked to the ICF and compared and a case example of a patient with Low back Pain was described based on the Rehab-Cycle. The ICF is a promising new framework and classification to assess the impact of Low Back Pain. The ICF and practical tools, such as the ICF Core Sets for Low Back Pain, are useful for clinical practice, outcome and rehabilitation research, education, health statistics, and regulation.
Voltammetric Electronic Tongue and Support Vector Machines for Identification of Selected Features in Mexican Coffee

PubMed Central

Domínguez, Rocio Berenice; Moreno-Barón, Laura; Muñoz, Roberto; Gutiérrez, Juan Manuel

2014-01-01

This paper describes a new method based on a voltammetric electronic tongue (ET) for the recognition of distinctive features in coffee samples. An ET was directly applied to different samples from the main Mexican coffee regions without any pretreatment before the analysis. The resulting electrochemical information was modeled with two different mathematical tools, namely Linear Discriminant Analysis (LDA) and Support Vector Machines (SVM). Growing conditions (i.e., organic or non-organic practices and altitude of crops) were considered for a first classification. LDA results showed an average discrimination rate of 88% ± 6.53% while SVM successfully accomplished an overall accuracy of 96.4% ± 3.50% for the same task. A second classification based on geographical origin of samples was carried out. Results showed an overall accuracy of 87.5% ± 7.79% for LDA and a superior performance of 97.5% ± 3.22% for SVM. Given the complexity of coffee samples, the high accuracy percentages achieved by ET coupled with SVM in both classification problems suggested a potential applicability of ET in the assessment of selected coffee features with a simpler and faster methodology along with a null sample pretreatment. In addition, the proposed method can be applied to authentication assessment while improving cost, time and accuracy of the general procedure. PMID:25254303
Voltammetric electronic tongue and support vector machines for identification of selected features in Mexican coffee.

PubMed

Domínguez, Rocio Berenice; Moreno-Barón, Laura; Muñoz, Roberto; Gutiérrez, Juan Manuel

2014-09-24

This paper describes a new method based on a voltammetric electronic tongue (ET) for the recognition of distinctive features in coffee samples. An ET was directly applied to different samples from the main Mexican coffee regions without any pretreatment before the analysis. The resulting electrochemical information was modeled with two different mathematical tools, namely Linear Discriminant Analysis (LDA) and Support Vector Machines (SVM). Growing conditions (i.e., organic or non-organic practices and altitude of crops) were considered for a first classification. LDA results showed an average discrimination rate of 88% ± 6.53% while SVM successfully accomplished an overall accuracy of 96.4% ± 3.50% for the same task. A second classification based on geographical origin of samples was carried out. Results showed an overall accuracy of 87.5% ± 7.79% for LDA and a superior performance of 97.5% ± 3.22% for SVM. Given the complexity of coffee samples, the high accuracy percentages achieved by ET coupled with SVM in both classification problems suggested a potential applicability of ET in the assessment of selected coffee features with a simpler and faster methodology along with a null sample pretreatment. In addition, the proposed method can be applied to authentication assessment while improving cost, time and accuracy of the general procedure.
Applying Neural Networks to Hyperspectral and Multispectral Field Data for Discrimination of Cruciferous Weeds in Winter Crops

PubMed Central

de Castro, Ana-Isabel; Jurado-Expósito, Montserrat; Gómez-Casero, María-Teresa; López-Granados, Francisca

2012-01-01

In the context of detection of weeds in crops for site-specific weed control, on-ground spectral reflectance measurements are the first step to determine the potential of remote spectral data to classify weeds and crops. Field studies were conducted for four years at different locations in Spain. We aimed to distinguish cruciferous weeds in wheat and broad bean crops, using hyperspectral and multispectral readings in the visible and near-infrared spectrum. To identify differences in reflectance between cruciferous weeds, we applied three classification methods: stepwise discriminant (STEPDISC) analysis and two neural networks, specifically, multilayer perceptron (MLP) and radial basis function (RBF). Hyperspectral and multispectral signatures of cruciferous weeds, and wheat and broad bean crops can be classified using STEPDISC analysis, and MLP and RBF neural networks with different success, being the MLP model the most accurate with 100%, or higher than 98.1%, of classification performance for all the years. Classification accuracy from hyperspectral signatures was similar to that from multispectral and spectral indices, suggesting that little advantage would be obtained by using more expensive airborne hyperspectral imagery. Therefore, for next investigations, we recommend using multispectral remote imagery to explore whether they can potentially discriminate these weeds and crops. PMID:22629171
Applying neural networks to hyperspectral and multispectral field data for discrimination of cruciferous weeds in winter crops.

PubMed

de Castro, Ana-Isabel; Jurado-Expósito, Montserrat; Gómez-Casero, María-Teresa; López-Granados, Francisca

2012-01-01

In the context of detection of weeds in crops for site-specific weed control, on-ground spectral reflectance measurements are the first step to determine the potential of remote spectral data to classify weeds and crops. Field studies were conducted for four years at different locations in Spain. We aimed to distinguish cruciferous weeds in wheat and broad bean crops, using hyperspectral and multispectral readings in the visible and near-infrared spectrum. To identify differences in reflectance between cruciferous weeds, we applied three classification methods: stepwise discriminant (STEPDISC) analysis and two neural networks, specifically, multilayer perceptron (MLP) and radial basis function (RBF). Hyperspectral and multispectral signatures of cruciferous weeds, and wheat and broad bean crops can be classified using STEPDISC analysis, and MLP and RBF neural networks with different success, being the MLP model the most accurate with 100%, or higher than 98.1%, of classification performance for all the years. Classification accuracy from hyperspectral signatures was similar to that from multispectral and spectral indices, suggesting that little advantage would be obtained by using more expensive airborne hyperspectral imagery. Therefore, for next investigations, we recommend using multispectral remote imagery to explore whether they can potentially discriminate these weeds and crops.
Predicting carcinogenicity of diverse chemicals using probabilistic neural network modeling approaches

DOE Office of Scientific and Technical Information (OSTI.GOV)

Singh, Kunwar P., E-mail: kpsingh_52@yahoo.com; Environmental Chemistry Division, CSIR-Indian Institute of Toxicology Research, Post Box 80, Mahatma Gandhi Marg, Lucknow 226 001; Gupta, Shikha

Robust global models capable of discriminating positive and non-positive carcinogens; and predicting carcinogenic potency of chemicals in rodents were developed. The dataset of 834 structurally diverse chemicals extracted from Carcinogenic Potency Database (CPDB) was used which contained 466 positive and 368 non-positive carcinogens. Twelve non-quantum mechanical molecular descriptors were derived. Structural diversity of the chemicals and nonlinearity in the data were evaluated using Tanimoto similarity index and Brock–Dechert–Scheinkman statistics. Probabilistic neural network (PNN) and generalized regression neural network (GRNN) models were constructed for classification and function optimization problems using the carcinogenicity end point in rat. Validation of the models wasmore » performed using the internal and external procedures employing a wide series of statistical checks. PNN constructed using five descriptors rendered classification accuracy of 92.09% in complete rat data. The PNN model rendered classification accuracies of 91.77%, 80.70% and 92.08% in mouse, hamster and pesticide data, respectively. The GRNN constructed with nine descriptors yielded correlation coefficient of 0.896 between the measured and predicted carcinogenic potency with mean squared error (MSE) of 0.44 in complete rat data. The rat carcinogenicity model (GRNN) applied to the mouse and hamster data yielded correlation coefficient and MSE of 0.758, 0.71 and 0.760, 0.46, respectively. The results suggest for wide applicability of the inter-species models in predicting carcinogenic potency of chemicals. Both the PNN and GRNN (inter-species) models constructed here can be useful tools in predicting the carcinogenicity of new chemicals for regulatory purposes. - Graphical abstract: Figure (a) shows classification accuracies (positive and non-positive carcinogens) in rat, mouse, hamster, and pesticide data yielded by optimal PNN model. Figure (b) shows generalization and predictive abilities of the interspecies GRNN model to predict the carcinogenic potency of diverse chemicals. - Highlights: • Global robust models constructed for carcinogenicity prediction of diverse chemicals. • Tanimoto/BDS test revealed structural diversity of chemicals and nonlinearity in data. • PNN/GRNN successfully predicted carcinogenicity/carcinogenic potency of chemicals. • Developed interspecies PNN/GRNN models for carcinogenicity prediction. • Proposed models can be used as tool to predict carcinogenicity of new chemicals.« less
Uncovering state-dependent relationships in shallow lakes using Bayesian latent variable regression.

PubMed

Vitense, Kelsey; Hanson, Mark A; Herwig, Brian R; Zimmer, Kyle D; Fieberg, John

2018-03-01

Ecosystems sometimes undergo dramatic shifts between contrasting regimes. Shallow lakes, for instance, can transition between two alternative stable states: a clear state dominated by submerged aquatic vegetation and a turbid state dominated by phytoplankton. Theoretical models suggest that critical nutrient thresholds differentiate three lake types: highly resilient clear lakes, lakes that may switch between clear and turbid states following perturbations, and highly resilient turbid lakes. For effective and efficient management of shallow lakes and other systems, managers need tools to identify critical thresholds and state-dependent relationships between driving variables and key system features. Using shallow lakes as a model system for which alternative stable states have been demonstrated, we developed an integrated framework using Bayesian latent variable regression (BLR) to classify lake states, identify critical total phosphorus (TP) thresholds, and estimate steady state relationships between TP and chlorophyll a (chl a) using cross-sectional data. We evaluated the method using data simulated from a stochastic differential equation model and compared its performance to k-means clustering with regression (KMR). We also applied the framework to data comprising 130 shallow lakes. For simulated data sets, BLR had high state classification rates (median/mean accuracy >97%) and accurately estimated TP thresholds and state-dependent TP-chl a relationships. Classification and estimation improved with increasing sample size and decreasing noise levels. Compared to KMR, BLR had higher classification rates and better approximated the TP-chl a steady state relationships and TP thresholds. We fit the BLR model to three different years of empirical shallow lake data, and managers can use the estimated bifurcation diagrams to prioritize lakes for management according to their proximity to thresholds and chance of successful rehabilitation. Our model improves upon previous methods for shallow lakes because it allows classification and regression to occur simultaneously and inform one another, directly estimates TP thresholds and the uncertainty associated with thresholds and state classifications, and enables meaningful constraints to be built into models. The BLR framework is broadly applicable to other ecosystems known to exhibit alternative stable states in which regression can be used to establish relationships between driving variables and state variables. © 2017 by the Ecological Society of America.

Exhaustive Classification of the Invariant Solutions for a Specific Nonlinear Model Describing Near Planar and Marginally Long-Wave Unstable Interfaces for Phase Transition

NASA Astrophysics Data System (ADS)

Ahangari, Fatemeh

2018-05-01

Problems of thermodynamic phase transition originate inherently in solidification, combustion and various other significant fields. If the transition region among two locally stable phases is adequately narrow, the dynamics can be modeled by an interface motion. This paper is devoted to exhaustive analysis of the invariant solutions for a modified Kuramoto-Sivashinsky equation in two spatial and one temporal dimensions is presented. This nonlinear partial differential equation asymptotically characterizes near planar interfaces, which are marginally long-wave unstable. For this purpose, by applying the classical symmetry method for this model the classical symmetry operators are attained. Moreover, the structure of the Lie algebra of symmetries is discussed and the optimal system of subalgebras, which yields the preliminary classification of group invariant solutions is constructed. Mainly, the Lie invariants corresponding to the infinitesimal symmetry generators as well as associated similarity reduced equations are also pointed out. Furthermore, the nonclassical symmetries of this nonlinear PDE are also comprehensively investigated.
Identification of consensus biomarkers for predicting non-genotoxic hepatocarcinogens

PubMed Central

Huang, Shan-Han; Tung, Chun-Wei

2017-01-01

The assessment of non-genotoxic hepatocarcinogens (NGHCs) is currently relying on two-year rodent bioassays. Toxicogenomics biomarkers provide a potential alternative method for the prioritization of NGHCs that could be useful for risk assessment. However, previous studies using inconsistently classified chemicals as the training set and a single microarray dataset concluded no consensus biomarkers. In this study, 4 consensus biomarkers of A2m, Ca3, Cxcl1, and Cyp8b1 were identified from four large-scale microarray datasets of the one-day single maximum tolerated dose and a large set of chemicals without inconsistent classifications. Machine learning techniques were subsequently applied to develop prediction models for NGHCs. The final bagging decision tree models were constructed with an average AUC performance of 0.803 for an independent test. A set of 16 chemicals with controversial classifications were reclassified according to the consensus biomarkers. The developed prediction models and identified consensus biomarkers are expected to be potential alternative methods for prioritization of NGHCs for further experimental validation. PMID:28117354
Prediction of chemical biodegradability using support vector classifier optimized with differential evolution.

PubMed

Cao, Qi; Leung, K M

2014-09-22

Reliable computer models for the prediction of chemical biodegradability from molecular descriptors and fingerprints are very important for making health and environmental decisions. Coupling of the differential evolution (DE) algorithm with the support vector classifier (SVC) in order to optimize the main parameters of the classifier resulted in an improved classifier called the DE-SVC, which is introduced in this paper for use in chemical biodegradability studies. The DE-SVC was applied to predict the biodegradation of chemicals on the basis of extensive sample data sets and known structural features of molecules. Our optimization experiments showed that DE can efficiently find the proper parameters of the SVC. The resulting classifier possesses strong robustness and reliability compared with grid search, genetic algorithm, and particle swarm optimization methods. The classification experiments conducted here showed that the DE-SVC exhibits better classification performance than models previously used for such studies. It is a more effective and efficient prediction model for chemical biodegradability.
Preprocessing and meta-classification for brain-computer interfaces.

PubMed

Hammon, Paul S; de Sa, Virginia R

2007-03-01

A brain-computer interface (BCI) is a system which allows direct translation of brain states into actions, bypassing the usual muscular pathways. A BCI system works by extracting user brain signals, applying machine learning algorithms to classify the user's brain state, and performing a computer-controlled action. Our goal is to improve brain state classification. Perhaps the most obvious way to improve classification performance is the selection of an advanced learning algorithm. However, it is now well known in the BCI community that careful selection of preprocessing steps is crucial to the success of any classification scheme. Furthermore, recent work indicates that combining the output of multiple classifiers (meta-classification) leads to improved classification rates relative to single classifiers (Dornhege et al., 2004). In this paper, we develop an automated approach which systematically analyzes the relative contributions of different preprocessing and meta-classification approaches. We apply this procedure to three data sets drawn from BCI Competition 2003 (Blankertz et al., 2004) and BCI Competition III (Blankertz et al., 2006), each of which exhibit very different characteristics. Our final classification results compare favorably with those from past BCI competitions. Additionally, we analyze the relative contributions of individual preprocessing and meta-classification choices and discuss which types of BCI data benefit most from specific algorithms.
40 CFR 51.902 - Which classification and nonattainment area planning provisions of the CAA shall apply to areas...

Code of Federal Regulations, 2011 CFR

2011-07-01

... area planning provisions of the CAA shall apply to areas designated nonattainment for the 8-hour NAAQS? 51.902 Section 51.902 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) AIR... Implementation of 8-hour Ozone National Ambient Air Quality Standard § 51.902 Which classification and...
40 CFR 51.902 - Which classification and nonattainment area planning provisions of the CAA shall apply to areas...

Code of Federal Regulations, 2014 CFR

2014-07-01

... area planning provisions of the CAA shall apply to areas designated nonattainment for the 1997 8-hour NAAQS? 51.902 Section 51.902 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) AIR... Implementation of 8-hour Ozone National Ambient Air Quality Standard § 51.902 Which classification and...
40 CFR 51.902 - Which classification and nonattainment area planning provisions of the CAA shall apply to areas...

Code of Federal Regulations, 2012 CFR

2012-07-01

... area planning provisions of the CAA shall apply to areas designated nonattainment for the 1997 8-hour NAAQS? 51.902 Section 51.902 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) AIR... Implementation of 8-hour Ozone National Ambient Air Quality Standard § 51.902 Which classification and...
40 CFR 51.902 - Which classification and nonattainment area planning provisions of the CAA shall apply to areas...

Code of Federal Regulations, 2013 CFR

2013-07-01

... area planning provisions of the CAA shall apply to areas designated nonattainment for the 1997 8-hour NAAQS? 51.902 Section 51.902 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) AIR... Implementation of 8-hour Ozone National Ambient Air Quality Standard § 51.902 Which classification and...
40 CFR 51.902 - Which classification and nonattainment area planning provisions of the CAA shall apply to areas...

Code of Federal Regulations, 2010 CFR

2010-07-01

... area planning provisions of the CAA shall apply to areas designated nonattainment for the 8-hour NAAQS? 51.902 Section 51.902 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) AIR... Implementation of 8-hour Ozone National Ambient Air Quality Standard § 51.902 Which classification and...
Applying a Hidden Markov Model-Based Event Detection and Classification Algorithm to Apollo Lunar Seismic Data

NASA Astrophysics Data System (ADS)

Knapmeyer-Endrun, B.; Hammer, C.

2014-12-01

The seismometers that the Apollo astronauts deployed on the Moon provide the only recordings of seismic events from any extra-terrestrial body so far. These lunar events are significantly different from ones recorded on Earth, in terms of both signal shape and source processes. Thus they are a valuable test case for any experiment in planetary seismology. In this study, we analyze Apollo 16 data with a single-station event detection and classification algorithm in view of NASA's upcoming InSight mission to Mars. InSight, scheduled for launch in early 2016, has the goal to investigate Mars' internal structure by deploying a seismometer on its surface. As the mission does not feature any orbiter, continuous data will be relayed to Earth at a reduced rate. Full range data will only be available by requesting specific time-windows within a few days after the receipt of the original transmission. We apply a recently introduced algorithm based on hidden Markov models that requires only a single example waveform of each event class for training appropriate models. After constructing the prototypes we detect and classify impacts and deep and shallow moonquakes. Initial results for 1972 (year of station installation with 8 months of data) indicate a high detection rate of over 95% for impacts, of which more than 80% are classified correctly. Deep moonquakes, which occur in large amounts, but often show only very weak signals, are detected with less certainty (~70%). As there is only one weak shallow moonquake covered, results for this event class are not statistically significant. Daily adjustments of the background noise model help to reduce false alarms, which are mainly erroneous deep moonquake detections, by about 25%. The algorithm enables us to classify events that were previously listed in the catalog without classification, and, through the combined use of long period and short period data, identify some unlisted local impacts as well as at least two yet unreported deep moonquakes.
Common and Distant Structural Characteristics of Feruloyl Esterase Families from Aspergillus oryzae

PubMed Central

Udatha, D. B. R. K. Gupta; Mapelli, Valeria; Panagiotou, Gianni; Olsson, Lisbeth

2012-01-01

Background Feruloyl esterases (FAEs) are important biomass degrading accessory enzymes due to their capability of cleaving the ester links between hemicellulose and pectin to aromatic compounds of lignin, thus enhancing the accessibility of plant tissues to cellulolytic and hemicellulolytic enzymes. FAEs have gained increased attention in the area of biocatalytic transformations for the synthesis of value added compounds with medicinal and nutritional applications. Following the increasing attention on these enzymes, a novel descriptor based classification system has been proposed for FAEs resulting into 12 distinct families and pharmacophore models for three FAE sub-families have been developed. Methodology/Principal Findings The feruloylome of Aspergillus oryzae contains 13 predicted FAEs belonging to six sub-families based on our recently developed descriptor-based classification system. The three-dimensional structures of the 13 FAEs were modeled for structural analysis of the feruloylome. The three genes coding for three enzymes, viz., A.O.2, A.O.8 and A.O.10 from the feruloylome of A. oryzae, representing sub-families with unknown functional features, were heterologously expressed in Pichia pastoris, characterized for substrate specificity and structural characterization through CD spectroscopy. Common feature-based pharamacophore models were developed according to substrate specificity characteristics of the three enzymes. The active site residues were identified for the three expressed FAEs by determining the titration curves of amino acid residues as a function of the pH by applying molecular simulations. Conclusions/Significance Our findings on the structure-function relationships and substrate specificity of the FAEs of A. oryzae will be instrumental for further understanding of the FAE families in the novel classification system. The developed pharmacophore models could be applied for virtual screening of compound databases for short listing the putative substrates prior to docking studies or for post-processing docking results to remove false positives. Our study exemplifies how computational predictions can complement to the information obtained through experimental methods. PMID:22745763
Common and distant structural characteristics of feruloyl esterase families from Aspergillus oryzae.

PubMed

Udatha, D B R K Gupta; Mapelli, Valeria; Panagiotou, Gianni; Olsson, Lisbeth

2012-01-01

Feruloyl esterases (FAEs) are important biomass degrading accessory enzymes due to their capability of cleaving the ester links between hemicellulose and pectin to aromatic compounds of lignin, thus enhancing the accessibility of plant tissues to cellulolytic and hemicellulolytic enzymes. FAEs have gained increased attention in the area of biocatalytic transformations for the synthesis of value added compounds with medicinal and nutritional applications. Following the increasing attention on these enzymes, a novel descriptor based classification system has been proposed for FAEs resulting into 12 distinct families and pharmacophore models for three FAE sub-families have been developed. The feruloylome of Aspergillus oryzae contains 13 predicted FAEs belonging to six sub-families based on our recently developed descriptor-based classification system. The three-dimensional structures of the 13 FAEs were modeled for structural analysis of the feruloylome. The three genes coding for three enzymes, viz., A.O.2, A.O.8 and A.O.10 from the feruloylome of A. oryzae, representing sub-families with unknown functional features, were heterologously expressed in Pichia pastoris, characterized for substrate specificity and structural characterization through CD spectroscopy. Common feature-based pharamacophore models were developed according to substrate specificity characteristics of the three enzymes. The active site residues were identified for the three expressed FAEs by determining the titration curves of amino acid residues as a function of the pH by applying molecular simulations. Our findings on the structure-function relationships and substrate specificity of the FAEs of A. oryzae will be instrumental for further understanding of the FAE families in the novel classification system. The developed pharmacophore models could be applied for virtual screening of compound databases for short listing the putative substrates prior to docking studies or for post-processing docking results to remove false positives. Our study exemplifies how computational predictions can complement to the information obtained through experimental methods.
SU-E-J-107: Supervised Learning Model of Aligned Collagen for Human Breast Carcinoma Prognosis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bredfeldt, J; Liu, Y; Conklin, M

Purpose: Our goal is to develop and apply a set of optical and computational tools to enable large-scale investigations of the interaction between collagen and tumor cells. Methods: We have built a novel imaging system for automating the capture of whole-slide second harmonic generation (SHG) images of collagen in registry with bright field (BF) images of hematoxylin and eosin stained tissue. To analyze our images, we have integrated a suite of supervised learning tools that semi-automatically model and score collagen interactions with tumor cells via a variety of metrics, a method we call Electronic Tumor Associated Collagen Signatures (eTACS). Thismore » group of tools first segments regions of epithelial cells and collagen fibers from BF and SHG images respectively. We then associate fibers with groups of epithelial cells and finally compute features based on the angle of interaction and density of the collagen surrounding the epithelial cell clusters. These features are then processed with a support vector machine to separate cancer patients into high and low risk groups. Results: We validated our model by showing that eTACS produces classifications that have statistically significant correlation with manual classifications. In addition, our system generated classification scores that accurately predicted breast cancer patient survival in a cohort of 196 patients. Feature rank analysis revealed that TACS positive fibers are more well aligned with each other, generally lower density, and terminate within or near groups of epithelial cells. Conclusion: We are working to apply our model to predict survival in larger cohorts of breast cancer patients with a diversity of breast cancer types, predict response to treatments such as COX2 inhibitors, and to study collagen architecture changes in other cancer types. In the future, our system may be used to provide metastatic potential information to cancer patients to augment existing clinical assays.« less
Learning to Predict Combinatorial Structures

NASA Astrophysics Data System (ADS)

Vembu, Shankar

2009-12-01

The major challenge in designing a discriminative learning algorithm for predicting structured data is to address the computational issues arising from the exponential size of the output space. Existing algorithms make different assumptions to ensure efficient, polynomial time estimation of model parameters. For several combinatorial structures, including cycles, partially ordered sets, permutations and other graph classes, these assumptions do not hold. In this thesis, we address the problem of designing learning algorithms for predicting combinatorial structures by introducing two new assumptions: (i) The first assumption is that a particular counting problem can be solved efficiently. The consequence is a generalisation of the classical ridge regression for structured prediction. (ii) The second assumption is that a particular sampling problem can be solved efficiently. The consequence is a new technique for designing and analysing probabilistic structured prediction models. These results can be applied to solve several complex learning problems including but not limited to multi-label classification, multi-category hierarchical classification, and label ranking.
Authentication of Trappist beers by LC-MS fingerprints and multivariate data analysis.

PubMed

Mattarucchi, Elia; Stocchero, Matteo; Moreno-Rojas, José Manuel; Giordano, Giuseppe; Reniero, Fabiano; Guillou, Claude

2010-12-08

The aim of this study was to asses the applicability of LC-MS profiling to authenticate a selected Trappist beer as part of a program on traceability funded by the European Commission. A total of 232 beers were fingerprinted and classified through multivariate data analysis. The selected beer was clearly distinguished from beers of different brands, while only 3 samples (3.5% of the test set) were wrongly classified when compared with other types of beer of the same Trappist brewery. The fingerprints were further analyzed to extract the most discriminating variables, which proved to be sufficient for classification, even using a simplified unsupervised model. This reduced fingerprint allowed us to study the influence of batch-to-batch variability on the classification model. Our results can easily be applied to different matrices and they confirmed the effectiveness of LC-MS profiling in combination with multivariate data analysis for the characterization of food products.
An online BCI game based on the decoding of users' attention to color stimulus.

PubMed

Yang, Lingling; Leung, Howard

2013-01-01

Studies have shown that statistically there are differences in theta, alpha and beta band powers when people look at blue and red colors. In this paper, a game has been developed to test whether these statistical differences are good enough for online Brain Computer Interface (BCI) application. We implemented a two-choice BCI game in which the subject makes the choice by looking at a color option and our system decodes the subject's intention by analyzing the EEG signal. In our system, band power features of the EEG data were used to train a support vector machine (SVM) classification model. An online mechanism was adopted to update the classification model during the training stage to account for individual differences. Our results showed that an accuracy of 70%-80% could be achieved and it provided evidence for the possibility in applying color stimuli to BCI applications.
Identifying Chinese Microblog Users With High Suicide Probability Using Internet-Based Profile and Linguistic Features: Classification Model

PubMed Central

Guan, Li; Hao, Bibo; Cheng, Qijin; Yip, Paul SF

2015-01-01

Background Traditional offline assessment of suicide probability is time consuming and difficult in convincing at-risk individuals to participate. Identifying individuals with high suicide probability through online social media has an advantage in its efficiency and potential to reach out to hidden individuals, yet little research has been focused on this specific field. Objective The objective of this study was to apply two classification models, Simple Logistic Regression (SLR) and Random Forest (RF), to examine the feasibility and effectiveness of identifying high suicide possibility microblog users in China through profile and linguistic features extracted from Internet-based data. Methods There were nine hundred and nine Chinese microblog users that completed an Internet survey, and those scoring one SD above the mean of the total Suicide Probability Scale (SPS) score, as well as one SD above the mean in each of the four subscale scores in the participant sample were labeled as high-risk individuals, respectively. Profile and linguistic features were fed into two machine learning algorithms (SLR and RF) to train the model that aims to identify high-risk individuals in general suicide probability and in its four dimensions. Models were trained and then tested by 5-fold cross validation; in which both training set and test set were generated under the stratified random sampling rule from the whole sample. There were three classic performance metrics (Precision, Recall, F1 measure) and a specifically defined metric “Screening Efficiency” that were adopted to evaluate model effectiveness. Results Classification performance was generally matched between SLR and RF. Given the best performance of the classification models, we were able to retrieve over 70% of the labeled high-risk individuals in overall suicide probability as well as in the four dimensions. Screening Efficiency of most models varied from 1/4 to 1/2. Precision of the models was generally below 30%. Conclusions Individuals in China with high suicide probability are recognizable by profile and text-based information from microblogs. Although there is still much space to improve the performance of classification models in the future, this study may shed light on preliminary screening of risky individuals via machine learning algorithms, which can work side-by-side with expert scrutiny to increase efficiency in large-scale-surveillance of suicide probability from online social media. PMID:26543921
Identifying Chinese Microblog Users With High Suicide Probability Using Internet-Based Profile and Linguistic Features: Classification Model.

PubMed

Guan, Li; Hao, Bibo; Cheng, Qijin; Yip, Paul Sf; Zhu, Tingshao

2015-01-01

Traditional offline assessment of suicide probability is time consuming and difficult in convincing at-risk individuals to participate. Identifying individuals with high suicide probability through online social media has an advantage in its efficiency and potential to reach out to hidden individuals, yet little research has been focused on this specific field. The objective of this study was to apply two classification models, Simple Logistic Regression (SLR) and Random Forest (RF), to examine the feasibility and effectiveness of identifying high suicide possibility microblog users in China through profile and linguistic features extracted from Internet-based data. There were nine hundred and nine Chinese microblog users that completed an Internet survey, and those scoring one SD above the mean of the total Suicide Probability Scale (SPS) score, as well as one SD above the mean in each of the four subscale scores in the participant sample were labeled as high-risk individuals, respectively. Profile and linguistic features were fed into two machine learning algorithms (SLR and RF) to train the model that aims to identify high-risk individuals in general suicide probability and in its four dimensions. Models were trained and then tested by 5-fold cross validation; in which both training set and test set were generated under the stratified random sampling rule from the whole sample. There were three classic performance metrics (Precision, Recall, F1 measure) and a specifically defined metric "Screening Efficiency" that were adopted to evaluate model effectiveness. Classification performance was generally matched between SLR and RF. Given the best performance of the classification models, we were able to retrieve over 70% of the labeled high-risk individuals in overall suicide probability as well as in the four dimensions. Screening Efficiency of most models varied from 1/4 to 1/2. Precision of the models was generally below 30%. Individuals in China with high suicide probability are recognizable by profile and text-based information from microblogs. Although there is still much space to improve the performance of classification models in the future, this study may shed light on preliminary screening of risky individuals via machine learning algorithms, which can work side-by-side with expert scrutiny to increase efficiency in large-scale-surveillance of suicide probability from online social media.
Identifying the optimal segmentors for mass classification in mammograms

NASA Astrophysics Data System (ADS)

Zhang, Yu; Tomuro, Noriko; Furst, Jacob; Raicu, Daniela S.

2015-03-01

In this paper, we present the results of our investigation on identifying the optimal segmentor(s) from an ensemble of weak segmentors, used in a Computer-Aided Diagnosis (CADx) system which classifies suspicious masses in mammograms as benign or malignant. This is an extension of our previous work, where we used various parameter settings of image enhancement techniques to each suspicious mass (region of interest (ROI)) to obtain several enhanced images, then applied segmentation to each image to obtain several contours of a given mass. Each segmentation in this ensemble is essentially a "weak segmentor" because no single segmentation can produce the optimal result for all images. Then after shape features are computed from the segmented contours, the final classification model was built using logistic regression. The work in this paper focuses on identifying the optimal segmentor(s) from an ensemble mix of weak segmentors. For our purpose, optimal segmentors are those in the ensemble mix which contribute the most to the overall classification rather than the ones that produced high precision segmentation. To measure the segmentors' contribution, we examined weights on the features in the derived logistic regression model and computed the average feature weight for each segmentor. The result showed that, while in general the segmentors with higher segmentation success rates had higher feature weights, some segmentors with lower segmentation rates had high classification feature weights as well.
Geometry-based ensembles: toward a structural characterization of the classification boundary.

PubMed

Pujol, Oriol; Masip, David

2009-06-01

This paper introduces a novel binary discriminative learning technique based on the approximation of the nonlinear decision boundary by a piecewise linear smooth additive model. The decision border is geometrically defined by means of the characterizing boundary points-points that belong to the optimal boundary under a certain notion of robustness. Based on these points, a set of locally robust linear classifiers is defined and assembled by means of a Tikhonov regularized optimization procedure in an additive model to create a final lambda-smooth decision rule. As a result, a very simple and robust classifier with a strong geometrical meaning and nonlinear behavior is obtained. The simplicity of the method allows its extension to cope with some of today's machine learning challenges, such as online learning, large-scale learning or parallelization, with linear computational complexity. We validate our approach on the UCI database, comparing with several state-of-the-art classification techniques. Finally, we apply our technique in online and large-scale scenarios and in six real-life computer vision and pattern recognition problems: gender recognition based on face images, intravascular ultrasound tissue classification, speed traffic sign detection, Chagas' disease myocardial damage severity detection, old musical scores clef classification, and action recognition using 3D accelerometer data from a wearable device. The results are promising and this paper opens a line of research that deserves further attention.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.