Community detection in complex networks using proximate support vector clustering
NASA Astrophysics Data System (ADS)
Wang, Feifan; Zhang, Baihai; Chai, Senchun; Xia, Yuanqing
2018-03-01
Community structure, one of the most attention attracting properties in complex networks, has been a cornerstone in advances of various scientific branches. A number of tools have been involved in recent studies concentrating on the community detection algorithms. In this paper, we propose a support vector clustering method based on a proximity graph, owing to which the introduced algorithm surpasses the traditional support vector approach both in accuracy and complexity. Results of extensive experiments undertaken on computer generated networks and real world data sets illustrate competent performances in comparison with the other counterparts.
Phylogeny of the Genus Flavivirus
Kuno, Goro; Chang, Gwong-Jen J.; Tsuchiya, K. Richard; Karabatsos, Nick; Cropp, C. Bruce
1998-01-01
We undertook a comprehensive phylogenetic study to establish the genetic relationship among the viruses of the genus Flavivirus and to compare the classification based on molecular phylogeny with the existing serologic method. By using a combination of quantitative definitions (bootstrap support level and the pairwise nucleotide sequence identity), the viruses could be classified into clusters, clades, and species. Our phylogenetic study revealed for the first time that from the putative ancestor two branches, non-vector and vector-borne virus clusters, evolved and from the latter cluster emerged tick-borne and mosquito-borne virus clusters. Provided that the theory of arthropod association being an acquired trait was correct, pairwise nucleotide sequence identity among these three clusters provided supporting data for a possibility that the non-vector cluster evolved first, followed by the separation of tick-borne and mosquito-borne virus clusters in that order. Clades established in our study correlated significantly with existing antigenic complexes. We also resolved many of the past taxonomic problems by establishing phylogenetic relationships of the antigenically unclassified viruses with the well-established viruses and by identifying synonymous viruses. PMID:9420202
Phylogeny of the genus Flavivirus.
Kuno, G; Chang, G J; Tsuchiya, K R; Karabatsos, N; Cropp, C B
1998-01-01
We undertook a comprehensive phylogenetic study to establish the genetic relationship among the viruses of the genus Flavivirus and to compare the classification based on molecular phylogeny with the existing serologic method. By using a combination of quantitative definitions (bootstrap support level and the pairwise nucleotide sequence identity), the viruses could be classified into clusters, clades, and species. Our phylogenetic study revealed for the first time that from the putative ancestor two branches, non-vector and vector-borne virus clusters, evolved and from the latter cluster emerged tick-borne and mosquito-borne virus clusters. Provided that the theory of arthropod association being an acquired trait was correct, pairwise nucleotide sequence identity among these three clusters provided supporting data for a possibility that the non-vector cluster evolved first, followed by the separation of tick-borne and mosquito-borne virus clusters in that order. Clades established in our study correlated significantly with existing antigenic complexes. We also resolved many of the past taxonomic problems by establishing phylogenetic relationships of the antigenically unclassified viruses with the well-established viruses and by identifying synonymous viruses.
Virtual screening by a new Clustering-based Weighted Similarity Extreme Learning Machine approach
Kudisthalert, Wasu
2018-01-01
Machine learning techniques are becoming popular in virtual screening tasks. One of the powerful machine learning algorithms is Extreme Learning Machine (ELM) which has been applied to many applications and has recently been applied to virtual screening. We propose the Weighted Similarity ELM (WS-ELM) which is based on a single layer feed-forward neural network in a conjunction of 16 different similarity coefficients as activation function in the hidden layer. It is known that the performance of conventional ELM is not robust due to random weight selection in the hidden layer. Thus, we propose a Clustering-based WS-ELM (CWS-ELM) that deterministically assigns weights by utilising clustering algorithms i.e. k-means clustering and support vector clustering. The experiments were conducted on one of the most challenging datasets–Maximum Unbiased Validation Dataset–which contains 17 activity classes carefully selected from PubChem. The proposed algorithms were then compared with other machine learning techniques such as support vector machine, random forest, and similarity searching. The results show that CWS-ELM in conjunction with support vector clustering yields the best performance when utilised together with Sokal/Sneath(1) coefficient. Furthermore, ECFP_6 fingerprint presents the best results in our framework compared to the other types of fingerprints, namely ECFP_4, FCFP_4, and FCFP_6. PMID:29652912
Automated Creation of Labeled Pointcloud Datasets in Support of Machine-Learning Based Perception
2017-12-01
computationally intensive 3D vector math and took more than ten seconds to segment a single LIDAR frame from the HDL-32e with the Dell XPS15 9650’s Intel...Core i7 CPU. Depth Clustering avoids the computationally intensive 3D vector math of Euclidean Clustering-based DON segmentation and, instead
Hybrid approach of selecting hyperparameters of support vector machine for regression.
Jeng, Jin-Tsong
2006-06-01
To select the hyperparameters of the support vector machine for regression (SVR), a hybrid approach is proposed to determine the kernel parameter of the Gaussian kernel function and the epsilon value of Vapnik's epsilon-insensitive loss function. The proposed hybrid approach includes a competitive agglomeration (CA) clustering algorithm and a repeated SVR (RSVR) approach. Since the CA clustering algorithm is used to find the nearly "optimal" number of clusters and the centers of clusters in the clustering process, the CA clustering algorithm is applied to select the Gaussian kernel parameter. Additionally, an RSVR approach that relies on the standard deviation of a training error is proposed to obtain an epsilon in the loss function. Finally, two functions, one real data set (i.e., a time series of quarterly unemployment rate for West Germany) and an identification of nonlinear plant are used to verify the usefulness of the hybrid approach.
Kamarudin, Nur Diyana; Ooi, Chia Yee; Kawanabe, Tadaaki; Odaguchi, Hiroshi; Kobayashi, Fuminori
2017-01-01
In tongue diagnosis, colour information of tongue body has kept valuable information regarding the state of disease and its correlation with the internal organs. Qualitatively, practitioners may have difficulty in their judgement due to the instable lighting condition and naked eye's ability to capture the exact colour distribution on the tongue especially the tongue with multicolour substance. To overcome this ambiguity, this paper presents a two-stage tongue's multicolour classification based on a support vector machine (SVM) whose support vectors are reduced by our proposed k -means clustering identifiers and red colour range for precise tongue colour diagnosis. In the first stage, k -means clustering is used to cluster a tongue image into four clusters of image background (black), deep red region, red/light red region, and transitional region. In the second-stage classification, red/light red tongue images are further classified into red tongue or light red tongue based on the red colour range derived in our work. Overall, true rate classification accuracy of the proposed two-stage classification to diagnose red, light red, and deep red tongue colours is 94%. The number of support vectors in SVM is improved by 41.2%, and the execution time for one image is recorded as 48 seconds.
Ooi, Chia Yee; Kawanabe, Tadaaki; Odaguchi, Hiroshi; Kobayashi, Fuminori
2017-01-01
In tongue diagnosis, colour information of tongue body has kept valuable information regarding the state of disease and its correlation with the internal organs. Qualitatively, practitioners may have difficulty in their judgement due to the instable lighting condition and naked eye's ability to capture the exact colour distribution on the tongue especially the tongue with multicolour substance. To overcome this ambiguity, this paper presents a two-stage tongue's multicolour classification based on a support vector machine (SVM) whose support vectors are reduced by our proposed k-means clustering identifiers and red colour range for precise tongue colour diagnosis. In the first stage, k-means clustering is used to cluster a tongue image into four clusters of image background (black), deep red region, red/light red region, and transitional region. In the second-stage classification, red/light red tongue images are further classified into red tongue or light red tongue based on the red colour range derived in our work. Overall, true rate classification accuracy of the proposed two-stage classification to diagnose red, light red, and deep red tongue colours is 94%. The number of support vectors in SVM is improved by 41.2%, and the execution time for one image is recorded as 48 seconds. PMID:29065640
Guo, Lei; Abbosh, Amin
2018-05-01
For any chance for stroke patients to survive, the stroke type should be classified to enable giving medication within a few hours of the onset of symptoms. In this paper, a microwave-based stroke localization and classification framework is proposed. It is based on microwave tomography, k-means clustering, and a support vector machine (SVM) method. The dielectric profile of the brain is first calculated using the Born iterative method, whereas the amplitude of the dielectric profile is then taken as the input to k-means clustering. The cluster is selected as the feature vector for constructing and testing the SVM. A database of MRI-derived realistic head phantoms at different signal-to-noise ratios is used in the classification procedure. The performance of the proposed framework is evaluated using the receiver operating characteristic (ROC) curve. The results based on a two-dimensional framework show that 88% classification accuracy, with a sensitivity of 91% and a specificity of 87%, can be achieved. Bioelectromagnetics. 39:312-324, 2018. © 2018 Wiley Periodicals, Inc. © 2018 Wiley Periodicals, Inc.
Support Vector Data Descriptions and k-Means Clustering: One Class?
Gornitz, Nico; Lima, Luiz Alberto; Muller, Klaus-Robert; Kloft, Marius; Nakajima, Shinichi
2017-09-27
We present ClusterSVDD, a methodology that unifies support vector data descriptions (SVDDs) and k-means clustering into a single formulation. This allows both methods to benefit from one another, i.e., by adding flexibility using multiple spheres for SVDDs and increasing anomaly resistance and flexibility through kernels to k-means. In particular, our approach leads to a new interpretation of k-means as a regularized mode seeking algorithm. The unifying formulation further allows for deriving new algorithms by transferring knowledge from one-class learning settings to clustering settings and vice versa. As a showcase, we derive a clustering method for structured data based on a one-class learning scenario. Additionally, our formulation can be solved via a particularly simple optimization scheme. We evaluate our approach empirically to highlight some of the proposed benefits on artificially generated data, as well as on real-world problems, and provide a Python software package comprising various implementations of primal and dual SVDD as well as our proposed ClusterSVDD.
Testing of the Support Vector Machine for Binary-Class Classification
NASA Technical Reports Server (NTRS)
Scholten, Matthew
2011-01-01
The Support Vector Machine is a powerful algorithm, useful in classifying data in to species. The Support Vector Machines implemented in this research were used as classifiers for the final stage in a Multistage Autonomous Target Recognition system. A single kernel SVM known as SVMlight, and a modified version known as a Support Vector Machine with K-Means Clustering were used. These SVM algorithms were tested as classifiers under varying conditions. Image noise levels varied, and the orientation of the targets changed. The classifiers were then optimized to demonstrate their maximum potential as classifiers. Results demonstrate the reliability of SMV as a method for classification. From trial to trial, SVM produces consistent results
Yin, Zhong; Zhang, Jianhua
2014-07-01
Identifying the abnormal changes of mental workload (MWL) over time is quite crucial for preventing the accidents due to cognitive overload and inattention of human operators in safety-critical human-machine systems. It is known that various neuroimaging technologies can be used to identify the MWL variations. In order to classify MWL into a few discrete levels using representative MWL indicators and small-sized training samples, a novel EEG-based approach by combining locally linear embedding (LLE), support vector clustering (SVC) and support vector data description (SVDD) techniques is proposed and evaluated by using the experimentally measured data. The MWL indicators from different cortical regions are first elicited by using the LLE technique. Then, the SVC approach is used to find the clusters of these MWL indicators and thereby to detect MWL variations. It is shown that the clusters can be interpreted as the binary class MWL. Furthermore, a trained binary SVDD classifier is shown to be capable of detecting slight variations of those indicators. By combining the two schemes, a SVC-SVDD framework is proposed, where the clear-cut (smaller) cluster is detected by SVC first and then a subsequent SVDD model is utilized to divide the overlapped (larger) cluster into two classes. Finally, three-class MWL levels (low, normal and high) can be identified automatically. The experimental data analysis results are compared with those of several existing methods. It has been demonstrated that the proposed framework can lead to acceptable computational accuracy and has the advantages of both unsupervised and supervised training strategies. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Lu, Chi-Jie; Chang, Chi-Chang
2014-01-01
Sales forecasting plays an important role in operating a business since it can be used to determine the required inventory level to meet consumer demand and avoid the problem of under/overstocking. Improving the accuracy of sales forecasting has become an important issue of operating a business. This study proposes a hybrid sales forecasting scheme by combining independent component analysis (ICA) with K-means clustering and support vector regression (SVR). The proposed scheme first uses the ICA to extract hidden information from the observed sales data. The extracted features are then applied to K-means algorithm for clustering the sales data into several disjoined clusters. Finally, the SVR forecasting models are applied to each group to generate final forecasting results. Experimental results from information technology (IT) product agent sales data reveal that the proposed sales forecasting scheme outperforms the three comparison models and hence provides an efficient alternative for sales forecasting.
2014-01-01
Sales forecasting plays an important role in operating a business since it can be used to determine the required inventory level to meet consumer demand and avoid the problem of under/overstocking. Improving the accuracy of sales forecasting has become an important issue of operating a business. This study proposes a hybrid sales forecasting scheme by combining independent component analysis (ICA) with K-means clustering and support vector regression (SVR). The proposed scheme first uses the ICA to extract hidden information from the observed sales data. The extracted features are then applied to K-means algorithm for clustering the sales data into several disjoined clusters. Finally, the SVR forecasting models are applied to each group to generate final forecasting results. Experimental results from information technology (IT) product agent sales data reveal that the proposed sales forecasting scheme outperforms the three comparison models and hence provides an efficient alternative for sales forecasting. PMID:25045738
Analysis of ground-motion simulation big data
NASA Astrophysics Data System (ADS)
Maeda, T.; Fujiwara, H.
2016-12-01
We developed a parallel distributed processing system which applies a big data analysis to the large-scale ground motion simulation data. The system uses ground-motion index values and earthquake scenario parameters as input. We used peak ground velocity value and velocity response spectra as the ground-motion index. The ground-motion index values are calculated from our simulation data. We used simulated long-period ground motion waveforms at about 80,000 meshes calculated by a three dimensional finite difference method based on 369 earthquake scenarios of a great earthquake in the Nankai Trough. These scenarios were constructed by considering the uncertainty of source model parameters such as source area, rupture starting point, asperity location, rupture velocity, fmax and slip function. We used these parameters as the earthquake scenario parameter. The system firstly carries out the clustering of the earthquake scenario in each mesh by the k-means method. The number of clusters is determined in advance using a hierarchical clustering by the Ward's method. The scenario clustering results are converted to the 1-D feature vector. The dimension of the feature vector is the number of scenario combination. If two scenarios belong to the same cluster the component of the feature vector is 1, and otherwise the component is 0. The feature vector shows a `response' of mesh to the assumed earthquake scenario group. Next, the system performs the clustering of the mesh by k-means method using the feature vector of each mesh previously obtained. Here the number of clusters is arbitrarily given. The clustering of scenarios and meshes are performed by parallel distributed processing with Hadoop and Spark, respectively. In this study, we divided the meshes into 20 clusters. The meshes in each cluster are geometrically concentrated. Thus this system can extract regions, in which the meshes have similar `response', as clusters. For each cluster, it is possible to determine particular scenario parameters which characterize the cluster. In other word, by utilizing this system, we can obtain critical scenario parameters of the ground-motion simulation for each evaluation point objectively. This research was supported by CREST, JST.
NASA Astrophysics Data System (ADS)
Su, Lihong
In remote sensing communities, support vector machine (SVM) learning has recently received increasing attention. SVM learning usually requires large memory and enormous amounts of computation time on large training sets. According to SVM algorithms, the SVM classification decision function is fully determined by support vectors, which compose a subset of the training sets. In this regard, a solution to optimize SVM learning is to efficiently reduce training sets. In this paper, a data reduction method based on agglomerative hierarchical clustering is proposed to obtain smaller training sets for SVM learning. Using a multiple angle remote sensing dataset of a semi-arid region, the effectiveness of the proposed method is evaluated by classification experiments with a series of reduced training sets. The experiments show that there is no loss of SVM accuracy when the original training set is reduced to 34% using the proposed approach. Maximum likelihood classification (MLC) also is applied on the reduced training sets. The results show that MLC can also maintain the classification accuracy. This implies that the most informative data instances can be retained by this approach.
A hybrid approach to select features and classify diseases based on medical data
NASA Astrophysics Data System (ADS)
AbdelLatif, Hisham; Luo, Jiawei
2018-03-01
Feature selection is popular problem in the classification of diseases in clinical medicine. Here, we developing a hybrid methodology to classify diseases, based on three medical datasets, Arrhythmia, Breast cancer, and Hepatitis datasets. This methodology called k-means ANOVA Support Vector Machine (K-ANOVA-SVM) uses K-means cluster with ANOVA statistical to preprocessing data and selection the significant features, and Support Vector Machines in the classification process. To compare and evaluate the performance, we choice three classification algorithms, decision tree Naïve Bayes, Support Vector Machines and applied the medical datasets direct to these algorithms. Our methodology was a much better classification accuracy is given of 98% in Arrhythmia datasets, 92% in Breast cancer datasets and 88% in Hepatitis datasets, Compare to use the medical data directly with decision tree Naïve Bayes, and Support Vector Machines. Also, the ROC curve and precision with (K-ANOVA-SVM) Achieved best results than other algorithms
Support Vector Machine-Based Endmember Extraction
DOE Office of Scientific and Technical Information (OSTI.GOV)
Filippi, Anthony M; Archibald, Richard K
Introduced in this paper is the utilization of Support Vector Machines (SVMs) to automatically perform endmember extraction from hyperspectral data. The strengths of SVM are exploited to provide a fast and accurate calculated representation of high-dimensional data sets that may consist of multiple distributions. Once this representation is computed, the number of distributions can be determined without prior knowledge. For each distribution, an optimal transform can be determined that preserves informational content while reducing the data dimensionality, and hence, the computational cost. Finally, endmember extraction for the whole data set is accomplished. Results indicate that this Support Vector Machine-Based Endmembermore » Extraction (SVM-BEE) algorithm has the capability of autonomously determining endmembers from multiple clusters with computational speed and accuracy, while maintaining a robust tolerance to noise.« less
Gidwani, Kamlesh; Picado, Albert; Rijal, Suman; Singh, Shri Prakash; Roy, Lalita; Volfova, Vera; Andersen, Elisabeth Wreford; Uranw, Surendra; Ostyn, Bart; Sudarshan, Medhavi; Chakravarty, Jaya; Volf, Petr; Sundar, Shyam; Boelaert, Marleen; Rogers, Matthew Edward
2011-01-01
Background Visceral leishmaniasis is the world' second largest vector-borne parasitic killer and a neglected tropical disease, prevalent in poor communities. Long-lasting insecticidal nets (LNs) are a low cost proven vector intervention method for malaria control; however, their effectiveness against visceral leishmaniasis (VL) is unknown. This study quantified the effect of LNs on exposure to the sand fly vector of VL in India and Nepal during a two year community intervention trial. Methods As part of a paired-cluster randomized controlled clinical trial in VL-endemic regions of India and Nepal we tested the effect of LNs on sand fly biting by measuring the antibody response of subjects to the saliva of Leishmania donovani vector Phlebotomus argentipes and the sympatric (non-vector) Phlebotomus papatasi. Fifteen to 20 individuals above 15 years of age from 26 VL endemic clusters were asked to provide a blood sample at baseline, 12 and 24 months post-intervention. Results A total of 305 individuals were included in the study, 68 participants provided two blood samples and 237 gave three samples. A random effect linear regression model showed that cluster-wide distribution of LNs reduced exposure to P. argentipes by 12% at 12 months (effect 0.88; 95% CI 0.83–0.94) and 9% at 24 months (effect 0.91; 95% CI 0.80–1.02) in the intervention group compared to control adjusting for baseline values and pair. Similar results were obtained for P. papatasi. Conclusions This trial provides evidence that LNs have a limited effect on sand fly exposure in VL endemic communities in India and Nepal and supports the use of sand fly saliva antibodies as a marker to evaluate vector control interventions. PMID:21931871
Gidwani, Kamlesh; Picado, Albert; Rijal, Suman; Singh, Shri Prakash; Roy, Lalita; Volfova, Vera; Andersen, Elisabeth Wreford; Uranw, Surendra; Ostyn, Bart; Sudarshan, Medhavi; Chakravarty, Jaya; Volf, Petr; Sundar, Shyam; Boelaert, Marleen; Rogers, Matthew Edward
2011-09-01
Visceral leishmaniasis is the world' second largest vector-borne parasitic killer and a neglected tropical disease, prevalent in poor communities. Long-lasting insecticidal nets (LNs) are a low cost proven vector intervention method for malaria control; however, their effectiveness against visceral leishmaniasis (VL) is unknown. This study quantified the effect of LNs on exposure to the sand fly vector of VL in India and Nepal during a two year community intervention trial. As part of a paired-cluster randomized controlled clinical trial in VL-endemic regions of India and Nepal we tested the effect of LNs on sand fly biting by measuring the antibody response of subjects to the saliva of Leishmania donovani vector Phlebotomus argentipes and the sympatric (non-vector) Phlebotomus papatasi. Fifteen to 20 individuals above 15 years of age from 26 VL endemic clusters were asked to provide a blood sample at baseline, 12 and 24 months post-intervention. A total of 305 individuals were included in the study, 68 participants provided two blood samples and 237 gave three samples. A random effect linear regression model showed that cluster-wide distribution of LNs reduced exposure to P. argentipes by 12% at 12 months (effect 0.88; 95% CI 0.83-0.94) and 9% at 24 months (effect 0.91; 95% CI 0.80-1.02) in the intervention group compared to control adjusting for baseline values and pair. Similar results were obtained for P. papatasi. This trial provides evidence that LNs have a limited effect on sand fly exposure in VL endemic communities in India and Nepal and supports the use of sand fly saliva antibodies as a marker to evaluate vector control interventions.
Microsatellites Reveal a High Population Structure in Triatoma infestans from Chuquisaca, Bolivia
Pizarro, Juan Carlos; Gilligan, Lauren M.; Stevens, Lori
2008-01-01
Background For Chagas disease, the most serious infectious disease in the Americas, effective disease control depends on elimination of vectors through spraying with insecticides. Molecular genetic research can help vector control programs by identifying and characterizing vector populations and then developing effective intervention strategies. Methods and Findings The population genetic structure of Triatoma infestans (Hemiptera: Reduviidae), the main vector of Chagas disease in Bolivia, was investigated using a hierarchical sampling strategy. A total of 230 adults and nymphs from 23 localities throughout the department of Chuquisaca in Southern Bolivia were analyzed at ten microsatellite loci. Population structure, estimated using analysis of molecular variance (AMOVA) to estimate FST (infinite alleles model) and RST (stepwise mutation model), was significant between western and eastern regions within Chuquisaca and between insects collected in domestic and peri-domestic habitats. Genetic differentiation at three different hierarchical geographic levels was significant, even in the case of adjacent households within a single locality (R ST = 0.14, F ST = 0.07). On the largest geographic scale, among five communities up to 100 km apart, R ST = 0.12 and F ST = 0.06. Cluster analysis combined with assignment tests identified five clusters within the five communities. Conclusions Some houses are colonized by insects from several genetic clusters after spraying, whereas other households are colonized predominately by insects from a single cluster. Significant population structure, measured by both R ST and F ST, supports the hypothesis of poor dispersal ability and/or reduced migration of T. infestans. The high degree of genetic structure at small geographic scales, inferences from cluster analysis and assignment tests, and demographic data suggest reinfesting vectors are coming from nearby and from recrudescence (hatching of eggs that were laid before insecticide spraying). Suggestions for using these results in vector control strategies are made. PMID:18365033
fast_protein_cluster: parallel and optimized clustering of large-scale protein modeling data.
Hung, Ling-Hong; Samudrala, Ram
2014-06-15
fast_protein_cluster is a fast, parallel and memory efficient package used to cluster 60 000 sets of protein models (with up to 550 000 models per set) generated by the Nutritious Rice for the World project. fast_protein_cluster is an optimized and extensible toolkit that supports Root Mean Square Deviation after optimal superposition (RMSD) and Template Modeling score (TM-score) as metrics. RMSD calculations using a laptop CPU are 60× faster than qcprot and 3× faster than current graphics processing unit (GPU) implementations. New GPU code further increases the speed of RMSD and TM-score calculations. fast_protein_cluster provides novel k-means and hierarchical clustering methods that are up to 250× and 2000× faster, respectively, than Clusco, and identify significantly more accurate models than Spicker and Clusco. fast_protein_cluster is written in C++ using OpenMP for multi-threading support. Custom streaming Single Instruction Multiple Data (SIMD) extensions and advanced vector extension intrinsics code accelerate CPU calculations, and OpenCL kernels support AMD and Nvidia GPUs. fast_protein_cluster is available under the M.I.T. license. (http://software.compbio.washington.edu/fast_protein_cluster) © The Author 2014. Published by Oxford University Press.
Jung, Inuk; Jo, Kyuri; Kang, Hyejin; Ahn, Hongryul; Yu, Youngjae; Kim, Sun
2017-12-01
Identifying biologically meaningful gene expression patterns from time series gene expression data is important to understand the underlying biological mechanisms. To identify significantly perturbed gene sets between different phenotypes, analysis of time series transcriptome data requires consideration of time and sample dimensions. Thus, the analysis of such time series data seeks to search gene sets that exhibit similar or different expression patterns between two or more sample conditions, constituting the three-dimensional data, i.e. gene-time-condition. Computational complexity for analyzing such data is very high, compared to the already difficult NP-hard two dimensional biclustering algorithms. Because of this challenge, traditional time series clustering algorithms are designed to capture co-expressed genes with similar expression pattern in two sample conditions. We present a triclustering algorithm, TimesVector, specifically designed for clustering three-dimensional time series data to capture distinctively similar or different gene expression patterns between two or more sample conditions. TimesVector identifies clusters with distinctive expression patterns in three steps: (i) dimension reduction and clustering of time-condition concatenated vectors, (ii) post-processing clusters for detecting similar and distinct expression patterns and (iii) rescuing genes from unclassified clusters. Using four sets of time series gene expression data, generated by both microarray and high throughput sequencing platforms, we demonstrated that TimesVector successfully detected biologically meaningful clusters of high quality. TimesVector improved the clustering quality compared to existing triclustering tools and only TimesVector detected clusters with differential expression patterns across conditions successfully. The TimesVector software is available at http://biohealth.snu.ac.kr/software/TimesVector/. sunkim.bioinfo@snu.ac.kr. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
NASA Astrophysics Data System (ADS)
Liu, Jianjun; Kan, Jianquan
2018-04-01
In this paper, based on the terahertz spectrum, a new identification method of genetically modified material by support vector machine (SVM) based on affinity propagation clustering is proposed. This algorithm mainly uses affinity propagation clustering algorithm to make cluster analysis and labeling on unlabeled training samples, and in the iterative process, the existing SVM training data are continuously updated, when establishing the identification model, it does not need to manually label the training samples, thus, the error caused by the human labeled samples is reduced, and the identification accuracy of the model is greatly improved.
Support vector machine multiuser receiver for DS-CDMA signals in multipath channels.
Chen, S; Samingan, A K; Hanzo, L
2001-01-01
The problem of constructing an adaptive multiuser detector (MUD) is considered for direct sequence code division multiple access (DS-CDMA) signals transmitted through multipath channels. The emerging learning technique, called support vector machines (SVM), is proposed as a method of obtaining a nonlinear MUD from a relatively small training data block. Computer simulation is used to study this SVM MUD, and the results show that it can closely match the performance of the optimal Bayesian one-shot detector. Comparisons with an adaptive radial basis function (RBF) MUD trained by an unsupervised clustering algorithm are discussed.
de Melo, Diogo Portella Ornelas; Scherrer, Luciano Rios; Eiras, Álvaro Eduardo
2012-01-01
The use of vector surveillance tools for preventing dengue disease requires fine assessment of risk, in order to improve vector control activities. Nevertheless, the thresholds between vector detection and dengue fever occurrence are currently not well established. In Belo Horizonte (Minas Gerais, Brazil), dengue has been endemic for several years. From January 2007 to June 2008, the dengue vector Aedes (Stegomyia) aegypti was monitored by ovitrap, the sticky-trap MosquiTRAP™ and larval surveys in an study area in Belo Horizonte. Using a space-time scan for clusters detection implemented in SaTScan software, the vector presence recorded by the different monitoring methods was evaluated. Clusters of vectors and dengue fever were detected. It was verified that ovitrap and MosquiTRAP vector detection methods predicted dengue occurrence better than larval survey, both spatially and temporally. MosquiTRAP and ovitrap presented similar results of space-time intersections to dengue fever clusters. Nevertheless ovitrap clusters presented longer duration periods than MosquiTRAP ones, less acuratelly signalizing the dengue risk areas, since the detection of vector clusters during most of the study period was not necessarily correlated to dengue fever occurrence. It was verified that ovitrap clusters occurred more than 200 days (values ranged from 97.0±35.35 to 283.0±168.4 days) before dengue fever clusters, whereas MosquiTRAP clusters preceded dengue fever clusters by approximately 80 days (values ranged from 65.5±58.7 to 94.0±14. 3 days), the former showing to be more temporally precise. Thus, in the present cluster analysis study MosquiTRAP presented superior results for signaling dengue transmission risks both geographically and temporally. Since early detection is crucial for planning and deploying effective preventions, MosquiTRAP showed to be a reliable tool and this method provides groundwork for the development of even more precise tools. PMID:22848729
NASA Astrophysics Data System (ADS)
Pasquato, Mario; Chung, Chul
2016-05-01
Context. Machine-learning (ML) solves problems by learning patterns from data with limited or no human guidance. In astronomy, ML is mainly applied to large observational datasets, e.g. for morphological galaxy classification. Aims: We apply ML to gravitational N-body simulations of star clusters that are either formed by merging two progenitors or evolved in isolation, planning to later identify globular clusters (GCs) that may have a history of merging from observational data. Methods: We create mock-observations from simulated GCs, from which we measure a set of parameters (also called features in the machine-learning field). After carrying out dimensionality reduction on the feature space, the resulting datapoints are fed in to various classification algorithms. Using repeated random subsampling validation, we check whether the groups identified by the algorithms correspond to the underlying physical distinction between mergers and monolithically evolved simulations. Results: The three algorithms we considered (C5.0 trees, k-nearest neighbour, and support-vector machines) all achieve a test misclassification rate of about 10% without parameter tuning, with support-vector machines slightly outperforming the others. The first principal component of feature space correlates with cluster concentration. If we exclude it from the regression, the performance of the algorithms is only slightly reduced.
2013-01-01
Background Anopheles sinensis is a principal vector for Plasmodium vivax malaria in most parts of China. Understanding of genetic structure and genetic differentiation of the mosquito should contribute to the vector control and malaria elimination in China. Methods The present study investigated the genetic structure of An. sinensis populations using a 729 bp fragment of mtDNA ND5 among 10 populations collected from seven provinces in China. Results ND5 was polymorphic by single mutations within three groups of An. sinensis that were collected from 10 different geographic populations in China. Out of 140 specimens collected from 10 representative sites, 84 haplotypes and 71 variable positions were determined. The overall level of genetic differentiation of An. sinensis varied from low to moderate across China and with a FST range of 0.00065 – 0.341. Genealogy analysis clustered the populations of An. sinensis into three main clusters. Each cluster shared one main haplotype. Pairwise variations within populations were higher (68.68%) than among populations (31.32%) and with high fixation index (FST = 0.313). The results of the present study support population growth and expansion in the An. sinensis populations from China. Three clusters of An. sinensis populations were detected in this study with each displaying different proportion patterns over seven Chinese provinces. No correlation between genetic and geographic distance was detected in overall populations of An. sinensis (R2 = 0.058; P = 0.301). Conclusions The results indicate that the ND5 gene of mtDNA is highly polymorphic in An. sinensis and has moderate genetic variability in the populations of this mosquito in China. Demographic and spatial results support evidence of expansion in An. sinensis populations. PMID:24192424
NASA Astrophysics Data System (ADS)
Taha, Zahari; Muazu Musa, Rabiu; Majeed, Anwar P. P. Abdul; Razali Abdullah, Mohamad; Amirul Abdullah, Muhammad; Hasnun Arif Hassan, Mohd; Khalil, Zubair
2018-04-01
The present study employs a machine learning algorithm namely support vector machine (SVM) to classify high and low potential archers from a collection of bio-physiological variables trained on different SVMs. 50 youth archers with the average age and standard deviation of (17.0 ±.056) gathered from various archery programmes completed a one end shooting score test. The bio-physiological variables namely resting heart rate, resting respiratory rate, resting diastolic blood pressure, resting systolic blood pressure, as well as calories intake, were measured prior to their shooting tests. k-means cluster analysis was applied to cluster the archers based on their scores on variables assessed. SVM models i.e. linear, quadratic and cubic kernel functions, were trained on the aforementioned variables. The k-means clustered the archers into high (HPA) and low potential archers (LPA), respectively. It was demonstrated that the linear SVM exhibited good accuracy with a classification accuracy of 94% in comparison the other tested models. The findings of this investigation can be valuable to coaches and sports managers to recognise high potential athletes from the selected bio-physiological variables examined.
Optimization of Support Vector Machine (SVM) for Object Classification
NASA Technical Reports Server (NTRS)
Scholten, Matthew; Dhingra, Neil; Lu, Thomas T.; Chao, Tien-Hsin
2012-01-01
The Support Vector Machine (SVM) is a powerful algorithm, useful in classifying data into species. The SVMs implemented in this research were used as classifiers for the final stage in a Multistage Automatic Target Recognition (ATR) system. A single kernel SVM known as SVMlight, and a modified version known as a SVM with K-Means Clustering were used. These SVM algorithms were tested as classifiers under varying conditions. Image noise levels varied, and the orientation of the targets changed. The classifiers were then optimized to demonstrate their maximum potential as classifiers. Results demonstrate the reliability of SVM as a method for classification. From trial to trial, SVM produces consistent results.
Prediction of hourly PM2.5 using a space-time support vector regression model
NASA Astrophysics Data System (ADS)
Yang, Wentao; Deng, Min; Xu, Feng; Wang, Hang
2018-05-01
Real-time air quality prediction has been an active field of research in atmospheric environmental science. The existing methods of machine learning are widely used to predict pollutant concentrations because of their enhanced ability to handle complex non-linear relationships. However, because pollutant concentration data, as typical geospatial data, also exhibit spatial heterogeneity and spatial dependence, they may violate the assumptions of independent and identically distributed random variables in most of the machine learning methods. As a result, a space-time support vector regression model is proposed to predict hourly PM2.5 concentrations. First, to address spatial heterogeneity, spatial clustering is executed to divide the study area into several homogeneous or quasi-homogeneous subareas. To handle spatial dependence, a Gauss vector weight function is then developed to determine spatial autocorrelation variables as part of the input features. Finally, a local support vector regression model with spatial autocorrelation variables is established for each subarea. Experimental data on PM2.5 concentrations in Beijing are used to verify whether the results of the proposed model are superior to those of other methods.
Preclinical Evaluation of An Anti-HCV miRNA Cluster for Treatment of HCV Infection
Yang, Xiao; Marcucci, Katherine; Anguela, Xavier; Couto, Linda B.
2013-01-01
We developed a strategy to treat hepatitis C virus (HCV) infection by replacing five endogenous microRNA (miRNA) sequences of a natural miRNA cluster (miR-17–92) with sequences that are complementary to the HCV genome. This miRNA cluster (HCV-miR-Cluster 5) is delivered to cells using adeno-associated virus (AAV) vectors and the miRNAs are expressed in the liver, the site of HCV replication and assembly. AAV-HCV-miR-Cluster 5 inhibited bona fide HCV replication in vitro by up to 95% within 2 days, and the spread of HCV to uninfected cells was prevented by continuous expression of the anti-HCV miRNAs. Furthermore, the number of cells harboring HCV RNA replicons decreased dramatically by sustained expression of the anti-HCV miRNAs, suggesting that the vector is capable of curing cells of HCV. Delivery of AAV-HCV-miR-Cluster 5 to mice resulted in efficient transfer of the miRNA gene cluster and expression of all five miRNAs in liver tissue, at levels up to 1,300 copies/cell. These levels achieved up to 98% gene silencing of cognate HCV sequences, and no liver toxicity was observed, supporting the safety of this approach. Therefore, AAV-HCV-miR-Cluster 5 represents a different paradigm for the treatment of HCV infection. PMID:23295950
Modified Gravity and its test on galaxy clusters
NASA Astrophysics Data System (ADS)
Nieuwenhuizen, Theodorus M.; Morandi, Andrea; Limousin, Marceau
2018-05-01
The MOdified Gravity (MOG) theory of J. Moffat assumes a massive vector particle which causes a repulsive contribution to the tensor gravitation. For the galaxy cluster A1689 new data for the X-ray gas and the strong lensing properties are presented. Fits to MOG are possible by adjusting the galaxy density profile. However, this appears to work as an effective dark matter component, posing a serious problem for MOG. New gas and strong lensing data for the cluster A1835 support these conclusions and point at a tendency of the gas alone to overestimate the lensing effects in MOG theory.
Monitoring by Use of Clusters of Sensor-Data Vectors
NASA Technical Reports Server (NTRS)
Iverson, David L.
2007-01-01
The inductive monitoring system (IMS) is a system of computer hardware and software for automated monitoring of the performance, operational condition, physical integrity, and other aspects of the health of a complex engineering system (e.g., an industrial process line or a spacecraft). The input to the IMS consists of streams of digitized readings from sensors in the monitored system. The IMS determines the type and amount of any deviation of the monitored system from a nominal or normal ( healthy ) condition on the basis of a comparison between (1) vectors constructed from the incoming sensor data and (2) corresponding vectors in a database of nominal or normal behavior. The term inductive reflects the use of a process reminiscent of traditional mathematical induction to learn about normal operation and build the nominal-condition database. The IMS offers two major advantages over prior computational monitoring systems: The computational burden of the IMS is significantly smaller, and there is no need for abnormal-condition sensor data for training the IMS to recognize abnormal conditions. The figure schematically depicts the relationships among the computational processes effected by the IMS. Training sensor data are gathered during normal operation of the monitored system, detailed computational simulation of operation of the monitored system, or both. The training data are formed into vectors that are used to generate the database. The vectors in the database are clustered into regions that represent normal or nominal operation. Once the database has been generated, the IMS compares the vectors of incoming sensor data with vectors representative of the clusters. The monitored system is deemed to be operating normally or abnormally, depending on whether the vector of incoming sensor data is or is not, respectively, sufficiently close to one of the clusters. For this purpose, a distance between two vectors is calculated by a suitable metric (e.g., Euclidean distance) and "sufficiently close" signifies lying at a distance less than a specified threshold value. It must be emphasized that although the IMS is intended to detect off-nominal or abnormal performance or health, it is not necessarily capable of performing a thorough or detailed diagnosis. Limited diagnostic information may be available under some circumstances. For example, the distance of a vector of incoming sensor data from the nearest cluster could serve as an indication of the severity of a malfunction. The identity of the nearest cluster may be a clue as to the identity of the malfunctioning component or subsystem. It is possible to decrease the IMS computation time by use of a combination of cluster-indexing and -retrieval methods. For example, in one method, the distances between each cluster and two or more reference vectors can be used for the purpose of indexing and retrieval. The clusters are sorted into a list according to these distance values, typically in ascending order of distance. When a set of input data arrives and is to be tested, the data are first arranged as an ordered set (that is, a vector). The distances from the input vector to the reference points are computed. The search of clusters from the list can then be limited to those clusters lying within a certain distance range from the input vector; the computation time is reduced by not searching the clusters at a greater distance.
An Intelligent Decision Support System for Leukaemia Diagnosis using Microscopic Blood Images.
Chin Neoh, Siew; Srisukkham, Worawut; Zhang, Li; Todryk, Stephen; Greystoke, Brigit; Peng Lim, Chee; Alamgir Hossain, Mohammed; Aslam, Nauman
2015-10-09
This research proposes an intelligent decision support system for acute lymphoblastic leukaemia diagnosis from microscopic blood images. A novel clustering algorithm with stimulating discriminant measures (SDM) of both within- and between-cluster scatter variances is proposed to produce robust segmentation of nucleus and cytoplasm of lymphocytes/lymphoblasts. Specifically, the proposed between-cluster evaluation is formulated based on the trade-off of several between-cluster measures of well-known feature extraction methods. The SDM measures are used in conjuction with Genetic Algorithm for clustering nucleus, cytoplasm, and background regions. Subsequently, a total of eighty features consisting of shape, texture, and colour information of the nucleus and cytoplasm sub-images are extracted. A number of classifiers (multi-layer perceptron, Support Vector Machine (SVM) and Dempster-Shafer ensemble) are employed for lymphocyte/lymphoblast classification. Evaluated with the ALL-IDB2 database, the proposed SDM-based clustering overcomes the shortcomings of Fuzzy C-means which focuses purely on within-cluster scatter variance. It also outperforms Linear Discriminant Analysis and Fuzzy Compactness and Separation for nucleus-cytoplasm separation. The overall system achieves superior recognition rates of 96.72% and 96.67% accuracies using bootstrapping and 10-fold cross validation with Dempster-Shafer and SVM, respectively. The results also compare favourably with those reported in the literature, indicating the usefulness of the proposed SDM-based clustering method.
An Intelligent Decision Support System for Leukaemia Diagnosis using Microscopic Blood Images
Chin Neoh, Siew; Srisukkham, Worawut; Zhang, Li; Todryk, Stephen; Greystoke, Brigit; Peng Lim, Chee; Alamgir Hossain, Mohammed; Aslam, Nauman
2015-01-01
This research proposes an intelligent decision support system for acute lymphoblastic leukaemia diagnosis from microscopic blood images. A novel clustering algorithm with stimulating discriminant measures (SDM) of both within- and between-cluster scatter variances is proposed to produce robust segmentation of nucleus and cytoplasm of lymphocytes/lymphoblasts. Specifically, the proposed between-cluster evaluation is formulated based on the trade-off of several between-cluster measures of well-known feature extraction methods. The SDM measures are used in conjuction with Genetic Algorithm for clustering nucleus, cytoplasm, and background regions. Subsequently, a total of eighty features consisting of shape, texture, and colour information of the nucleus and cytoplasm sub-images are extracted. A number of classifiers (multi-layer perceptron, Support Vector Machine (SVM) and Dempster-Shafer ensemble) are employed for lymphocyte/lymphoblast classification. Evaluated with the ALL-IDB2 database, the proposed SDM-based clustering overcomes the shortcomings of Fuzzy C-means which focuses purely on within-cluster scatter variance. It also outperforms Linear Discriminant Analysis and Fuzzy Compactness and Separation for nucleus-cytoplasm separation. The overall system achieves superior recognition rates of 96.72% and 96.67% accuracies using bootstrapping and 10-fold cross validation with Dempster-Shafer and SVM, respectively. The results also compare favourably with those reported in the literature, indicating the usefulness of the proposed SDM-based clustering method. PMID:26450665
Harischandra, Iresha Nilmini; Dassanayake, Ranil Samantha; De Silva, Bambaranda Gammacharige Don Nissanka Kolitha
2016-01-04
The disease re-emergence threat from the major malaria vector in Sri Lanka, Anopheles culicifacies, is currently increasing. To predict malaria vector dynamics, knowledge of population genetics and gene flow is required, but this information is unavailable for Sri Lanka. This study was carried out to determine the population structure of An. culicifacies E in Sri Lanka. Eight microsatellite markers were used to examine An. culicifacies E collected from six sites in Sri Lanka during 2010-2012. Standard population genetic tests and analyses, genetic differentiation, Hardy-Weinberg equilibrium, linkage disequilibrium, Bayesian cluster analysis, AMOVA, SAMOVA and isolation-by-distance were conducted using five polymorphic loci. Five microsatellite loci were highly polymorphic with high allelic richness. Hardy-Weinberg Equilibrium (HWE) was significantly rejected for four loci with positive F(IS) values in the pooled population (p < 0.0100). Three loci showed high deviations in all sites except Kataragama, which was in agreement with HWE for all loci except one locus (p < 0.0016). Observed heterozygosity was less than the expected values for all sites except Kataragama, where reported negative F(IS) values indicated a heterozygosity excess. Genetic differentiation was observed for all sampling site pairs and was not supported by the isolation by distance model. Bayesian clustering analysis identified the presence of three sympatric clusters (gene pools) in the studied population. Significant genetic differentiation was detected in cluster pairs with low gene flow and isolation by distance was not detected between clusters. Furthermore, the results suggested the presence of a barrier to gene flow that divided the populations into two parts with the central hill region of Sri Lanka as the dividing line. Three sympatric clusters were detected among An. culicifacies E specimens isolated in Sri Lanka. There was no effect of geographic distance on genetic differentiation and the central mountain ranges in Sri Lanka appeared to be a barrier to gene flow.
Gimbal-Angle Vectors of the Nonredundant CMG Cluster
NASA Astrophysics Data System (ADS)
Lee, Donghun; Bang, Hyochoong
2018-05-01
This paper deals with the method using the preferred gimbal angles of a control moment gyro (CMG) cluster for controlling spacecraft attitude. To apply the method to the nonredundant CMG cluster, analytical gimbal-angle solutions for the zero angular momentum state are derived, and the gimbal-angle vectors for the nonzero angular momentum states are studied by a numerical method. It will be shown that the number of the gimbal-angle vectors is determined from the given skew angle and the angular momentum state of the CMG cluster. Through numerical examples, it is shown that the method using the preferred gimbal-angle is an efficient approach to avoid internal singularities for the nonredundant CMG cluster.
Clustering, climate and dengue transmission.
Junxiong, Pang; Yee-Sin, Leo
2015-06-01
Dengue is currently the most rapidly spreading vector-borne disease, with an increasing burden over recent decades. Currently, neither a licensed vaccine nor an effective anti-viral therapy is available, and treatment largely remains supportive. Current vector control strategies to prevent and reduce dengue transmission are neither efficient nor sustainable as long-term interventions. Increased globalization and climate change have been reported to influence dengue transmission. In this article, we reviewed the non-climatic and climatic risk factors which facilitate dengue transmission. Sustainable and effective interventions to reduce the increasing threat from dengue would require the integration of these risk factors into current and future prevention strategies, including dengue vaccination, as well as the continuous support and commitment from the political and environmental stakeholders.
Heterogeneous Tensor Decomposition for Clustering via Manifold Optimization.
Sun, Yanfeng; Gao, Junbin; Hong, Xia; Mishra, Bamdev; Yin, Baocai
2016-03-01
Tensor clustering is an important tool that exploits intrinsically rich structures in real-world multiarray or Tensor datasets. Often in dealing with those datasets, standard practice is to use subspace clustering that is based on vectorizing multiarray data. However, vectorization of tensorial data does not exploit complete structure information. In this paper, we propose a subspace clustering algorithm without adopting any vectorization process. Our approach is based on a novel heterogeneous Tucker decomposition model taking into account cluster membership information. We propose a new clustering algorithm that alternates between different modes of the proposed heterogeneous tensor model. All but the last mode have closed-form updates. Updating the last mode reduces to optimizing over the multinomial manifold for which we investigate second order Riemannian geometry and propose a trust-region algorithm. Numerical experiments show that our proposed algorithm compete effectively with state-of-the-art clustering algorithms that are based on tensor factorization.
Topic detection using paragraph vectors to support active learning in systematic reviews.
Hashimoto, Kazuma; Kontonatsios, Georgios; Miwa, Makoto; Ananiadou, Sophia
2016-08-01
Systematic reviews require expert reviewers to manually screen thousands of citations in order to identify all relevant articles to the review. Active learning text classification is a supervised machine learning approach that has been shown to significantly reduce the manual annotation workload by semi-automating the citation screening process of systematic reviews. In this paper, we present a new topic detection method that induces an informative representation of studies, to improve the performance of the underlying active learner. Our proposed topic detection method uses a neural network-based vector space model to capture semantic similarities between documents. We firstly represent documents within the vector space, and cluster the documents into a predefined number of clusters. The centroids of the clusters are treated as latent topics. We then represent each document as a mixture of latent topics. For evaluation purposes, we employ the active learning strategy using both our novel topic detection method and a baseline topic model (i.e., Latent Dirichlet Allocation). Results obtained demonstrate that our method is able to achieve a high sensitivity of eligible studies and a significantly reduced manual annotation cost when compared to the baseline method. This observation is consistent across two clinical and three public health reviews. The tool introduced in this work is available from https://nactem.ac.uk/pvtopic/. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Kumar, Surendra; Ghosh, Subhojit; Tetarway, Suhash; Sinha, Rakesh Kumar
2015-07-01
In this study, the magnitude and spatial distribution of frequency spectrum in the resting electroencephalogram (EEG) were examined to address the problem of detecting alcoholism in the cerebral motor cortex. The EEG signals were recorded from chronic alcoholic conditions (n = 20) and the control group (n = 20). Data were taken from motor cortex region and divided into five sub-bands (delta, theta, alpha, beta-1 and beta-2). Three methodologies were adopted for feature extraction: (1) absolute power, (2) relative power and (3) peak power frequency. The dimension of the extracted features is reduced by linear discrimination analysis and classified by support vector machine (SVM) and fuzzy C-mean clustering. The maximum classification accuracy (88 %) with SVM clustering was achieved with the EEG spectral features with absolute power frequency on F4 channel. Among the bands, relatively higher classification accuracy was found over theta band and beta-2 band in most of the channels when computed with the EEG features of relative power. Electrodes wise CZ, C3 and P4 were having more alteration. Considering the good classification accuracy obtained by SVM with relative band power features in most of the EEG channels of motor cortex, it can be suggested that the noninvasive automated online diagnostic system for the chronic alcoholic condition can be developed with the help of EEG signals.
Che-Mendoza, Azael; Guillermo-May, Guillermo; Herrera-Bojórquez, Josué; Barrera-Pérez, Mario; Dzul-Manzanilla, Felipe; Gutierrez-Castro, Cipriano; Arredondo-Jiménez, Juan I.; Sánchez-Tejeda, Gustavo; Vazquez-Prokopec, Gonzalo; Ranson, Hilary; Lenhart, Audrey; Sommerfeld, Johannes; McCall, Philip J.; Kroeger, Axel; Manrique-Saide, Pablo
2015-01-01
Background Long-lasting insecticidal net screens (LLIS) fitted to domestic windows and doors in combination with targeted treatment (TT) of the most productive Aedes aegypti breeding sites were evaluated for their impact on dengue vector indices in a cluster-randomised trial in Mexico between 2011 and 2013. Methods Sequentially over 2 years, LLIS and TT were deployed in 10 treatment clusters (100 houses/cluster) and followed up over 24 months. Cross-sectional surveys quantified infestations of adult mosquitoes, immature stages at baseline (pre-intervention) and in four post-intervention samples at 6-monthly intervals. Identical surveys were carried out in 10 control clusters that received no treatment. Results LLIS clusters had significantly lower infestations compared to control clusters at 5 and 12 months after installation, as measured by adult (male and female) and pupal-based vector indices. After addition of TT to the intervention houses in intervention clusters, indices remained significantly lower in the treated clusters until 18 (immature and adult stage indices) and 24 months (adult indices only) post-intervention. Conclusions These safe, simple affordable vector control tools were well-accepted by study participants and are potentially suitable in many regions at risk from dengue worldwide. PMID:25604761
Caprara, Andrea; De Oliveira Lima, José Wellington; Rocha Peixoto, Ana Carolina; Vasconcelos Motta, Cyntia Monteiro; Soares Nobre, Joana Mary; Sommerfeld, Johannes; Kroeger, Axel
2015-01-01
Background This study intended to implement a novel intervention strategy, in Brazil, using an ecohealth approach and analyse its effectiveness and costs in reducing Aedes aegypti vector density as well as its acceptance, feasibility and sustainability. The intervention was conducted from 2012 to 2013 in the municipality of Fortaleza, northeast Brazil. Methodology A cluster randomized controlled trial was designed by comparing ten intervention clusters with ten control clusters where routine vector control activities were conducted. The intervention included: community workshops; community involvement in clean-up campaigns; covering the elevated containers and in-house rubbish disposal without larviciding; mobilization of schoolchildren and senior inhabitants; and distribution of information, education and communication (IEC) materials in the community. Results Differences in terms of social participation, commitment and leadership were present in the clusters. The results showed the effectiveness of the intervention package in comparison with the routine control programme. Differences regarding the costs of the intervention were reasonable and could be adopted by public health services. Conclusions Embedding social participation and environmental management for improved dengue vector control was feasible and significantly reduced vector densities. Such a participatory ecohealth approach offers a promising alternative to routine vector control measures. PMID:25604760
On the Partitioning of Squared Euclidean Distance and Its Applications in Cluster Analysis.
ERIC Educational Resources Information Center
Carter, Randy L.; And Others
1989-01-01
The partitioning of squared Euclidean--E(sup 2)--distance between two vectors in M-dimensional space into the sum of squared lengths of vectors in mutually orthogonal subspaces is discussed. Applications to specific cluster analysis problems are provided (i.e., to design Monte Carlo studies for performance comparisons of several clustering methods…
Software tool for data mining and its applications
NASA Astrophysics Data System (ADS)
Yang, Jie; Ye, Chenzhou; Chen, Nianyi
2002-03-01
A software tool for data mining is introduced, which integrates pattern recognition (PCA, Fisher, clustering, hyperenvelop, regression), artificial intelligence (knowledge representation, decision trees), statistical learning (rough set, support vector machine), computational intelligence (neural network, genetic algorithm, fuzzy systems). It consists of nine function models: pattern recognition, decision trees, association rule, fuzzy rule, neural network, genetic algorithm, Hyper Envelop, support vector machine, visualization. The principle and knowledge representation of some function models of data mining are described. The software tool of data mining is realized by Visual C++ under Windows 2000. Nonmonotony in data mining is dealt with by concept hierarchy and layered mining. The software tool of data mining has satisfactorily applied in the prediction of regularities of the formation of ternary intermetallic compounds in alloy systems, and diagnosis of brain glioma.
Agent-based method for distributed clustering of textual information
Potok, Thomas E [Oak Ridge, TN; Reed, Joel W [Knoxville, TN; Elmore, Mark T [Oak Ridge, TN; Treadwell, Jim N [Louisville, TN
2010-09-28
A computer method and system for storing, retrieving and displaying information has a multiplexing agent (20) that calculates a new document vector (25) for a new document (21) to be added to the system and transmits the new document vector (25) to master cluster agents (22) and cluster agents (23) for evaluation. These agents (22, 23) perform the evaluation and return values upstream to the multiplexing agent (20) based on the similarity of the document to documents stored under their control. The multiplexing agent (20) then sends the document (21) and the document vector (25) to the master cluster agent (22), which then forwards it to a cluster agent (23) or creates a new cluster agent (23) to manage the document (21). The system also searches for stored documents according to a search query having at least one term and identifying the documents found in the search, and displays the documents in a clustering display (80) of similarity so as to indicate similarity of the documents to each other.
NASA Astrophysics Data System (ADS)
Thanos, Konstantinos-Georgios; Thomopoulos, Stelios C. A.
2014-06-01
The study in this paper belongs to a more general research of discovering facial sub-clusters in different ethnicity face databases. These new sub-clusters along with other metadata (such as race, sex, etc.) lead to a vector for each face in the database where each vector component represents the likelihood of participation of a given face to each cluster. This vector is then used as a feature vector in a human identification and tracking system based on face and other biometrics. The first stage in this system involves a clustering method which evaluates and compares the clustering results of five different clustering algorithms (average, complete, single hierarchical algorithm, k-means and DIGNET), and selects the best strategy for each data collection. In this paper we present the comparative performance of clustering results of DIGNET and four clustering algorithms (average, complete, single hierarchical and k-means) on fabricated 2D and 3D samples, and on actual face images from various databases, using four different standard metrics. These metrics are the silhouette figure, the mean silhouette coefficient, the Hubert test Γ coefficient, and the classification accuracy for each clustering result. The results showed that, in general, DIGNET gives more trustworthy results than the other algorithms when the metrics values are above a specific acceptance threshold. However when the evaluation results metrics have values lower than the acceptance threshold but not too low (too low corresponds to ambiguous results or false results), then it is necessary for the clustering results to be verified by the other algorithms.
Vector dissimilarity and clustering.
Lefkovitch, L P
1991-04-01
Based on the description of objects by m attributes, an m-element vector dissimilarity function is defined that, unlike scalar functions, retains the distinction among attributes. This function, which satisfies the conditions for a metric, allows the definition of betweenness, which can then be used for clustering. Applications to the subset-generation phase of conditional clustering and to nearest-neighbor-type algorithms are described.
Marek K. Jakubowksi; Qinghua Guo; Brandon Collins; Scott Stephens; Maggi Kelly
2013-01-01
We compared the ability of several classification and regression algorithms to predict forest stand structure metrics and standard surface fuel models. Our study area spans a dense, topographically complex Sierra Nevada mixed-conifer forest. We used clustering, regression trees, and support vector machine algorithms to analyze high density (average 9 pulses/m
An Algorithm for Converting Static Earth Sensor Measurements into Earth Observation Vectors
NASA Technical Reports Server (NTRS)
Harman, R.; Hashmall, Joseph A.; Sedlak, Joseph
2004-01-01
An algorithm has been developed that converts penetration angles reported by Static Earth Sensors (SESs) into Earth observation vectors. This algorithm allows compensation for variation in the horizon height including that caused by Earth oblateness. It also allows pitch and roll to be computed using any number (greater than 1) of simultaneous sensor penetration angles simplifying processing during periods of Sun and Moon interference. The algorithm computes body frame unit vectors through each SES cluster. It also computes GCI vectors from the spacecraft to the position on the Earth's limb where each cluster detects the Earth's limb. These body frame vectors are used as sensor observation vectors and the GCI vectors are used as reference vectors in an attitude solution. The attitude, with the unobservable yaw discarded, is iteratively refined to provide the Earth observation vector solution.
Che-Mendoza, Azael; Guillermo-May, Guillermo; Herrera-Bojórquez, Josué; Barrera-Pérez, Mario; Dzul-Manzanilla, Felipe; Gutierrez-Castro, Cipriano; Arredondo-Jiménez, Juan I; Sánchez-Tejeda, Gustavo; Vazquez-Prokopec, Gonzalo; Ranson, Hilary; Lenhart, Audrey; Sommerfeld, Johannes; McCall, Philip J; Kroeger, Axel; Manrique-Saide, Pablo
2015-02-01
Long-lasting insecticidal net screens (LLIS) fitted to domestic windows and doors in combination with targeted treatment (TT) of the most productive Aedes aegypti breeding sites were evaluated for their impact on dengue vector indices in a cluster-randomised trial in Mexico between 2011 and 2013. Sequentially over 2 years, LLIS and TT were deployed in 10 treatment clusters (100 houses/cluster) and followed up over 24 months. Cross-sectional surveys quantified infestations of adult mosquitoes, immature stages at baseline (pre-intervention) and in four post-intervention samples at 6-monthly intervals. Identical surveys were carried out in 10 control clusters that received no treatment. LLIS clusters had significantly lower infestations compared to control clusters at 5 and 12 months after installation, as measured by adult (male and female) and pupal-based vector indices. After addition of TT to the intervention houses in intervention clusters, indices remained significantly lower in the treated clusters until 18 (immature and adult stage indices) and 24 months (adult indices only) post-intervention. These safe, simple affordable vector control tools were well-accepted by study participants and are potentially suitable in many regions at risk from dengue worldwide. © The author 2015. The World Health Organization has granted Oxford University Press permission for the reproduction of this article.
T-wave end detection using neural networks and Support Vector Machines.
Suárez-León, Alexander Alexeis; Varon, Carolina; Willems, Rik; Van Huffel, Sabine; Vázquez-Seisdedos, Carlos Román
2018-05-01
In this paper we propose a new approach for detecting the end of the T-wave in the electrocardiogram (ECG) using Neural Networks and Support Vector Machines. Both, Multilayer Perceptron (MLP) neural networks and Fixed-Size Least-Squares Support Vector Machines (FS-LSSVM) were used as regression algorithms to determine the end of the T-wave. Different strategies for selecting the training set such as random selection, k-means, robust clustering and maximum quadratic (Rényi) entropy were evaluated. Individual parameters were tuned for each method during training and the results are given for the evaluation set. A comparison between MLP and FS-LSSVM approaches was performed. Finally, a fair comparison of the FS-LSSVM method with other state-of-the-art algorithms for detecting the end of the T-wave was included. The experimental results show that FS-LSSVM approaches are more suitable as regression algorithms than MLP neural networks. Despite the small training sets used, the FS-LSSVM methods outperformed the state-of-the-art techniques. FS-LSSVM can be successfully used as a T-wave end detection algorithm in ECG even with small training set sizes. Copyright © 2018 Elsevier Ltd. All rights reserved.
NASA Technical Reports Server (NTRS)
Iverson, David L. (Inventor)
2008-01-01
The present invention relates to an Inductive Monitoring System (IMS), its software implementations, hardware embodiments and applications. Training data is received, typically nominal system data acquired from sensors in normally operating systems or from detailed system simulations. The training data is formed into vectors that are used to generate a knowledge database having clusters of nominal operating regions therein. IMS monitors a system's performance or health by comparing cluster parameters in the knowledge database with incoming sensor data from a monitored-system formed into vectors. Nominal performance is concluded when a monitored-system vector is determined to lie within a nominal operating region cluster or lies sufficiently close to a such a cluster as determined by a threshold value and a distance metric. Some embodiments of IMS include cluster indexing and retrieval methods that increase the execution speed of IMS.
A Cluster-Randomized Trial of Insecticide-Treated Curtains for Dengue Vector Control in Thailand
Lenhart, Audrey; Trongtokit, Yuwadee; Alexander, Neal; Apiwathnasorn, Chamnarn; Satimai, Wichai; Vanlerberghe, Veerle; Van der Stuyft, Patrick; McCall, Philip J.
2013-01-01
The efficacy of insecticide-treated window curtains (ITCs) for dengue vector control was evaluated in Thailand in a cluster-randomized controlled trial. A total of 2,037 houses in 26 clusters was randomized to receive the intervention or act as control (no treatment). Entomological surveys measured Aedes infestations (Breteau index, house index, container index, and pupae per person index) and oviposition indices (mean numbers of eggs laid in oviposition traps) immediately before and after intervention, and at 3-month intervals over 12 months. There were no consistent statistically significant differences in entomological indices between intervention and control clusters, although oviposition indices were lower (P < 0.01) in ITC clusters during the wet season. It is possible that the open housing structures in the study reduced the likelihood of mosquitoes making contact with ITCs. ITCs deployed in a region where this house design is common may be unsuitable for dengue vector control. PMID:23166195
Thin-layer chromatographic identification of Chinese propolis using chemometric fingerprinting.
Tang, Tie-xin; Guo, Wei-yan; Xu, Ye; Zhang, Si-ming; Xu, Xin-jun; Wang, Dong-mei; Zhao, Zhi-min; Zhu, Long-ping; Yang, De-po
2014-01-01
Poplar tree gum has a similar chemical composition and appearance to Chinese propolis (bee glue) and has been widely used as a counterfeit propolis because Chinese propolis is typically the poplar-type propolis, the chemical composition of which is determined mainly by the resin of poplar trees. The discrimination of Chinese propolis from poplar tree gum is a challenging task. To develop a rapid thin-layer chromatographic (TLC) identification method using chemometric fingerprinting to discriminate Chinese propolis from poplar tree gum. A new TLC method using a combination of ammonia and hydrogen peroxide vapours as the visualisation reagent was developed to characterise the chemical profile of Chinese propolis. Three separate people performed TLC on eight Chinese propolis samples and three poplar tree gum samples of varying origins. Five chemometric methods, including similarity analysis, hierarchical clustering, k-means clustering, neural network and support vector machine, were compared for use in classifying the samples based on their densitograms obtained from the TLC chromatograms via image analysis. Hierarchical clustering, neural network and support vector machine analyses achieved a correct classification rate of 100% in classifying the samples. A strategy for TLC identification of Chinese propolis using chemometric fingerprinting was proposed and it provided accurate sample classification. The study has shown that the TLC identification method using chemometric fingerprinting is a rapid, low-cost method for the discrimination of Chinese propolis from poplar tree gum and may be used for the quality control of Chinese propolis. Copyright © 2014 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Taha, Zahari; Muazu Musa, Rabiu; Majeed, A. P. P. Abdul; Razali Abdullah, Mohamad; Aizzat Zakaria, Muhammad; Muaz Alim, Muhammad; Arif Mat Jizat, Jessnor; Fauzi Ibrahim, Mohamad
2018-03-01
Support Vector Machine (SVM) has been revealed to be a powerful learning algorithm for classification and prediction. However, the use of SVM for prediction and classification in sport is at its inception. The present study classified and predicted high and low potential archers from a collection of psychological coping skills variables trained on different SVMs. 50 youth archers with the average age and standard deviation of (17.0 ±.056) gathered from various archery programmes completed a one end shooting score test. Psychological coping skills inventory which evaluates the archers level of related coping skills were filled out by the archers prior to their shooting tests. k-means cluster analysis was applied to cluster the archers based on their scores on variables assessed. SVM models, i.e. linear and fine radial basis function (RBF) kernel functions, were trained on the psychological variables. The k-means clustered the archers into high psychologically prepared archers (HPPA) and low psychologically prepared archers (LPPA), respectively. It was demonstrated that the linear SVM exhibited good accuracy and precision throughout the exercise with an accuracy of 92% and considerably fewer error rate for the prediction of the HPPA and the LPPA as compared to the fine RBF SVM. The findings of this investigation can be valuable to coaches and sports managers to recognise high potential athletes from the selected psychological coping skills variables examined which would consequently save time and energy during talent identification and development programme.
Active Learning Using Hint Information.
Li, Chun-Liang; Ferng, Chun-Sung; Lin, Hsuan-Tien
2015-08-01
The abundance of real-world data and limited labeling budget calls for active learning, an important learning paradigm for reducing human labeling efforts. Many recently developed active learning algorithms consider both uncertainty and representativeness when making querying decisions. However, exploiting representativeness with uncertainty concurrently usually requires tackling sophisticated and challenging learning tasks, such as clustering. In this letter, we propose a new active learning framework, called hinted sampling, which takes both uncertainty and representativeness into account in a simpler way. We design a novel active learning algorithm within the hinted sampling framework with an extended support vector machine. Experimental results validate that the novel active learning algorithm can result in a better and more stable performance than that achieved by state-of-the-art algorithms. We also show that the hinted sampling framework allows improving another active learning algorithm designed from the transductive support vector machine.
Machine Learning for Biological Trajectory Classification Applications
NASA Technical Reports Server (NTRS)
Sbalzarini, Ivo F.; Theriot, Julie; Koumoutsakos, Petros
2002-01-01
Machine-learning techniques, including clustering algorithms, support vector machines and hidden Markov models, are applied to the task of classifying trajectories of moving keratocyte cells. The different algorithms axe compared to each other as well as to expert and non-expert test persons, using concepts from signal-detection theory. The algorithms performed very well as compared to humans, suggesting a robust tool for trajectory classification in biological applications.
Euler-Vector Clustering of GPS Velocities Defines Microplate Geometry in Southwest Japan
NASA Astrophysics Data System (ADS)
Savage, J. C.
2018-02-01
I have used Euler-vector clustering to assign 469 GEONET stations in southwest Japan to k clusters (k = 2, 3,..., 9) so that, for any k, the velocities of stations within each cluster are most consistent with rigid-block motion on a sphere. That is, I attempt to explain the raw (i.e., uncorrected for strain accumulation), 1996-2006 velocities of those 469 Global Positioning System stations by rigid motion of k clusters on the surface of a spherical Earth. Because block geometry is maintained as strain accumulates, Euler-vector clustering may better approximate the block geometry than the values of the associated Euler vectors. The microplate solution for each k is constructed by merging contiguous clusters that have closely similar Euler vectors. The best solution consists of three microplates arranged along the Nankaido Trough-Ryukyu Trench between the Amurian and Philippine Sea Plates. One of these microplates, the South Kyushu Microplate (an extension of the Ryukyu forearc into the southeast corner of Kyushu), had previously been identified from paleomagnetic rotations. Relative to ITRF2000 the three microplates rotate at different rates about neighboring poles located close to the northwest corner of Shikoku. The microplate model is identical to that proposed in the block model of Wallace et al. (2009, https://doi.org/10.1130/G2522A.1) except in southernmost Kyushu. On Shikoku and Honshu, but not Kyushu, the microplate model is consistent with that proposed in the block models of Nishimura and Hashimoto (2006, https://doi.org/10.1016/j.tecto.2006.04.017) and Loveless and Meade (2010, https://doi.org/10.1029/2008JB006248) without the low-slip-rate boundaries proposed in the latter.
Multi-Parent Clustering Algorithms from Stochastic Grammar Data Models
NASA Technical Reports Server (NTRS)
Mjoisness, Eric; Castano, Rebecca; Gray, Alexander
1999-01-01
We introduce a statistical data model and an associated optimization-based clustering algorithm which allows data vectors to belong to zero, one or several "parent" clusters. For each data vector the algorithm makes a discrete decision among these alternatives. Thus, a recursive version of this algorithm would place data clusters in a Directed Acyclic Graph rather than a tree. We test the algorithm with synthetic data generated according to the statistical data model. We also illustrate the algorithm using real data from large-scale gene expression assays.
Automated detection of microcalcification clusters in mammograms
NASA Astrophysics Data System (ADS)
Karale, Vikrant A.; Mukhopadhyay, Sudipta; Singh, Tulika; Khandelwal, Niranjan; Sadhu, Anup
2017-03-01
Mammography is the most efficient modality for detection of breast cancer at early stage. Microcalcifications are tiny bright spots in mammograms and can often get missed by the radiologist during diagnosis. The presence of microcalcification clusters in mammograms can act as an early sign of breast cancer. This paper presents a completely automated computer-aided detection (CAD) system for detection of microcalcification clusters in mammograms. Unsharp masking is used as a preprocessing step which enhances the contrast between microcalcifications and the background. The preprocessed image is thresholded and various shape and intensity based features are extracted. Support vector machine (SVM) classifier is used to reduce the false positives while preserving the true microcalcification clusters. The proposed technique is applied on two different databases i.e DDSM and private database. The proposed technique shows good sensitivity with moderate false positives (FPs) per image on both databases.
A fast learning method for large scale and multi-class samples of SVM
NASA Astrophysics Data System (ADS)
Fan, Yu; Guo, Huiming
2017-06-01
A multi-class classification SVM(Support Vector Machine) fast learning method based on binary tree is presented to solve its low learning efficiency when SVM processing large scale multi-class samples. This paper adopts bottom-up method to set up binary tree hierarchy structure, according to achieved hierarchy structure, sub-classifier learns from corresponding samples of each node. During the learning, several class clusters are generated after the first clustering of the training samples. Firstly, central points are extracted from those class clusters which just have one type of samples. For those which have two types of samples, cluster numbers of their positive and negative samples are set respectively according to their mixture degree, secondary clustering undertaken afterwards, after which, central points are extracted from achieved sub-class clusters. By learning from the reduced samples formed by the integration of extracted central points above, sub-classifiers are obtained. Simulation experiment shows that, this fast learning method, which is based on multi-level clustering, can guarantee higher classification accuracy, greatly reduce sample numbers and effectively improve learning efficiency.
Coquillettidia (Culicidae, Diptera) mosquitoes are natural vectors of avian malaria in Africa
2009-01-01
Background The mosquito vectors of Plasmodium spp. have largely been overlooked in studies of ecology and evolution of avian malaria and other vertebrates in wildlife. Methods Plasmodium DNA from wild-caught Coquillettidia spp. collected from lowland forests in Cameroon was isolated and sequenced using nested PCR. Female Coquillettidia aurites were also dissected and salivary glands were isolated and microscopically examined for the presence of sporozoites. Results In total, 33% (85/256) of mosquito pools tested positive for avian Plasmodium spp., harbouring at least eight distinct parasite lineages. Sporozoites of Plasmodium spp. were recorded in salivary glands of C. aurites supporting the PCR data that the parasites complete development in these mosquitoes. Results suggest C. aurites, Coquillettidia pseudoconopas and Coquillettidia metallica as new and important vectors of avian malaria in Africa. All parasite lineages recovered clustered with parasites formerly identified from several bird species and suggest the vectors capability of infecting birds from different families. Conclusion Identifying the major vectors of avian Plasmodium spp. will assist in understanding the epizootiology of avian malaria, including differences in this disease distribution between pristine and disturbed landscapes. PMID:19664282
The magnetic field investigation on Cluster
NASA Technical Reports Server (NTRS)
Balogh, A.; Cowley, S. W. H.; Southwood, D. J.; Musmann, G.; Luhr, H.; Neubauer, F. M.; Glassmeier, K.-H.; Riedler, W.; Heyn, M. F.; Acuna, M. H.
1988-01-01
The magnetic field investigation of the Cluster four-spacecraft mission is designed to provide intercalibrated measurements of the B magnetic field vector. The instrumentation and data processing of the mission are discussed. The instrumentation is identical on the four spacecraft. It consists of two triaxial fluxgate sensors and of a failure tolerant data processing unit. The combined analysis of the four spacecraft data will yield such parameters as the current density vector, wave vectors, and the geometry and structure of discontinuities.
Community-based control of Aedes aegypti by adoption of eco-health methods in Chennai City, India
Arunachalam, Natarajan; Tyagi, Brij Kishore; Samuel, Miriam; Krishnamoorthi, R; Manavalan, R; Tewari, Satish Chandra; Ashokkumar, V; Kroeger, Axel; Sommerfeld, Johannes; Petzold, Max
2012-01-01
Background Dengue is highly endemic in Chennai city, South India, in spite of continuous vector control efforts. This intervention study was aimed at establishing the efficacy as well as the favouring and limiting factors relating to a community-based environmental intervention package to control the dengue vector Aedes aegypti. Methods A cluster randomized controlled trial was designed to measure the outcome of a new vector control package and process analysis; different data collection tools were used to determine the performance. Ten randomly selected intervention clusters (neighbourhoods with 100 houses each) were paired with ten control clusters on the basis of ecological/entomological indices and sociological parameters collected during baseline studies. In the intervention clusters, Aedes control was carried out using a community-based environmental management approach like provision of water container covers through community actors, clean-up campaigns, and dissemination of dengue information through schoolchildren. The main outcome measure was reduction in pupal indices (pupae per person index), used as a proxy measure of adult vectors, in the intervention clusters compared to the control clusters. Results At baseline, almost half the respondents did not know that dengue is serious but preventable, or that it is transmitted by mosquitoes. The stakeholder analysis showed that dengue vector control is carried out by vertically structured programmes of national, state, and local administrative bodies through fogging and larval control with temephos, without any involvement of community-based organizations, and that vector control efforts were conducted in an isolated and irregular way. The most productive container types for Aedes pupae were cement tanks, drums, and discarded containers. All ten intervention clusters with a total of 1000 houses and 4639 inhabitants received the intervention while the ten control clusters with a total of 1000 houses and 4439 inhabitants received only the routine government services and some of the information education and communication project materials. The follow-up studies showed that there was a substantial increase in dengue understanding in the intervention group with only minor knowledge changes in the control group. Community involvement and the partnership among stakeholders (particularly women’s self-help groups) worked well. After 10 months of intervention, the pupae per person index was significantly reduced to 0.004 pupae per person from 1.075 (P = 0.020) in the intervention clusters compared to control clusters. There were also significant reductions in the Stegomyia indices: the house index was reduced to 4.2%, the container index to 1.05%, and the Breteau index to 4.3 from the baseline values of 19.6, 8.91, and 30.8 in the intervention arm. Conclusion A community-based approach together with other stakeholders that promoted interventions to prevent dengue vector breeding led to a substantial reduction in dengue vector density. PMID:23318241
Community-based control of Aedes aegypti by adoption of eco-health methods in Chennai City, India.
Arunachalam, Natarajan; Tyagi, Brij Kishore; Samuel, Miriam; Krishnamoorthi, R; Manavalan, R; Tewari, Satish Chandra; Ashokkumar, V; Kroeger, Axel; Sommerfeld, Johannes; Petzold, Max
2012-12-01
Dengue is highly endemic in Chennai city, South India, in spite of continuous vector control efforts. This intervention study was aimed at establishing the efficacy as well as the favouring and limiting factors relating to a community-based environmental intervention package to control the dengue vector Aedes aegypti. A cluster randomized controlled trial was designed to measure the outcome of a new vector control package and process analysis; different data collection tools were used to determine the performance. Ten randomly selected intervention clusters (neighbourhoods with 100 houses each) were paired with ten control clusters on the basis of ecological/entomological indices and sociological parameters collected during baseline studies. In the intervention clusters, Aedes control was carried out using a community-based environmental management approach like provision of water container covers through community actors, clean-up campaigns, and dissemination of dengue information through schoolchildren. The main outcome measure was reduction in pupal indices (pupae per person index), used as a proxy measure of adult vectors, in the intervention clusters compared to the control clusters. At baseline, almost half the respondents did not know that dengue is serious but preventable, or that it is transmitted by mosquitoes. The stakeholder analysis showed that dengue vector control is carried out by vertically structured programmes of national, state, and local administrative bodies through fogging and larval control with temephos, without any involvement of community-based organizations, and that vector control efforts were conducted in an isolated and irregular way. The most productive container types for Aedes pupae were cement tanks, drums, and discarded containers. All ten intervention clusters with a total of 1000 houses and 4639 inhabitants received the intervention while the ten control clusters with a total of 1000 houses and 4439 inhabitants received only the routine government services and some of the information education and communication project materials. The follow-up studies showed that there was a substantial increase in dengue understanding in the intervention group with only minor knowledge changes in the control group. Community involvement and the partnership among stakeholders (particularly women's self-help groups) worked well. After 10 months of intervention, the pupae per person index was significantly reduced to 0·004 pupae per person from 1·075 (P = 0·020) in the intervention clusters compared to control clusters. There were also significant reductions in the Stegomyia indices: the house index was reduced to 4·2%, the container index to 1·05%, and the Breteau index to 4·3 from the baseline values of 19·6, 8·91, and 30·8 in the intervention arm. A community-based approach together with other stakeholders that promoted interventions to prevent dengue vector breeding led to a substantial reduction in dengue vector density.
Two generalizations of Kohonen clustering
NASA Technical Reports Server (NTRS)
Bezdek, James C.; Pal, Nikhil R.; Tsao, Eric C. K.
1993-01-01
The relationship between the sequential hard c-means (SHCM), learning vector quantization (LVQ), and fuzzy c-means (FCM) clustering algorithms is discussed. LVQ and SHCM suffer from several major problems. For example, they depend heavily on initialization. If the initial values of the cluster centers are outside the convex hull of the input data, such algorithms, even if they terminate, may not produce meaningful results in terms of prototypes for cluster representation. This is due in part to the fact that they update only the winning prototype for every input vector. The impact and interaction of these two families with Kohonen's self-organizing feature mapping (SOFM), which is not a clustering method, but which often leads ideas to clustering algorithms is discussed. Then two generalizations of LVQ that are explicitly designed as clustering algorithms are presented; these algorithms are referred to as generalized LVQ = GLVQ; and fuzzy LVQ = FLVQ. Learning rules are derived to optimize an objective function whose goal is to produce 'good clusters'. GLVQ/FLVQ (may) update every node in the clustering net for each input vector. Neither GLVQ nor FLVQ depends upon a choice for the update neighborhood or learning rate distribution - these are taken care of automatically. Segmentation of a gray tone image is used as a typical application of these algorithms to illustrate the performance of GLVQ/FLVQ.
NASA Astrophysics Data System (ADS)
Taha, Z.; Razman, M. A. M.; Adnan, F. A.; Ghani, A. S. Abdul; Majeed, A. P. P. Abdul; Musa, R. M.; Sallehudin, M. F.; Mukai, Y.
2018-03-01
Fish Hunger behaviour is one of the important element in determining the fish feeding routine, especially for farmed fishes. Inaccurate feeding routines (under-feeding or over-feeding) lead the fishes to die and thus, reduces the total production of fishes. The excessive food which is not eaten by fish will be dissolved in the water and thus, reduce the water quality (oxygen quantity in the water will be reduced). The reduction of oxygen (water quality) leads the fish to die and in some cases, may lead to fish diseases. This study correlates Barramundi fish-school behaviour with hunger condition through the hybrid data integration of image processing technique. The behaviour is clustered with respect to the position of the centre of gravity of the school of fish prior feeding, during feeding and after feeding. The clustered fish behaviour is then classified by means of a machine learning technique namely Support vector machine (SVM). It has been shown from the study that the Fine Gaussian variation of SVM is able to provide a reasonably accurate classification of fish feeding behaviour with a classification accuracy of 79.7%. The proposed integration technique may increase the usefulness of the captured data and thus better differentiates the various behaviour of farmed fishes.
Mining the National Career Assessment Examination Result Using Clustering Algorithm
NASA Astrophysics Data System (ADS)
Pagudpud, M. V.; Palaoag, T. T.; Padirayon, L. M.
2018-03-01
Education is an essential process today which elicits authorities to discover and establish innovative strategies for educational improvement. This study applied data mining using clustering technique for knowledge extraction from the National Career Assessment Examination (NCAE) result in the Division of Quirino. The NCAE is an examination given to all grade 9 students in the Philippines to assess their aptitudes in the different domains. Clustering the students is helpful in identifying students’ learning considerations. With the use of the RapidMiner tool, clustering algorithms such as Density-Based Spatial Clustering of Applications with Noise (DBSCAN), k-means, k-medoid, expectation maximization clustering, and support vector clustering algorithms were analyzed. The silhouette indexes of the said clustering algorithms were compared, and the result showed that the k-means algorithm with k = 3 and silhouette index equal to 0.196 is the most appropriate clustering algorithm to group the students. Three groups were formed having 477 students in the determined group (cluster 0), 310 proficient students (cluster 1) and 396 developing students (cluster 2). The data mining technique used in this study is essential in extracting useful information from the NCAE result to better understand the abilities of students which in turn is a good basis for adopting teaching strategies.
Predicting Flavonoid UGT Regioselectivity
Jackson, Rhydon; Knisley, Debra; McIntosh, Cecilia; Pfeiffer, Phillip
2011-01-01
Machine learning was applied to a challenging and biologically significant protein classification problem: the prediction of avonoid UGT acceptor regioselectivity from primary sequence. Novel indices characterizing graphical models of residues were proposed and found to be widely distributed among existing amino acid indices and to cluster residues appropriately. UGT subsequences biochemically linked to regioselectivity were modeled as sets of index sequences. Several learning techniques incorporating these UGT models were compared with classifications based on standard sequence alignment scores. These techniques included an application of time series distance functions to protein classification. Time series distances defined on the index sequences were used in nearest neighbor and support vector machine classifiers. Additionally, Bayesian neural network classifiers were applied to the index sequences. The experiments identified improvements over the nearest neighbor and support vector machine classifications relying on standard alignment similarity scores, as well as strong correlations between specific subsequences and regioselectivities. PMID:21747849
Alonso, Sergi; Zulliger, Rose; Wagman, Joe; Saifodine, Abuchahama; Candrinho, Baltazar; Macete, Eusébio; Brew, Joe; Fornadel, Christen; Kassim, Hidayat; Loch, Lourdes; Sacoor, Charfudin; Varela, Kenyssony; Carty, Cara L; Robertson, Molly; Saute, Francisco
2018-01-01
Background Most of the reduction in malaria prevalence seen in Africa since 2000 has been attributed to vector control interventions. Yet increases in the distribution and intensity of insecticide resistance and higher costs of newer insecticides pose a challenge to sustaining these gains. Thus, endemic countries face challenging decisions regarding the choice of vector control interventions. Methods A cluster randomised trial is being carried out in Mopeia District in the Zambezia Province of Mozambique, where malaria prevalence in children under 5 is high (68% in 2015), despite continuous and campaign distribution of long-lasting insecticide-treated nets (LLINs). Study arm 1 will continue to use the standard, LLIN-based National Malaria Control Programme vector control strategy (LLINs only), while study arm 2 will receive indoor residual spraying (IRS) once a year for 2 years with a microencapsulated formulation of pirimiphos-methyl (Actellic 300 CS), in addition to the standard LLIN strategy (LLINs+IRS). Prior to the 2016 IRS implementation (the first of two IRS campaigns in this study), 146 clusters were defined and stratified per number of households. Clusters were then randomised 1:1 into the two study arms. The public health impact and cost-effectiveness of IRS intervention will be evaluated over 2 years using multiple methods: (1) monthly active malaria case detection in a cohort of 1548 total children aged 6–59 months; (2) enhanced passive surveillance at health facilities and with community health workers; (3) annual cross-sectional surveys; and (4) entomological surveillance. Prospective microcosting of the intervention and provider and societal costs will be conducted. Insecticide resistance status pattern and changes in local Anopheline populations will be included as important supportive outcomes. Discussion By evaluating the public health impact and cost-effectiveness of IRS with a non-pyrethroid insecticide in a high-transmission setting with high LLIN ownership, it is expected that this study will provide programmatic and policy-relevant data to guide national and global vector control strategies. Trial registration number NCT02910934. PMID:29564161
Chaccour, Carlos J; Alonso, Sergi; Zulliger, Rose; Wagman, Joe; Saifodine, Abuchahama; Candrinho, Baltazar; Macete, Eusébio; Brew, Joe; Fornadel, Christen; Kassim, Hidayat; Loch, Lourdes; Sacoor, Charfudin; Varela, Kenyssony; Carty, Cara L; Robertson, Molly; Saute, Francisco
2018-01-01
Most of the reduction in malaria prevalence seen in Africa since 2000 has been attributed to vector control interventions. Yet increases in the distribution and intensity of insecticide resistance and higher costs of newer insecticides pose a challenge to sustaining these gains. Thus, endemic countries face challenging decisions regarding the choice of vector control interventions. A cluster randomised trial is being carried out in Mopeia District in the Zambezia Province of Mozambique, where malaria prevalence in children under 5 is high (68% in 2015), despite continuous and campaign distribution of long-lasting insecticide-treated nets (LLINs). Study arm 1 will continue to use the standard, LLIN-based National Malaria Control Programme vector control strategy (LLINs only), while study arm 2 will receive indoor residual spraying (IRS) once a year for 2 years with a microencapsulated formulation of pirimiphos-methyl (Actellic 300 CS), in addition to the standard LLIN strategy (LLINs+IRS). Prior to the 2016 IRS implementation (the first of two IRS campaigns in this study), 146 clusters were defined and stratified per number of households. Clusters were then randomised 1:1 into the two study arms. The public health impact and cost-effectiveness of IRS intervention will be evaluated over 2 years using multiple methods: (1) monthly active malaria case detection in a cohort of 1548 total children aged 6-59 months; (2) enhanced passive surveillance at health facilities and with community health workers; (3) annual cross-sectional surveys; and (4) entomological surveillance. Prospective microcosting of the intervention and provider and societal costs will be conducted. Insecticide resistance status pattern and changes in local Anopheline populations will be included as important supportive outcomes. By evaluating the public health impact and cost-effectiveness of IRS with a non-pyrethroid insecticide in a high-transmission setting with high LLIN ownership, it is expected that this study will provide programmatic and policy-relevant data to guide national and global vector control strategies. NCT02910934.
Karayiannis, N B
2000-01-01
This paper presents the development and investigates the properties of ordered weighted learning vector quantization (LVQ) and clustering algorithms. These algorithms are developed by using gradient descent to minimize reformulation functions based on aggregation operators. An axiomatic approach provides conditions for selecting aggregation operators that lead to admissible reformulation functions. Minimization of admissible reformulation functions based on ordered weighted aggregation operators produces a family of soft LVQ and clustering algorithms, which includes fuzzy LVQ and clustering algorithms as special cases. The proposed LVQ and clustering algorithms are used to perform segmentation of magnetic resonance (MR) images of the brain. The diagnostic value of the segmented MR images provides the basis for evaluating a variety of ordered weighted LVQ and clustering algorithms.
Wagner, T.; Benbow, M.E.; Brenden, T.O.; Qi, J.; Johnson, R.C.
2008-01-01
Background: Buruli ulcer (BU) disease, caused by infection with the environmental mycobacterium M. ulcerans, is an emerging infectious disease in many tropical and sub-tropical countries. Although vectors and modes of transmission remain unknown, it is hypothesized that the transmission of BU disease is associated with human activities in or around aquatic environments, and that characteristics of the landscape (e.g., land use/cover) play a role in mediating BU disease. Several studies performed at relatively small spatial scales (e.g., within a single village or region of a country) support these hypotheses; however, if BU disease is associated with land use/cover characteristics, either through spatial constraints on vector-host dynamics or by mediating human activities, then large-scale (i.e., country-wide) associations should also emerge. The objectives of this study were to (1) investigate associations between BU disease prevalence in villages in Benin, West Africa and surrounding land use/cover patterns and other map-based characteristics, and (2) identify areas with greater and lower than expected prevalence rates (i.e., disease clusters) to assist with the development of prevention and control programs. Results: Our landscape-based models identified low elevation, rural villages surrounded by forest land cover, and located in drainage basins with variable wetness patterns as being associated with higher BU disease prevalence rates. We also identified five spatial disease clusters. Three of the five clusters contained villages with greater than expected prevalence rates and two clusters contained villages with lower than expected prevalence rates. Those villages with greater than expected BU disease prevalence rates spanned a fairly narrow region of south-central Benin. Conclusion: Our analyses suggest that interactions between natural land cover and human alterations to the landscape likely play a role in the dynamics of BU disease. For example, urbanization, potentially by providing access to protected water sources, may reduce the likelihood of becoming infected with BU disease. Villages located at low elevations may have higher BU disease prevalence rates due to their close spatial proximity to high risk environments. In addition, forest land cover and drainage basins with variable wetness patterns may be important for providing suitable growth conditions for M. ulcerans, influencing the distribution and abundance of vectors, or mediating vector-human interactions. The identification of disease clusters in this study provides direction for future research aimed at better understanding these and other environmental and social determinants involved in BU disease outbreaks. ?? 2008 Wagner et al; licensee BioMed Central Ltd.
Wagner, Tyler; Benbow, M Eric; Brenden, Travis O; Qi, Jiaguo; Johnson, R Christian
2008-01-01
Background Buruli ulcer (BU) disease, caused by infection with the environmental mycobacterium M. ulcerans, is an emerging infectious disease in many tropical and sub-tropical countries. Although vectors and modes of transmission remain unknown, it is hypothesized that the transmission of BU disease is associated with human activities in or around aquatic environments, and that characteristics of the landscape (e.g., land use/cover) play a role in mediating BU disease. Several studies performed at relatively small spatial scales (e.g., within a single village or region of a country) support these hypotheses; however, if BU disease is associated with land use/cover characteristics, either through spatial constraints on vector-host dynamics or by mediating human activities, then large-scale (i.e., country-wide) associations should also emerge. The objectives of this study were to (1) investigate associations between BU disease prevalence in villages in Benin, West Africa and surrounding land use/cover patterns and other map-based characteristics, and (2) identify areas with greater and lower than expected prevalence rates (i.e., disease clusters) to assist with the development of prevention and control programs. Results Our landscape-based models identified low elevation, rural villages surrounded by forest land cover, and located in drainage basins with variable wetness patterns as being associated with higher BU disease prevalence rates. We also identified five spatial disease clusters. Three of the five clusters contained villages with greater than expected prevalence rates and two clusters contained villages with lower than expected prevalence rates. Those villages with greater than expected BU disease prevalence rates spanned a fairly narrow region of south-central Benin. Conclusion Our analyses suggest that interactions between natural land cover and human alterations to the landscape likely play a role in the dynamics of BU disease. For example, urbanization, potentially by providing access to protected water sources, may reduce the likelihood of becoming infected with BU disease. Villages located at low elevations may have higher BU disease prevalence rates due to their close spatial proximity to high risk environments. In addition, forest land cover and drainage basins with variable wetness patterns may be important for providing suitable growth conditions for M. ulcerans, influencing the distribution and abundance of vectors, or mediating vector-human interactions. The identification of disease clusters in this study provides direction for future research aimed at better understanding these and other environmental and social determinants involved in BU disease outbreaks. PMID:18505567
NASA Astrophysics Data System (ADS)
Gandomkar, Ziba; Tay, Kevin; Ryder, Will; Brennan, Patrick C.; Mello-Thoms, Claudia
2016-03-01
Radiologists' gaze-related parameters combined with image-based features were utilized to classify suspicious mammographic areas ultimately scored as True Positives (TP) and False Positives (FP). Eight breast radiologists read 120 two-view digital mammograms of which 59 had biopsy proven cancer. Eye tracking data was collected and nearby fixations were clustered together. Suspicious areas on mammograms were independently identified based on thresholding an intensity saliency map followed by automatic segmentation and pruning steps. For each radiologist reported area, radiologist's fixation clusters in the area, as well as neighboring suspicious areas within 2.5° of the center of fixation, were found. A 45-dimensional feature vector containing gaze parameters of the corresponding cluster along with image-based characteristics was constructed. Gaze parameters included total number of fixations in the cluster, dwell time, time to hit the cluster for the first time, maximum number of consecutive fixations, and saccade magnitude of the first fixation in the cluster. Image-based features consisted of intensity, shape, and texture descriptors extracted from the region around the suspicious area, its surrounding tissue, and the entire breast. For each radiologist, a userspecific Support Vector Machine (SVM) model was built to classify the reported areas as TPs or FPs. Leave-one-out cross validation was utilized to avoid over-fitting. A feature selection step was embedded in the SVM training procedure by allowing radial basis function kernels to have 45 scaling factors. The proposed method was compared with the radiologists' performance using the jackknife alternative free-response receiver operating characteristic (JAFROC). The JAFROC figure of merit increased significantly for six radiologists.
Kavitha, Muthu Subash; Asano, Akira; Taguchi, Akira; Heo, Min-Suk
2013-09-01
To prevent low bone mineral density (BMD), that is, osteoporosis, in postmenopausal women, it is essential to diagnose osteoporosis more precisely. This study presented an automatic approach utilizing a histogram-based automatic clustering (HAC) algorithm with a support vector machine (SVM) to analyse dental panoramic radiographs (DPRs) and thus improve diagnostic accuracy by identifying postmenopausal women with low BMD or osteoporosis. We integrated our newly-proposed histogram-based automatic clustering (HAC) algorithm with our previously-designed computer-aided diagnosis system. The extracted moment-based features (mean, variance, skewness, and kurtosis) of the mandibular cortical width for the radial basis function (RBF) SVM classifier were employed. We also compared the diagnostic efficacy of the SVM model with the back propagation (BP) neural network model. In this study, DPRs and BMD measurements of 100 postmenopausal women patients (aged >50 years), with no previous record of osteoporosis, were randomly selected for inclusion. The accuracy, sensitivity, and specificity of the BMD measurements using our HAC-SVM model to identify women with low BMD were 93.0% (88.0%-98.0%), 95.8% (91.9%-99.7%) and 86.6% (79.9%-93.3%), respectively, at the lumbar spine; and 89.0% (82.9%-95.1%), 96.0% (92.2%-99.8%) and 84.0% (76.8%-91.2%), respectively, at the femoral neck. Our experimental results predict that the proposed HAC-SVM model combination applied on DPRs could be useful to assist dentists in early diagnosis and help to reduce the morbidity and mortality associated with low BMD and osteoporosis.
Kernel spectral clustering with memory effect
NASA Astrophysics Data System (ADS)
Langone, Rocco; Alzate, Carlos; Suykens, Johan A. K.
2013-05-01
Evolving graphs describe many natural phenomena changing over time, such as social relationships, trade markets, metabolic networks etc. In this framework, performing community detection and analyzing the cluster evolution represents a critical task. Here we propose a new model for this purpose, where the smoothness of the clustering results over time can be considered as a valid prior knowledge. It is based on a constrained optimization formulation typical of Least Squares Support Vector Machines (LS-SVM), where the objective function is designed to explicitly incorporate temporal smoothness. The latter allows the model to cluster the current data well and to be consistent with the recent history. We also propose new model selection criteria in order to carefully choose the hyper-parameters of our model, which is a crucial issue to achieve good performances. We successfully test the model on four toy problems and on a real world network. We also compare our model with Evolutionary Spectral Clustering, which is a state-of-the-art algorithm for community detection of evolving networks, illustrating that the kernel spectral clustering with memory effect can achieve better or equal performances.
Unsupervised Learning —A Novel Clustering Method for Rolling Bearing Faults Identification
NASA Astrophysics Data System (ADS)
Kai, Li; Bo, Luo; Tao, Ma; Xuefeng, Yang; Guangming, Wang
2017-12-01
To promptly process the massive fault data and automatically provide accurate diagnosis results, numerous studies have been conducted on intelligent fault diagnosis of rolling bearing. Among these studies, such as artificial neural networks, support vector machines, decision trees and other supervised learning methods are used commonly. These methods can detect the failure of rolling bearing effectively, but to achieve better detection results, it often requires a lot of training samples. Based on above, a novel clustering method is proposed in this paper. This novel method is able to find the correct number of clusters automatically the effectiveness of the proposed method is validated using datasets from rolling element bearings. The diagnosis results show that the proposed method can accurately detect the fault types of small samples. Meanwhile, the diagnosis results are also relative high accuracy even for massive samples.
NASA Astrophysics Data System (ADS)
Cannata, A.; Montalto, P.; Aliotta, M.; Cassisi, C.; Pulvirenti, A.; Privitera, E.; Patanè, D.
2011-04-01
Active volcanoes generate sonic and infrasonic signals, whose investigation provides useful information for both monitoring purposes and the study of the dynamics of explosive phenomena. At Mt. Etna volcano (Italy), a pattern recognition system based on infrasonic waveform features has been developed. First, by a parametric power spectrum method, the features describing and characterizing the infrasound events were extracted: peak frequency and quality factor. Then, together with the peak-to-peak amplitude, these features constituted a 3-D ‘feature space’; by Density-Based Spatial Clustering of Applications with Noise algorithm (DBSCAN) three clusters were recognized inside it. After the clustering process, by using a common location method (semblance method) and additional volcanological information concerning the intensity of the explosive activity, we were able to associate each cluster to a particular source vent and/or a kind of volcanic activity. Finally, for automatic event location, clusters were used to train a model based on Support Vector Machine, calculating optimal hyperplanes able to maximize the margins of separation among the clusters. After the training phase this system automatically allows recognizing the active vent with no location algorithm and by using only a single station.
Chen, Chao; Zhao, Xinqing; Jin, Yingyu; Zhao, Zongbao Kent; Suh, Joo-Won
2014-11-01
Bacterial artificial chromosomal (BAC) vectors are increasingly being used in cloning large DNA fragments containing complex biosynthetic pathways to facilitate heterologous production of microbial metabolites for drug development. To express inserted genes using Streptomyces species as the production hosts, an integration expression cassette is required to be inserted into the BAC vector, which includes genetic elements encoding a phage-specific attachment site, an integrase, an origin of transfer, a selection marker and a promoter. Due to the large sizes of DNA inserted into the BAC vectors, it is normally inefficient and time-consuming to assemble these fragments by routine PCR amplifications and restriction-ligations. Here we present a rapid method to insert fragments to construct BAC-based expression vectors. A DNA fragment of about 130 bp was designed, which contains upstream and downstream homologous sequences of both BAC vector and pIB139 plasmid carrying the whole integration expression cassette. In-Fusion cloning was performed using the designer DNA fragment to modify pIB139, followed by λ-RED-mediated recombination to obtain the BAC-based expression vector. We demonstrated the effectiveness of this method by rapid construction of a BAC-based expression vector with an insert of about 120 kb that contains the entire gene cluster for biosynthesis of immunosuppressant FK506. The empty BAC-based expression vector constructed in this study can be conveniently used for construction of BAC libraries using either microbial pure culture or environmental DNA, and the selected BAC clones can be directly used for heterologous expression. Alternatively, if a BAC library has already been constructed using a commercial BAC vector, the selected BAC vectors can be manipulated using the method described here to get the BAC-based expression vectors with desired gene clusters for heterologous expression. The rapid construction of a BAC-based expression vector facilitates heterologous expression of large gene clusters for drug discovery. Copyright © 2014 Elsevier Inc. All rights reserved.
A new clustering algorithm applicable to multispectral and polarimetric SAR images
NASA Technical Reports Server (NTRS)
Wong, Yiu-Fai; Posner, Edward C.
1993-01-01
We describe an application of a scale-space clustering algorithm to the classification of a multispectral and polarimetric SAR image of an agricultural site. After the initial polarimetric and radiometric calibration and noise cancellation, we extracted a 12-dimensional feature vector for each pixel from the scattering matrix. The clustering algorithm was able to partition a set of unlabeled feature vectors from 13 selected sites, each site corresponding to a distinct crop, into 13 clusters without any supervision. The cluster parameters were then used to classify the whole image. The classification map is much less noisy and more accurate than those obtained by hierarchical rules. Starting with every point as a cluster, the algorithm works by melting the system to produce a tree of clusters in the scale space. It can cluster data in any multidimensional space and is insensitive to variability in cluster densities, sizes and ellipsoidal shapes. This algorithm, more powerful than existing ones, may be useful for remote sensing for land use.
Image segmentation using fuzzy LVQ clustering networks
NASA Technical Reports Server (NTRS)
Tsao, Eric Chen-Kuo; Bezdek, James C.; Pal, Nikhil R.
1992-01-01
In this note we formulate image segmentation as a clustering problem. Feature vectors extracted from a raw image are clustered into subregions, thereby segmenting the image. A fuzzy generalization of a Kohonen learning vector quantization (LVQ) which integrates the Fuzzy c-Means (FCM) model with the learning rate and updating strategies of the LVQ is used for this task. This network, which segments images in an unsupervised manner, is thus related to the FCM optimization problem. Numerical examples on photographic and magnetic resonance images are given to illustrate this approach to image segmentation.
Ajamma, Yvonne Ukamaka; Villinger, Jandouwe; Omondi, David; Salifu, Daisy; Onchuru, Thomas Ogao; Njoroge, Laban; Muigai, Anne W. T.; Masiga, Daniel K.
2016-01-01
The Lake Baringo and Lake Victoria regions of Kenya are associated with high seroprevalence of mosquito-transmitted arboviruses. However, molecular identification of potential mosquito vector species, including morphologically identified ones, remains scarce. To estimate the diversity, abundance, and distribution of mosquito vectors on the mainland shores and adjacent inhabited islands in these regions, we collected and morphologically identified adult and immature mosquitoes and obtained the corresponding sequence variation at cytochrome c oxidase 1 (COI) and internal transcribed spacer region 2 (ITS2) gene regions. A total of 63 species (including five subspecies) were collected from both study areas, 47 of which have previously been implicated as disease vectors. Fourteen species were found only on island sites, which are rarely included in mosquito diversity surveys. We collected more mosquitoes, yet with lower species composition, at Lake Baringo (40,229 mosquitoes, 32 species) than at Lake Victoria (22,393 mosquitoes, 54 species). Phylogenetic analysis of COI gene sequences revealed Culex perexiguus and Cx. tenagius that could not be distinguished morphologically. Most Culex species clustered into a heterogeneous clade with closely related sequences, while Culex pipiens clustered into two distinct COI and ITS2 clades. These data suggest limitations in current morphological identification keys. This is the first DNA barcode report of Kenyan mosquitoes. To improve mosquito species identification, morphological identifications should be supported by their molecular data, while diversity surveys should target both adults and immatures. The diversity of native mosquito disease vectors identified in this study impacts disease transmission risks to humans and livestock. PMID:27402888
Multi-hop routing mechanism for reliable sensor computing.
Chen, Jiann-Liang; Ma, Yi-Wei; Lai, Chia-Ping; Hu, Chia-Cheng; Huang, Yueh-Min
2009-01-01
Current research on routing in wireless sensor computing concentrates on increasing the service lifetime, enabling scalability for large number of sensors and supporting fault tolerance for battery exhaustion and broken nodes. A sensor node is naturally exposed to various sources of unreliable communication channels and node failures. Sensor nodes have many failure modes, and each failure degrades the network performance. This work develops a novel mechanism, called Reliable Routing Mechanism (RRM), based on a hybrid cluster-based routing protocol to specify the best reliable routing path for sensor computing. Table-driven intra-cluster routing and on-demand inter-cluster routing are combined by changing the relationship between clusters for sensor computing. Applying a reliable routing mechanism in sensor computing can improve routing reliability, maintain low packet loss, minimize management overhead and save energy consumption. Simulation results indicate that the reliability of the proposed RRM mechanism is around 25% higher than that of the Dynamic Source Routing (DSR) and ad hoc On-demand Distance Vector routing (AODV) mechanisms.
Parallel and Scalable Clustering and Classification for Big Data in Geosciences
NASA Astrophysics Data System (ADS)
Riedel, M.
2015-12-01
Machine learning, data mining, and statistical computing are common techniques to perform analysis in earth sciences. This contribution will focus on two concrete and widely used data analytics methods suitable to analyse 'big data' in the context of geoscience use cases: clustering and classification. From the broad class of available clustering methods we focus on the density-based spatial clustering of appliactions with noise (DBSCAN) algorithm that enables the identification of outliers or interesting anomalies. A new open source parallel and scalable DBSCAN implementation will be discussed in the light of a scientific use case that detects water mixing events in the Koljoefjords. The second technique we cover is classification, with a focus set on the support vector machines algorithm (SVMs), as one of the best out-of-the-box classification algorithm. A parallel and scalable SVM implementation will be discussed in the light of a scientific use case in the field of remote sensing with 52 different classes of land cover types.
Problems and Mitigation Strategies for Developing and Validating Statistical Cyber Defenses
2014-04-01
Clustering Support Vector Machine (SVM) Classification Netflow Twitter Training Datasets Trained SVMs Enriched Feature State...requests. • Netflow data for TCP connections • E-mail data from SMTP logs • Chat data from XMPP logs • Microtext data (from Twitter message archives...summary data from Bro and Netflow data captured on the BBN network over the period of 1 month, plus simulated attacks WHOIS Domain name record
Incremental Support Vector Machine Framework for Visual Sensor Networks
NASA Astrophysics Data System (ADS)
Awad, Mariette; Jiang, Xianhua; Motai, Yuichi
2006-12-01
Motivated by the emerging requirements of surveillance networks, we present in this paper an incremental multiclassification support vector machine (SVM) technique as a new framework for action classification based on real-time multivideo collected by homogeneous sites. The technique is based on an adaptation of least square SVM (LS-SVM) formulation but extends beyond the static image-based learning of current SVM methodologies. In applying the technique, an initial supervised offline learning phase is followed by a visual behavior data acquisition and an online learning phase during which the cluster head performs an ensemble of model aggregations based on the sensor nodes inputs. The cluster head then selectively switches on designated sensor nodes for future incremental learning. Combining sensor data offers an improvement over single camera sensing especially when the latter has an occluded view of the target object. The optimization involved alleviates the burdens of power consumption and communication bandwidth requirements. The resulting misclassification error rate, the iterative error reduction rate of the proposed incremental learning, and the decision fusion technique prove its validity when applied to visual sensor networks. Furthermore, the enabled online learning allows an adaptive domain knowledge insertion and offers the advantage of reducing both the model training time and the information storage requirements of the overall system which makes it even more attractive for distributed sensor networks communication.
NASA Astrophysics Data System (ADS)
Ragozzine, Brett
The invocation of dark matter in the universe is predicated upon gravitational observations that cannot be explained by the amount of luminous matter that we detect. There is an ongoing debate over which gravitational model is correct. The work herein tests a prescription of gravity theory known as Tensor-Vector-Scalar and is based upon the work of Angus et al. (2007). We add upon this work by extending the sample of galaxy clusters to five and testing the accepted Navarro, Frenk & White (NFW) dark matter potential (Navarro et al., 1996). Our independent implementation of this method includes weak gravitational lensing analysis to determine the amount of dark matter in these galaxy clusters by calculating the gas fraction ƒgas = Mgas=Mtot. The ability of the Tensor-Vector-Scalar theory to predict a consistent ƒgas across all galaxy clusters is a measure of its liklihood of being the correct gravity model.
A multiple-feature and multiple-kernel scene segmentation algorithm for humanoid robot.
Liu, Zhi; Xu, Shuqiong; Zhang, Yun; Chen, Chun Lung Philip
2014-11-01
This technical correspondence presents a multiple-feature and multiple-kernel support vector machine (MFMK-SVM) methodology to achieve a more reliable and robust segmentation performance for humanoid robot. The pixel wise intensity, gradient, and C1 SMF features are extracted via the local homogeneity model and Gabor filter, which would be used as inputs of MFMK-SVM model. It may provide multiple features of the samples for easier implementation and efficient computation of MFMK-SVM model. A new clustering method, which is called feature validity-interval type-2 fuzzy C-means (FV-IT2FCM) clustering algorithm, is proposed by integrating a type-2 fuzzy criterion in the clustering optimization process to improve the robustness and reliability of clustering results by the iterative optimization. Furthermore, the clustering validity is employed to select the training samples for the learning of the MFMK-SVM model. The MFMK-SVM scene segmentation method is able to fully take advantage of the multiple features of scene image and the ability of multiple kernels. Experiments on the BSDS dataset and real natural scene images demonstrate the superior performance of our proposed method.
Hasman, Henrik; Aarestrup, Frank M; Dalsgaard, Anders; Guardabassi, Luca
2006-04-01
The aim of the study was to determine whether glycopeptide resistance gene clusters from soil bacteria could be heterologously expressed in Enterococcus faecalis and adapt to the new host following exposure to vancomycin. The vanHAX clusters from Paenibacillus thiaminolyticus PT-2B1, Paenibacillus apiarius PA-B2B and Amycolatopsis coloradensis DSM 44225 were separately cloned in an appropriately constructed shuttle vector containing the two-component regulatory system (vanRS) of Tn1546. The complete vanA(PT) operon (vanRSHAXY) from P. thiaminolyticus PT-2B1 was cloned in the same shuttle vector lacking enterococcal vanRS. All plasmid constructs were electroporated into E. faecalis JH2-2 and the MICs of vancomycin and teicoplanin were determined for each recombinant strain before and following exposure to sublethal concentrations of vancomycin. The vanHAX clusters from P. thiaminolyticus and P. apiarius conferred high-level vancomycin resistance (MIC > or = 125 mg/L) in E. faecalis JH2-2. In contrast, cloning of the vanHAX cluster from A. coloradensis did not result in a significant increase of vancomycin resistance (MIC = 0.7 mg/L). Resistance to vancomycin was not observed after cloning the complete vanA(PT) operon from P. thiaminolyticus (MIC = 2 mg/L), but this recombinant rapidly adapted to high concentrations of vancomycin (MIC = 500 mg/L) following exposure to sub-lethal concentrations of this antibiotic. The results showed that vanA(PT) in P. thiaminolyticus is a possible ancestor of vanA-mediated glycopeptide resistance in enterococci. Experimental evidence supported the hypothesis that enterococci did not acquire glycopeptide resistance directly from glycopeptide-producing organisms such as A. coloradensis.
Gatos, Ilias; Tsantis, Stavros; Spiliopoulos, Stavros; Karnabatidis, Dimitris; Theotokas, Ioannis; Zoumpoulis, Pavlos; Loupas, Thanasis; Hazle, John D; Kagadis, George C
2017-09-01
The purpose of the present study was to employ a computer-aided diagnosis system that classifies chronic liver disease (CLD) using ultrasound shear wave elastography (SWE) imaging, with a stiffness value-clustering and machine-learning algorithm. A clinical data set of 126 patients (56 healthy controls, 70 with CLD) was analyzed. First, an RGB-to-stiffness inverse mapping technique was employed. A five-cluster segmentation was then performed associating corresponding different-color regions with certain stiffness value ranges acquired from the SWE manufacturer-provided color bar. Subsequently, 35 features (7 for each cluster), indicative of physical characteristics existing within the SWE image, were extracted. A stepwise regression analysis toward feature reduction was used to derive a reduced feature subset that was fed into the support vector machine classification algorithm to classify CLD from healthy cases. The highest accuracy in classification of healthy to CLD subject discrimination from the support vector machine model was 87.3% with sensitivity and specificity values of 93.5% and 81.2%, respectively. Receiver operating characteristic curve analysis gave an area under the curve value of 0.87 (confidence interval: 0.77-0.92). A machine-learning algorithm that quantifies color information in terms of stiffness values from SWE images and discriminates CLD from healthy cases is introduced. New objective parameters and criteria for CLD diagnosis employing SWE images provided by the present study can be considered an important step toward color-based interpretation, and could assist radiologists' diagnostic performance on a daily basis after being installed in a PC and employed retrospectively, immediately after the examination. Copyright © 2017 World Federation for Ultrasound in Medicine & Biology. Published by Elsevier Inc. All rights reserved.
Taha, Zahari; Musa, Rabiu Muazu; P P Abdul Majeed, Anwar; Alim, Muhammad Muaz; Abdullah, Mohamad Razali
2018-02-01
Support Vector Machine (SVM) has been shown to be an effective learning algorithm for classification and prediction. However, the application of SVM for prediction and classification in specific sport has rarely been used to quantify/discriminate low and high-performance athletes. The present study classified and predicted high and low-potential archers from a set of fitness and motor ability variables trained on different SVMs kernel algorithms. 50 youth archers with the mean age and standard deviation of 17.0 ± 0.6 years drawn from various archery programmes completed a six arrows shooting score test. Standard fitness and ability measurements namely hand grip, vertical jump, standing broad jump, static balance, upper muscle strength and the core muscle strength were also recorded. Hierarchical agglomerative cluster analysis (HACA) was used to cluster the archers based on the performance variables tested. SVM models with linear, quadratic, cubic, fine RBF, medium RBF, as well as the coarse RBF kernel functions, were trained based on the measured performance variables. The HACA clustered the archers into high-potential archers (HPA) and low-potential archers (LPA), respectively. The linear, quadratic, cubic, as well as the medium RBF kernel functions models, demonstrated reasonably excellent classification accuracy of 97.5% and 2.5% error rate for the prediction of the HPA and the LPA. The findings of this investigation can be valuable to coaches and sports managers to recognise high potential athletes from a combination of the selected few measured fitness and motor ability performance variables examined which would consequently save cost, time and effort during talent identification programme. Copyright © 2017 Elsevier B.V. All rights reserved.
Amin, Morteza Moradi; Kermani, Saeed; Talebi, Ardeshir; Oghli, Mostafa Ghelich
2015-01-01
Acute lymphoblastic leukemia is the most common form of pediatric cancer which is categorized into three L1, L2, and L3 and could be detected through screening of blood and bone marrow smears by pathologists. Due to being time-consuming and tediousness of the procedure, a computer-based system is acquired for convenient detection of Acute lymphoblastic leukemia. Microscopic images are acquired from blood and bone marrow smears of patients with Acute lymphoblastic leukemia and normal cases. After applying image preprocessing, cells nuclei are segmented by k-means algorithm. Then geometric and statistical features are extracted from nuclei and finally these cells are classified to cancerous and noncancerous cells by means of support vector machine classifier with 10-fold cross validation. These cells are also classified into their sub-types by multi-Support vector machine classifier. Classifier is evaluated by these parameters: Sensitivity, specificity, and accuracy which values for cancerous and noncancerous cells 98%, 95%, and 97%, respectively. These parameters are also used for evaluation of cell sub-types which values in mean 84.3%, 97.3%, and 95.6%, respectively. The results show that proposed algorithm could achieve an acceptable performance for the diagnosis of Acute lymphoblastic leukemia and its sub-types and can be used as an assistant diagnostic tool for pathologists.
On three-dimensional misorientation spaces.
Krakow, Robert; Bennett, Robbie J; Johnstone, Duncan N; Vukmanovic, Zoja; Solano-Alvarez, Wilberth; Lainé, Steven J; Einsle, Joshua F; Midgley, Paul A; Rae, Catherine M F; Hielscher, Ralf
2017-10-01
Determining the local orientation of crystals in engineering and geological materials has become routine with the advent of modern crystallographic mapping techniques. These techniques enable many thousands of orientation measurements to be made, directing attention towards how such orientation data are best studied. Here, we provide a guide to the visualization of misorientation data in three-dimensional vector spaces, reduced by crystal symmetry, to reveal crystallographic orientation relationships. Domains for all point group symmetries are presented and an analysis methodology is developed and applied to identify crystallographic relationships, indicated by clusters in the misorientation space, in examples from materials science and geology. This analysis aids the determination of active deformation mechanisms and evaluation of cluster centres and spread enables more accurate description of transformation processes supporting arguments regarding provenance.
On three-dimensional misorientation spaces
NASA Astrophysics Data System (ADS)
Krakow, Robert; Bennett, Robbie J.; Johnstone, Duncan N.; Vukmanovic, Zoja; Solano-Alvarez, Wilberth; Lainé, Steven J.; Einsle, Joshua F.; Midgley, Paul A.; Rae, Catherine M. F.; Hielscher, Ralf
2017-10-01
Determining the local orientation of crystals in engineering and geological materials has become routine with the advent of modern crystallographic mapping techniques. These techniques enable many thousands of orientation measurements to be made, directing attention towards how such orientation data are best studied. Here, we provide a guide to the visualization of misorientation data in three-dimensional vector spaces, reduced by crystal symmetry, to reveal crystallographic orientation relationships. Domains for all point group symmetries are presented and an analysis methodology is developed and applied to identify crystallographic relationships, indicated by clusters in the misorientation space, in examples from materials science and geology. This analysis aids the determination of active deformation mechanisms and evaluation of cluster centres and spread enables more accurate description of transformation processes supporting arguments regarding provenance.
Spin vectors in the Koronis family: III. (832) Karin
NASA Astrophysics Data System (ADS)
Slivan, Stephen M.; Molnar, Lawrence A.
2012-08-01
Studies of asteroid families constrain models of asteroid collisions and evolution processes, and the Karin cluster within the Koronis family is among the youngest families known (Nesvorný, D., Bottke, Jr., W.F., Dones, L., Levison, H.F. [2002]. Nature 417, 720-722). (832) Karin itself is by far the largest member of the Karin cluster, thus knowledge of Karin's spin vector is important to constrain family formation and evolution models that include spin, and to test whether its spin properties are consistent with the Karin cluster being a very young family. We observed rotation lightcurves of Karin during its four consecutive apparitions in 2006-2009, and combined the new observations with previously published lightcurves to determine its spin vector orientation and preliminary model shape. Karin is a prograde rotator with a period of (18.352 ± 0.003) h, spin obliquity near (42 ± 5)°, and pole ecliptic longitude near either (52 ± 5)° or (230 ± 5)°. The spin vector and shape results for Karin will constrain models of family formation that include spin properties; in the meantime we briefly discuss Karin's own spin in the context of those of other members of the Karin cluster and the parent body's siblings in the Koronis family.
Statistical analysis of dispersion relations in turbulent solar wind fluctuations using Cluster data
NASA Astrophysics Data System (ADS)
Perschke, C.; Narita, Y.
2012-12-01
Multi-spacecraft measurements enable us to resolve three-dimensional spatial structures without assuming Taylor's frozen-in-flow hypothesis. This is very useful to study frequency-wave vector diagram in solar wind turbulence through direct determination of three-dimensional wave vectors. The existence and evolution of dispersion relation and its role in fully-developed plasma turbulence have been drawing attention of physicists, in particular, if solar wind turbulence represents kinetic Alfvén or whistler mode as the carrier of spectral energy among different scales through wave-wave interactions. We investigate solar wind intervals of Cluster data for various flow velocities with a high-resolution wave vector analysis method, Multi-point Signal Resonator technique, at the tetrahedral separation about 100 km. Magnetic field data and ion data are used to determine the frequency- wave vector diagrams in the co-moving frame of the solar wind. We find primarily perpendicular wave vectors in solar wind turbulence which justify the earlier discussions about kinetic Alfvén or whistler wave. The frequency- wave vector diagrams confirm (a) wave vector anisotropy and (b) scattering in frequencies.
A cost-function approach to rival penalized competitive learning (RPCL).
Ma, Jinwen; Wang, Taijun
2006-08-01
Rival penalized competitive learning (RPCL) has been shown to be a useful tool for clustering on a set of sample data in which the number of clusters is unknown. However, the RPCL algorithm was proposed heuristically and is still in lack of a mathematical theory to describe its convergence behavior. In order to solve the convergence problem, we investigate it via a cost-function approach. By theoretical analysis, we prove that a general form of RPCL, called distance-sensitive RPCL (DSRPCL), is associated with the minimization of a cost function on the weight vectors of a competitive learning network. As a DSRPCL process decreases the cost to a local minimum, a number of weight vectors eventually fall into a hypersphere surrounding the sample data, while the other weight vectors diverge to infinity. Moreover, it is shown by the theoretical analysis and simulation experiments that if the cost reduces into the global minimum, a correct number of weight vectors is automatically selected and located around the centers of the actual clusters, respectively. Finally, we apply the DSRPCL algorithms to unsupervised color image segmentation and classification of the wine data.
Mo, Yun; Zhang, Zhongzhao; Meng, Weixiao; Ma, Lin; Wang, Yao
2014-01-01
Indoor positioning systems based on the fingerprint method are widely used due to the large number of existing devices with a wide range of coverage. However, extensive positioning regions with a massive fingerprint database may cause high computational complexity and error margins, therefore clustering methods are widely applied as a solution. However, traditional clustering methods in positioning systems can only measure the similarity of the Received Signal Strength without being concerned with the continuity of physical coordinates. Besides, outage of access points could result in asymmetric matching problems which severely affect the fine positioning procedure. To solve these issues, in this paper we propose a positioning system based on the Spatial Division Clustering (SDC) method for clustering the fingerprint dataset subject to physical distance constraints. With the Genetic Algorithm and Support Vector Machine techniques, SDC can achieve higher coarse positioning accuracy than traditional clustering algorithms. In terms of fine localization, based on the Kernel Principal Component Analysis method, the proposed positioning system outperforms its counterparts based on other feature extraction methods in low dimensionality. Apart from balancing online matching computational burden, the new positioning system exhibits advantageous performance on radio map clustering, and also shows better robustness and adaptability in the asymmetric matching problem aspect. PMID:24451470
Ajamma, Yvonne Ukamaka; Villinger, Jandouwe; Omondi, David; Salifu, Daisy; Onchuru, Thomas Ogao; Njoroge, Laban; Muigai, Anne W T; Masiga, Daniel K
2016-11-01
The Lake Baringo and Lake Victoria regions of Kenya are associated with high seroprevalence of mosquito-transmitted arboviruses. However, molecular identification of potential mosquito vector species, including morphologically identified ones, remains scarce. To estimate the diversity, abundance, and distribution of mosquito vectors on the mainland shores and adjacent inhabited islands in these regions, we collected and morphologically identified adult and immature mosquitoes and obtained the corresponding sequence variation at cytochrome c oxidase 1 (COI) and internal transcribed spacer region 2 (ITS2) gene regions. A total of 63 species (including five subspecies) were collected from both study areas, 47 of which have previously been implicated as disease vectors. Fourteen species were found only on island sites, which are rarely included in mosquito diversity surveys. We collected more mosquitoes, yet with lower species composition, at Lake Baringo (40,229 mosquitoes, 32 species) than at Lake Victoria (22,393 mosquitoes, 54 species). Phylogenetic analysis of COI gene sequences revealed Culex perexiguus and Cx tenagius that could not be distinguished morphologically. Most Culex species clustered into a heterogeneous clade with closely related sequences, while Culex pipiens clustered into two distinct COI and ITS2 clades. These data suggest limitations in current morphological identification keys. This is the first DNA barcode report of Kenyan mosquitoes. To improve mosquito species identification, morphological identifications should be supported by their molecular data, while diversity surveys should target both adults and immatures. The diversity of native mosquito disease vectors identified in this study impacts disease transmission risks to humans and livestock. © The Authors 2016. Published by Oxford University Press on behalf of Entomological Society of America.
Kittayapong, Pattamaporn; Thongyuan, Suporn; Olanratmanee, Phanthip; Aumchareoun, Worawit; Koyadun, Surachart; Kittayapong, Rungrith; Butraporn, Piyarat
2012-01-01
Background Dengue is considered one of the most important vector-borne diseases in Thailand. Its incidence is increasing despite routine implementation of national dengue control programmes. This study, conducted during 2010, aimed to demonstrate an application of integrated, community-based, eco-bio-social strategies in combination with locally-produced eco-friendly vector control tools in the dengue control programme, emphasizing urban and peri-urban settings in eastern Thailand. Methodology Three different community settings were selected and were randomly assigned to intervention and control clusters. Key community leaders and relevant governmental authorities were approached to participate in this intervention programme. Ecohealth volunteers were identified and trained in each study community. They were selected among active community health volunteers and were trained by public health experts to conduct vector control activities in their own communities using environmental management in combination with eco-friendly vector control tools. These trained ecohealth volunteers carried out outreach health education and vector control during household visits. Management of public spaces and public properties, especially solid waste management, was efficiently carried out by local municipalities. Significant reduction in the pupae per person index in the intervention clusters when compared to the control ones was used as a proxy to determine the impact of this programme. Results Our community-based dengue vector control programme demonstrated a significant reduction in the pupae per person index during entomological surveys which were conducted at two-month intervals from May 2010 for the total of six months in the intervention and control clusters. The programme also raised awareness in applying eco-friendly vector control approaches and increased intersectoral and household participation in dengue control activities. Conclusion An eco-friendly dengue vector control programme was successfully implemented in urban and peri-urban settings in Thailand, through intersectoral collaboration and practical action at household level, with a significant reduction in vector densities. PMID:23318236
Quintero, Juliana; Brochero, Helena; Manrique-Saide, Pablo; Barrera-Pérez, Mario; Basso, César; Romero, Sonnia; Caprara, Andrea; De Lima Cunha, Jane Cris; Beltrán-Ayala, Efraín; Mitchell-Foster, Kendra; Kroeger, Axel; Sommerfeld, Johannnes; Petzold, Max
2014-01-21
Dengue is an increasingly important public health problem in most Latin American countries and more cost-effective ways of reducing dengue vector densities to prevent transmission are in demand by vector control programs. This multi-centre study attempted to identify key factors associated with vector breeding and development as a basis for improving targeted intervention strategies. In each of 5 participant cities in Mexico, Colombia, Ecuador, Brazil and Uruguay, 20 clusters were randomly selected by grid sampling to incorporate 100 contiguous households, non-residential private buildings (businesses) and public spaces. Standardized household surveys, cluster background surveys and entomological surveys specifically targeted to obtain pupal indices for Aedes aegypti, were conducted in the dry and wet seasons. The study clusters included mainly urban low-middle class populations with satisfactory infrastructure and -except for Uruguay- favourable climatic conditions for dengue vector development. Household knowledge about dengue and "dengue mosquitoes" was widespread, mainly through mass media, but there was less awareness around interventions to reduce vector densities. Vector production (measured through pupal indices) was favoured when water containers were outdoor, uncovered, unused (even in Colombia and Ecuador where the large tanks used for household water storage and washing were predominantly productive) and -particularly during the dry season- rainwater filled. Larval infestation did not reflect productive container types. All productive container types, including those important in the dry season, were identified by pupal surveys executed during the rainy season. A number of findings are relevant for improving vector control: 1) there is a need for complementing larval surveys with occasional pupal surveys (to be conducted during the wet season) for identifying and subsequently targeting productive container types; 2) the need to raise public awareness about useful and effective interventions in productive container types specific to their area; and 3) the motivation for control services that-according to this and similar studies in Asia- dedicated, targeted vector management can make a difference in terms of reducing vector abundance.
Kittayapong, Pattamaporn; Thongyuan, Suporn; Olanratmanee, Phanthip; Aumchareoun, Worawit; Koyadun, Surachart; Kittayapong, Rungrith; Butraporn, Piyarat
2012-12-01
Dengue is considered one of the most important vector-borne diseases in Thailand. Its incidence is increasing despite routine implementation of national dengue control programmes. This study, conducted during 2010, aimed to demonstrate an application of integrated, community-based, eco-bio-social strategies in combination with locally-produced eco-friendly vector control tools in the dengue control programme, emphasizing urban and peri-urban settings in eastern Thailand. Three different community settings were selected and were randomly assigned to intervention and control clusters. Key community leaders and relevant governmental authorities were approached to participate in this intervention programme. Ecohealth volunteers were identified and trained in each study community. They were selected among active community health volunteers and were trained by public health experts to conduct vector control activities in their own communities using environmental management in combination with eco-friendly vector control tools. These trained ecohealth volunteers carried out outreach health education and vector control during household visits. Management of public spaces and public properties, especially solid waste management, was efficiently carried out by local municipalities. Significant reduction in the pupae per person index in the intervention clusters when compared to the control ones was used as a proxy to determine the impact of this programme. Our community-based dengue vector control programme demonstrated a significant reduction in the pupae per person index during entomological surveys which were conducted at two-month intervals from May 2010 for the total of six months in the intervention and control clusters. The programme also raised awareness in applying eco-friendly vector control approaches and increased intersectoral and household participation in dengue control activities. An eco-friendly dengue vector control programme was successfully implemented in urban and peri-urban settings in Thailand, through intersectoral collaboration and practical action at household level, with a significant reduction in vector densities.
Diametrical clustering for identifying anti-correlated gene clusters.
Dhillon, Inderjit S; Marcotte, Edward M; Roshan, Usman
2003-09-01
Clustering genes based upon their expression patterns allows us to predict gene function. Most existing clustering algorithms cluster genes together when their expression patterns show high positive correlation. However, it has been observed that genes whose expression patterns are strongly anti-correlated can also be functionally similar. Biologically, this is not unintuitive-genes responding to the same stimuli, regardless of the nature of the response, are more likely to operate in the same pathways. We present a new diametrical clustering algorithm that explicitly identifies anti-correlated clusters of genes. Our algorithm proceeds by iteratively (i). re-partitioning the genes and (ii). computing the dominant singular vector of each gene cluster; each singular vector serving as the prototype of a 'diametric' cluster. We empirically show the effectiveness of the algorithm in identifying diametrical or anti-correlated clusters. Testing the algorithm on yeast cell cycle data, fibroblast gene expression data, and DNA microarray data from yeast mutants reveals that opposed cellular pathways can be discovered with this method. We present systems whose mRNA expression patterns, and likely their functions, oppose the yeast ribosome and proteosome, along with evidence for the inverse transcriptional regulation of a number of cellular systems.
Zhang, Lu; Pang, Xiaodan; Ozolins, Oskars; Udalcovs, Aleksejs; Popov, Sergei; Xiao, Shilin; Hu, Weisheng; Chen, Jiajia
2018-04-01
We propose a spectrally efficient digitized radio-over-fiber (D-RoF) system by grouping highly correlated neighboring samples of the analog signals into multidimensional vectors, where the k-means clustering algorithm is adopted for adaptive quantization. A 30 Gbit/s D-RoF system is experimentally demonstrated to validate the proposed scheme, reporting a carrier aggregation of up to 40 100 MHz orthogonal frequency division multiplexing (OFDM) channels with quadrate amplitude modulation (QAM) order of 4 and an aggregation of 10 100 MHz OFDM channels with a QAM order of 16384. The equivalent common public radio interface rates from 37 to 150 Gbit/s are supported. Besides, the error vector magnitude (EVM) of 8% is achieved with the number of quantization bits of 4, and the EVM can be further reduced to 1% by increasing the number of quantization bits to 7. Compared with conventional pulse coding modulation-based D-RoF systems, the proposed D-RoF system improves the signal-to-noise-ratio up to ∼9 dB and greatly reduces the EVM, given the same number of quantization bits.
Inference of the oxidative stress network in Anopheles stephensi upon Plasmodium infection.
Shrinet, Jatin; Nandal, Umesh Kumar; Adak, Tridibes; Bhatnagar, Raj K; Sunil, Sujatha
2014-01-01
Ookinete invasion of Anopheles midgut is a critical step for malaria transmission; the parasite numbers drop drastically and practically reach a minimum during the parasite's whole life cycle. At this stage, the parasite as well as the vector undergoes immense oxidative stress. Thereafter, the vector undergoes oxidative stress at different time points as the parasite invades its tissues during the parasite development. The present study was undertaken to reconstruct the network of differentially expressed genes involved in oxidative stress in Anopheles stephensi during Plasmodium development and maturation in the midgut. Using high throughput next generation sequencing methods, we generated the transcriptome of the An. stephensi midgut during Plasmodium vinckei petteri oocyst invasion of the midgut epithelium. Further, we utilized large datasets available on public domain on Anopheles during Plasmodium ookinete invasion and Drosophila datasets and arrived upon clusters of genes that may play a role in oxidative stress. Finally, we used support vector machines for the functional prediction of the un-annotated genes of An. stephensi. Integrating the results from all the different data analyses, we identified a total of 516 genes that were involved in oxidative stress in An. stephensi during Plasmodium development. The significantly regulated genes were further extracted from this gene cluster and used to infer an oxidative stress network of An. stephensi. Using system biology approaches, we have been able to ascertain the role of several putative genes in An. stephensi with respect to oxidative stress. Further experimental validations of these genes are underway.
2011-01-01
Background Existing methods of predicting DNA-binding proteins used valuable features of physicochemical properties to design support vector machine (SVM) based classifiers. Generally, selection of physicochemical properties and determination of their corresponding feature vectors rely mainly on known properties of binding mechanism and experience of designers. However, there exists a troublesome problem for designers that some different physicochemical properties have similar vectors of representing 20 amino acids and some closely related physicochemical properties have dissimilar vectors. Results This study proposes a systematic approach (named Auto-IDPCPs) to automatically identify a set of physicochemical and biochemical properties in the AAindex database to design SVM-based classifiers for predicting and analyzing DNA-binding domains/proteins. Auto-IDPCPs consists of 1) clustering 531 amino acid indices in AAindex into 20 clusters using a fuzzy c-means algorithm, 2) utilizing an efficient genetic algorithm based optimization method IBCGA to select an informative feature set of size m to represent sequences, and 3) analyzing the selected features to identify related physicochemical properties which may affect the binding mechanism of DNA-binding domains/proteins. The proposed Auto-IDPCPs identified m=22 features of properties belonging to five clusters for predicting DNA-binding domains with a five-fold cross-validation accuracy of 87.12%, which is promising compared with the accuracy of 86.62% of the existing method PSSM-400. For predicting DNA-binding sequences, the accuracy of 75.50% was obtained using m=28 features, where PSSM-400 has an accuracy of 74.22%. Auto-IDPCPs and PSSM-400 have accuracies of 80.73% and 82.81%, respectively, applied to an independent test data set of DNA-binding domains. Some typical physicochemical properties discovered are hydrophobicity, secondary structure, charge, solvent accessibility, polarity, flexibility, normalized Van Der Waals volume, pK (pK-C, pK-N, pK-COOH and pK-a(RCOOH)), etc. Conclusions The proposed approach Auto-IDPCPs would help designers to investigate informative physicochemical and biochemical properties by considering both prediction accuracy and analysis of binding mechanism simultaneously. The approach Auto-IDPCPs can be also applicable to predict and analyze other protein functions from sequences. PMID:21342579
2013-01-01
Background Knowledge of the interactions between mosquitoes and humans, and how vector control interventions affect them, is sparse. A study exploring host-seeking behaviour at a human-occupied bed net, a key event in such interactions, is reported here. Methods Host-seeking female Anopheles gambiae activity was studied using a human-baited ‘sticky-net’ (a bed net without insecticide, coated with non-setting adhesive) to trap mosquitoes. The numbers and distribution of mosquitoes captured on each surface of the bed net were recorded and analysed using non-parametric statistical methods and random effects regression analysis. To confirm sticky-net reliability, the experiment was repeated using a pitched sticky-net (tilted sides converging at apex, i.e., neither horizontal nor vertical). The capture efficiency of horizontal and vertical sticky surfaces were compared, and the potential repellency of the adhesive was investigated. Results In a semi-field experiment, more mosquitoes were caught on the top (74-87%) than on the sides of the net (p < 0.001). In laboratory experiments, more mosquitoes were caught on the top than on the sides in human-baited tests (p < 0.001), significantly different to unbaited controls (p < 0.001) where most mosquitoes were on the sides (p = 0.047). In both experiments, approximately 70% of mosquitoes captured on the top surface were clustered within a 90 × 90 cm (or lesser) area directly above the head and chest (p < 0.001). In pitched net tests, similar clustering occurred over the sleeper’s head and chest in baited tests only (p < 0.001). Capture rates at horizontal and vertical surfaces were not significantly different and the sticky-net was not repellent. Conclusion This study demonstrated that An. gambiae activity occurs predominantly within a limited area of the top surface of bed nets. The results provide support for the two-in-one bed net design for managing pyrethroid-resistant vector populations. Further exploration of vector behaviour at the bed net interface could contribute to additional improvements in insecticide-treated bed net design or the development of novel vector control tools. PMID:23902661
NASA Astrophysics Data System (ADS)
Yang, L.; Shi, L.; Li, P.; Yang, J.; Zhao, L.; Zhao, B.
2018-04-01
Due to the forward scattering and block of radar signal, the water, bare soil, shadow, named low backscattering objects (LBOs), often present low backscattering intensity in polarimetric synthetic aperture radar (PolSAR) image. Because the LBOs rise similar backscattering intensity and polarimetric responses, the spectral-based classifiers are inefficient to deal with LBO classification, such as Wishart method. Although some polarimetric features had been exploited to relieve the confusion phenomenon, the backscattering features are still found unstable when the system noise floor varies in the range direction. This paper will introduce a simple but effective scene classification method based on Bag of Words (BoW) model using Support Vector Machine (SVM) to discriminate the LBOs, without relying on any polarimetric features. In the proposed approach, square windows are firstly opened around the LBOs adaptively to determine the scene images, and then the Scale-Invariant Feature Transform (SIFT) points are detected in training and test scenes. The several SIFT features detected are clustered using K-means to obtain certain cluster centers as the visual word lists and scene images are represented using word frequency. At last, the SVM is selected for training and predicting new scenes as some kind of LBOs. The proposed method is executed over two AIRSAR data sets at C band and L band, including water, bare soil and shadow scenes. The experimental results illustrate the effectiveness of the scene method in distinguishing LBOs.
Koch, Stefan P.; Hägele, Claudia; Haynes, John-Dylan; Heinz, Andreas; Schlagenhauf, Florian; Sterzer, Philipp
2015-01-01
Functional neuroimaging has provided evidence for altered function of mesolimbic circuits implicated in reward processing, first and foremost the ventral striatum, in patients with schizophrenia. While such findings based on significant group differences in brain activations can provide important insights into the pathomechanisms of mental disorders, the use of neuroimaging results from standard univariate statistical analysis for individual diagnosis has proven difficult. In this proof of concept study, we tested whether the predictive accuracy for the diagnostic classification of schizophrenia patients vs. healthy controls could be improved using multivariate pattern analysis (MVPA) of regional functional magnetic resonance imaging (fMRI) activation patterns for the anticipation of monetary reward. With a searchlight MVPA approach using support vector machine classification, we found that the diagnostic category could be predicted from local activation patterns in frontal, temporal, occipital and midbrain regions, with a maximal cluster peak classification accuracy of 93% for the right pallidum. Region-of-interest based MVPA for the ventral striatum achieved a maximal cluster peak accuracy of 88%, whereas the classification accuracy on the basis of standard univariate analysis reached only 75%. Moreover, using support vector regression we could additionally predict the severity of negative symptoms from ventral striatal activation patterns. These results show that MVPA can be used to substantially increase the accuracy of diagnostic classification on the basis of task-related fMRI signal patterns in a regionally specific way. PMID:25799236
Zare, Marzieh; Rezvani, Zahra; Benasich, April A
2016-07-01
This study assesses the ability of a novel, "automatic classification" approach to facilitate identification of infants at highest familial risk for language-learning disorders (LLD) and to provide converging assessments to enable earlier detection of developmental disorders that disrupt language acquisition. Network connectivity measures derived from 62-channel electroencephalogram (EEG) recording were used to identify selected features within two infant groups who differed on LLD risk: infants with a family history of LLD (FH+) and typically-developing infants without such a history (FH-). A support vector machine was deployed; global efficiency and global and local clustering coefficients were computed. A novel minimum spanning tree (MST) approach was also applied. Cross-validation was employed to assess the resultant classification. Infants were classified with about 80% accuracy into FH+ and FH- groups with 89% specificity and precision of 92%. Clustering patterns differed by risk group and MST network analysis suggests that FH+ infants' EEG complexity patterns were significantly different from FH- infants. The automatic classification techniques used here were shown to be both robust and reliable and should provide valuable information when applied to early identification of risk or clinical groups. The ability to identify infants at highest risk for LLD using "automatic classification" strategies is a novel convergent approach that may facilitate earlier diagnosis and remediation. Copyright © 2016 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.
A multi-platform evaluation of the randomized CX low-rank matrix factorization in Spark
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gittens, Alex; Kottalam, Jey; Yang, Jiyan
We investigate the performance and scalability of the randomized CX low-rank matrix factorization and demonstrate its applicability through the analysis of a 1TB mass spectrometry imaging (MSI) dataset, using Apache Spark on an Amazon EC2 cluster, a Cray XC40 system, and an experimental Cray cluster. We implemented this factorization both as a parallelized C implementation with hand-tuned optimizations and in Scala using the Apache Spark high-level cluster computing framework. We obtained consistent performance across the three platforms: using Spark we were able to process the 1TB size dataset in under 30 minutes with 960 cores on all systems, with themore » fastest times obtained on the experimental Cray cluster. In comparison, the C implementation was 21X faster on the Amazon EC2 system, due to careful cache optimizations, bandwidth-friendly access of matrices and vector computation using SIMD units. We report these results and their implications on the hardware and software issues arising in supporting data-centric workloads in parallel and distributed environments.« less
Unsupervised active learning based on hierarchical graph-theoretic clustering.
Hu, Weiming; Hu, Wei; Xie, Nianhua; Maybank, Steve
2009-10-01
Most existing active learning approaches are supervised. Supervised active learning has the following problems: inefficiency in dealing with the semantic gap between the distribution of samples in the feature space and their labels, lack of ability in selecting new samples that belong to new categories that have not yet appeared in the training samples, and lack of adaptability to changes in the semantic interpretation of sample categories. To tackle these problems, we propose an unsupervised active learning framework based on hierarchical graph-theoretic clustering. In the framework, two promising graph-theoretic clustering algorithms, namely, dominant-set clustering and spectral clustering, are combined in a hierarchical fashion. Our framework has some advantages, such as ease of implementation, flexibility in architecture, and adaptability to changes in the labeling. Evaluations on data sets for network intrusion detection, image classification, and video classification have demonstrated that our active learning framework can effectively reduce the workload of manual classification while maintaining a high accuracy of automatic classification. It is shown that, overall, our framework outperforms the support-vector-machine-based supervised active learning, particularly in terms of dealing much more efficiently with new samples whose categories have not yet appeared in the training samples.
On three-dimensional misorientation spaces
Bennett, Robbie J.; Vukmanovic, Zoja; Solano-Alvarez, Wilberth; Lainé, Steven J.; Einsle, Joshua F.; Midgley, Paul A.; Rae, Catherine M. F.; Hielscher, Ralf
2017-01-01
Determining the local orientation of crystals in engineering and geological materials has become routine with the advent of modern crystallographic mapping techniques. These techniques enable many thousands of orientation measurements to be made, directing attention towards how such orientation data are best studied. Here, we provide a guide to the visualization of misorientation data in three-dimensional vector spaces, reduced by crystal symmetry, to reveal crystallographic orientation relationships. Domains for all point group symmetries are presented and an analysis methodology is developed and applied to identify crystallographic relationships, indicated by clusters in the misorientation space, in examples from materials science and geology. This analysis aids the determination of active deformation mechanisms and evaluation of cluster centres and spread enables more accurate description of transformation processes supporting arguments regarding provenance. PMID:29118660
NASA Astrophysics Data System (ADS)
Li, Hui; Hong, Lu-Yao; Zhou, Qing; Yu, Hai-Jie
2015-08-01
The business failure of numerous companies results in financial crises. The high social costs associated with such crises have made people to search for effective tools for business risk prediction, among which, support vector machine is very effective. Several modelling means, including single-technique modelling, hybrid modelling, and ensemble modelling, have been suggested in forecasting business risk with support vector machine. However, existing literature seldom focuses on the general modelling frame for business risk prediction, and seldom investigates performance differences among different modelling means. We reviewed researches on forecasting business risk with support vector machine, proposed the general assisted prediction modelling frame with hybridisation and ensemble (APMF-WHAE), and finally, investigated the use of principal components analysis, support vector machine, random sampling, and group decision, under the general frame in forecasting business risk. Under the APMF-WHAE frame with support vector machine as the base predictive model, four specific predictive models were produced, namely, pure support vector machine, a hybrid support vector machine involved with principal components analysis, a support vector machine ensemble involved with random sampling and group decision, and an ensemble of hybrid support vector machine using group decision to integrate various hybrid support vector machines on variables produced from principle components analysis and samples from random sampling. The experimental results indicate that hybrid support vector machine and ensemble of hybrid support vector machines were able to produce dominating performance than pure support vector machine and support vector machine ensemble.
WEBGIS based CropWatch online agriculture monitoring system
NASA Astrophysics Data System (ADS)
Zhang, X.; Wu, B.; Zeng, H.; Zhang, M.; Yan, N.
2015-12-01
CropWatch, which was developed by the Institute of Remote Sensing and Digital Earth (RADI), Chinese Academy of Sciences (CAS), has achieved breakthrough results in the integration of methods, independence of the assessments and support to emergency response by periodically releasing global agricultural information. Taking advantages of the multi-source remote sensing data and the openness of the data sharing policies, CropWatch group reported their monitoring results by publishing four bulletins one year. In order to better analysis and generate the bulletin and provide an alternative way to access agricultural monitoring indicators and results in CropWatch, The CropWatch online system based on the WEBGIS techniques has been developed. Figure 1 shows the CropWatch online system structure and the system UI in Clustering mode. Data visualization is sorted into three different modes: Vector mode, Raster mode and Clustering mode. Vector mode provides the statistic value for all the indicators over each monitoring units which allows users to compare current situation with historical values (average, maximum, etc.). Users can compare the profiles of each indicator over the current growing season with the historical data in a chart by selecting the region of interest (ROI). Raster mode provides pixel based anomaly of CropWatch indicators globally. In this mode, users are able to zoom in to the regions where the notable anomaly was identified from statistic values in vector mode. Data from remote sensing image series at high temporal and low spatial resolution provide key information in agriculture monitoring. Clustering mode provides integrated information on different classes in maps, the corresponding profiles for each class and the percentage of area of each class to the total area of all classes. The time series data is categorized into limited types by the ISODATA algorithm. For each clustering type, pixels on the map, profiles, and percentage legend are all linked together. All the three visualization methods are applied to four scales including 65 monitoring and reporting units (MRUs), 7 major production zones (MPZs), 173 countries and sub-countries for 9 large countries. Agro-Climatic information, Agronomic information and indicators related with crop area, crop yield and crop production are provided.
The peculiar velocities of rich clusters in the hot and cold dark matter scenarios
NASA Technical Reports Server (NTRS)
Rhee, George F.; West, Michael J.; Villumsen, Jens V.
1993-01-01
We present the results of a study of the peculiar velocities of rich clusters of galaxies. The peculiar motion of rich clusters in various cosmological scenarios is of interest for a number of reasons. Observationally, one can measure the peculiar motion of clusters to greater distances than galaxies because cluster peculiar motions can be determined to greater accuracy. One can also test the slope of distance indicator relations using clusters to see if galaxy properties vary with environment. We have used N-body simulations to measure the amplitude and rms cluster peculiar velocity as a function of bias parameter in the hot and cold dark matter scenarios. In addition to measuring the mean and rms peculiar velocity of clusters in the two models, we determined whether the peculiar velocity vector of a given cluster is well aligned with the gravity vector due to all the particles in the simulation and the gravity vector due to the particles present only in the clusters. We have investigated the peculiar velocities of rich clusters of galaxies in the cold dark matter and hot dark matter galaxy formation scenarios. We have derived peculiar velocities and associated errors for the scenarios using four values of the bias parameter ranging from b = 1 to b = 2.5. The growth of the mean peculiar velocity with scale factor has been determined and compared to that predicted by linear theory. In addition, we have compared the orientation of force and velocity in these simulations to see if a program such as that proposed by Bertschinger and Dekel (1989) for elliptical galaxy peculiar motions can be applied to clusters. The method they describe enables one to recover the density field from large scale redshift distance samples. The method makes it possible to do this when only radial velocities are known by assuming that the velocity field is curl free. Our analysis suggests that this program if applied to clusters is only realizable for models with a low value of the bias parameter, i.e., models in which the peculiar velocities of clusters are large enough that the errors do not render the analysis impracticable.
Targeted Screening Strategies to Detect Trypanosoma cruzi Infection in Children
Levy, Michael Z.; Kawai, Vivian; Bowman, Natalie M.; Waller, Lance A.; Cabrera, Lilia; Pinedo-Cancino, Viviana V.; Seitz, Amy E.; Steurer, Frank J.; Cornejo del Carpio, Juan G.; Cordova-Benzaquen, Eleazar; Maguire, James H.; Gilman, Robert H.; Bern, Caryn
2007-01-01
Background Millions of people are infected with Trypanosoma cruzi, the causative agent of Chagas disease in Latin America. Anti-trypanosomal drug therapy can cure infected individuals, but treatment efficacy is highest early in infection. Vector control campaigns disrupt transmission of T. cruzi, but without timely diagnosis, children infected prior to vector control often miss the window of opportunity for effective chemotherapy. Methods and Findings We performed a serological survey in children 2–18 years old living in a peri-urban community of Arequipa, Peru, and linked the results to entomologic, spatial and census data gathered during a vector control campaign. 23 of 433 (5.3% [95% CI 3.4–7.9]) children were confirmed seropositive for T. cruzi infection by two methods. Spatial analysis revealed that households with infected children were very tightly clustered within looser clusters of households with parasite-infected vectors. Bayesian hierarchical mixed models, which controlled for clustering of infection, showed that a child's risk of being seropositive increased by 20% per year of age and 4% per vector captured within the child's house. Receiver operator characteristic (ROC) plots of best-fit models suggest that more than 83% of infected children could be identified while testing only 22% of eligible children. Conclusions We found evidence of spatially-focal vector-borne T. cruzi transmission in peri-urban Arequipa. Ongoing vector control campaigns, in addition to preventing further parasite transmission, facilitate the collection of data essential to identifying children at high risk of T. cruzi infection. Targeted screening strategies could make integration of diagnosis and treatment of children into Chagas disease control programs feasible in lower-resource settings. PMID:18160979
Abeyewickreme, W; Wickremasinghe, A R; Karunatilake, K; Sommerfeld, J; Axel, Kroeger
2012-12-01
Waste management through community mobilization to reduce breeding places at household level could be an effective and sustainable dengue vector control strategy in areas where vector breeding takes place in small discarded water containers. The objective of this study was to assess the validity of this assumption. An intervention study was conducted from February 2009 to February 2010 in the populous Gampaha District of Sri Lanka. Eight neighborhoods (clusters) with roughly 200 houses each were selected randomly from high and low dengue endemic areas; 4 of them were allocated to the intervention arm (2 in the high and 2 in the low endemicity areas) and in the same way 4 clusters to the control arm. A baseline household survey was conducted and entomological and sociological surveys were carried out simultaneously at baseline, at 3 months, at 9 months and at 15 months after the start of the intervention. The intervention programme in the treatment clusters consisted of building partnerships of local stakeholders, waste management at household level, the promotion of composting biodegradable household waste, raising awareness on the importance of solid waste management in dengue control and improving garbage collection with the assistance of local government authorities. The intervention and control clusters were very similar and there were no significant differences in pupal and larval indices of Aedes mosquitoes. The establishment of partnerships among local authorities was well accepted and sustainable; the involvement of communities and households was successful. Waste management with the elimination of the most productive water container types (bowls, tins, bottles) led to a significant reduction of pupal indices as a proxy for adult vector densities. The coordination of local authorities along with increased household responsibility for targeted vector interventions (in our case solid waste management due to the type of preferred vector breeding places) is vital for effective and sustained dengue control.
Abeyewickreme, W; Wickremasinghe, A R; Karunatilake, K; Sommerfeld, Johannes; Kroeger, Axel
2012-01-01
Introduction Waste management through community mobilization to reduce breeding places at household level could be an effective and sustainable dengue vector control strategy in areas where vector breeding takes place in small discarded water containers. The objective of this study was to assess the validity of this assumption. Methods An intervention study was conducted from February 2009 to February 2010 in the populous Gampaha District of Sri Lanka. Eight neighborhoods (clusters) with roughly 200 houses each were selected randomly from high and low dengue endemic areas; 4 of them were allocated to the intervention arm (2 in the high and 2 in the low endemicity areas) and in the same way 4 clusters to the control arm. A baseline household survey was conducted and entomological and sociological surveys were carried out simultaneously at baseline, at 3 months, at 9 months and at 15 months after the start of the intervention. The intervention programme in the treatment clusters consisted of building partnerships of local stakeholders, waste management at household level, the promotion of composting biodegradable household waste, raising awareness on the importance of solid waste management in dengue control and improving garbage collection with the assistance of local government authorities. Results The intervention and control clusters were very similar and there were no significant differences in pupal and larval indices of Aedes mosquitoes. The establishment of partnerships among local authorities was well accepted and sustainable; the involvement of communities and households was successful. Waste management with the elimination of the most productive water container types (bowls, tins, bottles) led to a significant reduction of pupal indices as a proxy for adult vector densities. Conclusion The coordination of local authorities along with increased household responsibility for targeted vector interventions (in our case solid waste management due to the type of preferred vector breeding places) is vital for effective and sustained dengue control. PMID:23318240
NASA Astrophysics Data System (ADS)
Li, Yongxin; Li, Zhongrui; Yamanaka, Kazuya; Xu, Ying; Zhang, Weipeng; Vlamakis, Hera; Kolter, Roberto; Moore, Bradley S.; Qian, Pei-Yuan
2015-03-01
Bacilli are ubiquitous low G+C environmental Gram-positive bacteria that produce a wide assortment of specialized small molecules. Although their natural product biosynthetic potential is high, robust molecular tools to support the heterologous expression of large biosynthetic gene clusters in Bacillus hosts are rare. Herein we adapt transformation-associated recombination (TAR) in yeast to design a single genomic capture and expression vector for antibiotic production in Bacillus subtilis. After validating this direct cloning ``plug-and-play'' approach with surfactin, we genetically interrogated amicoumacin biosynthetic gene cluster from the marine isolate Bacillus subtilis 1779. Its heterologous expression allowed us to explore an unusual maturation process involving the N-acyl-asparagine pro-drug intermediates preamicoumacins, which are hydrolyzed by the asparagine-specific peptidase into the active component amicoumacin A. This work represents the first direct cloning based heterologous expression of natural products in the model organism B. subtilis and paves the way to the development of future genome mining efforts in this genus.
Li, Yongxin; Li, Zhongrui; Yamanaka, Kazuya; Xu, Ying; Zhang, Weipeng; Vlamakis, Hera; Kolter, Roberto; Moore, Bradley S; Qian, Pei-Yuan
2015-03-24
Bacilli are ubiquitous low G+C environmental Gram-positive bacteria that produce a wide assortment of specialized small molecules. Although their natural product biosynthetic potential is high, robust molecular tools to support the heterologous expression of large biosynthetic gene clusters in Bacillus hosts are rare. Herein we adapt transformation-associated recombination (TAR) in yeast to design a single genomic capture and expression vector for antibiotic production in Bacillus subtilis. After validating this direct cloning "plug-and-play" approach with surfactin, we genetically interrogated amicoumacin biosynthetic gene cluster from the marine isolate Bacillus subtilis 1779. Its heterologous expression allowed us to explore an unusual maturation process involving the N-acyl-asparagine pro-drug intermediates preamicoumacins, which are hydrolyzed by the asparagine-specific peptidase into the active component amicoumacin A. This work represents the first direct cloning based heterologous expression of natural products in the model organism B. subtilis and paves the way to the development of future genome mining efforts in this genus.
Iris recognition using image moments and k-means algorithm.
Khan, Yaser Daanial; Khan, Sher Afzal; Ahmad, Farooq; Islam, Saeed
2014-01-01
This paper presents a biometric technique for identification of a person using the iris image. The iris is first segmented from the acquired image of an eye using an edge detection algorithm. The disk shaped area of the iris is transformed into a rectangular form. Described moments are extracted from the grayscale image which yields a feature vector containing scale, rotation, and translation invariant moments. Images are clustered using the k-means algorithm and centroids for each cluster are computed. An arbitrary image is assumed to belong to the cluster whose centroid is the nearest to the feature vector in terms of Euclidean distance computed. The described model exhibits an accuracy of 98.5%.
Iris Recognition Using Image Moments and k-Means Algorithm
Khan, Yaser Daanial; Khan, Sher Afzal; Ahmad, Farooq; Islam, Saeed
2014-01-01
This paper presents a biometric technique for identification of a person using the iris image. The iris is first segmented from the acquired image of an eye using an edge detection algorithm. The disk shaped area of the iris is transformed into a rectangular form. Described moments are extracted from the grayscale image which yields a feature vector containing scale, rotation, and translation invariant moments. Images are clustered using the k-means algorithm and centroids for each cluster are computed. An arbitrary image is assumed to belong to the cluster whose centroid is the nearest to the feature vector in terms of Euclidean distance computed. The described model exhibits an accuracy of 98.5%. PMID:24977221
Community detection using Kernel Spectral Clustering with memory
NASA Astrophysics Data System (ADS)
Langone, Rocco; Suykens, Johan A. K.
2013-02-01
This work is related to the problem of community detection in dynamic scenarios, which for instance arises in the segmentation of moving objects, clustering of telephone traffic data, time-series micro-array data etc. A desirable feature of a clustering model which has to capture the evolution of communities over time is the temporal smoothness between clusters in successive time-steps. In this way the model is able to track the long-term trend and in the same time it smooths out short-term variation due to noise. We use the Kernel Spectral Clustering with Memory effect (MKSC) which allows to predict cluster memberships of new nodes via out-of-sample extension and has a proper model selection scheme. It is based on a constrained optimization formulation typical of Least Squares Support Vector Machines (LS-SVM), where the objective function is designed to explicitly incorporate temporal smoothness as a valid prior knowledge. The latter, in fact, allows the model to cluster the current data well and to be consistent with the recent history. Here we propose a generalization of the MKSC model with an arbitrary memory, not only one time-step in the past. The experiments conducted on toy problems confirm our expectations: the more memory we add to the model, the smoother over time are the clustering results. We also compare with the Evolutionary Spectral Clustering (ESC) algorithm which is a state-of-the art method, and we obtain comparable or better results.
Data Mining Methods for Recommender Systems
NASA Astrophysics Data System (ADS)
Amatriain, Xavier; Jaimes*, Alejandro; Oliver, Nuria; Pujol, Josep M.
In this chapter, we give an overview of the main Data Mining techniques used in the context of Recommender Systems. We first describe common preprocessing methods such as sampling or dimensionality reduction. Next, we review the most important classification techniques, including Bayesian Networks and Support Vector Machines. We describe the k-means clustering algorithm and discuss several alternatives. We also present association rules and related algorithms for an efficient training process. In addition to introducing these techniques, we survey their uses in Recommender Systems and present cases where they have been successfully applied.
Key-Node-Separated Graph Clustering and Layouts for Human Relationship Graph Visualization.
Itoh, Takayuki; Klein, Karsten
2015-01-01
Many graph-drawing methods apply node-clustering techniques based on the density of edges to find tightly connected subgraphs and then hierarchically visualize the clustered graphs. However, users may want to focus on important nodes and their connections to groups of other nodes for some applications. For this purpose, it is effective to separately visualize the key nodes detected based on adjacency and attributes of the nodes. This article presents a graph visualization technique for attribute-embedded graphs that applies a graph-clustering algorithm that accounts for the combination of connections and attributes. The graph clustering step divides the nodes according to the commonality of connected nodes and similarity of feature value vectors. It then calculates the distances between arbitrary pairs of clusters according to the number of connecting edges and the similarity of feature value vectors and finally places the clusters based on the distances. Consequently, the technique separates important nodes that have connections to multiple large clusters and improves the visibility of such nodes' connections. To test this technique, this article presents examples with human relationship graph datasets, including a coauthorship and Twitter communication network dataset.
Exotic vector charmonium and its leptonic decay width
NASA Astrophysics Data System (ADS)
Chen, Ying; Chiu, Wei-Feng; Gong, Ming; Gui, Long-Cheng; Liu, Zhao-Feng
2016-08-01
We propose a novel type of interpolating field operator, which manifests the hybrid-like configuration that the charm quark-antiquark pair recoils against gluonic degrees of freedom. A heavy vector charmonium-like state with a mass of 4.33(2),GeV is disentangled from the conventional charmonium states in the quenched approximation. This state has affinity for the hybrid-like operators but couples less to the relevant quark bilinear operator. We also try to extract its leptonic decay constant and give a tentative upper limit that it is less than one tenth of that of J/ψ, which corresponds to a leptonic decay width about dozens of eV. The connection of this state with X(4260) is also discussed. The numerical calculations were carried out on Tianhe-1A at the National Supercomputer Center (NSCC) in Tianjin and the GPU cluster at Hunan Normal University. This work is supported in part by the National Science Foundation of China (NSFC) (11575196, 11575197, 11335001, 11405053), Y.C. and Z.L. also acknowledge the support of NSFC (11261130311) (CRC 110 by DFG and NSFC)
A cross-species bi-clustering approach to identifying conserved co-regulated genes.
Sun, Jiangwen; Jiang, Zongliang; Tian, Xiuchun; Bi, Jinbo
2016-06-15
A growing number of studies have explored the process of pre-implantation embryonic development of multiple mammalian species. However, the conservation and variation among different species in their developmental programming are poorly defined due to the lack of effective computational methods for detecting co-regularized genes that are conserved across species. The most sophisticated method to date for identifying conserved co-regulated genes is a two-step approach. This approach first identifies gene clusters for each species by a cluster analysis of gene expression data, and subsequently computes the overlaps of clusters identified from different species to reveal common subgroups. This approach is ineffective to deal with the noise in the expression data introduced by the complicated procedures in quantifying gene expression. Furthermore, due to the sequential nature of the approach, the gene clusters identified in the first step may have little overlap among different species in the second step, thus difficult to detect conserved co-regulated genes. We propose a cross-species bi-clustering approach which first denoises the gene expression data of each species into a data matrix. The rows of the data matrices of different species represent the same set of genes that are characterized by their expression patterns over the developmental stages of each species as columns. A novel bi-clustering method is then developed to cluster genes into subgroups by a joint sparse rank-one factorization of all the data matrices. This method decomposes a data matrix into a product of a column vector and a row vector where the column vector is a consistent indicator across the matrices (species) to identify the same gene cluster and the row vector specifies for each species the developmental stages that the clustered genes co-regulate. Efficient optimization algorithm has been developed with convergence analysis. This approach was first validated on synthetic data and compared to the two-step method and several recent joint clustering methods. We then applied this approach to two real world datasets of gene expression during the pre-implantation embryonic development of the human and mouse. Co-regulated genes consistent between the human and mouse were identified, offering insights into conserved functions, as well as similarities and differences in genome activation timing between the human and mouse embryos. The R package containing the implementation of the proposed method in C ++ is available at: https://github.com/JavonSun/mvbc.git and also at the R platform https://www.r-project.org/ jinbo@engr.uconn.edu. © The Author 2016. Published by Oxford University Press.
Quintero, Juliana; García-Betancourt, Tatiana; Cortés, Sebastian; García, Diana; Alcalá, Lucas; González-Uribe, Catalina; Brochero, Helena; Carrasquilla, Gabriel
2015-01-01
Background Long-lasting insecticide-treated net (LLIN) window and door curtains alone or in combination with LLIN water container covers were analysed regarding effectiveness in reducing dengue vector density, and feasibility of the intervention. Methods A cluster randomised trial was conducted in an urban area of Colombia comparing 10 randomly selected control and 10 intervention clusters. In control clusters, routine vector control activities were performed. The intervention delivered first, LLIN curtains (from July to August 2013) and secondly, water container covers (from October to March 2014). Cross-sectional entomological surveys were carried out at baseline (February 2013 to June 2013), 9 weeks after the first intervention (August to October 2013), and 4–6 weeks after the second intervention (March to April 2014). Results Curtains were installed in 922 households and water container covers in 303 households. The Breteau index (BI) fell from 14 to 6 in the intervention group and from 8 to 5 in the control group. The additional intervention with LLIN covers for water containers showed a significant reduction in pupae per person index (PPI) (p=0.01). In the intervention group, the PPI index showed a clear decline of 71% compared with 25% in the control group. Costs were high but options for cost savings were identified. Conclusions Short term impact evaluation indicates that the intervention package can reduce dengue vector density but sustained effect will depend on multiple factors. PMID:25604762
2014-01-01
Background Dengue is an increasingly important public health problem in most Latin American countries and more cost-effective ways of reducing dengue vector densities to prevent transmission are in demand by vector control programs. This multi-centre study attempted to identify key factors associated with vector breeding and development as a basis for improving targeted intervention strategies. Methods In each of 5 participant cities in Mexico, Colombia, Ecuador, Brazil and Uruguay, 20 clusters were randomly selected by grid sampling to incorporate 100 contiguous households, non-residential private buildings (businesses) and public spaces. Standardized household surveys, cluster background surveys and entomological surveys specifically targeted to obtain pupal indices for Aedes aegypti, were conducted in the dry and wet seasons. Results The study clusters included mainly urban low-middle class populations with satisfactory infrastructure and –except for Uruguay- favourable climatic conditions for dengue vector development. Household knowledge about dengue and “dengue mosquitoes” was widespread, mainly through mass media, but there was less awareness around interventions to reduce vector densities. Vector production (measured through pupal indices) was favoured when water containers were outdoor, uncovered, unused (even in Colombia and Ecuador where the large tanks used for household water storage and washing were predominantly productive) and –particularly during the dry season- rainwater filled. Larval infestation did not reflect productive container types. All productive container types, including those important in the dry season, were identified by pupal surveys executed during the rainy season. Conclusions A number of findings are relevant for improving vector control: 1) there is a need for complementing larval surveys with occasional pupal surveys (to be conducted during the wet season) for identifying and subsequently targeting productive container types; 2) the need to raise public awareness about useful and effective interventions in productive container types specific to their area; and 3) the motivation for control services that-according to this and similar studies in Asia- dedicated, targeted vector management can make a difference in terms of reducing vector abundance. PMID:24447796
Thomas, Stephen J.; Aldstadt, Jared; Jarman, Richard G.; Buddhari, Darunee; Yoon, In-Kyu; Richardson, Jason H.; Ponlawat, Alongkot; Iamsirithaworn, Sopon; Scott, Thomas W.; Rothman, Alan L.; Gibbons, Robert V.; Lambrechts, Louis; Endy, Timothy P.
2015-01-01
Dengue is of public health importance in tropical and sub-tropical regions. Dengue virus (DENV) transmission dynamics was studied in Kamphaeng Phet Province, Thailand, using an enhanced spatiotemporal surveillance of 93 hospitalized subjects with confirmed dengue (initiates) and associated cluster individuals (associates) with entomologic sampling. A total of 438 associates were enrolled from 208 houses with household members with a history of fever, located within a 200-m radius of an initiate case. Of 409 associates, 86 (21%) had laboratory-confirmed DENV infection. A total of 63 (1.8%) of the 3,565 mosquitoes collected were dengue polymerase chain reaction positive (PCR+). There was a significant relationship between spatial proximity to the initiate case and likelihood of detecting DENV from associate cases and Aedes mosquitoes. The viral detection rate from human hosts and mosquito vectors in this study was higher than previously observed by the study team in the same geographic area using different methodologies. We propose that the sampling strategy used in this study could support surveillance of DENV transmission and vector interactions. PMID:25986580
Adaptive h -refinement for reduced-order models: ADAPTIVE h -refinement for reduced-order models
Carlberg, Kevin T.
2014-11-05
Our work presents a method to adaptively refine reduced-order models a posteriori without requiring additional full-order-model solves. The technique is analogous to mesh-adaptive h-refinement: it enriches the reduced-basis space online by ‘splitting’ a given basis vector into several vectors with disjoint support. The splitting scheme is defined by a tree structure constructed offline via recursive k-means clustering of the state variables using snapshot data. This method identifies the vectors to split online using a dual-weighted-residual approach that aims to reduce error in an output quantity of interest. The resulting method generates a hierarchy of subspaces online without requiring large-scale operationsmore » or full-order-model solves. Furthermore, it enables the reduced-order model to satisfy any prescribed error tolerance regardless of its original fidelity, as a completely refined reduced-order model is mathematically equivalent to the original full-order model. Experiments on a parameterized inviscid Burgers equation highlight the ability of the method to capture phenomena (e.g., moving shocks) not contained in the span of the original reduced basis.« less
A new feature constituting approach to detection of vocal fold pathology
NASA Astrophysics Data System (ADS)
Hariharan, M.; Polat, Kemal; Yaacob, Sazali
2014-08-01
In the last two decades, non-invasive methods through acoustic analysis of voice signal have been proved to be excellent and reliable tool to diagnose vocal fold pathologies. This paper proposes a new feature vector based on the wavelet packet transform and singular value decomposition for the detection of vocal fold pathology. k-means clustering based feature weighting is proposed to increase the distinguishing performance of the proposed features. In this work, two databases Massachusetts Eye and Ear Infirmary (MEEI) voice disorders database and MAPACI speech pathology database are used. Four different supervised classifiers such as k-nearest neighbour (k-NN), least-square support vector machine, probabilistic neural network and general regression neural network are employed for testing the proposed features. The experimental results uncover that the proposed features give very promising classification accuracy of 100% for both MEEI database and MAPACI speech pathology database.
Support vector machine learning-based fMRI data group analysis.
Wang, Ze; Childress, Anna R; Wang, Jiongjiong; Detre, John A
2007-07-15
To explore the multivariate nature of fMRI data and to consider the inter-subject brain response discrepancies, a multivariate and brain response model-free method is fundamentally required. Two such methods are presented in this paper by integrating a machine learning algorithm, the support vector machine (SVM), and the random effect model. Without any brain response modeling, SVM was used to extract a whole brain spatial discriminance map (SDM), representing the brain response difference between the contrasted experimental conditions. Population inference was then obtained through the random effect analysis (RFX) or permutation testing (PMU) on the individual subjects' SDMs. Applied to arterial spin labeling (ASL) perfusion fMRI data, SDM RFX yielded lower false-positive rates in the null hypothesis test and higher detection sensitivity for synthetic activations with varying cluster size and activation strengths, compared to the univariate general linear model (GLM)-based RFX. For a sensory-motor ASL fMRI study, both SDM RFX and SDM PMU yielded similar activation patterns to GLM RFX and GLM PMU, respectively, but with higher t values and cluster extensions at the same significance level. Capitalizing on the absence of temporal noise correlation in ASL data, this study also incorporated PMU in the individual-level GLM and SVM analyses accompanied by group-level analysis through RFX or group-level PMU. Providing inferences on the probability of being activated or deactivated at each voxel, these individual-level PMU-based group analysis methods can be used to threshold the analysis results of GLM RFX, SDM RFX or SDM PMU.
Unsupervised color image segmentation using a lattice algebra clustering technique
NASA Astrophysics Data System (ADS)
Urcid, Gonzalo; Ritter, Gerhard X.
2011-08-01
In this paper we introduce a lattice algebra clustering technique for segmenting digital images in the Red-Green- Blue (RGB) color space. The proposed technique is a two step procedure. Given an input color image, the first step determines the finite set of its extreme pixel vectors within the color cube by means of the scaled min-W and max-M lattice auto-associative memory matrices, including the minimum and maximum vector bounds. In the second step, maximal rectangular boxes enclosing each extreme color pixel are found using the Chebychev distance between color pixels; afterwards, clustering is performed by assigning each image pixel to its corresponding maximal box. The two steps in our proposed method are completely unsupervised or autonomous. Illustrative examples are provided to demonstrate the color segmentation results including a brief numerical comparison with two other non-maximal variations of the same clustering technique.
Consensus-Based Sorting of Neuronal Spike Waveforms
Fournier, Julien; Mueller, Christian M.; Shein-Idelson, Mark; Hemberger, Mike
2016-01-01
Optimizing spike-sorting algorithms is difficult because sorted clusters can rarely be checked against independently obtained “ground truth” data. In most spike-sorting algorithms in use today, the optimality of a clustering solution is assessed relative to some assumption on the distribution of the spike shapes associated with a particular single unit (e.g., Gaussianity) and by visual inspection of the clustering solution followed by manual validation. When the spatiotemporal waveforms of spikes from different cells overlap, the decision as to whether two spikes should be assigned to the same source can be quite subjective, if it is not based on reliable quantitative measures. We propose a new approach, whereby spike clusters are identified from the most consensual partition across an ensemble of clustering solutions. Using the variability of the clustering solutions across successive iterations of the same clustering algorithm (template matching based on K-means clusters), we estimate the probability of spikes being clustered together and identify groups of spikes that are not statistically distinguishable from one another. Thus, we identify spikes that are most likely to be clustered together and therefore correspond to consistent spike clusters. This method has the potential advantage that it does not rely on any model of the spike shapes. It also provides estimates of the proportion of misclassified spikes for each of the identified clusters. We tested our algorithm on several datasets for which there exists a ground truth (simultaneous intracellular data), and show that it performs close to the optimum reached by a support vector machine trained on the ground truth. We also show that the estimated rate of misclassification matches the proportion of misclassified spikes measured from the ground truth data. PMID:27536990
Consensus-Based Sorting of Neuronal Spike Waveforms.
Fournier, Julien; Mueller, Christian M; Shein-Idelson, Mark; Hemberger, Mike; Laurent, Gilles
2016-01-01
Optimizing spike-sorting algorithms is difficult because sorted clusters can rarely be checked against independently obtained "ground truth" data. In most spike-sorting algorithms in use today, the optimality of a clustering solution is assessed relative to some assumption on the distribution of the spike shapes associated with a particular single unit (e.g., Gaussianity) and by visual inspection of the clustering solution followed by manual validation. When the spatiotemporal waveforms of spikes from different cells overlap, the decision as to whether two spikes should be assigned to the same source can be quite subjective, if it is not based on reliable quantitative measures. We propose a new approach, whereby spike clusters are identified from the most consensual partition across an ensemble of clustering solutions. Using the variability of the clustering solutions across successive iterations of the same clustering algorithm (template matching based on K-means clusters), we estimate the probability of spikes being clustered together and identify groups of spikes that are not statistically distinguishable from one another. Thus, we identify spikes that are most likely to be clustered together and therefore correspond to consistent spike clusters. This method has the potential advantage that it does not rely on any model of the spike shapes. It also provides estimates of the proportion of misclassified spikes for each of the identified clusters. We tested our algorithm on several datasets for which there exists a ground truth (simultaneous intracellular data), and show that it performs close to the optimum reached by a support vector machine trained on the ground truth. We also show that the estimated rate of misclassification matches the proportion of misclassified spikes measured from the ground truth data.
Generalized Analysis Tools for Multi-Spacecraft Missions
NASA Astrophysics Data System (ADS)
Chanteur, G. M.
2011-12-01
Analysis tools for multi-spacecraft missions like CLUSTER or MMS have been designed since the end of the 90's to estimate gradients of fields or to characterize discontinuities crossed by a cluster of spacecraft. Different approaches have been presented and discussed in the book "Analysis Methods for Multi-Spacecraft Data" published as Scientific Report 001 of the International Space Science Institute in Bern, Switzerland (G. Paschmann and P. Daly Eds., 1998). On one hand the approach using methods of least squares has the advantage to apply to any number of spacecraft [1] but is not convenient to perform analytical computation especially when considering the error analysis. On the other hand the barycentric approach is powerful as it provides simple analytical formulas involving the reciprocal vectors of the tetrahedron [2] but appears limited to clusters of four spacecraft. Moreover the barycentric approach allows to derive theoretical formulas for errors affecting the estimators built from the reciprocal vectors [2,3,4]. Following a first generalization of reciprocal vectors proposed by Vogt et al [4] and despite the present lack of projects with more than four spacecraft we present generalized reciprocal vectors for a cluster made of any number of spacecraft : each spacecraft is given a positive or nul weight. The non-coplanarity of at least four spacecraft with strictly positive weights is a necessary and sufficient condition for this analysis to be enabled. Weights given to spacecraft allow to minimize the influence of some spacecraft if its location or the quality of its data are not appropriate, or simply to extract subsets of spacecraft from the cluster. Estimators presented in [2] are generalized within this new frame except for the error analysis which is still under investigation. References [1] Harvey, C. C.: Spatial Gradients and the Volumetric Tensor, in: Analysis Methods for Multi-Spacecraft Data, G. Paschmann and P. Daly (eds.), pp. 307-322, ISSI SR-001, 1998. [2] Chanteur, G.: Spatial Interpolation for Four Spacecraft: Theory, in: Analysis Methods for Multi-Spacecraft Data, G. Paschmann and P. Daly (eds.), pp. 371-393, ISSI SR-001, 1998. [3] Chanteur, G.: Accuracy of field gradient estimations by Cluster: Explanation of its dependency upon elongation and planarity of the tetrahedron, pp. 265-268, ESA SP-449, 2000. [4] Vogt, J., Paschmann, G., and Chanteur, G.: Reciprocal Vectors, pp. 33-46, ISSI SR-008, 2008.
Real-data comparison of data mining methods in prediction of diabetes in iran.
Tapak, Lily; Mahjub, Hossein; Hamidi, Omid; Poorolajal, Jalal
2013-09-01
Diabetes is one of the most common non-communicable diseases in developing countries. Early screening and diagnosis play an important role in effective prevention strategies. This study compared two traditional classification methods (logistic regression and Fisher linear discriminant analysis) and four machine-learning classifiers (neural networks, support vector machines, fuzzy c-mean, and random forests) to classify persons with and without diabetes. The data set used in this study included 6,500 subjects from the Iranian national non-communicable diseases risk factors surveillance obtained through a cross-sectional survey. The obtained sample was based on cluster sampling of the Iran population which was conducted in 2005-2009 to assess the prevalence of major non-communicable disease risk factors. Ten risk factors that are commonly associated with diabetes were selected to compare the performance of six classifiers in terms of sensitivity, specificity, total accuracy, and area under the receiver operating characteristic (ROC) curve criteria. Support vector machines showed the highest total accuracy (0.986) as well as area under the ROC (0.979). Also, this method showed high specificity (1.000) and sensitivity (0.820). All other methods produced total accuracy of more than 85%, but for all methods, the sensitivity values were very low (less than 0.350). The results of this study indicate that, in terms of sensitivity, specificity, and overall classification accuracy, the support vector machine model ranks first among all the classifiers tested in the prediction of diabetes. Therefore, this approach is a promising classifier for predicting diabetes, and it should be further investigated for the prediction of other diseases.
Enhancers Are Major Targets for Murine Leukemia Virus Vector Integration
De Ravin, Suk See; Su, Ling; Theobald, Narda; Choi, Uimook; Macpherson, Janet L.; Poidinger, Michael; Symonds, Geoff; Pond, Susan M.; Ferris, Andrea L.; Hughes, Stephen H.
2014-01-01
ABSTRACT Retroviral vectors have been used in successful gene therapies. However, in some patients, insertional mutagenesis led to leukemia or myelodysplasia. Both the strong promoter/enhancer elements in the long terminal repeats (LTRs) of murine leukemia virus (MLV)-based vectors and the vector-specific integration site preferences played an important role in these adverse clinical events. MLV integration is known to prefer regions in or near transcription start sites (TSS). Recently, BET family proteins were shown to be the major cellular proteins responsible for targeting MLV integration. Although MLV integration sites are significantly enriched at TSS, only a small fraction of the MLV integration sites (<15%) occur in this region. To resolve this apparent discrepancy, we created a high-resolution genome-wide integration map of more than one million integration sites from CD34+ hematopoietic stem cells transduced with a clinically relevant MLV-based vector. The integration sites form ∼60,000 tight clusters. These clusters comprise ∼1.9% of the genome. The vast majority (87%) of the integration sites are located within histone H3K4me1 islands, a hallmark of enhancers. The majority of these clusters also have H3K27ac histone modifications, which mark active enhancers. The enhancers of some oncogenes, including LMO2, are highly preferred targets for integration without in vivo selection. IMPORTANCE We show that active enhancer regions are the major targets for MLV integration; this means that MLV preferentially integrates in regions that are favorable for viral gene expression in a variety of cell types. The results provide insights for MLV integration target site selection and also explain the high risk of insertional mutagenesis that is associated with gene therapy trials using MLV vectors. PMID:24501411
Webs on surfaces, rings of invariants, and clusters.
Fomin, Sergey; Pylyavskyy, Pavlo
2014-07-08
We construct and study cluster algebra structures in rings of invariants of the special linear group action on collections of 3D vectors, covectors, and matrices. The construction uses Kuperberg's calculus of webs on marked surfaces with boundary.
Improved Correction of Atmospheric Pressure Data Obtained by Smartphones through Machine Learning
Kim, Yong-Hyuk; Ha, Ji-Hun; Kim, Na-Young; Im, Hyo-Hyuc; Sim, Sangjin; Choi, Reno K. Y.
2016-01-01
A correction method using machine learning aims to improve the conventional linear regression (LR) based method for correction of atmospheric pressure data obtained by smartphones. The method proposed in this study conducts clustering and regression analysis with time domain classification. Data obtained in Gyeonggi-do, one of the most populous provinces in South Korea surrounding Seoul with the size of 10,000 km2, from July 2014 through December 2014, using smartphones were classified with respect to time of day (daytime or nighttime) as well as day of the week (weekday or weekend) and the user's mobility, prior to the expectation-maximization (EM) clustering. Subsequently, the results were analyzed for comparison by applying machine learning methods such as multilayer perceptron (MLP) and support vector regression (SVR). The results showed a mean absolute error (MAE) 26% lower on average when regression analysis was performed through EM clustering compared to that obtained without EM clustering. For machine learning methods, the MAE for SVR was around 31% lower for LR and about 19% lower for MLP. It is concluded that pressure data from smartphones are as good as the ones from national automatic weather station (AWS) network. PMID:27524999
Quintero, Juliana; García-Betancourt, Tatiana; Cortés, Sebastian; García, Diana; Alcalá, Lucas; González-Uribe, Catalina; Brochero, Helena; Carrasquilla, Gabriel
2015-02-01
Long-lasting insecticide-treated net (LLIN) window and door curtains alone or in combination with LLIN water container covers were analysed regarding effectiveness in reducing dengue vector density, and feasibility of the intervention. A cluster randomised trial was conducted in an urban area of Colombia comparing 10 randomly selected control and 10 intervention clusters. In control clusters, routine vector control activities were performed. The intervention delivered first, LLIN curtains (from July to August 2013) and secondly, water container covers (from October to March 2014). Cross-sectional entomological surveys were carried out at baseline (February 2013 to June 2013), 9 weeks after the first intervention (August to October 2013), and 4-6 weeks after the second intervention (March to April 2014). Curtains were installed in 922 households and water container covers in 303 households. The Breteau index (BI) fell from 14 to 6 in the intervention group and from 8 to 5 in the control group. The additional intervention with LLIN covers for water containers showed a significant reduction in pupae per person index (PPI) (p=0.01). In the intervention group, the PPI index showed a clear decline of 71% compared with 25% in the control group. Costs were high but options for cost savings were identified. Short term impact evaluation indicates that the intervention package can reduce dengue vector density but sustained effect will depend on multiple factors. © The author 2015. The World Health Organization has granted Oxford University Press permission for the reproduction of this article.
The Spin Vector of (832) Karin
NASA Astrophysics Data System (ADS)
Slivan, Stephen M.; Molnar, L. A.
2010-10-01
We observed rotation lightcurves of Koronis family and Karin cluster member (832) Karin during its four consecutive apparitions in 2006-2009, and combined the new observations with previously published lightcurves to determine its spin vector orientation and preliminary model shape. Karin is a prograde rotator with a period of 18.352 h, spin obliquity near 41°, and pole ecliptic longitude near either 51° or 228°. Although the two ambiguous pole solutions are near the clustered pole solutions of four Koronis family members whose spins are thought to be trapped in a spin-orbit resonance (Vokrouhlický et al., 2003), Karin does not seem to be trapped in the resonance; this is consistent with the expectation that the 6 My age of Karin (Nesvorný et al., 2002) is too young for YORP torques to have modified its spin since its formation. The spin vector and shape results for Karin will constrain family formation models that include spin properties, and we discuss the Karin results in the context of the other members of the Karin cluster, the Karin parent body, and the parent body's siblings in the Koronis family.
Emerson, Paul M; Lindsay, Steve W; Walraven, Gijs E L; Dibba, Sheikh Mafuji; Lowe, Kebba O; Bailey, Robin L
2002-04-01
The Flies and Eyes project is a community-based, cluster-randomised, intervention trial based in a rural area of The Gambia. It was designed to prove whether flies are mechanical vectors of trachoma; to quantify the relative importance of flies as vectors of trachoma and to test the effectiveness of insecticide spraying and the provision of latrines in trachoma control. A total of 21 clusters, each composed of 300-550 people, are to be recruited in groups of three. One cluster from each group is randomly allocated to receive insecticide spraying, one to receive pit latrines and the remaining to act as a control. The seven groups of clusters are recruited on a step-wise basis separated by two months to aid logistics and allow all seasons to be covered. Standardised, validated trachoma surveys are conducted for people of all ages and both sexes at baseline and six months post intervention. The Muscid fly population is monitored using standard traps and fly-eye contact is measured with catches of flies direct from children's faces. The Flies and Eyes project has been designed to strengthen the evidence base for the 'E' component of the SAFE strategy for trachoma control. The results will assist programme planners and country co-ordinators to make informed decisions on the environmental aspects of trachoma control.
Is it worth changing pattern recognition methods for structural health monitoring?
NASA Astrophysics Data System (ADS)
Bull, L. A.; Worden, K.; Cross, E. J.; Dervilis, N.
2017-05-01
The key element of this work is to demonstrate alternative strategies for using pattern recognition algorithms whilst investigating structural health monitoring. This paper looks to determine if it makes any difference in choosing from a range of established classification techniques: from decision trees and support vector machines, to Gaussian processes. Classification algorithms are tested on adjustable synthetic data to establish performance metrics, then all techniques are applied to real SHM data. To aid the selection of training data, an informative chain of artificial intelligence tools is used to explore an active learning interaction between meaningful clusters of data.
Research on bearing fault diagnosis of large machinery based on mathematical morphology
NASA Astrophysics Data System (ADS)
Wang, Yu
2018-04-01
To study the automatic diagnosis of large machinery fault based on support vector machine, combining the four common faults of the large machinery, the support vector machine is used to classify and identify the fault. The extracted feature vectors are entered. The feature vector is trained and identified by multi - classification method. The optimal parameters of the support vector machine are searched by trial and error method and cross validation method. Then, the support vector machine is compared with BP neural network. The results show that the support vector machines are short in time and high in classification accuracy. It is more suitable for the research of fault diagnosis in large machinery. Therefore, it can be concluded that the training speed of support vector machines (SVM) is fast and the performance is good.
Vector nature of multi-soliton patterns in a passively mode-locked figure-eight fiber laser.
Ning, Qiu-Yi; Liu, Hao; Zheng, Xu-Wu; Yu, Wei; Luo, Ai-Ping; Huang, Xu-Guang; Luo, Zhi-Chao; Xu, Wen-Cheng; Xu, Shan-Hui; Yang, Zhong-Min
2014-05-19
The vector nature of multi-soliton dynamic patterns was investigated in a passively mode-locked figure-eight fiber laser based on the nonlinear amplifying loop mirror (NALM). By properly adjusting the cavity parameters such as the pump power level and intra-cavity polarization controllers (PCs), in addition to the fundamental vector soliton, various vector multi-soliton regimes were observed, such as the random static distribution of vector multiple solitons, vector soliton cluster, vector soliton flow, and the state of vector multiple solitons occupying the whole cavity. Both the polarization-locked vector solitons (PLVSs) and the polarization-rotating vector solitons (PRVSs) were observed for fundamental soliton and each type of multi-soliton patterns. The obtained results further reveal the fundamental physics of multi-soliton patterns and demonstrate that the figure-eight fiber lasers are indeed a good platform for investigating the vector nature of different soliton types.
High-performance computing — an overview
NASA Astrophysics Data System (ADS)
Marksteiner, Peter
1996-08-01
An overview of high-performance computing (HPC) is given. Different types of computer architectures used in HPC are discussed: vector supercomputers, high-performance RISC processors, various parallel computers like symmetric multiprocessors, workstation clusters, massively parallel processors. Software tools and programming techniques used in HPC are reviewed: vectorizing compilers, optimization and vector tuning, optimization for RISC processors; parallel programming techniques like shared-memory parallelism, message passing and data parallelism; and numerical libraries.
Constructing storyboards based on hierarchical clustering analysis
NASA Astrophysics Data System (ADS)
Hasebe, Satoshi; Sami, Mustafa M.; Muramatsu, Shogo; Kikuchi, Hisakazu
2005-07-01
There are growing needs for quick preview of video contents for the purpose of improving accessibility of video archives as well as reducing network traffics. In this paper, a storyboard that contains a user-specified number of keyframes is produced from a given video sequence. It is based on hierarchical cluster analysis of feature vectors that are derived from wavelet coefficients of video frames. Consistent use of extracted feature vectors is the key to avoid a repetition of computationally-intensive parsing of the same video sequence. Experimental results suggest that a significant reduction in computational time is gained by this strategy.
NASA Astrophysics Data System (ADS)
Eberle, Detlef G.; Daudi, Elias X. F.; Muiuane, Elônio A.; Nyabeze, Peter; Pontavida, Alfredo M.
2012-01-01
The National Geology Directorate of Mozambique (DNG) and Maputo-based Eduardo-Mondlane University (UEM) entered a joint venture with the South African Council for Geoscience (CGS) to conduct a case study over the meso-Proterozoic Alto Ligonha pegmatite field in the Zambézia Province of northeastern Mozambique to support the local exploration and mining sectors. Rare-metal minerals, i.e. tantalum and niobium, as well as rare-earth minerals have been mined in the Alto Ligonha pegmatite field since decades, but due to the civil war (1977-1992) production nearly ceased. The Government now strives to promote mining in the region as contribution to poverty alleviation. This study was undertaken to facilitate the extraction of geological information from the high resolution airborne magnetic and radiometric data sets recently acquired through a World Bank funded survey and mapping project. The aim was to generate a value-added map from the airborne geophysical data that is easier to read and use by the exploration and mining industries than mere airborne geophysical grid data or maps. As a first step towards clustering, thorium (Th) and potassium (K) concentrations were determined from the airborne geophysical data as well as apparent magnetic susceptibility and first vertical magnetic gradient data. These four datasets were projected onto a 100 m spaced regular grid to assemble 850,000 four-element (multivariate) sample vectors over the study area. Classification of the sample vectors using crisp clustering based upon the Euclidian distance between sample and class centre provided a (pseudo-) geology map or value-added map, respectively, displaying the spatial distribution of six different classes in the study area. To learn the quality of sample allocation, the degree of membership of each sample vector was determined using a-posterior discriminant analysis. Geophysical ground truth control was essential to allocate geology/geophysical attributes to the six classes. The highest probability to meet pegmatite bodies is in close vicinity to (magnetic) amphibole schist occurring in areas where depletion of potassium as indication of metasomatic processes is evident from the airborne radiometric data. Clustering has proven to be a fast and effective method to compile value-added maps from multivariate geophysical datasets. Experience made in the Alto Ligonha pegmatite field encourages adopting this new methodology for mapping other parts of the Mozambique Fold Belt.
Li, Ke; Liu, Yi; Wang, Quanxin; Wu, Yalei; Song, Shimin; Sun, Yi; Liu, Tengchong; Wang, Jun; Li, Yang; Du, Shaoyi
2015-01-01
This paper proposes a novel multi-label classification method for resolving the spacecraft electrical characteristics problems which involve many unlabeled test data processing, high-dimensional features, long computing time and identification of slow rate. Firstly, both the fuzzy c-means (FCM) offline clustering and the principal component feature extraction algorithms are applied for the feature selection process. Secondly, the approximate weighted proximal support vector machine (WPSVM) online classification algorithms is used to reduce the feature dimension and further improve the rate of recognition for electrical characteristics spacecraft. Finally, the data capture contribution method by using thresholds is proposed to guarantee the validity and consistency of the data selection. The experimental results indicate that the method proposed can obtain better data features of the spacecraft electrical characteristics, improve the accuracy of identification and shorten the computing time effectively. PMID:26544549
Yoshioka, Kota; Tercero, Doribel; Pérez, Byron; Nakamura, Jiro; Pérez, Lenin
2017-03-06
Chagas disease is one of the neglected tropical diseases (NTDs). International goals for its control involve elimination of vector-borne transmission. Central American countries face challenges in establishing sustainable vector control programmes, since the main vector, Triatoma dimidiata, cannot be eliminated. In 2012, the Ministry of Health in Nicaragua started a field test of a vector surveillance-response system to control domestic vector infestation. This paper reports the main findings from this pilot study. This study was carried out from 2012 to 2015 in the Municipality of Totogalpa. The Japan International Cooperation Agency provided technical cooperation in designing and monitoring the surveillance-response system until 2014. This system involved 1) vector reports by householders to health facilities, 2) data analysis and planning of responses at the municipal health centre and 3) house visits or insecticide spraying by health personnel as a response. We registered all vector reports and responses in a digital database. The collected data were used to describe and analyse the system performance in terms of amount of vector reports as well as rates and timeliness of responses. During the study period, T. dimidiata was reported 396 times. Spatiotemporal analysis identified some high-risk clusters. All houses reported to be infested were visited by health personnel in 2013 and this response rate dropped to 39% in 2015. Rates of insecticide spraying rose above 80% in 2013 but no spraying was carried out in the following 2 years. The timeliness of house visits improved significantly after the responsibility was transferred from a vector control technician to primary health care staff. We argue that the proposed vector surveillance-response system is workable within the resource-constrained health system in Nicaragua. Integration to the primary health care services was a key to improve the system performance. Continual efforts are necessary to keep adapting the surveillance-response system to the dynamic health systems. We also discuss that the goal of eliminating vector-borne transmission remains unachievable. This paper provides lessons not only for Chagas disease control in Central America, but also for control efforts for other NTDs that need a sustainable surveillance-response system to support elimination.
Unsupervised Anomaly Detection Based on Clustering and Multiple One-Class SVM
NASA Astrophysics Data System (ADS)
Song, Jungsuk; Takakura, Hiroki; Okabe, Yasuo; Kwon, Yongjin
Intrusion detection system (IDS) has played an important role as a device to defend our networks from cyber attacks. However, since it is unable to detect unknown attacks, i.e., 0-day attacks, the ultimate challenge in intrusion detection field is how we can exactly identify such an attack by an automated manner. Over the past few years, several studies on solving these problems have been made on anomaly detection using unsupervised learning techniques such as clustering, one-class support vector machine (SVM), etc. Although they enable one to construct intrusion detection models at low cost and effort, and have capability to detect unforeseen attacks, they still have mainly two problems in intrusion detection: a low detection rate and a high false positive rate. In this paper, we propose a new anomaly detection method based on clustering and multiple one-class SVM in order to improve the detection rate while maintaining a low false positive rate. We evaluated our method using KDD Cup 1999 data set. Evaluation results show that our approach outperforms the existing algorithms reported in the literature; especially in detection of unknown attacks.
Ma, Tao; Wang, Fen; Cheng, Jianjun; Yu, Yang; Chen, Xiaoyun
2016-01-01
The development of intrusion detection systems (IDS) that are adapted to allow routers and network defence systems to detect malicious network traffic disguised as network protocols or normal access is a critical challenge. This paper proposes a novel approach called SCDNN, which combines spectral clustering (SC) and deep neural network (DNN) algorithms. First, the dataset is divided into k subsets based on sample similarity using cluster centres, as in SC. Next, the distance between data points in a testing set and the training set is measured based on similarity features and is fed into the deep neural network algorithm for intrusion detection. Six KDD-Cup99 and NSL-KDD datasets and a sensor network dataset were employed to test the performance of the model. These experimental results indicate that the SCDNN classifier not only performs better than backpropagation neural network (BPNN), support vector machine (SVM), random forest (RF) and Bayes tree models in detection accuracy and the types of abnormal attacks found. It also provides an effective tool of study and analysis of intrusion detection in large networks. PMID:27754380
Ma, Tao; Wang, Fen; Cheng, Jianjun; Yu, Yang; Chen, Xiaoyun
2016-10-13
The development of intrusion detection systems (IDS) that are adapted to allow routers and network defence systems to detect malicious network traffic disguised as network protocols or normal access is a critical challenge. This paper proposes a novel approach called SCDNN, which combines spectral clustering (SC) and deep neural network (DNN) algorithms. First, the dataset is divided into k subsets based on sample similarity using cluster centres, as in SC. Next, the distance between data points in a testing set and the training set is measured based on similarity features and is fed into the deep neural network algorithm for intrusion detection. Six KDD-Cup99 and NSL-KDD datasets and a sensor network dataset were employed to test the performance of the model. These experimental results indicate that the SCDNN classifier not only performs better than backpropagation neural network (BPNN), support vector machine (SVM), random forest (RF) and Bayes tree models in detection accuracy and the types of abnormal attacks found. It also provides an effective tool of study and analysis of intrusion detection in large networks.
Chen, Zhiru; Hong, Wenxue
2016-02-01
Considering the low accuracy of prediction in the positive samples and poor overall classification effects caused by unbalanced sample data of MicroRNA (miRNA) target, we proposes a support vector machine (SVM)-integration of under-sampling and weight (IUSM) algorithm in this paper, an under-sampling based on the ensemble learning algorithm. The algorithm adopts SVM as learning algorithm and AdaBoost as integration framework, and embeds clustering-based under-sampling into the iterative process, aiming at reducing the degree of unbalanced distribution of positive and negative samples. Meanwhile, in the process of adaptive weight adjustment of the samples, the SVM-IUSM algorithm eliminates the abnormal ones in negative samples with robust sample weights smoothing mechanism so as to avoid over-learning. Finally, the prediction of miRNA target integrated classifier is achieved with the combination of multiple weak classifiers through the voting mechanism. The experiment revealed that the SVM-IUSW, compared with other algorithms on unbalanced dataset collection, could not only improve the accuracy of positive targets and the overall effect of classification, but also enhance the generalization ability of miRNA target classifier.
Harz, M; Rösch, P; Peschke, K-D; Ronneberger, O; Burkhardt, H; Popp, J
2005-11-01
Microbial contamination is not only a medical problem, but also plays a large role in pharmaceutical clean room production and food processing technology. Therefore many techniques were developed to achieve differentiation and identification of microorganisms. Among these methods vibrational spectroscopic techniques (IR, Raman and SERS) are useful tools because of their rapidity and sensitivity. Recently we have shown that micro-Raman spectroscopy in combination with a support vector machine is an extremely capable approach for a fast and reliable, non-destructive online identification of single bacteria belonging to different genera. In order to simulate different environmental conditions we analyzed in this contribution different Staphylococcus strains with varying cultivation conditions in order to evaluate our method with a reliable dataset. First, micro-Raman spectra of the bulk material and single bacterial cells that were grown under the same conditions were recorded and used separately for a distinct chemotaxonomic classification of the strains. Furthermore Raman spectra were recorded from single bacterial cells that were cultured under various conditions to study the influence of cultivation on the discrimination ability. This dataset was analyzed both with a hierarchical cluster analysis (HCA) and a support vector machine (SVM).
Zhang, Hong-Guang; Yang, Qin-Min; Lu, Jian-Gang
2014-04-01
In this paper, a novel discriminant methodology based on near infrared spectroscopic analysis technique and least square support vector machine was proposed for rapid and nondestructive discrimination of different types of Polyacrylamide. The diffuse reflectance spectra of samples of Non-ionic Polyacrylamide, Anionic Polyacrylamide and Cationic Polyacrylamide were measured. Then principal component analysis method was applied to reduce the dimension of the spectral data and extract of the principal compnents. The first three principal components were used for cluster analysis of the three different types of Polyacrylamide. Then those principal components were also used as inputs of least square support vector machine model. The optimization of the parameters and the number of principal components used as inputs of least square support vector machine model was performed through cross validation based on grid search. 60 samples of each type of Polyacrylamide were collected. Thus a total of 180 samples were obtained. 135 samples, 45 samples for each type of Polyacrylamide, were randomly split into a training set to build calibration model and the rest 45 samples were used as test set to evaluate the performance of the developed model. In addition, 5 Cationic Polyacrylamide samples and 5 Anionic Polyacrylamide samples adulterated with different proportion of Non-ionic Polyacrylamide were also prepared to show the feasibilty of the proposed method to discriminate the adulterated Polyacrylamide samples. The prediction error threshold for each type of Polyacrylamide was determined by F statistical significance test method based on the prediction error of the training set of corresponding type of Polyacrylamide in cross validation. The discrimination accuracy of the built model was 100% for prediction of the test set. The prediction of the model for the 10 mixing samples was also presented, and all mixing samples were accurately discriminated as adulterated samples. The overall results demonstrate that the discrimination method proposed in the present paper can rapidly and nondestructively discriminate the different types of Polyacrylamide and the adulterated Polyacrylamide samples, and offered a new approach to discriminate the types of Polyacrylamide.
Statistical downscaling of GCM simulations to streamflow using relevance vector machine
NASA Astrophysics Data System (ADS)
Ghosh, Subimal; Mujumdar, P. P.
2008-01-01
General circulation models (GCMs), the climate models often used in assessing the impact of climate change, operate on a coarse scale and thus the simulation results obtained from GCMs are not particularly useful in a comparatively smaller river basin scale hydrology. The article presents a methodology of statistical downscaling based on sparse Bayesian learning and Relevance Vector Machine (RVM) to model streamflow at river basin scale for monsoon period (June, July, August, September) using GCM simulated climatic variables. NCEP/NCAR reanalysis data have been used for training the model to establish a statistical relationship between streamflow and climatic variables. The relationship thus obtained is used to project the future streamflow from GCM simulations. The statistical methodology involves principal component analysis, fuzzy clustering and RVM. Different kernel functions are used for comparison purpose. The model is applied to Mahanadi river basin in India. The results obtained using RVM are compared with those of state-of-the-art Support Vector Machine (SVM) to present the advantages of RVMs over SVMs. A decreasing trend is observed for monsoon streamflow of Mahanadi due to high surface warming in future, with the CCSR/NIES GCM and B2 scenario.
Hustedt, John; Doum, Dyna; Keo, Vanney; Ly, Sokha; Sam, BunLeng; Chan, Vibol; Alexander, Neal; Bradley, John; Prasetyo, Didot Budi; Rachmat, Agus; Muhammad, Shafique; Lopes, Sergio; Leang, Rithea; Hii, Jeffrey
2017-08-04
Evidence on the effectiveness of low-cost, sustainable, biological vector-control tools for the Aedes mosquitoes is limited. Therefore, the purpose of this trial is to estimate the impact of guppy fish (guppies), in combination with the use of the larvicide pyriproxyfen (Sumilarv® 2MR), and Communication for Behavioral Impact (COMBI) activities to reduce entomological indices in Cambodia. In this cluster randomized controlled, superiority trial, 30 clusters comprising one or more villages each (with approximately 170 households) will be allocated, in a 1:1:1 ratio, to receive either (1) three interventions (guppies, Sumilarv® 2MR, and COMBI activities), (2) two interventions (guppies and COMBI activities), or (3) control (standard vector control). Households will be invited to participate, and entomology surveys among 40 randomly selected households per cluster will be carried out quarterly. The primary outcome will be the population density of adult female Aedes mosquitoes (i.e., number per house) trapped using adult resting collections. Secondary outcome measures will include the House Index, Container Index, Breteau Index, Pupae Per House, Pupae Per Person, mosquito infection rate, guppy fish coverage, Sumilarv® 2MR coverage, and percentage of respondents with knowledge about Aedes mosquitoes causing dengue. In the primary analysis, adult female Aedes density and mosquito infection rates will be aggregated over follow-up time points to give a single rate per cluster. This will be analyzed by negative binomial regression, yielding density ratios. This trial is expected to provide robust estimates of the intervention effect. A rigorous evaluation of these vector-control interventions is vital to developing an evidence-based dengue control strategy and to help direct government resources. Current Controlled Trials, ID: ISRCTN85307778 . Registered on 25 October 2015.
Construction and Evaluation of Novel Rhesus Monkey Adenovirus Vaccine Vectors
Abbink, Peter; Maxfield, Lori F.; Ng'ang'a, David; ...
2014-11-19
Adenovirus vectors are widely used as vaccine candidates for a variety of pathogens, including HIV-1. To date, human and chimpanzee adenoviruses have been explored in detail as vaccine vectors. Furthermore, the phylogeny of human and chimpanzee adenoviruses is overlapping, and preexisting humoral and cellular immunity to both are exhibited in human populations worldwide. More distantly related adenoviruses may therefore offer advantages as vaccine vectors. We describe the primary isolation and vectorization of three novel adenoviruses from rhesus monkeys. The seroprevalence of these novel rhesus monkey adenovirus vectors was extremely low in sub-Saharan Africa human populations, and these vectors proved tomore » have immunogenicity comparable to that of human and chimpanzee adenovirus vaccine vectors in mice. These rhesus monkey adenoviruses phylogenetically clustered with the poorly described adenovirus species G and robustly stimulated innate immune responses. These novel adenoviruses represent a new class of candidate vaccine vectors.« less
Construction and Evaluation of Novel Rhesus Monkey Adenovirus Vaccine Vectors
DOE Office of Scientific and Technical Information (OSTI.GOV)
Abbink, Peter; Maxfield, Lori F.; Ng'ang'a, David
Adenovirus vectors are widely used as vaccine candidates for a variety of pathogens, including HIV-1. To date, human and chimpanzee adenoviruses have been explored in detail as vaccine vectors. Furthermore, the phylogeny of human and chimpanzee adenoviruses is overlapping, and preexisting humoral and cellular immunity to both are exhibited in human populations worldwide. More distantly related adenoviruses may therefore offer advantages as vaccine vectors. We describe the primary isolation and vectorization of three novel adenoviruses from rhesus monkeys. The seroprevalence of these novel rhesus monkey adenovirus vectors was extremely low in sub-Saharan Africa human populations, and these vectors proved tomore » have immunogenicity comparable to that of human and chimpanzee adenovirus vaccine vectors in mice. These rhesus monkey adenoviruses phylogenetically clustered with the poorly described adenovirus species G and robustly stimulated innate immune responses. These novel adenoviruses represent a new class of candidate vaccine vectors.« less
2013-01-01
Background Interruption of vector-borne transmission of Trypanosoma cruzi remains an unrealized objective in many Latin American countries. The task of vector control is complicated by the emergence of vector insects in urban areas. Methods Utilizing data from a large-scale vector control program in Arequipa, Peru, we explored the spatial patterns of infestation by Triatoma infestans in an urban and peri-urban landscape. Multilevel logistic regression was utilized to assess the associations between household infestation and household- and locality-level socio-environmental measures. Results Of 37,229 households inspected for infestation, 6,982 (18.8%; 95% CI: 18.4 – 19.2%) were infested by T. infestans. Eighty clusters of infestation were identified, ranging in area from 0.1 to 68.7 hectares and containing as few as one and as many as 1,139 infested households. Spatial dependence between infested households was significant at distances up to 2,000 meters. Household T. infestans infestation was associated with household- and locality-level factors, including housing density, elevation, land surface temperature, and locality type. Conclusions High levels of T. infestans infestation, characterized by spatial heterogeneity, were found across extensive urban and peri-urban areas prior to vector control. Several environmental and social factors, which may directly or indirectly influence the biology and behavior of T. infestans, were associated with infestation. Spatial clustering of infestation in the urban context may both challenge and inform surveillance and control of vector reemergence after insecticide intervention. PMID:24171704
Invariant-feature-based adaptive automatic target recognition in obscured 3D point clouds
NASA Astrophysics Data System (ADS)
Khuon, Timothy; Kershner, Charles; Mattei, Enrico; Alverio, Arnel; Rand, Robert
2014-06-01
Target recognition and classification in a 3D point cloud is a non-trivial process due to the nature of the data collected from a sensor system. The signal can be corrupted by noise from the environment, electronic system, A/D converter, etc. Therefore, an adaptive system with a desired tolerance is required to perform classification and recognition optimally. The feature-based pattern recognition algorithm architecture as described below is particularly devised for solving a single-sensor classification non-parametrically. Feature set is extracted from an input point cloud, normalized, and classifier a neural network classifier. For instance, automatic target recognition in an urban area would require different feature sets from one in a dense foliage area. The figure above (see manuscript) illustrates the architecture of the feature based adaptive signature extraction of 3D point cloud including LIDAR, RADAR, and electro-optical data. This network takes a 3D cluster and classifies it into a specific class. The algorithm is a supervised and adaptive classifier with two modes: the training mode and the performing mode. For the training mode, a number of novel patterns are selected from actual or artificial data. A particular 3D cluster is input to the network as shown above for the decision class output. The network consists of three sequential functional modules. The first module is for feature extraction that extracts the input cluster into a set of singular value features or feature vector. Then the feature vector is input into the feature normalization module to normalize and balance it before being fed to the neural net classifier for the classification. The neural net can be trained by actual or artificial novel data until each trained output reaches the declared output within the defined tolerance. In case new novel data is added after the neural net has been learned, the training is then resumed until the neural net has incrementally learned with the new novel data. The associative memory capability of the neural net enables the incremental learning. The back propagation algorithm or support vector machine can be utilized for the classification and recognition.
The role of research in molecular entomology in the fight against malaria vectors.
della Torre, A; Arca, B; Favia, G; Petrarca, V; Coluzzi, M
2008-06-01
The text summarizes the principal current fields of investigation and the recent achievements of the research groups presently contributing to the Molecular Entomology Cluster of the Italian Malaria Network. Particular emphasis is given to the researches with a more direct impact on the fight against malaria vectors.
Vector dark energy and high-z massive clusters
NASA Astrophysics Data System (ADS)
Carlesi, Edoardo; Knebe, Alexander; Yepes, Gustavo; Gottlöber, Stefan; Jiménez, Jose Beltrán.; Maroto, Antonio L.
2011-12-01
The detection of extremely massive clusters at z > 1 such as SPT-CL J0546-5345, SPT-CL J2106-5844 and XMMU J2235.3-2557 has been considered by some authors as a challenge to the standard Λ cold dark matter cosmology. In fact, assuming Gaussian initial conditions, the theoretical expectation of detecting such objects is as low as ≤1 per cent. In this paper we discuss the probability of the existence of such objects in the light of the vector dark energy paradigm, showing by means of a series of N-body simulations that chances of detection are substantially enhanced in this non-standard framework.
An adaptive data-driven method for accurate prediction of remaining useful life of rolling bearings
NASA Astrophysics Data System (ADS)
Peng, Yanfeng; Cheng, Junsheng; Liu, Yanfei; Li, Xuejun; Peng, Zhihua
2018-06-01
A novel data-driven method based on Gaussian mixture model (GMM) and distance evaluation technique (DET) is proposed to predict the remaining useful life (RUL) of rolling bearings. The data sets are clustered by GMM to divide all data sets into several health states adaptively and reasonably. The number of clusters is determined by the minimum description length principle. Thus, either the health state of the data sets or the number of the states is obtained automatically. Meanwhile, the abnormal data sets can be recognized during the clustering process and removed from the training data sets. After obtaining the health states, appropriate features are selected by DET for increasing the classification and prediction accuracy. In the prediction process, each vibration signal is decomposed into several components by empirical mode decomposition. Some common statistical parameters of the components are calculated first and then the features are clustered using GMM to divide the data sets into several health states and remove the abnormal data sets. Thereafter, appropriate statistical parameters of the generated components are selected using DET. Finally, least squares support vector machine is utilized to predict the RUL of rolling bearings. Experimental results indicate that the proposed method reliably predicts the RUL of rolling bearings.
A Human Activity Recognition System Using Skeleton Data from RGBD Sensors.
Cippitelli, Enea; Gasparrini, Samuele; Gambi, Ennio; Spinsante, Susanna
2016-01-01
The aim of Active and Assisted Living is to develop tools to promote the ageing in place of elderly people, and human activity recognition algorithms can help to monitor aged people in home environments. Different types of sensors can be used to address this task and the RGBD sensors, especially the ones used for gaming, are cost-effective and provide much information about the environment. This work aims to propose an activity recognition algorithm exploiting skeleton data extracted by RGBD sensors. The system is based on the extraction of key poses to compose a feature vector, and a multiclass Support Vector Machine to perform classification. Computation and association of key poses are carried out using a clustering algorithm, without the need of a learning algorithm. The proposed approach is evaluated on five publicly available datasets for activity recognition, showing promising results especially when applied for the recognition of AAL related actions. Finally, the current applicability of this solution in AAL scenarios and the future improvements needed are discussed.
Phylogeny of triatomine vectors of Trypanosoma cruzi suggested by mitochondrial DNA sequences.
Sainz, Andrés C; Mauro, Laura V; Moriyama, Etsuko N; García, Beatriz A
2004-07-01
The subfamily Triatominae (Hemiptera: Reduviidae) comprises hematophagous insects, most of which are actual or potential vectors of Trypanosoma cruzi, the protozoan agent of Chagas' disease (American trypanosomiasis). DNA sequence comparisons of mitochondrial DNA (mtDNA) genes were used to infer phylogenetic relationships among 32 species of the subfamily Triatominae, 26 belonging to the genus Triatoma and six species of different genera. We analyzed mtDNA fragments of the 12S and 16S ribosomal RNA genes (totaling 848-851 bp) from each of the 32 species, as well as of the cytochrome oxidase I (COI, 1447 bp) gene from nine. The phylogenetic analyses unambiguously supported several clusters within the genus Triatoma. In the morphological classification, T. costalimai was placed tentatively within the infestans complex while T. guazu was not included in any Triatoma complex. The placement of these species in the molecular phylogeny indicated that both belong to the infestans complex. We confirmed with a strong support the inclusion of T. circummaculata, a member of a different complex based on morphology, within the infestans complex. On the other hand, the present phylogenetics analysis did not support the monophyly of the infestans complex species as it was suggested in our previous studies. While no strong inference of polyphyly of the genus Triatoma was provided by the bootstrap analyses, the other species belonging to Triatomini analyzed could not be distinguished from the species of Triatoma.
Load forecast method of electric vehicle charging station using SVR based on GA-PSO
NASA Astrophysics Data System (ADS)
Lu, Kuan; Sun, Wenxue; Ma, Changhui; Yang, Shenquan; Zhu, Zijian; Zhao, Pengfei; Zhao, Xin; Xu, Nan
2017-06-01
This paper presents a Support Vector Regression (SVR) method for electric vehicle (EV) charging station load forecast based on genetic algorithm (GA) and particle swarm optimization (PSO). Fuzzy C-Means (FCM) clustering is used to establish similar day samples. GA is used for global parameter searching and PSO is used for a more accurately local searching. Load forecast is then regressed using SVR. The practical load data of an EV charging station were taken to illustrate the proposed method. The result indicates an obvious improvement in the forecasting accuracy compared with SVRs based on PSO and GA exclusively.
Basso, César; García da Rosa, Elsa; Romero, Sonnia; González, Cristina; Lairihoy, Rosario; Roche, Ingrid; Caffera, Ruben M; da Rosa, Ricardo; Calfani, Marisel; Alfonso-Sierra, Eduardo; Petzold, Max; Kroeger, Axel; Sommerfeld, Johannes
2015-02-01
Uruguay is located at the southern border of Aedes aegypti distribution on the South American sub-continent. The reported dengue cases in the country are all imported from surrounding countries. One of the cities at higher risk of local dengue transmission is Salto, a border city with heavy traffic from dengue endemic areas. We completed an intervention study using a cluster randomized trial design in 20 randomly selected 'clusters' in Salto. The clusters were located in neighborhoods of differing geography and economic, cultural and social aspects. Entomological surveys were carried out to measure the impact of the intervention on vector densities. Through participatory processes of all stakeholders, an appropriate ecosystem management intervention was defined. Residents collected the abundant small water holding containers and the Ministry of Public Health and the Municipality of Salto were responsible for collecting and eliminating them. Additional vector breeding places were large water tanks; they were either altered so that they could not hold water any more or covered so that oviposition by mosquitoes could not take place. The response from the community and national programme managers was encouraging. The intervention evidenced opportunities for cost savings and reducing dengue vector densities (although not to statistically significant levels). The observed low vector density limits the potential reduction due to the intervention. A larger sample size is needed to obtain a statistically significant difference. © The author 2015. The World Health Organization has granted Oxford University Press permission for the reproduction of this article.
Oberle, Michael; Wohlwend, Nadia; Jonas, Daniel; Maurer, Florian P; Jost, Geraldine; Tschudin-Sutter, Sarah; Vranckx, Katleen; Egli, Adrian
2016-01-01
The technical, biological, and inter-center reproducibility of matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI TOF MS) typing data has not yet been explored. The aim of this study is to compare typing data from multiple centers employing bioinformatics using bacterial strains from two past outbreaks and non-related strains. Participants received twelve extended spectrum betalactamase-producing E. coli isolates and followed the same standard operating procedure (SOP) including a full-protein extraction protocol. All laboratories provided visually read spectra via flexAnalysis (Bruker, Germany). Raw data from each laboratory allowed calculating the technical and biological reproducibility between centers using BioNumerics (Applied Maths NV, Belgium). Technical and biological reproducibility ranged between 96.8-99.4% and 47.6-94.4%, respectively. The inter-center reproducibility showed a comparable clustering among identical isolates. Principal component analysis indicated a higher tendency to cluster within the same center. Therefore, we used a discriminant analysis, which completely separated the clusters. Next, we defined a reference center and performed a statistical analysis to identify specific peaks to identify the outbreak clusters. Finally, we used a classifier algorithm and a linear support vector machine on the determined peaks as classifier. A validation showed that within the set of the reference center, the identification of the cluster was 100% correct with a large contrast between the score with the correct cluster and the next best scoring cluster. Based on the sufficient technical and biological reproducibility of MALDI-TOF MS based spectra, detection of specific clusters is possible from spectra obtained from different centers. However, we believe that a shared SOP and a bioinformatics approach are required to make the analysis robust and reliable.
Riemannian multi-manifold modeling and clustering in brain networks
NASA Astrophysics Data System (ADS)
Slavakis, Konstantinos; Salsabilian, Shiva; Wack, David S.; Muldoon, Sarah F.; Baidoo-Williams, Henry E.; Vettel, Jean M.; Cieslak, Matthew; Grafton, Scott T.
2017-08-01
This paper introduces Riemannian multi-manifold modeling in the context of brain-network analytics: Brainnetwork time-series yield features which are modeled as points lying in or close to a union of a finite number of submanifolds within a known Riemannian manifold. Distinguishing disparate time series amounts thus to clustering multiple Riemannian submanifolds. To this end, two feature-generation schemes for brain-network time series are put forth. The first one is motivated by Granger-causality arguments and uses an auto-regressive moving average model to map low-rank linear vector subspaces, spanned by column vectors of appropriately defined observability matrices, to points into the Grassmann manifold. The second one utilizes (non-linear) dependencies among network nodes by introducing kernel-based partial correlations to generate points in the manifold of positivedefinite matrices. Based on recently developed research on clustering Riemannian submanifolds, an algorithm is provided for distinguishing time series based on their Riemannian-geometry properties. Numerical tests on time series, synthetically generated from real brain-network structural connectivity matrices, reveal that the proposed scheme outperforms classical and state-of-the-art techniques in clustering brain-network states/structures.
A hybrid clustering and classification approach for predicting crash injury severity on rural roads.
Hasheminejad, Seyed Hessam-Allah; Zahedi, Mohsen; Hasheminejad, Seyed Mohammad Hossein
2018-03-01
As a threat for transportation system, traffic crashes have a wide range of social consequences for governments. Traffic crashes are increasing in developing countries and Iran as a developing country is not immune from this risk. There are several researches in the literature to predict traffic crash severity based on artificial neural networks (ANNs), support vector machines and decision trees. This paper attempts to investigate the crash injury severity of rural roads by using a hybrid clustering and classification approach to compare the performance of classification algorithms before and after applying the clustering. In this paper, a novel rule-based genetic algorithm (GA) is proposed to predict crash injury severity, which is evaluated by performance criteria in comparison with classification algorithms like ANN. The results obtained from analysis of 13,673 crashes (5600 property damage, 778 fatal crashes, 4690 slight injuries and 2605 severe injuries) on rural roads in Tehran Province of Iran during 2011-2013 revealed that the proposed GA method outperforms other classification algorithms based on classification metrics like precision (86%), recall (88%) and accuracy (87%). Moreover, the proposed GA method has the highest level of interpretation, is easy to understand and provides feedback to analysts.
A Two-Layer Least Squares Support Vector Machine Approach to Credit Risk Assessment
NASA Astrophysics Data System (ADS)
Liu, Jingli; Li, Jianping; Xu, Weixuan; Shi, Yong
Least squares support vector machine (LS-SVM) is a revised version of support vector machine (SVM) and has been proved to be a useful tool for pattern recognition. LS-SVM had excellent generalization performance and low computational cost. In this paper, we propose a new method called two-layer least squares support vector machine which combines kernel principle component analysis (KPCA) and linear programming form of least square support vector machine. With this method sparseness and robustness is obtained while solving large dimensional and large scale database. A U.S. commercial credit card database is used to test the efficiency of our method and the result proved to be a satisfactory one.
Nanthagopal, A Padma; Rajamony, R Sukanesh
2012-07-01
The proposed system provides new textural information for segmenting tumours, efficiently and accurately and with less computational time, from benign and malignant tumour images, especially in smaller dimensions of tumour regions of computed tomography (CT) images. Region-based segmentation of tumour from brain CT image data is an important but time-consuming task performed manually by medical experts. The objective of this work is to segment brain tumour from CT images using combined grey and texture features with new edge features and nonlinear support vector machine (SVM) classifier. The selected optimal features are used to model and train the nonlinear SVM classifier to segment the tumour from computed tomography images and the segmentation accuracies are evaluated for each slice of the tumour image. The method is applied on real data of 80 benign, malignant tumour images. The results are compared with the radiologist labelled ground truth. Quantitative analysis between ground truth and the segmented tumour is presented in terms of segmentation accuracy and the overlap similarity measure dice metric. From the analysis and performance measures such as segmentation accuracy and dice metric, it is inferred that better segmentation accuracy and higher dice metric are achieved with the normalized cut segmentation method than with the fuzzy c-means clustering method.
NASA Technical Reports Server (NTRS)
Kocurek, Michael J.
2005-01-01
The HARVIST project seeks to automatically provide an accurate, interactive interface to predict crop yield over the entire United States. In order to accomplish this goal, large images must be quickly and automatically classified by crop type. Current trained and untrained classification algorithms, while accurate, are highly inefficient when operating on large datasets. This project sought to develop new variants of two standard trained and untrained classification algorithms that are optimized to take advantage of the spatial nature of image data. The first algorithm, harvist-cluster, utilizes divide-and-conquer techniques to precluster an image in the hopes of increasing overall clustering speed. The second algorithm, harvistSVM, utilizes support vector machines (SVMs), a type of trained classifier. It seeks to increase classification speed by applying a "meta-SVM" to a quick (but inaccurate) SVM to approximate a slower, yet more accurate, SVM. Speedups were achieved by tuning the algorithm to quickly identify when the quick SVM was incorrect, and then reclassifying low-confidence pixels as necessary. Comparing the classification speeds of both algorithms to known baselines showed a slight speedup for large values of k (the number of clusters) for harvist-cluster, and a significant speedup for harvistSVM. Future work aims to automate the parameter tuning process required for harvistSVM, and further improve classification accuracy and speed. Additionally, this research will move documents created in Canvas into ArcGIS. The launch of the Mars Reconnaissance Orbiter (MRO) will provide a wealth of image data such as global maps of Martian weather and high resolution global images of Mars. The ability to store this new data in a georeferenced format will support future Mars missions by providing data for landing site selection and the search for water on Mars.
Encoding the local connectivity patterns of fMRI for cognitive task and state classification.
Onal Ertugrul, Itir; Ozay, Mete; Yarman Vural, Fatos T
2018-06-15
In this work, we propose a novel framework to encode the local connectivity patterns of brain, using Fisher vectors (FV), vector of locally aggregated descriptors (VLAD) and bag-of-words (BoW) methods. We first obtain local descriptors, called mesh arc descriptors (MADs) from fMRI data, by forming local meshes around anatomical regions, and estimating their relationship within a neighborhood. Then, we extract a dictionary of relationships, called brain connectivity dictionary by fitting a generative Gaussian mixture model (GMM) to a set of MADs, and selecting codewords at the mean of each component of the mixture. Codewords represent connectivity patterns among anatomical regions. We also encode MADs by VLAD and BoW methods using k-Means clustering. We classify cognitive tasks using the Human Connectome Project (HCP) task fMRI dataset and cognitive states using the Emotional Memory Retrieval (EMR). We train support vector machines (SVMs) using the encoded MADs. Results demonstrate that, FV encoding of MADs can be successfully employed for classification of cognitive tasks, and outperform VLAD and BoW representations. Moreover, we identify the significant Gaussians in mixture models by computing energy of their corresponding FV parts, and analyze their effect on classification accuracy. Finally, we suggest a new method to visualize the codewords of the learned brain connectivity dictionary.
A new method to cluster genomes based on cumulative Fourier power spectrum.
Dong, Rui; Zhu, Ziyue; Yin, Changchuan; He, Rong L; Yau, Stephen S-T
2018-06-20
Analyzing phylogenetic relationships using mathematical methods has always been of importance in bioinformatics. Quantitative research may interpret the raw biological data in a precise way. Multiple Sequence Alignment (MSA) is used frequently to analyze biological evolutions, but is very time-consuming. When the scale of data is large, alignment methods cannot finish calculation in reasonable time. Therefore, we present a new method using moments of cumulative Fourier power spectrum in clustering the DNA sequences. Each sequence is translated into a vector in Euclidean space. Distances between the vectors can reflect the relationships between sequences. The mapping between the spectra and moment vector is one-to-one, which means that no information is lost in the power spectra during the calculation. We cluster and classify several datasets including Influenza A, primates, and human rhinovirus (HRV) datasets to build up the phylogenetic trees. Results show that the new proposed cumulative Fourier power spectrum is much faster and more accurately than MSA and another alignment-free method known as k-mer. The research provides us new insights in the study of phylogeny, evolution, and efficient DNA comparison algorithms for large genomes. The computer programs of the cumulative Fourier power spectrum are available at GitHub (https://github.com/YaulabTsinghua/cumulative-Fourier-power-spectrum). Copyright © 2018. Published by Elsevier B.V.
Comparison of organs' shapes with geometric and Zernike 3D moments.
Broggio, D; Moignier, A; Ben Brahim, K; Gardumi, A; Grandgirard, N; Pierrat, N; Chea, M; Derreumaux, S; Desbrée, A; Boisserie, G; Aubert, B; Mazeron, J-J; Franck, D
2013-09-01
The morphological similarity of organs is studied with feature vectors based on geometric and Zernike 3D moments. It is particularly investigated if outliers and average models can be identified. For this purpose, the relative proximity to the mean feature vector is defined, principal coordinate and clustering analyses are also performed. To study the consistency and usefulness of this approach, 17 livers and 76 hearts voxel models from several sources are considered. In the liver case, models with similar morphological feature are identified. For the limited amount of studied cases, the liver of the ICRP male voxel model is identified as a better surrogate than the female one. For hearts, the clustering analysis shows that three heart shapes represent about 80% of the morphological variations. The relative proximity and clustering analysis rather consistently identify outliers and average models. For the two cases, identification of outliers and surrogate of average models is rather robust. However, deeper classification of morphological feature is subject to caution and can only be performed after cross analysis of at least two kinds of feature vectors. Finally, the Zernike moments contain all the information needed to re-construct the studied objects and thus appear as a promising tool to derive statistical organ shapes. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Elastic K-means using posterior probability.
Zheng, Aihua; Jiang, Bo; Li, Yan; Zhang, Xuehan; Ding, Chris
2017-01-01
The widely used K-means clustering is a hard clustering algorithm. Here we propose a Elastic K-means clustering model (EKM) using posterior probability with soft capability where each data point can belong to multiple clusters fractionally and show the benefit of proposed Elastic K-means. Furthermore, in many applications, besides vector attributes information, pairwise relations (graph information) are also available. Thus we integrate EKM with Normalized Cut graph clustering into a single clustering formulation. Finally, we provide several useful matrix inequalities which are useful for matrix formulations of learning models. Based on these results, we prove the correctness and the convergence of EKM algorithms. Experimental results on six benchmark datasets demonstrate the effectiveness of proposed EKM and its integrated model.
Classifying epileptic EEG signals with delay permutation entropy and Multi-Scale K-means.
Zhu, Guohun; Li, Yan; Wen, Peng Paul; Wang, Shuaifang
2015-01-01
Most epileptic EEG classification algorithms are supervised and require large training datasets, that hinder their use in real time applications. This chapter proposes an unsupervised Multi-Scale K-means (MSK-means) MSK-means algorithm to distinguish epileptic EEG signals and identify epileptic zones. The random initialization of the K-means algorithm can lead to wrong clusters. Based on the characteristics of EEGs, the MSK-means MSK-means algorithm initializes the coarse-scale centroid of a cluster with a suitable scale factor. In this chapter, the MSK-means algorithm is proved theoretically superior to the K-means algorithm on efficiency. In addition, three classifiers: the K-means, MSK-means MSK-means and support vector machine (SVM), are used to identify seizure and localize epileptogenic zone using delay permutation entropy features. The experimental results demonstrate that identifying seizure with the MSK-means algorithm and delay permutation entropy achieves 4. 7 % higher accuracy than that of K-means, and 0. 7 % higher accuracy than that of the SVM.
Joint Spatial-Spectral Feature Space Clustering for Speech Activity Detection from ECoG Signals
Kanas, Vasileios G.; Mporas, Iosif; Benz, Heather L.; Sgarbas, Kyriakos N.; Bezerianos, Anastasios; Crone, Nathan E.
2014-01-01
Brain machine interfaces for speech restoration have been extensively studied for more than two decades. The success of such a system will depend in part on selecting the best brain recording sites and signal features corresponding to speech production. The purpose of this study was to detect speech activity automatically from electrocorticographic signals based on joint spatial-frequency clustering of the ECoG feature space. For this study, the ECoG signals were recorded while a subject performed two different syllable repetition tasks. We found that the optimal frequency resolution to detect speech activity from ECoG signals was 8 Hz, achieving 98.8% accuracy by employing support vector machines (SVM) as a classifier. We also defined the cortical areas that held the most information about the discrimination of speech and non-speech time intervals. Additionally, the results shed light on the distinct cortical areas associated with the two syllable repetition tasks and may contribute to the development of portable ECoG-based communication. PMID:24658248
Entomologic and molecular investigation into Plasmodium vivax transmission in Singapore, 2009.
Ng, Lee-Ching; Lee, Kim-Sung; Tan, Cheong-Huat; Ooi, Peng-Lim; Lam-Phua, Sai-Gek; Lin, Raymond; Pang, Sook-Cheng; Lai, Yee-Ling; Solhan, Suhana; Chan, Pei-Pei; Wong, Kit-Yin; Ho, Swee-Tuan; Vythilingam, Indra
2010-10-29
Singapore has been certified malaria free since November 1982 by the World Health Organization and despite occasional local transmission, the country has maintained the standing. In 2009, three clusters of malaria cases were reported in Singapore. Epidemiological, entomological and molecular studies were carried out to investigate the three clusters, namely Mandai-Sungei Kadut, Jurong Island and Sembawang. A total of 29 malaria patients, with no recent travel history, were reported in the three clusters. Molecular analysis based on the msp3α and msp1 genes showed two independent local transmissions: one in Mandai-Sungei Kadut and another in Sembawang. Almost all cases within each cluster were epidemiologically linked. In Jurong Island cluster, epidemiological link remains uncertain, as almost all cases had a unique genetic profile. Only two cases shared a common profile and were found to be linked to the Mandai-Sungei Kadut cluster. Entomological investigation found Anopheles sinensis to be the predominant Anopheline in the two areas where local transmission of P. vivax was confirmed. Anopheles sinensis was found to be attracted to human bait and bites as early as 19:45 hrs. However, all Anopheles mosquitoes caught were negative for sporozoites and oocysts by dissection. Investigation of P. vivax cases from the three cluster areas confirmed the occurrence of local transmission in two areas. Although An. sinensis was the predominant Anopheline found in areas with confirmed transmission, the vector/s responsible for the outbreaks still remains cryptic.
Automatic document classification of biological literature
Chen, David; Müller, Hans-Michael; Sternberg, Paul W
2006-01-01
Background Document classification is a wide-spread problem with many applications, from organizing search engine snippets to spam filtering. We previously described Textpresso, a text-mining system for biological literature, which marks up full text according to a shallow ontology that includes terms of biological interest. This project investigates document classification in the context of biological literature, making use of the Textpresso markup of a corpus of Caenorhabditis elegans literature. Results We present a two-step text categorization algorithm to classify a corpus of C. elegans papers. Our classification method first uses a support vector machine-trained classifier, followed by a novel, phrase-based clustering algorithm. This clustering step autonomously creates cluster labels that are descriptive and understandable by humans. This clustering engine performed better on a standard test-set (Reuters 21578) compared to previously published results (F-value of 0.55 vs. 0.49), while producing cluster descriptions that appear more useful. A web interface allows researchers to quickly navigate through the hierarchy and look for documents that belong to a specific concept. Conclusion We have demonstrated a simple method to classify biological documents that embodies an improvement over current methods. While the classification results are currently optimized for Caenorhabditis elegans papers by human-created rules, the classification engine can be adapted to different types of documents. We have demonstrated this by presenting a web interface that allows researchers to quickly navigate through the hierarchy and look for documents that belong to a specific concept. PMID:16893465
Ground states of larger nuclei
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pieper, S.C.; Wiringa, R.B.; Pandharipande, V.R.
1995-08-01
The methods used for the few-body nuclei require operations on the complete spin-isospin vector; the size of this vector makes such methods impractical for nuclei with A > 8. During the last few years we developed cluster expansion methods that do not require operations on the complete vector. We use the same Hamiltonians as for the few-body nuclei and variational wave functions of form similar to the few-body wave functions. The cluster expansions are made for the noncentral parts of the wave functions and for the operators whose expectation values are being evaluated. The central pair correlations in the wavemore » functions are treated exactly and this requires the evaluation of 3A-dimensional integrals which are done with Monte Carlo techniques. Most of our effort was on {sup 16}O, other p-shell nuclei, and {sup 40}Ca. In 1993 the Mathematics and Computer Science Division acquired a 128-processor IBM SP which has a theoretical peak speed of 16 Gigaflops (GFLOPS). We converted our program to run on this machine. Because of the large memory on each node of the SP, it was easy to convert the program to parallel form with very low communication overhead. Considerably more effort was needed to restructure the program from one oriented towards long vectors for the Cray computers at NERSC to one that makes efficient use of the cache of the RS6000 architecture. The SP made possible complete five-body cluster calculations of {sup 16}O for the first time; previously we could only do four-body cluster calculations. These calculations show that the expectation value of the two-body potential is converging less rapidly than we had thought, while that of the three-body potential is more rapidly convergent; the net result is no significant change to our predicted binding energy for {sup 16}O using the new Argonne v{sub 18} potential and the Urbana IX three-nucleon potential. This result is in good agreement with experiment.« less
Removal of impulse noise clusters from color images with local order statistics
NASA Astrophysics Data System (ADS)
Ruchay, Alexey; Kober, Vitaly
2017-09-01
This paper proposes a novel algorithm for restoring images corrupted with clusters of impulse noise. The noise clusters often occur when the probability of impulse noise is very high. The proposed noise removal algorithm consists of detection of bulky impulse noise in three color channels with local order statistics followed by removal of the detected clusters by means of vector median filtering. With the help of computer simulation we show that the proposed algorithm is able to effectively remove clustered impulse noise. The performance of the proposed algorithm is compared in terms of image restoration metrics with that of common successful algorithms.
MCMC Sampling for a Multilevel Model with Nonindependent Residuals within and between Cluster Units
ERIC Educational Resources Information Center
Browne, William; Goldstein, Harvey
2010-01-01
In this article, we discuss the effect of removing the independence assumptions between the residuals in two-level random effect models. We first consider removing the independence between the Level 2 residuals and instead assume that the vector of all residuals at the cluster level follows a general multivariate normal distribution. We…
Zhang, Bo; Zhang, Lin; Dai, Ruixue; Yu, Meiying; Zhao, Guoping; Ding, Xiaoming
2013-01-01
Streptomyces bacteria are known for producing important natural compounds by secondary metabolism, especially antibiotics with novel biological activities. Functional studies of antibiotic-biosynthesizing gene clusters are generally through homologous genomic recombination by gene-targeting vectors. Here, we present a rapid and efficient method for construction of gene-targeting vectors. This approach is based on Streptomyces phage φBT1 integrase-mediated multisite in vitro site-specific recombination. Four 'entry clones' were assembled into a circular plasmid to generate the destination gene-targeting vector by a one-step reaction. The four 'entry clones' contained two clones of the upstream and downstream flanks of the target gene, a selectable marker and an E. coli-Streptomyces shuttle vector. After targeted modification of the genome, the selectable markers were removed by φC31 integrase-mediated in vivo site-specific recombination between pre-placed attB and attP sites. Using this method, part of the calcium-dependent antibiotic (CDA) and actinorhodin (Act) biosynthetic gene clusters were deleted, and the rrdA encoding RrdA, a negative regulator of Red production, was also deleted. The final prodiginine production of the engineered strain was over five times that of the wild-type strain. This straightforward φBT1 and φC31 integrase-based strategy provides an alternative approach for rapid gene-targeting vector construction and marker removal in streptomycetes.
Espinosa, Manuel O; Polop, Francisco; Rotela, Camilo H; Abril, Marcelo; Scavuzzo, Carlos M
2016-11-21
The main objective of this study was to obtain and analyse the space-time dynamics of Aedes aegypti breeding sites in Clorinda City, Formosa Province, Argentina coupled with landscape analysis using the maximum entropy approach in order to generate a dengue vector niche model. In urban areas, without vector control activities, 12 entomologic (larval) samplings were performed during three years (October 2011 to October 2014). The entomologic surveillance area represented 16,511 houses. Predictive models for Aedes distribution were developed using vector breeding abundance data, density analysis, clustering and geoprocessing techniques coupled with Earth observation satellite data. The spatial analysis showed a vector spatial distribution pattern with clusters of high density in the central region of Clorinda with a well-defined high-risk area in the western part of the city. It also showed a differential temporal behaviour among different areas, which could have implications for risk models and control strategies at the urban scale. The niche model obtained for Ae. aegypti, based on only one year of field data, showed that 85.8% of the distribution of breeding sites is explained by the percentage of water supply (48.2%), urban distribution (33.2%), and the percentage of urban coverage (4.4%). The consequences for the development of control strategies are discussed with reference to the results obtained using distribution maps based on environmental variables.
Robust support vector regression networks for function approximation with outliers.
Chuang, Chen-Chia; Su, Shun-Feng; Jeng, Jin-Tsong; Hsiao, Chih-Ching
2002-01-01
Support vector regression (SVR) employs the support vector machine (SVM) to tackle problems of function approximation and regression estimation. SVR has been shown to have good robust properties against noise. When the parameters used in SVR are improperly selected, overfitting phenomena may still occur. However, the selection of various parameters is not straightforward. Besides, in SVR, outliers may also possibly be taken as support vectors. Such an inclusion of outliers in support vectors may lead to seriously overfitting phenomena. In this paper, a novel regression approach, termed as the robust support vector regression (RSVR) network, is proposed to enhance the robust capability of SVR. In the approach, traditional robust learning approaches are employed to improve the learning performance for any selected parameters. From the simulation results, our RSVR can always improve the performance of the learned systems for all cases. Besides, it can be found that even the training lasted for a long period, the testing errors would not go up. In other words, the overfitting phenomenon is indeed suppressed.
Fuzzy support vector machine: an efficient rule-based classification technique for microarrays.
Hajiloo, Mohsen; Rabiee, Hamid R; Anooshahpour, Mahdi
2013-01-01
The abundance of gene expression microarray data has led to the development of machine learning algorithms applicable for tackling disease diagnosis, disease prognosis, and treatment selection problems. However, these algorithms often produce classifiers with weaknesses in terms of accuracy, robustness, and interpretability. This paper introduces fuzzy support vector machine which is a learning algorithm based on combination of fuzzy classifiers and kernel machines for microarray classification. Experimental results on public leukemia, prostate, and colon cancer datasets show that fuzzy support vector machine applied in combination with filter or wrapper feature selection methods develops a robust model with higher accuracy than the conventional microarray classification models such as support vector machine, artificial neural network, decision trees, k nearest neighbors, and diagonal linear discriminant analysis. Furthermore, the interpretable rule-base inferred from fuzzy support vector machine helps extracting biological knowledge from microarray data. Fuzzy support vector machine as a new classification model with high generalization power, robustness, and good interpretability seems to be a promising tool for gene expression microarray classification.
Currency crisis indication by using ensembles of support vector machine classifiers
NASA Astrophysics Data System (ADS)
Ramli, Nor Azuana; Ismail, Mohd Tahir; Wooi, Hooy Chee
2014-07-01
There are many methods that had been experimented in the analysis of currency crisis. However, not all methods could provide accurate indications. This paper introduces an ensemble of classifiers by using Support Vector Machine that's never been applied in analyses involving currency crisis before with the aim of increasing the indication accuracy. The proposed ensemble classifiers' performances are measured using percentage of accuracy, root mean squared error (RMSE), area under the Receiver Operating Characteristics (ROC) curve and Type II error. The performances of an ensemble of Support Vector Machine classifiers are compared with the single Support Vector Machine classifier and both of classifiers are tested on the data set from 27 countries with 12 macroeconomic indicators for each country. From our analyses, the results show that the ensemble of Support Vector Machine classifiers outperforms single Support Vector Machine classifier on the problem involving indicating a currency crisis in terms of a range of standard measures for comparing the performance of classifiers.
A Human Activity Recognition System Based on Dynamic Clustering of Skeleton Data.
Manzi, Alessandro; Dario, Paolo; Cavallo, Filippo
2017-05-11
Human activity recognition is an important area in computer vision, with its wide range of applications including ambient assisted living. In this paper, an activity recognition system based on skeleton data extracted from a depth camera is presented. The system makes use of machine learning techniques to classify the actions that are described with a set of a few basic postures. The training phase creates several models related to the number of clustered postures by means of a multiclass Support Vector Machine (SVM), trained with Sequential Minimal Optimization (SMO). The classification phase adopts the X-means algorithm to find the optimal number of clusters dynamically. The contribution of the paper is twofold. The first aim is to perform activity recognition employing features based on a small number of informative postures, extracted independently from each activity instance; secondly, it aims to assess the minimum number of frames needed for an adequate classification. The system is evaluated on two publicly available datasets, the Cornell Activity Dataset (CAD-60) and the Telecommunication Systems Team (TST) Fall detection dataset. The number of clusters needed to model each instance ranges from two to four elements. The proposed approach reaches excellent performances using only about 4 s of input data (~100 frames) and outperforms the state of the art when it uses approximately 500 frames on the CAD-60 dataset. The results are promising for the test in real context.
Co-clustering phenome–genome for phenotype classification and disease gene discovery
Hwang, TaeHyun; Atluri, Gowtham; Xie, MaoQiang; Dey, Sanjoy; Hong, Changjin; Kumar, Vipin; Kuang, Rui
2012-01-01
Understanding the categorization of human diseases is critical for reliably identifying disease causal genes. Recently, genome-wide studies of abnormal chromosomal locations related to diseases have mapped >2000 phenotype–gene relations, which provide valuable information for classifying diseases and identifying candidate genes as drug targets. In this article, a regularized non-negative matrix tri-factorization (R-NMTF) algorithm is introduced to co-cluster phenotypes and genes, and simultaneously detect associations between the detected phenotype clusters and gene clusters. The R-NMTF algorithm factorizes the phenotype–gene association matrix under the prior knowledge from phenotype similarity network and protein–protein interaction network, supervised by the label information from known disease classes and biological pathways. In the experiments on disease phenotype–gene associations in OMIM and KEGG disease pathways, R-NMTF significantly improved the classification of disease phenotypes and disease pathway genes compared with support vector machines and Label Propagation in cross-validation on the annotated phenotypes and genes. The newly predicted phenotypes in each disease class are highly consistent with human phenotype ontology annotations. The roles of the new member genes in the disease pathways are examined and validated in the protein–protein interaction subnetworks. Extensive literature review also confirmed many new members of the disease classes and pathways as well as the predicted associations between disease phenotype classes and pathways. PMID:22735708
A novel approach to internal crown characterization for coniferous tree species classification
NASA Astrophysics Data System (ADS)
Harikumar, A.; Bovolo, F.; Bruzzone, L.
2016-10-01
The knowledge about individual trees in forest is highly beneficial in forest management. High density small foot- print multi-return airborne Light Detection and Ranging (LiDAR) data can provide a very accurate information about the structural properties of individual trees in forests. Every tree species has a unique set of crown structural characteristics that can be used for tree species classification. In this paper, we use both the internal and external crown structural information of a conifer tree crown, derived from a high density small foot-print multi-return LiDAR data acquisition for species classification. Considering the fact that branches are the major building blocks of a conifer tree crown, we obtain the internal crown structural information using a branch level analysis. The structure of each conifer branch is represented using clusters in the LiDAR point cloud. We propose the joint use of the k-means clustering and geometric shape fitting, on the LiDAR data projected onto a novel 3-dimensional space, to identify branch clusters. After mapping the identified clusters back to the original space, six internal geometric features are estimated using a branch-level analysis. The external crown characteristics are modeled by using six least correlated features based on cone fitting and convex hull. Species classification is performed using a sparse Support Vector Machines (sparse SVM) classifier.
NASA Astrophysics Data System (ADS)
Alagha, Jawad S.; Seyam, Mohammed; Md Said, Md Azlin; Mogheir, Yunes
2017-12-01
Artificial intelligence (AI) techniques have increasingly become efficient alternative modeling tools in the water resources field, particularly when the modeled process is influenced by complex and interrelated variables. In this study, two AI techniques—artificial neural networks (ANNs) and support vector machine (SVM)—were employed to achieve deeper understanding of the salinization process (represented by chloride concentration) in complex coastal aquifers influenced by various salinity sources. Both models were trained using 11 years of groundwater quality data from 22 municipal wells in Khan Younis Governorate, Gaza, Palestine. Both techniques showed satisfactory prediction performance, where the mean absolute percentage error (MAPE) and correlation coefficient ( R) for the test data set were, respectively, about 4.5 and 99.8% for the ANNs model, and 4.6 and 99.7% for SVM model. The performances of the developed models were further noticeably improved through preprocessing the wells data set using a k-means clustering method, then conducting AI techniques separately for each cluster. The developed models with clustered data were associated with higher performance, easiness and simplicity. They can be employed as an analytical tool to investigate the influence of input variables on coastal aquifer salinity, which is of great importance for understanding salinization processes, leading to more effective water-resources-related planning and decision making.
Vadivalagan, Chithravel; Karthika, Pushparaj; Murugan, Kadarkarai; Panneerselvam, Chellasamy; Paulpandi, Manickam; Madhiyazhagan, Pari; Wei, Hui; Aziz, Al Thabiani; Alsalhi, Mohamad Saleh; Devanesan, Sandhanasamy; Nicoletti, Marcello; Paramasivan, Rajaiah; Dinesh, Devakumar; Benelli, Giovanni
2016-03-01
Mosquitoes are vectors of devastating pathogens and parasites, causing millions of deaths every year. Dengue is a mosquito-borne viral infection found in tropical and subtropical regions around the world. Recently, dengue transmission has strongly increased in urban and semiurban areas, becoming a major international public health concern. Aedes aegypti (Diptera: Culicidae) is a primary vector of dengue. Shedding light on genetic deviation in A. aegypti populations is of crucial importance to fully understand their molecular ecology and evolution. In this research, haplotype and genetic analyses were conducted using individuals of A. aegypti from 31 localities in the north, southeast, northeast and central regions of Tamil Nadu (South India). The mitochondrial DNA region of cytochrome c oxidase 1 (CO1) gene was used as marker for the analyses. Thirty-one haplotypes sequences were submitted to GenBank and authenticated. The complete haplotype set included 64 haplotypes from various geographical regions clustered into three groups (lineages) separated by three fixed mutational steps, suggesting that the South Indian Ae. aegypti populations were pooled and are linked with West Africa, Columbian and Southeast Asian lineages. The genetic and haplotype diversity was low, indicating reduced gene flow among close populations of the vector, due to geographical barriers such as water bodies. Lastly, the negative values for neutrality tests indicated a bottle-neck effect and supported for low frequency of polymorphism among the haplotypes. Overall, our results add basic knowledge to molecular ecology of the dengue vector A. aegypti, providing the first evidence for multiple introductions of Ae. aegypti populations from Columbia and West Africa in South India.
Application of neuroanatomical features to tractography clustering.
Wang, Qian; Yap, Pew-Thian; Wu, Guorong; Shen, Dinggang
2013-09-01
Diffusion tensor imaging allows unprecedented insight into brain neural connectivity in vivo by allowing reconstruction of neuronal tracts via captured patterns of water diffusion in white matter microstructures. However, tractography algorithms often output hundreds of thousands of fibers, rendering subsequent data analysis intractable. As a remedy, fiber clustering techniques are able to group fibers into dozens of bundles and thus facilitate analyses. Most existing fiber clustering methods rely on geometrical information of fibers, by viewing them as curves in 3D Euclidean space. The important neuroanatomical aspect of fibers, however, is ignored. In this article, the neuroanatomical information of each fiber is encapsulated in the associativity vector, which functions as the unique "fingerprint" of the fiber. Specifically, each entry in the associativity vector describes the relationship between the fiber and a certain anatomical ROI in a fuzzy manner. The value of the entry approaches 1 if the fiber is spatially related to the ROI at high confidence; on the contrary, the value drops closer to 0. The confidence of the ROI is calculated by diffusing the ROI according to the underlying fibers from tractography. In particular, we have adopted the fast marching method for simulation of ROI diffusion. Using the associativity vectors of fibers, we further model fibers as observations sampled from multivariate Gaussian mixtures in the feature space. To group all fibers into relevant major bundles, an expectation-maximization clustering approach is employed. Experimental results indicate that our method results in anatomically meaningful bundles that are highly consistent across subjects. Copyright © 2012 Wiley Periodicals, Inc., a Wiley company.
Effective traffic features selection algorithm for cyber-attacks samples
NASA Astrophysics Data System (ADS)
Li, Yihong; Liu, Fangzheng; Du, Zhenyu
2018-05-01
By studying the defense scheme of Network attacks, this paper propose an effective traffic features selection algorithm based on k-means++ clustering to deal with the problem of high dimensionality of traffic features which extracted from cyber-attacks samples. Firstly, this algorithm divide the original feature set into attack traffic feature set and background traffic feature set by the clustering. Then, we calculates the variation of clustering performance after removing a certain feature. Finally, evaluating the degree of distinctiveness of the feature vector according to the result. Among them, the effective feature vector is whose degree of distinctiveness exceeds the set threshold. The purpose of this paper is to select out the effective features from the extracted original feature set. In this way, it can reduce the dimensionality of the features so as to reduce the space-time overhead of subsequent detection. The experimental results show that the proposed algorithm is feasible and it has some advantages over other selection algorithms.
TWSVR: Regression via Twin Support Vector Machine.
Khemchandani, Reshma; Goyal, Keshav; Chandra, Suresh
2016-02-01
Taking motivation from Twin Support Vector Machine (TWSVM) formulation, Peng (2010) attempted to propose Twin Support Vector Regression (TSVR) where the regressor is obtained via solving a pair of quadratic programming problems (QPPs). In this paper we argue that TSVR formulation is not in the true spirit of TWSVM. Further, taking motivation from Bi and Bennett (2003), we propose an alternative approach to find a formulation for Twin Support Vector Regression (TWSVR) which is in the true spirit of TWSVM. We show that our proposed TWSVR can be derived from TWSVM for an appropriately constructed classification problem. To check the efficacy of our proposed TWSVR we compare its performance with TSVR and classical Support Vector Regression(SVR) on various regression datasets. Copyright © 2015 Elsevier Ltd. All rights reserved.
Lu, Zhao; Sun, Jing; Butts, Kenneth
2014-05-01
Support vector regression for approximating nonlinear dynamic systems is more delicate than the approximation of indicator functions in support vector classification, particularly for systems that involve multitudes of time scales in their sampled data. The kernel used for support vector learning determines the class of functions from which a support vector machine can draw its solution, and the choice of kernel significantly influences the performance of a support vector machine. In this paper, to bridge the gap between wavelet multiresolution analysis and kernel learning, the closed-form orthogonal wavelet is exploited to construct new multiscale asymmetric orthogonal wavelet kernels for linear programming support vector learning. The closed-form multiscale orthogonal wavelet kernel provides a systematic framework to implement multiscale kernel learning via dyadic dilations and also enables us to represent complex nonlinear dynamics effectively. To demonstrate the superiority of the proposed multiscale wavelet kernel in identifying complex nonlinear dynamic systems, two case studies are presented that aim at building parallel models on benchmark datasets. The development of parallel models that address the long-term/mid-term prediction issue is more intricate and challenging than the identification of series-parallel models where only one-step ahead prediction is required. Simulation results illustrate the effectiveness of the proposed multiscale kernel learning.
Elastic K-means using posterior probability
Zheng, Aihua; Jiang, Bo; Li, Yan; Zhang, Xuehan; Ding, Chris
2017-01-01
The widely used K-means clustering is a hard clustering algorithm. Here we propose a Elastic K-means clustering model (EKM) using posterior probability with soft capability where each data point can belong to multiple clusters fractionally and show the benefit of proposed Elastic K-means. Furthermore, in many applications, besides vector attributes information, pairwise relations (graph information) are also available. Thus we integrate EKM with Normalized Cut graph clustering into a single clustering formulation. Finally, we provide several useful matrix inequalities which are useful for matrix formulations of learning models. Based on these results, we prove the correctness and the convergence of EKM algorithms. Experimental results on six benchmark datasets demonstrate the effectiveness of proposed EKM and its integrated model. PMID:29240756
Applications of Some Artificial Intelligence Methods to Satellite Soundings
NASA Technical Reports Server (NTRS)
Munteanu, M. J.; Jakubowicz, O.
1985-01-01
Hard clustering of temperature profiles and regression temperature retrievals were used to refine the method using the probabilities of membership of each pattern vector in each of the clusters derived with discriminant analysis. In hard clustering the maximum probability is taken and the corresponding cluster as the correct cluster are considered discarding the rest of the probabilities. In fuzzy partitioned clustering these probabilities are kept and the final regression retrieval is a weighted regression retrieval of several clusters. This method was used in the clustering of brightness temperatures where the purpose was to predict tropopause height. A further refinement is the division of temperature profiles into three major regions for classification purposes. The results are summarized in the tables total r.m.s. errors are displayed. An approach based on fuzzy logic which is intimately related to artificial intelligence methods is recommended.
Phylogenetics of the phlebotomine sand fly group Verrucarum (Diptera: Psychodidae: Lutzomyia).
Cohnstaedt, Lee W; Beati, Lorenza; Caceres, Abraham G; Ferro, Cristina; Munstermann, Leonard E
2011-06-01
Within the sand fly genus Lutzomyia, the Verrucarum species group contains several of the principal vectors of American cutaneous leishmaniasis and human bartonellosis in the Andean region of South America. The group encompasses 40 species for which the taxonomic status, phylogenetic relationships, and role of each species in disease transmission remain unresolved. Mitochondrial cytochrome c oxidase I (COI) phylogenetic analysis of a 667-bp fragment supported the morphological classification of the Verrucarum group into series. Genetic sequences from seven species were grouped in well-supported monophyletic lineages. Four species, however, clustered in two paraphyletic lineages that indicate conspecificity--the Lutzomyia longiflocosa-Lutzomyia sauroida pair and the Lutzomyia quasitownsendi-Lutzomyia torvida pair. COI sequences were also evaluated as a taxonomic tool based on interspecific genetic variability within the Verrucarum group and the intraspecific variability of one of its members, Lutzomyia verrucarum, across its known distribution.
Phylogenetics of the Phlebotomine Sand Fly Group Verrucarum (Diptera: Psychodidae: Lutzomyia)
Cohnstaedt, Lee W.; Beati, Lorenza; Caceres, Abraham G.; Ferro, Cristina; Munstermann, Leonard E.
2011-01-01
Within the sand fly genus Lutzomyia, the Verrucarum species group contains several of the principal vectors of American cutaneous leishmaniasis and human bartonellosis in the Andean region of South America. The group encompasses 40 species for which the taxonomic status, phylogenetic relationships, and role of each species in disease transmission remain unresolved. Mitochondrial cytochrome c oxidase I (COI) phylogenetic analysis of a 667-bp fragment supported the morphological classification of the Verrucarum group into series. Genetic sequences from seven species were grouped in well-supported monophyletic lineages. Four species, however, clustered in two paraphyletic lineages that indicate conspecificity—the Lutzomyia longiflocosa–Lutzomyia sauroida pair and the Lutzomyia quasitownsendi–Lutzomyia torvida pair. COI sequences were also evaluated as a taxonomic tool based on interspecific genetic variability within the Verrucarum group and the intraspecific variability of one of its members, Lutzomyia verrucarum, across its known distribution. PMID:21633028
Echodu, Richard; Opiyo, Elizabeth A.; Dion, Kirstin; Halyard, Alexis; Dunn, Augustine W.; Aksoy, Serap; Caccone, Adalgisa
2017-01-01
Uganda is the only country where the chronic and acute forms of human African Trypanosomiasis (HAT) or sleeping sickness both occur and are separated by < 100 km in areas north of Lake Kyoga. In Uganda, Glossina fuscipes fuscipes is the main vector of the Trypanosoma parasites responsible for these diseases as well for the animal African Trypanosomiasis (AAT), or Nagana. We used highly polymorphic microsatellite loci and a mitochondrial DNA (mtDNA) marker to provide fine scale spatial resolution of genetic structure of G. f. fuscipes from 42 sampling sites from the northern region of Uganda where a merger of the two disease belts is feared. Based on microsatellite analyses, we found that G. f. fuscipes in northern Uganda are structured into three distinct genetic clusters with varying degrees of interconnectivity among them. Based on genetic assignment and spatial location, we grouped the sampling sites into four genetic units corresponding to northwestern Uganda in the Albert Nile drainage, northeastern Uganda in the Lake Kyoga drainage, western Uganda in the Victoria Nile drainage, and a transition zone between the two northern genetic clusters characterized by high level of genetic admixture. An analysis using HYBRIDLAB supported a hybrid swarm model as most consistent with tsetse genotypes in these admixed samples. Results of mtDNA analyses revealed the presence of 30 haplotypes representing three main haplogroups, whose location broadly overlaps with the microsatellite defined clusters. Migration analyses based on microsatellites point to moderate migration among the northern units located in the Albert Nile, Achwa River, Okole River, and Lake Kyoga drainages, but not between the northern units and the Victoria Nile drainage in the west. Effective population size estimates were variable with low to moderate sizes in most populations and with evidence of recent population bottlenecks, especially in the northeast unit of the Lake Kyoga drainage. Our microsatellite and mtDNA based analyses indicate that G. f. fuscipes movement along the Achwa and Okole rivers may facilitate northwest expansion of the Rhodesiense disease belt in Uganda. We identified tsetse migration corridors and recommend a rolling carpet approach from south of Lake Kyoga northward to minimize disease dispersal and prevent vector re-colonization. Additionally, our findings highlight the need for continuing tsetse monitoring efforts during and after control. PMID:28453513
Fast large-scale clustering of protein structures using Gauss integrals.
Harder, Tim; Borg, Mikael; Boomsma, Wouter; Røgen, Peter; Hamelryck, Thomas
2012-02-15
Clustering protein structures is an important task in structural bioinformatics. De novo structure prediction, for example, often involves a clustering step for finding the best prediction. Other applications include assigning proteins to fold families and analyzing molecular dynamics trajectories. We present Pleiades, a novel approach to clustering protein structures with a rigorous mathematical underpinning. The method approximates clustering based on the root mean square deviation by first mapping structures to Gauss integral vectors--which were introduced by Røgen and co-workers--and subsequently performing K-means clustering. Compared to current methods, Pleiades dramatically improves on the time needed to perform clustering, and can cluster a significantly larger number of structures, while providing state-of-the-art results. The number of low energy structures generated in a typical folding study, which is in the order of 50,000 structures, can be clustered within seconds to minutes.
A Code Generation Approach for Auto-Vectorization in the Spade Compiler
NASA Astrophysics Data System (ADS)
Wang, Huayong; Andrade, Henrique; Gedik, Buğra; Wu, Kun-Lung
We describe an auto-vectorization approach for the Spade stream processing programming language, comprising two ideas. First, we provide support for vectors as a primitive data type. Second, we provide a C++ library with architecture-specific implementations of a large number of pre-vectorized operations as the means to support language extensions. We evaluate our approach with several stream processing operators, contrasting Spade's auto-vectorization with the native auto-vectorization provided by the GNU gcc and Intel icc compilers.
Construction and Evaluation of Novel Rhesus Monkey Adenovirus Vaccine Vectors
Abbink, Peter; Maxfield, Lori F.; Ng'ang'a, David; Borducchi, Erica N.; Iampietro, M. Justin; Bricault, Christine A.; Teigler, Jeffrey E.; Blackmore, Stephen; Parenteau, Lily; Wagh, Kshitij; Handley, Scott A.; Zhao, Guoyan; Virgin, Herbert W.; Korber, Bette
2014-01-01
ABSTRACT Adenovirus vectors are widely used as vaccine candidates for a variety of pathogens, including HIV-1. To date, human and chimpanzee adenoviruses have been explored in detail as vaccine vectors. The phylogeny of human and chimpanzee adenoviruses is overlapping, and preexisting humoral and cellular immunity to both are exhibited in human populations worldwide. More distantly related adenoviruses may therefore offer advantages as vaccine vectors. Here we describe the primary isolation and vectorization of three novel adenoviruses from rhesus monkeys. The seroprevalence of these novel rhesus monkey adenovirus vectors was extremely low in sub-Saharan Africa human populations, and these vectors proved to have immunogenicity comparable to that of human and chimpanzee adenovirus vaccine vectors in mice. These rhesus monkey adenoviruses phylogenetically clustered with the poorly described adenovirus species G and robustly stimulated innate immune responses. These novel adenoviruses represent a new class of candidate vaccine vectors. IMPORTANCE Although there have been substantial efforts in the development of vaccine vectors from human and chimpanzee adenoviruses, far less is known about rhesus monkey adenoviruses. In this report, we describe the isolation and vectorization of three novel rhesus monkey adenoviruses. These vectors exhibit virologic and immunologic characteristics that make them attractive as potential candidate vaccine vectors for both HIV-1 and other pathogens. PMID:25410856
Construction and evaluation of novel rhesus monkey adenovirus vaccine vectors.
Abbink, Peter; Maxfield, Lori F; Ng'ang'a, David; Borducchi, Erica N; Iampietro, M Justin; Bricault, Christine A; Teigler, Jeffrey E; Blackmore, Stephen; Parenteau, Lily; Wagh, Kshitij; Handley, Scott A; Zhao, Guoyan; Virgin, Herbert W; Korber, Bette; Barouch, Dan H
2015-02-01
Adenovirus vectors are widely used as vaccine candidates for a variety of pathogens, including HIV-1. To date, human and chimpanzee adenoviruses have been explored in detail as vaccine vectors. The phylogeny of human and chimpanzee adenoviruses is overlapping, and preexisting humoral and cellular immunity to both are exhibited in human populations worldwide. More distantly related adenoviruses may therefore offer advantages as vaccine vectors. Here we describe the primary isolation and vectorization of three novel adenoviruses from rhesus monkeys. The seroprevalence of these novel rhesus monkey adenovirus vectors was extremely low in sub-Saharan Africa human populations, and these vectors proved to have immunogenicity comparable to that of human and chimpanzee adenovirus vaccine vectors in mice. These rhesus monkey adenoviruses phylogenetically clustered with the poorly described adenovirus species G and robustly stimulated innate immune responses. These novel adenoviruses represent a new class of candidate vaccine vectors. Although there have been substantial efforts in the development of vaccine vectors from human and chimpanzee adenoviruses, far less is known about rhesus monkey adenoviruses. In this report, we describe the isolation and vectorization of three novel rhesus monkey adenoviruses. These vectors exhibit virologic and immunologic characteristics that make them attractive as potential candidate vaccine vectors for both HIV-1 and other pathogens. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
Picado, Albert; Das, Murari L; Kumar, Vijay; Kesari, Shreekant; Dinesh, Diwakar S; Roy, Lalita; Rijal, Suman; Das, Pradeep; Rowland, Mark; Sundar, Shyam; Coosemans, Marc; Boelaert, Marleen; Davies, Clive R
2010-01-26
Visceral leishmaniasis (VL) control in the Indian subcontinent is currently based on case detection and treatment, and on vector control using indoor residual spraying (IRS). The use of long-lasting insecticidal nets (LN) has been postulated as an alternative or complement to IRS. Here we tested the impact of comprehensive distribution of LN on the density of Phlebotomus argentipes in VL-endemic villages. A cluster-randomized controlled trial with household P. argentipes density as outcome was designed. Twelve clusters from an ongoing LN clinical trial--three intervention and three control clusters in both India and Nepal--were selected on the basis of accessibility and VL incidence. Ten houses per cluster selected on the basis of high pre-intervention P. argentipes density were monitored monthly for 12 months after distribution of LN using CDC light traps (LT) and mouth aspiration methods. Ten cattle sheds per cluster were also monitored by aspiration. A random effect linear regression model showed that the cluster-wide distribution of LNs significantly reduced the P. argentipes density/house by 24.9% (95% CI 1.80%-42.5%) as measured by means of LTs. The ongoing clinical trial, designed to measure the impact of LNs on VL incidence, will confirm whether LNs should be adopted as a control strategy in the regional VL elimination programs. The entomological evidence described here provides some evidence that LNs could be usefully deployed as part of the VL control program. ClinicalTrials.gov CT-2005-015374.
Community involvement in dengue vector control: cluster randomised trial.
Vanlerberghe, V; Toledo, M E; Rodríguez, M; Gómez, D; Baly, A; Benítez, J R; Van der Stuyft, P
2010-01-01
To assess the effectiveness of an integrated community based environmental management strategy to control Aedes aegypti, the vector of dengue, compared with a routine strategy. Design Cluster randomised trial. Setting Guantanamo, Cuba. Participants 32 circumscriptions (around 2000 inhabitants each). Interventions The circumscriptions were randomly allocated to control clusters (n=16) comprising routine Aedes control programme (entomological surveillance, source reduction, selective adulticiding, and health education) and to intervention clusters (n=16) comprising the routine Aedes control programme combined with a community based environmental management approach. The primary outcome was levels of Aedes infestation: house index (number of houses positive for at least one container with immature stages of Ae aegypti per 100 inspected houses), Breteau index (number of containers positive for immature stages of Ae aegypti per 100 inspected houses), and the pupae per inhabitant statistic (number of Ae aegypti pupae per inhabitant). All clusters were subjected to the intended intervention; all completed the study protocol up to February 2006 and all were included in the analysis. At baseline the Aedes infestation levels were comparable between intervention and control clusters: house index 0.25% v 0.20%, pupae per inhabitant 0.44 x 10(-3) v 0.29 x 10(-3). At the end of the intervention these indices were significantly lower in the intervention clusters: rate ratio for house indices 0.49 (95% confidence interval 0.27 to 0.88) and rate ratio for pupae per inhabitant 0.27 (0.09 to 0.76). A community based environmental management embedded in a routine control programme was effective at reducing levels of Aedes infestation. Trial Registration Current Controlled Trials ISRCTN88405796.
Community involvement in dengue vector control: cluster randomised trial.
Vanlerberghe, V; Toledo, M E; Rodríguez, M; Gomez, D; Baly, A; Benitez, J R; Van der Stuyft, P
2009-06-09
To assess the effectiveness of an integrated community based environmental management strategy to control Aedes aegypti, the vector of dengue, compared with a routine strategy. Cluster randomised trial. Guantanamo, Cuba. 32 circumscriptions (around 2000 inhabitants each). The circumscriptions were randomly allocated to control clusters (n=16) comprising routine Aedes control programme (entomological surveillance, source reduction, selective adulticiding, and health education) and to intervention clusters (n=16) comprising the routine Aedes control programme combined with a community based environmental management approach. The primary outcome was levels of Aedes infestation: house index (number of houses positive for at least one container with immature stages of Ae aegypti per 100 inspected houses), Breteau index (number of containers positive for immature stages of Ae aegypti per 100 inspected houses), and the pupae per inhabitant statistic (number of Ae aegypti pupae per inhabitant). All clusters were subjected to the intended intervention; all completed the study protocol up to February 2006 and all were included in the analysis. At baseline the Aedes infestation levels were comparable between intervention and control clusters: house index 0.25% v 0.20%, pupae per inhabitant 0.44x10(-3) v 0.29x10(-3). At the end of the intervention these indices were significantly lower in the intervention clusters: rate ratio for house indices 0.49 (95% confidence interval 0.27 to 0.88) and rate ratio for pupae per inhabitant 0.27 (0.09 to 0.76). A community based environmental management embedded in a routine control programme was effective at reducing levels of Aedes infestation. Current Controlled Trials ISRCTN88405796.
Topological side-chain classification of beta-turns: ideal motifs for peptidomimetic development.
Tran, Tran Trung; McKie, Jim; Meutermans, Wim D F; Bourne, Gregory T; Andrews, Peter R; Smythe, Mark L
2005-08-01
Beta-turns are important topological motifs for biological recognition of proteins and peptides. Organic molecules that sample the side chain positions of beta-turns have shown broad binding capacity to multiple different receptors, for example benzodiazepines. Beta-turns have traditionally been classified into various types based on the backbone dihedral angles (phi2, psi2, phi3 and psi3). Indeed, 57-68% of beta-turns are currently classified into 8 different backbone families (Type I, Type II, Type I', Type II', Type VIII, Type VIa1, Type VIa2 and Type VIb and Type IV which represents unclassified beta-turns). Although this classification of beta-turns has been useful, the resulting beta-turn types are not ideal for the design of beta-turn mimetics as they do not reflect topological features of the recognition elements, the side chains. To overcome this, we have extracted beta-turns from a data set of non-homologous and high-resolution protein crystal structures. The side chain positions, as defined by C(alpha)-C(beta) vectors, of these turns have been clustered using the kth nearest neighbor clustering and filtered nearest centroid sorting algorithms. Nine clusters were obtained that cluster 90% of the data, and the average intra-cluster RMSD of the four C(alpha)-C(beta) vectors is 0.36. The nine clusters therefore represent the topology of the side chain scaffold architecture of the vast majority of beta-turns. The mean structures of the nine clusters are useful for the development of beta-turn mimetics and as biological descriptors for focusing combinatorial chemistry towards biologically relevant topological space.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Clark, M. A.; Strelchenko, Alexei; Vaquero, Alejandro
Lattice quantum chromodynamics simulations in nuclear physics have benefited from a tremendous number of algorithmic advances such as multigrid and eigenvector deflation. These improve the time to solution but do not alleviate the intrinsic memory-bandwidth constraints of the matrix-vector operation dominating iterative solvers. Batching this operation for multiple vectors and exploiting cache and register blocking can yield a super-linear speed up. Block-Krylov solvers can naturally take advantage of such batched matrix-vector operations, further reducing the iterations to solution by sharing the Krylov space between solves. However, practical implementations typically suffer from the quadratic scaling in the number of vector-vector operations.more » Using the QUDA library, we present an implementation of a block-CG solver on NVIDIA GPUs which reduces the memory-bandwidth complexity of vector-vector operations from quadratic to linear. We present results for the HISQ discretization, showing a 5x speedup compared to highly-optimized independent Krylov solves on NVIDIA's SaturnV cluster.« less
Pérez-Ramos, Adrian; Werning, Maria L.; Prieto, Alicia; Russo, Pasquale; Spano, Giuseppe; Mohedano, Mari L.; López, Paloma
2017-01-01
Pediococcus parvulus 2.6 secretes a 2-substituted (1,3)-β-D-glucan with prebiotic and immunomodulatory properties. It is synthesized by the GTF glycosyltransferase using UDP-glucose as substrate. Analysis of the P. parvulus 2.6 draft genome revealed the existence of a sorbitol utilization cluster of six genes (gutFRMCBA), whose products should be involved in sorbitol utilization and could generate substrates for UDP-glucose synthesis. Southern blot hybridization analysis showed that the cluster is located in a plasmid. Analysis of metabolic fluxes and production of the exopolysaccharide revealed that: (i) P. parvulus 2.6 is able to metabolize sorbitol, (ii) sorbitol utilization is repressed in the presence of glucose and (iii) sorbitol supports the synthesis of 2-substituted (1,3)-β-D-glucan. The sorbitol cluster encodes two putative regulators, GutR and GutM, in addition to a phosphoenolpyruvate-dependent phosphotransferase transport system and sorbitol-6-phosphate dehydrogenase. Therefore, we investigated the involvement of GutR and GutM in the expression of gutFRMCBA. The promoter-probe vector pRCR based on the mrfp gene, which encodes the fluorescence protein mCherry, was used to test the potential promoter of the cluster (Pgut) and the genes encoding the regulators. This was performed by transferring by electrotransformation the recombinant plasmids into two hosts, which metabolize sorbitol: Lactobacillus plantarum and Lactobacillus casei. Upon growth in the presence of sorbitol, but not of glucose, only the presence of Pgut was required to support expression of mrfp in L. plantarum. In L. casei the presence of sorbitol in the growth medium and the pediococcal gutR or gutR plus gutM in the genome was required for Pgut functionality. This demonstrates that: (i) Pgut is required for expression of the gut cluster, (ii) Pgut is subjected to catabolic repression in lactobacilli, (iii) GutR is an activator, and (iv) in the presence of sorbitol, trans-complementation for activation of Pgut exists in L. plantarum but not in L. casei. PMID:29259592
Community detection in sequence similarity networks based on attribute clustering
Chowdhary, Janamejaya; Loeffler, Frank E.; Smith, Jeremy C.
2017-07-24
Networks are powerful tools for the presentation and analysis of interactions in multi-component systems. A commonly studied mesoscopic feature of networks is their community structure, which arises from grouping together similar nodes into one community and dissimilar nodes into separate communities. Here in this paper, the community structure of protein sequence similarity networks is determined with a new method: Attribute Clustering Dependent Communities (ACDC). Sequence similarity has hitherto typically been quantified by the alignment score or its expectation value. However, pair alignments with the same score or expectation value cannot thus be differentiated. To overcome this deficiency, the method constructs,more » for pair alignments, an extended alignment metric, the link attribute vector, which includes the score and other alignment characteristics. Rescaling components of the attribute vectors qualitatively identifies a systematic variation of sequence similarity within protein superfamilies. The problem of community detection is then mapped to clustering the link attribute vectors, selection of an optimal subset of links and community structure refinement based on the partition density of the network. ACDC-predicted communities are found to be in good agreement with gold standard sequence databases for which the "ground truth" community structures (or families) are known. ACDC is therefore a community detection method for sequence similarity networks based entirely on pair similarity information. A serial implementation of ACDC is available from https://cmb.ornl.gov/resources/developments« less
Community detection in sequence similarity networks based on attribute clustering
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chowdhary, Janamejaya; Loeffler, Frank E.; Smith, Jeremy C.
Networks are powerful tools for the presentation and analysis of interactions in multi-component systems. A commonly studied mesoscopic feature of networks is their community structure, which arises from grouping together similar nodes into one community and dissimilar nodes into separate communities. Here in this paper, the community structure of protein sequence similarity networks is determined with a new method: Attribute Clustering Dependent Communities (ACDC). Sequence similarity has hitherto typically been quantified by the alignment score or its expectation value. However, pair alignments with the same score or expectation value cannot thus be differentiated. To overcome this deficiency, the method constructs,more » for pair alignments, an extended alignment metric, the link attribute vector, which includes the score and other alignment characteristics. Rescaling components of the attribute vectors qualitatively identifies a systematic variation of sequence similarity within protein superfamilies. The problem of community detection is then mapped to clustering the link attribute vectors, selection of an optimal subset of links and community structure refinement based on the partition density of the network. ACDC-predicted communities are found to be in good agreement with gold standard sequence databases for which the "ground truth" community structures (or families) are known. ACDC is therefore a community detection method for sequence similarity networks based entirely on pair similarity information. A serial implementation of ACDC is available from https://cmb.ornl.gov/resources/developments« less
Xue, Ling; Scoglio, Caterina; McVey, D Scott; Boone, Rebecca; Cohnstaedt, Lee W
2015-09-01
Lyme disease has become the most prevalent vector-borne disease in the United States and results in morbidity in humans, especially children. We used historical case distributions to explain vector-borne disease introductions and subsequent geographic expansion in the absence of disease vector data. We used geographic information system analysis of publicly available Connecticut Department of Public Health case data from 1984, 1985, and 1991 to 2012 for the 169 towns in Connecticut to identify the yearly clusters of Lyme disease cases. Our analysis identified the spatial and temporal origins of two separate introductions of Lyme disease into Connecticut and identified the subsequent direction and rate of spread. We defined both epidemic clusters of cases using significant long-term spatial autocorrelation. The incidence-weighted geographic mean analysis indicates a northern trend of geographic expansion for both epidemic clusters. In eastern Connecticut, as the epidemic progressed, the yearly shift in the geographic mean (rate of epidemic expansion) decreased each year until spatial equilibrium was reached in 2007. The equilibrium indicates a transition from epidemic Lyme disease spread to stable endemic transmission, and we associate this with a reduction in incidence. In western Connecticut, the parabolic distribution of the yearly geographic mean indicates that following the establishment of Lyme disease (1988) the epidemic quickly expanded northward and established equilibrium in 2009.
Panzera, Francisco; Ferreiro, María J; Pita, Sebastián; Calleros, Lucía; Pérez, Ruben; Basmadjián, Yester; Guevara, Yenny; Brenière, Simone Frédérique; Panzera, Yanina
2014-10-01
Chagas disease, one of the most important vector-borne diseases in the Americas, is caused by Trypanosoma cruzi and transmitted to humans by insects of the subfamily Triatominae. An effective control of this disease depends on elimination of vectors through spraying with insecticides. Genetic research can help insect control programs by identifying and characterizing vector populations. In southern Latin America, Triatoma infestans is the main vector and presents two distinct lineages, known as Andean and non-Andean chromosomal groups, that are highly differentiated by the amount of heterochromatin and genome size. Analyses with nuclear and mitochondrial sequences are not conclusive about resolving the origin and spread of T. infestans. The present paper includes the analyses of karyotypes, heterochromatin distribution and chromosomal mapping of the major ribosomal cluster (45S rDNA) to specimens throughout the distribution range of this species, including pyrethroid-resistant populations. A total of 417 specimens from seven different countries were analyzed. We show an unusual wide rDNA variability related to number and chromosomal position of the ribosomal genes, never before reported in species with holocentric chromosomes. Considering the chromosomal groups previously described, the ribosomal patterns are associated with a particular geographic distribution. Our results reveal that the differentiation process between both T. infestans chromosomal groups has involved significant genomic reorganization of essential coding sequences, besides the changes in heterochromatin and genomic size previously reported. The chromosomal markers also allowed us to detect the existence of a hybrid zone occupied by individuals derived from crosses between both chromosomal groups. Our genetic studies support the hypothesis of an Andean origin for T. infestans, and suggest that pyrethroid-resistant populations from the Argentinean-Bolivian border are most likely the result of recent secondary contact between both lineages. We suggest that vector control programs should make a greater effort in the entomological surveillance of those regions with both chromosomal groups to avoid rapid emergence of resistant individuals. Copyright © 2014 Elsevier B.V. All rights reserved.
Dinkel, Philipp Johannes; Willmes, Klaus; Krinzinger, Helga; Konrad, Kerstin; Koten Jr, Jan Willem
2013-01-01
FMRI-studies are mostly based on a group study approach, either analyzing one group or comparing multiple groups, or on approaches that correlate brain activation with clinically relevant criteria or behavioral measures. In this study we investigate the potential of fMRI-techniques focusing on individual differences in brain activation within a test-retest reliability context. We employ a single-case analysis approach, which contrasts dyscalculic children with a control group of typically developing children. In a second step, a support-vector machine analysis and cluster analysis techniques served to investigate similarities in multivariate brain activation patterns. Children were confronted with a non-symbolic number comparison and a non-symbolic exact calculation task during fMRI acquisition. Conventional second level group comparison analysis only showed small differences around the angular gyrus bilaterally and the left parieto-occipital sulcus. Analyses based on single-case statistical procedures revealed that developmental dyscalculia is characterized by individual differences predominantly in visual processing areas. Dyscalculic children seemed to compensate for relative under-activation in the primary visual cortex through an upregulation in higher visual areas. However, overlap in deviant activation was low for the dyscalculic children, indicating that developmental dyscalculia is a disorder characterized by heterogeneous brain activation differences. Using support vector machine analysis and cluster analysis, we tried to group dyscalculic and typically developing children according to brain activation. Fronto-parietal systems seem to qualify for a distinction between the two groups. However, this was only effective when reliable brain activations of both tasks were employed simultaneously. Results suggest that deficits in number representation in the visual-parietal cortex get compensated for through finger related aspects of number representation in fronto-parietal cortex. We conclude that dyscalculic children show large individual differences in brain activation patterns. Nonetheless, the majority of dyscalculic children can be differentiated from controls employing brain activation patterns when appropriate methods are used. PMID:24349547
Dinkel, Philipp Johannes; Willmes, Klaus; Krinzinger, Helga; Konrad, Kerstin; Koten, Jan Willem
2013-01-01
FMRI-studies are mostly based on a group study approach, either analyzing one group or comparing multiple groups, or on approaches that correlate brain activation with clinically relevant criteria or behavioral measures. In this study we investigate the potential of fMRI-techniques focusing on individual differences in brain activation within a test-retest reliability context. We employ a single-case analysis approach, which contrasts dyscalculic children with a control group of typically developing children. In a second step, a support-vector machine analysis and cluster analysis techniques served to investigate similarities in multivariate brain activation patterns. Children were confronted with a non-symbolic number comparison and a non-symbolic exact calculation task during fMRI acquisition. Conventional second level group comparison analysis only showed small differences around the angular gyrus bilaterally and the left parieto-occipital sulcus. Analyses based on single-case statistical procedures revealed that developmental dyscalculia is characterized by individual differences predominantly in visual processing areas. Dyscalculic children seemed to compensate for relative under-activation in the primary visual cortex through an upregulation in higher visual areas. However, overlap in deviant activation was low for the dyscalculic children, indicating that developmental dyscalculia is a disorder characterized by heterogeneous brain activation differences. Using support vector machine analysis and cluster analysis, we tried to group dyscalculic and typically developing children according to brain activation. Fronto-parietal systems seem to qualify for a distinction between the two groups. However, this was only effective when reliable brain activations of both tasks were employed simultaneously. Results suggest that deficits in number representation in the visual-parietal cortex get compensated for through finger related aspects of number representation in fronto-parietal cortex. We conclude that dyscalculic children show large individual differences in brain activation patterns. Nonetheless, the majority of dyscalculic children can be differentiated from controls employing brain activation patterns when appropriate methods are used.
[A research on real-time ventricular QRS classification methods for single-chip-microcomputers].
Peng, L; Yang, Z; Li, L; Chen, H; Chen, E; Lin, J
1997-05-01
Ventricular QRS classification is key technique of ventricular arrhythmias detection in single-chip-microcomputer based dynamic electrocardiogram real-time analyser. This paper adopts morphological feature vector including QRS amplitude, interval information to reveal QRS morphology. After studying the distribution of QRS morphology feature vector of MIT/BIH DB ventricular arrhythmia files, we use morphological feature vector cluster to classify multi-morphology QRS. Based on the method, morphological feature parameters changing method which is suitable to catch occasional ventricular arrhythmias is presented. Clinical experiments verify missed ventricular arrhythmia is less than 1% by this method.
Hypercluster - Parallel processing for computational mechanics
NASA Technical Reports Server (NTRS)
Blech, Richard A.
1988-01-01
An account is given of the development status, performance capabilities and implications for further development of NASA-Lewis' testbed 'hypercluster' parallel computer network, in which multiple processors communicate through a shared memory. Processors have local as well as shared memory; the hypercluster is expanded in the same manner as the hypercube, with processor clusters replacing the normal single processor node. The NASA-Lewis machine has three nodes with a vector personality and one node with a scalar personality. Each of the vector nodes uses four board-level vector processors, while the scalar node uses four general-purpose microcomputer boards.
Triatomine Infestation in Guatemala: Spatial Assessment after Two Rounds of Vector Control
Manne, Jennifer; Nakagawa, Jun; Yamagata, Yoichi; Goehler, Alexander; Brownstein, John S.; Castro, Marcia C.
2012-01-01
In 2000, the Guatemalan Ministry of Health initiated a Chagas disease program to control Rhodnius prolixus and Triatoma dimidiata by periodic house spraying with pyrethroid insecticides to characterize infestation patterns and analyze the contribution of programmatic practices to these patterns. Spatial infestation patterns at three time points were identified using the Getis-Ord Gi*(d) test. Logistic regression was used to assess predictors of reinfestation after pyrethroid insecticide administration. Spatial analysis showed high and low clusters of infestation at three time points. After two rounds of spray, 178 communities persistently fell in high infestation clusters. A time lapse between rounds of vector control greater than 6 months was associated with 1.54 (95% confidence interval = 1.07–2.23) times increased odds of reinfestation after first spray, whereas a time lapse of greater than 1 year was associated with 2.66 (95% confidence interval = 1.85–3.83) times increased odds of reinfestation after first spray compared with localities where the time lapse was less than 180 days. The time lapse between rounds of vector control should remain under 1 year. Spatial analysis can guide targeted vector control efforts by enabling tracking of reinfestation hotspots and improved targeting of resources. PMID:22403315
Signal detection using support vector machines in the presence of ultrasonic speckle
NASA Astrophysics Data System (ADS)
Kotropoulos, Constantine L.; Pitas, Ioannis
2002-04-01
Support Vector Machines are a general algorithm based on guaranteed risk bounds of statistical learning theory. They have found numerous applications, such as in classification of brain PET images, optical character recognition, object detection, face verification, text categorization and so on. In this paper we propose the use of support vector machines to segment lesions in ultrasound images and we assess thoroughly their lesion detection ability. We demonstrate that trained support vector machines with a Radial Basis Function kernel segment satisfactorily (unseen) ultrasound B-mode images as well as clinical ultrasonic images.
Content based image retrieval using local binary pattern operator and data mining techniques.
Vatamanu, Oana Astrid; Frandeş, Mirela; Lungeanu, Diana; Mihalaş, Gheorghe-Ioan
2015-01-01
Content based image retrieval (CBIR) concerns the retrieval of similar images from image databases, using feature vectors extracted from images. These feature vectors globally define the visual content present in an image, defined by e.g., texture, colour, shape, and spatial relations between vectors. Herein, we propose the definition of feature vectors using the Local Binary Pattern (LBP) operator. A study was performed in order to determine the optimum LBP variant for the general definition of image feature vectors. The chosen LBP variant is then subsequently used to build an ultrasound image database, and a database with images obtained from Wireless Capsule Endoscopy. The image indexing process is optimized using data clustering techniques for images belonging to the same class. Finally, the proposed indexing method is compared to the classical indexing technique, which is nowadays widely used.
NASA Astrophysics Data System (ADS)
Lau, Yun-Fai; Kan, Yuet Wai
1983-09-01
We have developed a series of cosmids that can be used as vectors for genomic recombinant DNA library preparations, as expression vectors in mammalian cells for both transient and stable transformations, and as shuttle vectors between bacteria and mammalian cells. These cosmids were constructed by inserting one of the SV2-derived selectable gene markers-SV2-gpt, SV2-DHFR, and SV2-neo-in cosmid pJB8. High efficiency of genomic cloning was obtained with these cosmids and the size of the inserts was 30-42 kilobases. We isolated recombinant cosmids containing the human α -globin gene cluster from these genomic libraries. The simian virus 40 DNA in these selectable gene markers provides the origin of replication and enhancer sequences necessary for replication in permissive cells such as COS 7 cells and thereby allows transient expression of α -globin genes in these cells. These cosmids and their recombinants could also be stably transformed into mammalian cells by using the respective selection systems. Both of the adult α -globin genes were more actively expressed than the embryonic zeta -globin genes in these transformed cell lines. Because of the presence of the cohesive ends of the Charon 4A phage in the cosmids, the transforming DNA sequences could readily be rescued from these stably transformed cells into bacteria by in vitro packaging of total cellular DNA. Thus, these cosmid vectors are potentially useful for direct isolation of structural genes.
Andersson, Neil; Arostegui, Jorge; Nava-Aguilera, Elizabeth; Harris, Eva; Ledogar, Robert J
2017-05-30
Since the Aedes aegypti mosquitoes that transmit dengue virus can breed in clean water, WHO-endorsed vector control strategies place sachets of organophosphate pesticide, temephos (Abate), in household water storage containers. These and other pesticide-dependent approaches have failed to curb the spread of dengue and multiple dengue virus serotypes continue to spread throughout tropical and subtropical regions worldwide. A feasibility study in Managua, Nicaragua, generated instruments, intervention protocols, training schedules and impact assessment tools for a cluster randomised controlled trial of community-based approaches to vector control comprising an alternative strategy for dengue prevention and control in Nicaragua and Mexico. The Camino Verde (Green Way) is a pragmatic parallel group trial of pesticide-free dengue vector control, adding effectiveness to the standard government dengue control. A random sample from the most recent census in three coastal regions of Guerrero state in Mexico will generate 90 study clusters and the equivalent sampling frame in Managua, Nicaragua will generate 60 clusters, making a total of 150 clusters each of 137-140 households. After a baseline study, computer-driven randomisation will allocate to intervention one half of the sites, stratified by country, evidence of recent dengue virus infection in children aged 3-9 years and, in Nicaragua, level of community organisation. Following a common evidence-based education protocol, each cluster will develop and implement its own collective interventions including house-to-house visits, school-based programmes and inter-community visits. After 18 months, a follow-up study will compare dengue history, serological evidence of recent dengue virus infection (via measurement of anti-dengue virus antibodies in saliva samples) and entomological indices between intervention and control sites. Our hypothesis is that informed community mobilisation adds effectiveness in controlling dengue. ISRCTN27581154 .
Gehringer, Heike; Schacht, Erik; Maylaender, Nicole; Zeman, Ella; Kaysser, Philipp; Oehme, Rainer; Pluta, Silvia; Splettstoesser, Wolf D
2013-02-01
The zoonotic disease tularaemia is caused by the bacterial pathogen Francisella tularensis. Although the causative agent is known for 100 years, knowledge of its enzootic cycles is still rudimentary. Apart from tabanids and mosquitoes, hard ticks have been described as important vectors and potential reservoirs for F. tularensis. Available data on the incidence of human tularaemia indicate an increase in cases in the federal state of Baden-Wuerttemberg. To determine whether ticks are involved in the reported increase in F. tularensis infections in humans and wildlife in this south-western part of Germany, 916 Ixodes ricinus and 211 adult Dermacentor marginatus and D. reticulatus ticks were collected in two different locations. Screening for the presence of F. tularensis was performed by real-time PCR of the 16S rRNA gene. Of the 95 pools of I. ricinus ticks (representing 916 individual ticks), 8 tick pools (8.4%) were positive in this PCR. 30-bp deletion PCR confirmed that the F. tularensis subspecies holarctica was present. FtM24 VNTR analysis revealed that they belong to the emerging Franco-Iberian subclone group of F. tularensis holarctica. Of the 211 ticks of the genus Dermacentor, 35 randomly chosen DNAs were subjected to 16S rRNA gene screening PCR; 20 of these (57%) gave positive signals. For cluster analysis, the lpnA gene region of all Francisella-positive I. ricinus pools and 6 Dermacentor ticks with a positive reaction in the screening PCR was amplified and sequenced. In the resulting neighbour-joining tree, all Francisella-positive I. ricinus samples clustered with sequences of F. tularensis, whilst all Dermacentor tick samples clustered with FLE (Francisella-like endosymbiont) sequences. This study shows that I. ricinus ticks may serve as vectors and/or reservoirs of F. tularensis in Germany and supports the hypothesis that the state of Baden-Wuerttemberg represents an emerging endemic focus of tularaemia. Copyright © 2012 Elsevier GmbH. All rights reserved.
Molecular phylogeny of Anopheles hyrcanus group (Diptera: Culicidae) based on mtDNA COI.
Fang, Yuan; Shi, Wen-Qi; Zhang, Yi
2017-05-08
The Anopheles hyrcanus group, which includes at least 25 species, is widely distributed in the Oriental and Palearctic regions. Some group members have been incriminated as vectors of malaria and other mosquito-borne diseases. It is difficult to identify Hyrcanus Group members by morphological features. Thus, molecular phylogeny has been proposed as an important complementary method to traditional morphological taxonomy. Based on the GenBank database and our original study data, we used 466 mitochondrial DNA COI sequences belonging to 18 species to reconstruct the molecular phylogeny of the Hyrcanus Group across its worldwide geographic range. The results are as follows. 1) The average conspecific K2P divergence was 0.008 (range 0.002-0.017), whereas sequence divergence between congroup species averaged 0.064 (range 0.026-0.108). 2) The topology of COI tree of the Hyrcanus Group was generally consistent with classical morphological taxonomy in terms of species classification, but disagreed in subgroup division. In the COI tree, the group was divided into at least three main clusters. The first cluster contained An. nimpe; the second was composed of the Nigerrimus Subgroup and An. argyropus; and the third cluster was comprised of the Lesteri Subgroup and other unassociated species. 3) Phylogenetic analysis of COI indicated that ancient hybridizations probably occurred among the three closely related species, An. sinensis, An. belenrae, and An. kleini. 4) The results supported An. paraliae as a probable synonym of An. lesteri, and it was possible that An. pseudopictus and An. hyrcanus were the same species, as evident from their extremely low interspecific genetic divergence (0.020 and 0.007, respectively) and their phylogenetic positions. In summary, we reconstructed the molecular phylogeny and analysed genetic divergence of the Hyrcanus Group using mitochondrial COI sequences. Our results suggest that in the future of malaria surveillance, we should not only pay much attention to those known vectors of malaria, but also their closely related species.
A Novel Clustering Method Curbing the Number of States in Reinforcement Learning
NASA Astrophysics Data System (ADS)
Kotani, Naoki; Nunobiki, Masayuki; Taniguchi, Kenji
We propose an efficient state-space construction method for a reinforcement learning. Our method controls the number of categories with improving the clustering method of Fuzzy ART which is an autonomous state-space construction method. The proposed method represents weight vector as the mean value of input vectors in order to curb the number of new categories and eliminates categories whose state values are low to curb the total number of categories. As the state value is updated, the size of category becomes small to learn policy strictly. We verified the effectiveness of the proposed method with simulations of a reaching problem for a two-link robot arm. We confirmed that the number of categories was reduced and the agent achieved the complex task quickly.
NASA Astrophysics Data System (ADS)
Hernawati, Kuswari; Insani, Nur; Bambang S. H., M.; Nur Hadi, W.; Sahid
2017-08-01
This research aims to mapping the 33 (thirty-three) provinces in Indonesia, based on the data on air, water and soil pollution, as well as social demography and geography data, into a clustered model. The method used in this study was unsupervised method that combines the basic concept of Kohonen or Self-Organizing Feature Maps (SOFM). The method is done by providing the design parameters for the model based on data related directly/ indirectly to pollution, which are the demographic and social data, pollution levels of air, water and soil, as well as the geographical situation of each province. The parameters used consists of 19 features/characteristics, including the human development index, the number of vehicles, the availability of the plant's water absorption and flood prevention, as well as geographic and demographic situation. The data used were secondary data from the Central Statistics Agency (BPS), Indonesia. The data are mapped into SOFM from a high-dimensional vector space into two-dimensional vector space according to the closeness of location in term of Euclidean distance. The resulting outputs are represented in clustered grouping. Thirty-three provinces are grouped into five clusters, where each cluster has different features/characteristics and level of pollution. The result can used to help the efforts on prevention and resolution of pollution problems on each cluster in an effective and efficient way.
Identification of Wolbachia Strains in Mosquito Disease Vectors
Osei-Poku, Jewelna; Han, Calvin; Mbogo, Charles M.; Jiggins, Francis M.
2012-01-01
Wolbachia bacteria are common endosymbionts of insects, and some strains are known to protect their hosts against RNA viruses and other parasites. This has led to the suggestion that releasing Wolbachia-infected mosquitoes could prevent the transmission of arboviruses and other human parasites. We have identified Wolbachia in Kenyan populations of the yellow fever vector Aedes bromeliae and its relative Aedes metallicus, and in Mansonia uniformis and Mansonia africana, which are vectors of lymphatic filariasis. These Wolbachia strains cluster together on the bacterial phylogeny, and belong to bacterial clades that have recombined with other unrelated strains. These new Wolbachia strains may be affecting disease transmission rates of infected mosquito species, and could be transferred into other mosquito vectors as part of control programs. PMID:23185484
Yoon, In-Kyu; Getis, Arthur; Aldstadt, Jared; Rothman, Alan L.; Tannitisupawong, Darunee; Koenraadt, Constantianus J. M.; Fansiri, Thanyalak; Jones, James W.; Morrison, Amy C.; Jarman, Richard G.; Nisalak, Ananda; Mammen, Mammen P.; Thammapalo, Suwich; Srikiatkhachorn, Anon; Green, Sharone; Libraty, Daniel H.; Gibbons, Robert V.; Endy, Timothy; Pimgate, Chusak; Scott, Thomas W.
2012-01-01
Background Based on spatiotemporal clustering of human dengue virus (DENV) infections, transmission is thought to occur at fine spatiotemporal scales by horizontal transfer of virus between humans and mosquito vectors. To define the dimensions of local transmission and quantify the factors that support it, we examined relationships between infected humans and Aedes aegypti in Thai villages. Methodology/Principal Findings Geographic cluster investigations of 100-meter radius were conducted around DENV-positive and DENV-negative febrile “index” cases (positive and negative clusters, respectively) from a longitudinal cohort study in rural Thailand. Child contacts and Ae. aegypti from cluster houses were assessed for DENV infection. Spatiotemporal, demographic, and entomological parameters were evaluated. In positive clusters, the DENV infection rate among child contacts was 35.3% in index houses, 29.9% in houses within 20 meters, and decreased with distance from the index house to 6.2% in houses 80–100 meters away (p<0.001). Significantly more Ae. aegypti were DENV-infectious (i.e., DENV-positive in head/thorax) in positive clusters (23/1755; 1.3%) than negative clusters (1/1548; 0.1%). In positive clusters, 8.2% of mosquitoes were DENV-infectious in index houses, 4.2% in other houses with DENV-infected children, and 0.4% in houses without infected children (p<0.001). The DENV infection rate in contacts was 47.4% in houses with infectious mosquitoes, 28.7% in other houses in the same cluster, and 10.8% in positive clusters without infectious mosquitoes (p<0.001). Ae. aegypti pupae and adult females were more numerous only in houses containing infectious mosquitoes. Conclusions/Significance Human and mosquito infections are positively associated at the level of individual houses and neighboring residences. Certain houses with high transmission risk contribute disproportionately to DENV spread to neighboring houses. Small groups of houses with elevated transmission risk are consistent with over-dispersion of transmission (i.e., at a given point in time, people/mosquitoes from a small portion of houses are responsible for the majority of transmission). PMID:22816001
NASA Astrophysics Data System (ADS)
Li, Hui; Yu, Jun-Ling; Yu, Le-An; Sun, Jie
2014-05-01
Case-based reasoning (CBR) is one of the main forecasting methods in business forecasting, which performs well in prediction and holds the ability of giving explanations for the results. In business failure prediction (BFP), the number of failed enterprises is relatively small, compared with the number of non-failed ones. However, the loss is huge when an enterprise fails. Therefore, it is necessary to develop methods (trained on imbalanced samples) which forecast well for this small proportion of failed enterprises and performs accurately on total accuracy meanwhile. Commonly used methods constructed on the assumption of balanced samples do not perform well in predicting minority samples on imbalanced samples consisting of the minority/failed enterprises and the majority/non-failed ones. This article develops a new method called clustering-based CBR (CBCBR), which integrates clustering analysis, an unsupervised process, with CBR, a supervised process, to enhance the efficiency of retrieving information from both minority and majority in CBR. In CBCBR, various case classes are firstly generated through hierarchical clustering inside stored experienced cases, and class centres are calculated out by integrating cases information in the same clustered class. When predicting the label of a target case, its nearest clustered case class is firstly retrieved by ranking similarities between the target case and each clustered case class centre. Then, nearest neighbours of the target case in the determined clustered case class are retrieved. Finally, labels of the nearest experienced cases are used in prediction. In the empirical experiment with two imbalanced samples from China, the performance of CBCBR was compared with the classical CBR, a support vector machine, a logistic regression and a multi-variant discriminate analysis. The results show that compared with the other four methods, CBCBR performed significantly better in terms of sensitivity for identifying the minority samples and generated high total accuracy meanwhile. The proposed approach makes CBR useful in imbalanced forecasting.
Oberle, Michael; Wohlwend, Nadia; Jonas, Daniel; Maurer, Florian P.; Jost, Geraldine; Tschudin-Sutter, Sarah; Vranckx, Katleen; Egli, Adrian
2016-01-01
Background The technical, biological, and inter-center reproducibility of matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI TOF MS) typing data has not yet been explored. The aim of this study is to compare typing data from multiple centers employing bioinformatics using bacterial strains from two past outbreaks and non-related strains. Material/Methods Participants received twelve extended spectrum betalactamase-producing E. coli isolates and followed the same standard operating procedure (SOP) including a full-protein extraction protocol. All laboratories provided visually read spectra via flexAnalysis (Bruker, Germany). Raw data from each laboratory allowed calculating the technical and biological reproducibility between centers using BioNumerics (Applied Maths NV, Belgium). Results Technical and biological reproducibility ranged between 96.8–99.4% and 47.6–94.4%, respectively. The inter-center reproducibility showed a comparable clustering among identical isolates. Principal component analysis indicated a higher tendency to cluster within the same center. Therefore, we used a discriminant analysis, which completely separated the clusters. Next, we defined a reference center and performed a statistical analysis to identify specific peaks to identify the outbreak clusters. Finally, we used a classifier algorithm and a linear support vector machine on the determined peaks as classifier. A validation showed that within the set of the reference center, the identification of the cluster was 100% correct with a large contrast between the score with the correct cluster and the next best scoring cluster. Conclusions Based on the sufficient technical and biological reproducibility of MALDI-TOF MS based spectra, detection of specific clusters is possible from spectra obtained from different centers. However, we believe that a shared SOP and a bioinformatics approach are required to make the analysis robust and reliable. PMID:27798637
Role of radial nonuniformities in the interaction of an intense laser with atomic clusters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Holkundkar, Amol R.; Gupta, N. K.
A model for the interaction of an intense laser with atomic clusters is presented. The model takes into account the spatial nonuniformities of the cluster as it evolves in time. The cluster is treated as a stratified sphere having an arbitrary number of layers. Electric and magnetic fields are obtained by solving the vector Helmholtz equation coupled with one-dimensional Lagrangian hydrodynamics. Results are compared with the uniform density nanoplasma model. Enhancement in the amount of energy absorbed is seen over the uniform density model. In some cases the absorbed energy increases by as much as a factor of 40.
Wai, Khin Thet; Htun, Pe Than; Oo, Tin; Myint, Hla; Lin, Zaw; Kroeger, Axel; Sommerfeld, Johannes; Petzold, Max
2012-01-01
Objectives To build up and analyse the feasibility, process, and effectiveness of a partnership-driven ecosystem management intervention in reducing dengue vector breeding and constructing sustainable partnerships among multiple stakeholders. Methods A community-based intervention study was conducted from May 2009 to January 2010 in Yangon city. Six high-risk and six low-risk clusters were randomized and allocated as intervention and routine service areas, respectively. For each cluster, 100 households were covered. Bi-monthly entomological evaluations (i.e. larval and pupal surveys) and household acceptability surveys at the end of 6-month intervention period were conducted, supplemented by qualitative evaluations. Intervention description The strategies included eco-friendly multi-stakeholder partner groups (Thingaha) and ward-based volunteers, informed decision-making of householders, followed by integrated vector management approach. Findings Pupae per person index (PPI) decreased at the last evaluation by 5.7% (0.35–0.33) in high-risk clusters. But in low-risk clusters, PPI remarkably decreased by 63.6% (0.33–0.12). In routine service area, PPI also decreased due to availability of Temephos after Cyclone Nargis. As for total number of pupae in all containers, when compared to evaluation 1, there was a reduction of 18.6% in evaluation 2 and 44.1% in evaluation 3 in intervention area. However, in routine service area, more reduction was observed. All intervention tools were found as acceptable, being feasible to implement by multi-stakeholder partner groups. Conclusions The efficacy of community-controlled partnership-driven interventions was found to be superior to the vertical approach in terms of sustainability and community empowerment. PMID:23318238
Wai, Khin Thet; Htun, Pe Than; Oo, Tin; Myint, Hla; Lin, Zaw; Kroeger, Axel; Sommerfeld, Johannes; Petzold, Max
2012-12-01
To build up and analyse the feasibility, process, and effectiveness of a partnership-driven ecosystem management intervention in reducing dengue vector breeding and constructing sustainable partnerships among multiple stakeholders. A community-based intervention study was conducted from May 2009 to January 2010 in Yangon city. Six high-risk and six low-risk clusters were randomized and allocated as intervention and routine service areas, respectively. For each cluster, 100 households were covered. Bi-monthly entomological evaluations (i.e. larval and pupal surveys) and household acceptability surveys at the end of 6-month intervention period were conducted, supplemented by qualitative evaluations. Intervention description: The strategies included eco-friendly multi-stakeholder partner groups (Thingaha) and ward-based volunteers, informed decision-making of householders, followed by integrated vector management approach. Pupae per person index (PPI) decreased at the last evaluation by 5·7% (0·35-0·33) in high-risk clusters. But in low-risk clusters, PPI remarkably decreased by 63·6% (0·33-0·12). In routine service area, PPI also decreased due to availability of Temephos after Cyclone Nargis. As for total number of pupae in all containers, when compared to evaluation 1, there was a reduction of 18·6% in evaluation 2 and 44·1% in evaluation 3 in intervention area. However, in routine service area, more reduction was observed. All intervention tools were found as acceptable, being feasible to implement by multi-stakeholder partner groups. The efficacy of community-controlled partnership-driven interventions was found to be superior to the vertical approach in terms of sustainability and community empowerment.
Scalable NIC-based reduction on large-scale clusters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moody, A.; Fernández, J. C.; Petrini, F.
2003-01-01
Many parallel algorithms require effiaent support for reduction mllectives. Over the years, researchers have developed optimal reduction algonduns by taking inm account system size, dam size, and complexities of reduction operations. However, all of these algorithm have assumed the faa that the reduction precessing takes place on the host CPU. Modem Network Interface Cards (NICs) sport programmable processors with substantial memory and thus introduce a fresh variable into the equation This raises the following intersting challenge: Can we take advantage of modern NICs to implementJost redudion operations? In this paper, we take on this challenge in the context of large-scalemore » clusters. Through experiments on the 960-node, 1920-processor or ASCI Linux Cluster (ALC) located at the Lawrence Livermore National Laboratory, we show that NIC-based reductions indeed perform with reduced latency and immed consistency over host-based aleorithms for the wmmon case and that these benefits scale as the system grows. In the largest configuration tested--1812 processors-- our NIC-based algorithm can sum a single element vector in 73 ps with 32-bi integers and in 118 with Mbit floating-point numnbers. These results represent an improvement, respeaively, of 121% and 39% with resvect w the {approx}roductionle vel MPI library« less
Estimation of Rank Correlation for Clustered Data
Rosner, Bernard; Glynn, Robert
2017-01-01
It is well known that the sample correlation coefficient (Rxy) is the maximum likelihood estimator (MLE) of the Pearson correlation (ρxy) for i.i.d. bivariate normal data. However, this is not true for ophthalmologic data where X (e.g., visual acuity) and Y (e.g., visual field) are available for each eye and there is positive intraclass correlation for both X and Y in fellow eyes. In this paper, we provide a regression-based approach for obtaining the MLE of ρxy for clustered data, which can be implemented using standard mixed effects model software. This method is also extended to allow for estimation of partial correlation by controlling both X and Y for a vector U of other covariates. In addition, these methods can be extended to allow for estimation of rank correlation for clustered data by (a) converting ranks of both X and Y to the probit scale, (b) estimating the Pearson correlation between probit scores for X and Y, and (c) using the relationship between Pearson and rank correlation for bivariate normally distributed data. The validity of the methods in finite-sized samples is supported by simulation studies. Finally, two examples from ophthalmology and analgesic abuse are used to illustrate the methods. PMID:28399615
Wang, Huiya; Feng, Jun; Wang, Hongyu
2017-07-20
Detection of clustered microcalcification (MC) from mammograms plays essential roles in computer-aided diagnosis for early stage breast cancer. To tackle problems associated with the diversity of data structures of MC lesions and the variability of normal breast tissues, multi-pattern sample space learning is required. In this paper, a novel grouped fuzzy Support Vector Machine (SVM) algorithm with sample space partition based on Expectation-Maximization (EM) (called G-FSVM) is proposed for clustered MC detection. The diversified pattern of training data is partitioned into several groups based on EM algorithm. Then a series of fuzzy SVM are integrated for classification with each group of samples from the MC lesions and normal breast tissues. From DDSM database, a total of 1,064 suspicious regions are selected from 239 mammography, and the measurement of Accuracy, True Positive Rate (TPR), False Positive Rate (FPR) and EVL = TPR* 1-FPR are 0.82, 0.78, 0.14 and 0.72, respectively. The proposed method incorporates the merits of fuzzy SVM and multi-pattern sample space learning, decomposing the MC detection problem into serial simple two-class classification. Experimental results from synthetic data and DDSM database demonstrate that our integrated classification framework reduces the false positive rate significantly while maintaining the true positive rate.
Maximum Margin Clustering of Hyperspectral Data
NASA Astrophysics Data System (ADS)
Niazmardi, S.; Safari, A.; Homayouni, S.
2013-09-01
In recent decades, large margin methods such as Support Vector Machines (SVMs) are supposed to be the state-of-the-art of supervised learning methods for classification of hyperspectral data. However, the results of these algorithms mainly depend on the quality and quantity of available training data. To tackle down the problems associated with the training data, the researcher put effort into extending the capability of large margin algorithms for unsupervised learning. One of the recent proposed algorithms is Maximum Margin Clustering (MMC). The MMC is an unsupervised SVMs algorithm that simultaneously estimates both the labels and the hyperplane parameters. Nevertheless, the optimization of the MMC algorithm is a non-convex problem. Most of the existing MMC methods rely on the reformulating and the relaxing of the non-convex optimization problem as semi-definite programs (SDP), which are computationally very expensive and only can handle small data sets. Moreover, most of these algorithms are two-class classification, which cannot be used for classification of remotely sensed data. In this paper, a new MMC algorithm is used that solve the original non-convex problem using Alternative Optimization method. This algorithm is also extended for multi-class classification and its performance is evaluated. The results of the proposed algorithm show that the algorithm has acceptable results for hyperspectral data clustering.
Wang, Zhi-Long; Zhou, Zhi-Guo; Chen, Ying; Li, Xiao-Ting; Sun, Ying-Shi
The aim of this study was to diagnose lymph node metastasis of esophageal cancer by support vector machines model based on computed tomography. A total of 131 esophageal cancer patients with preoperative chemotherapy and radical surgery were included. Various indicators (tumor thickness, tumor length, tumor CT value, total number of lymph nodes, and long axis and short axis sizes of largest lymph node) on CT images before and after neoadjuvant chemotherapy were recorded. A support vector machines model based on these CT indicators was built to predict lymph node metastasis. Support vector machines model diagnosed lymph node metastasis better than preoperative short axis size of largest lymph node on CT. The area under the receiver operating characteristic curves were 0.887 and 0.705, respectively. The support vector machine model of CT images can help diagnose lymph node metastasis in esophageal cancer with preoperative chemotherapy.
Two-component vector solitons in defocusing Kerr-type media with spatially modulated nonlinearity
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhong, Wei-Ping, E-mail: zhongwp6@126.com; Texas A and M University at Qatar, P.O. Box 23874 Doha; Belić, Milivoj
2014-12-15
We present a class of exact solutions to the coupled (2+1)-dimensional nonlinear Schrödinger equation with spatially modulated nonlinearity and a special external potential, which describe the evolution of two-component vector solitons in defocusing Kerr-type media. We find a robust soliton solution, constructed with the help of Whittaker functions. For specific choices of the topological charge, the radial mode number and the modulation depth, the solitons may exist in various forms, such as the half-moon, necklace-ring, and sawtooth vortex-ring patterns. Our results show that the profile of such solitons can be effectively controlled by the topological charge, the radial mode number,more » and the modulation depth. - Highlights: • Two-component vector soliton clusters in defocusing Kerr-type media are reported. • These soliton clusters are constructed with the help of Whittaker functions. • The half-moon, necklace-ring and vortex-ring patterns are found. • The profile of these solitons can be effectively controlled by three soliton parameters.« less
Asymptotic stability of spectral-based PDF modeling for homogeneous turbulent flows
NASA Astrophysics Data System (ADS)
Campos, Alejandro; Duraisamy, Karthik; Iaccarino, Gianluca
2015-11-01
Engineering models of turbulence, based on one-point statistics, neglect spectral information inherent in a turbulence field. It is well known, however, that the evolution of turbulence is dictated by a complex interplay between the spectral modes of velocity. For example, for homogeneous turbulence, the pressure-rate-of-strain depends on the integrated energy spectrum weighted by components of the wave vectors. The Interacting Particle Representation Model (IPRM) (Kassinos & Reynolds, 1996) and the Velocity/Wave-Vector PDF model (Van Slooten & Pope, 1997) emulate spectral information in an attempt to improve the modeling of turbulence. We investigate the evolution and asymptotic stability of the IPRM using three different approaches. The first approach considers the Lagrangian evolution of individual realizations (idealized as particles) of the stochastic process defined by the IPRM. The second solves Lagrangian evolution equations for clusters of realizations conditional on a given wave vector. The third evolves the solution of the Eulerian conditional PDF corresponding to the aforementioned clusters. This last method avoids issues related to discrete particle noise and slow convergence associated with Lagrangian particle-based simulations.
Three learning phases for radial-basis-function networks.
Schwenker, F; Kestler, H A; Palm, G
2001-05-01
In this paper, learning algorithms for radial basis function (RBF) networks are discussed. Whereas multilayer perceptrons (MLP) are typically trained with backpropagation algorithms, starting the training procedure with a random initialization of the MLP's parameters, an RBF network may be trained in many different ways. We categorize these RBF training methods into one-, two-, and three-phase learning schemes. Two-phase RBF learning is a very common learning scheme. The two layers of an RBF network are learnt separately; first the RBF layer is trained, including the adaptation of centers and scaling parameters, and then the weights of the output layer are adapted. RBF centers may be trained by clustering, vector quantization and classification tree algorithms, and the output layer by supervised learning (through gradient descent or pseudo inverse solution). Results from numerical experiments of RBF classifiers trained by two-phase learning are presented in three completely different pattern recognition applications: (a) the classification of 3D visual objects; (b) the recognition hand-written digits (2D objects); and (c) the categorization of high-resolution electrocardiograms given as a time series (ID objects) and as a set of features extracted from these time series. In these applications, it can be observed that the performance of RBF classifiers trained with two-phase learning can be improved through a third backpropagation-like training phase of the RBF network, adapting the whole set of parameters (RBF centers, scaling parameters, and output layer weights) simultaneously. This, we call three-phase learning in RBF networks. A practical advantage of two- and three-phase learning in RBF networks is the possibility to use unlabeled training data for the first training phase. Support vector (SV) learning in RBF networks is a different learning approach. SV learning can be considered, in this context of learning, as a special type of one-phase learning, where only the output layer weights of the RBF network are calculated, and the RBF centers are restricted to be a subset of the training data. Numerical experiments with several classifier schemes including k-nearest-neighbor, learning vector quantization and RBF classifiers trained through two-phase, three-phase and support vector learning are given. The performance of the RBF classifiers trained through SV learning and three-phase learning are superior to the results of two-phase learning, but SV learning often leads to complex network structures, since the number of support vectors is not a small fraction of the total number of data points.
Hightower, Jake; Kracalik, Ian T; Vydayko, Nataliya; Goodin, Douglas; Glass, Gregory; Blackburn, Jason K
2014-10-16
Francisella tularensis, the causative agent of tularemia, is a zoonotic agent that remains across much of the northern hemisphere, where it exists in enzootic cycles. In Ukraine, tularemia has a long history that suggests a need for sustained surveillance in natural foci. To better characterize the host-vector diversity and spatial distribution of tularemia, we analyzed historical data from field collections carried out from 1941 to 2008. We analyzed the spatial-temporal distribution of bacterial isolates collected from field samples. Isolates were characterized by source and dominant land cover type. To identify environmental persistence and spatial variation in the source of isolation, we used the space-time permutation and multinomial models in SaTScan. A total of 3,086 positive isolates were taken from 1,084 geographic locations. Isolation of F. tularensis was more frequent among arthropods [n = 2,045 (66.3%)] followed by mammals [n = 619 (20.1%)], water [n = 393 (12.7%)], and farm produce [n = 29 (0.94%)], respectively. Four areas of persistent bacterial isolation were identified. Water and farm produce as sources of bacterial isolation were clustered. Our findings confirm the presence of long-standing natural foci of F. tularensis in Ukraine. Given the history of tularemia as well as its environmental persistence there exists a possibility of (re)emergence in human populations. Heterogeneity in the distribution of tularemia isolate recovery related to land cover type supports the theory of natural nidality and clusters identify areas to target potential sources of the pathogen and improve surveillance.
Giraldo-Calderón, Gloria I.; Emrich, Scott J.; MacCallum, Robert M.; Maslen, Gareth; Dialynas, Emmanuel; Topalis, Pantelis; Ho, Nicholas; Gesing, Sandra; Madey, Gregory; Collins, Frank H.; Lawson, Daniel
2015-01-01
VectorBase is a National Institute of Allergy and Infectious Diseases supported Bioinformatics Resource Center (BRC) for invertebrate vectors of human pathogens. Now in its 11th year, VectorBase currently hosts the genomes of 35 organisms including a number of non-vectors for comparative analysis. Hosted data range from genome assemblies with annotated gene features, transcript and protein expression data to population genetics including variation and insecticide-resistance phenotypes. Here we describe improvements to our resource and the set of tools available for interrogating and accessing BRC data including the integration of Web Apollo to facilitate community annotation and providing Galaxy to support user-based workflows. VectorBase also actively supports our community through hands-on workshops and online tutorials. All information and data are freely available from our website at https://www.vectorbase.org/. PMID:25510499
2015-05-18
THOMAS AND OTHERS ENHANCED SURVEILLANCE FOR DENGUE Improving Dengue Virus Capture Rates in Humans and Vectors in Kamphaeng Phet Province...of Medical Sciences, Bangkok, Thailand. Abstract. Dengue is of public health importance in tropical and sub-tropical regions. Dengue virus (DENV...with confirmed dengue (initiates) and associated cluster individuals (associates) with entomologic sampling. A total of 438 associates were enrolled
Colorimetric Recognition of Aldehydes and Ketones.
Li, Zheng; Fang, Ming; LaGasse, Maria K; Askim, Jon R; Suslick, Kenneth S
2017-08-07
A colorimetric sensor array has been designed for the identification of and discrimination among aldehydes and ketones in vapor phase. Due to rapid chemical reactions between the solid-state sensor elements and gaseous analytes, distinct color difference patterns were produced and digitally imaged for chemometric analysis. The sensor array was developed from classical spot tests using aniline and phenylhydrazine dyes that enable molecular recognition of a wide variety of aliphatic or aromatic aldehydes and ketones, as demonstrated by hierarchical cluster, principal component, and support vector machine analyses. The aldehyde/ketone-specific sensors were further employed for differentiation among and identification of ten liquor samples (whiskies, brandy, vodka) and ethanol controls, showing its potential applications in the beverage industry. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Balancing aggregation and smoothing errors in inverse models
Turner, A. J.; Jacob, D. J.
2015-06-30
Inverse models use observations of a system (observation vector) to quantify the variables driving that system (state vector) by statistical optimization. When the observation vector is large, such as with satellite data, selecting a suitable dimension for the state vector is a challenge. A state vector that is too large cannot be effectively constrained by the observations, leading to smoothing error. However, reducing the dimension of the state vector leads to aggregation error as prior relationships between state vector elements are imposed rather than optimized. Here we present a method for quantifying aggregation and smoothing errors as a function ofmore » state vector dimension, so that a suitable dimension can be selected by minimizing the combined error. Reducing the state vector within the aggregation error constraints can have the added advantage of enabling analytical solution to the inverse problem with full error characterization. We compare three methods for reducing the dimension of the state vector from its native resolution: (1) merging adjacent elements (grid coarsening), (2) clustering with principal component analysis (PCA), and (3) applying a Gaussian mixture model (GMM) with Gaussian pdfs as state vector elements on which the native-resolution state vector elements are projected using radial basis functions (RBFs). The GMM method leads to somewhat lower aggregation error than the other methods, but more importantly it retains resolution of major local features in the state vector while smoothing weak and broad features.« less
Balancing aggregation and smoothing errors in inverse models
NASA Astrophysics Data System (ADS)
Turner, A. J.; Jacob, D. J.
2015-01-01
Inverse models use observations of a system (observation vector) to quantify the variables driving that system (state vector) by statistical optimization. When the observation vector is large, such as with satellite data, selecting a suitable dimension for the state vector is a challenge. A state vector that is too large cannot be effectively constrained by the observations, leading to smoothing error. However, reducing the dimension of the state vector leads to aggregation error as prior relationships between state vector elements are imposed rather than optimized. Here we present a method for quantifying aggregation and smoothing errors as a function of state vector dimension, so that a suitable dimension can be selected by minimizing the combined error. Reducing the state vector within the aggregation error constraints can have the added advantage of enabling analytical solution to the inverse problem with full error characterization. We compare three methods for reducing the dimension of the state vector from its native resolution: (1) merging adjacent elements (grid coarsening), (2) clustering with principal component analysis (PCA), and (3) applying a Gaussian mixture model (GMM) with Gaussian pdfs as state vector elements on which the native-resolution state vector elements are projected using radial basis functions (RBFs). The GMM method leads to somewhat lower aggregation error than the other methods, but more importantly it retains resolution of major local features in the state vector while smoothing weak and broad features.
Balancing aggregation and smoothing errors in inverse models
NASA Astrophysics Data System (ADS)
Turner, A. J.; Jacob, D. J.
2015-06-01
Inverse models use observations of a system (observation vector) to quantify the variables driving that system (state vector) by statistical optimization. When the observation vector is large, such as with satellite data, selecting a suitable dimension for the state vector is a challenge. A state vector that is too large cannot be effectively constrained by the observations, leading to smoothing error. However, reducing the dimension of the state vector leads to aggregation error as prior relationships between state vector elements are imposed rather than optimized. Here we present a method for quantifying aggregation and smoothing errors as a function of state vector dimension, so that a suitable dimension can be selected by minimizing the combined error. Reducing the state vector within the aggregation error constraints can have the added advantage of enabling analytical solution to the inverse problem with full error characterization. We compare three methods for reducing the dimension of the state vector from its native resolution: (1) merging adjacent elements (grid coarsening), (2) clustering with principal component analysis (PCA), and (3) applying a Gaussian mixture model (GMM) with Gaussian pdfs as state vector elements on which the native-resolution state vector elements are projected using radial basis functions (RBFs). The GMM method leads to somewhat lower aggregation error than the other methods, but more importantly it retains resolution of major local features in the state vector while smoothing weak and broad features.
Community involvement in dengue vector control: cluster randomised trial
Toledo, M E; Rodríguez, M; Gomez, D; Baly, A; Benitez, J R; Van der Stuyft, P
2009-01-01
Objective To assess the effectiveness of an integrated community based environmental management strategy to control Aedes aegypti, the vector of dengue, compared with a routine strategy. Design Cluster randomised trial. Setting Guantanamo, Cuba. Participants 32 circumscriptions (around 2000 inhabitants each). Interventions The circumscriptions were randomly allocated to control clusters (n=16) comprising routine Aedes control programme (entomological surveillance, source reduction, selective adulticiding, and health education) and to intervention clusters (n=16) comprising the routine Aedes control programme combined with a community based environmental management approach. Main outcome measures The primary outcome was levels of Aedes infestation: house index (number of houses positive for at least one container with immature stages of Ae aegypti per 100 inspected houses), Breteau index (number of containers positive for immature stages of Ae aegypti per 100 inspected houses), and the pupae per inhabitant statistic (number of Ae aegypti pupae per inhabitant). Results All clusters were subjected to the intended intervention; all completed the study protocol up to February 2006 and all were included in the analysis. At baseline the Aedes infestation levels were comparable between intervention and control clusters: house index 0.25% v 0.20%, pupae per inhabitant 0.44×10−3 v 0.29×10−3. At the end of the intervention these indices were significantly lower in the intervention clusters: rate ratio for house indices 0.49 (95% confidence interval 0.27 to 0.88) and rate ratio for pupae per inhabitant 0.27 (0.09 to 0.76). Conclusion A community based environmental management embedded in a routine control programme was effective at reducing levels of Aedes infestation. Trial registration Current Controlled Trials ISRCTN88405796. PMID:19509031
PROPER MOTIONS AND ORIGINS OF SGR 1806-20 AND SGR 1900+14
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tendulkar, Shriharsh P.; Kulkarni, Shrinivas R.; Cameron, P. Brian, E-mail: spt@astro.caltech.edu
2012-12-10
We present results from high-resolution infrared observations of magnetars SGR 1806-20 and SGR 1900+14 over 5 years using laser-supported adaptive optics at the 10 m Keck Observatory. Our measurements of the proper motions of these magnetars provide robust links between magnetars and their progenitors and provide age estimates for magnetars. At the measured distances of their putative associations, we measure the linear transverse velocity of SGR 1806-20 to be 350 {+-} 100 km s{sup -1} and of SGR 1900+14 to be 130 {+-} 30 km s{sup -1}. The transverse velocity vectors for both magnetars point away from the clusters ofmore » massive stars, solidifying their proposed associations. Assuming that the magnetars were born in the clusters, we can estimate the braking index to be {approx}1.8 for SGR 1806-20 and {approx}1.2 for SGR 1900+14. This is significantly lower than the canonical value of n = 3 predicted by the magnetic dipole spin-down suggesting an alternative source of dissipation such as twisted magnetospheres or particle winds.« less
NASA Astrophysics Data System (ADS)
Bustamam, A.; Ulul, E. D.; Hura, H. F. A.; Siswantining, T.
2017-07-01
Hierarchical clustering is one of effective methods in creating a phylogenetic tree based on the distance matrix between DNA (deoxyribonucleic acid) sequences. One of the well-known methods to calculate the distance matrix is k-mer method. Generally, k-mer is more efficient than some distance matrix calculation techniques. The steps of k-mer method are started from creating k-mer sparse matrix, and followed by creating k-mer singular value vectors. The last step is computing the distance amongst vectors. In this paper, we analyze the sequences of MERS-CoV (Middle East Respiratory Syndrome - Coronavirus) DNA by implementing hierarchical clustering using k-mer sparse matrix in order to perform the phylogenetic analysis. Our results show that the ancestor of our MERS-CoV is coming from Egypt. Moreover, we found that the MERS-CoV infection that occurs in one country may not necessarily come from the same country of origin. This suggests that the process of MERS-CoV mutation might not only be influenced by geographical factor.
Cinco, Roehl M.; Robblee, John H.; Messinger, Johannes; Fernandez, Carmen; Holman, Karen L. McFarlane; Sauer, Kenneth; Yachandra, Vittal K.
2014-01-01
The oxygen-evolving complex of photosystem II (PS II) in green plants and algae contains a cluster of four Mn atoms in the active site, which catalyzes the photoinduced oxidation of water to dioxygen. Along with Mn, calcium and chloride ions are necessary cofactors for proper functioning of the complex. The current study using polarized Sr EXAFS on oriented Sr-reactivated samples shows that Fourier peak II, which fits best to Mn at 3.5 Å rather than lighter atoms (C, N, O, or Cl), is dichroic, with a larger magnitude at 10° (angle between the PS II membrane normal and the X-ray electric field vector) and a smaller magnitude at 80°. Analysis of the dichroism of the Sr EXAFS yields a lower and upper limit of 0° and 23° for the average angle between the Sr–Mn vectors and the membrane normal and an isotropic coordination number (number of Mn neighbors to Sr) of 1 or 2 for these layered PS II samples. The results confirm the contention that Ca (Sr) is proximal to the Mn cluster and lead to refined working models of the heteronuclear Mn4Ca cluster of the oxygen-evolving complex in PS II. PMID:15491134
Transformation to equivalent dimensions—a new methodology to study earthquake clustering
NASA Astrophysics Data System (ADS)
Lasocki, Stanislaw
2014-05-01
A seismic event is represented by a point in a parameter space, quantified by the vector of parameter values. Studies of earthquake clustering involve considering distances between such points in multidimensional spaces. However, the metrics of earthquake parameters are different, hence the metric in a multidimensional parameter space cannot be readily defined. The present paper proposes a solution of this metric problem based on a concept of probabilistic equivalence of earthquake parameters. Under this concept the lengths of parameter intervals are equivalent if the probability for earthquakes to take values from either interval is the same. Earthquake clustering is studied in an equivalent rather than the original dimensions space, where the equivalent dimension (ED) of a parameter is its cumulative distribution function. All transformed parameters are of linear scale in [0, 1] interval and the distance between earthquakes represented by vectors in any ED space is Euclidean. The unknown, in general, cumulative distributions of earthquake parameters are estimated from earthquake catalogues by means of the model-free non-parametric kernel estimation method. Potential of the transformation to EDs is illustrated by two examples of use: to find hierarchically closest neighbours in time-space and to assess temporal variations of earthquake clustering in a specific 4-D phase space.
NASA Astrophysics Data System (ADS)
Zhao, Tongtiegang; Liu, Pan; Zhang, Yongyong; Ruan, Chengqing
2017-09-01
Global climate model (GCM) forecasts are an integral part of long-range hydroclimatic forecasting. We propose to use clustering to explore anomaly correlation, which indicates the performance of raw GCM forecasts, in the three-dimensional space of latitude, longitude, and initialization time. Focusing on a certain period of the year, correlations for forecasts initialized at different preceding periods form a vector. The vectors of anomaly correlation across different GCM grid cells are clustered to reveal how GCM forecasts perform as time progresses. Through the case study of Climate Forecast System Version 2 (CFSv2) forecasts of summer precipitation in China, we observe that the correlation at a certain cell oscillates with lead time and can become negative. The use of clustering reveals two meaningful patterns that characterize the relationship between anomaly correlation and lead time. For some grid cells in Central and Southwest China, CFSv2 forecasts exhibit positive correlations with observations and they tend to improve as time progresses. This result suggests that CFSv2 forecasts tend to capture the summer precipitation induced by the East Asian monsoon and the South Asian monsoon. It also indicates that CFSv2 forecasts can potentially be applied to improving hydrological forecasts in these regions. For some other cells, the correlations are generally close to zero at different lead times. This outcome implies that CFSv2 forecasts still have plenty of room for further improvement. The robustness of the patterns has been tested using both hierarchical clustering and k-means clustering and examined with the Silhouette score.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wilson, Andrew; Haass, Michael; Rintoul, Mark Daniel
GazeAppraise advances the state of the art of gaze pattern analysis using methods that simultaneously analyze spatial and temporal characteristics of gaze patterns. GazeAppraise enables novel research in visual perception and cognition; for example, using shape features as distinguishing elements to assess individual differences in visual search strategy. Given a set of point-to-point gaze sequences, hereafter referred to as scanpaths, the method constructs multiple descriptive features for each scanpath. Once the scanpath features have been calculated, they are used to form a multidimensional vector representing each scanpath and cluster analysis is performed on the set of vectors from all scanpaths.more » An additional benefit of this method is the identification of causal or correlated characteristics of the stimuli, subjects, and visual task through statistical analysis of descriptive metadata distributions within and across clusters.« less
The magnetic universe through vector potential SPMHD simulations
NASA Astrophysics Data System (ADS)
Stasyszyn, F. A.
2017-10-01
The use of Smoothed Particle Magneto Hydrodynamics (SPMHD) is getting nowadays more and more common in Astrophysics. From galaxy clusters to neutron starts, there are multiple applications already existing in the literature. I will review some of the common methods used and highlight the successful approach of using vector potentials to describe the evolution of the magnetic fields. The latter have some interesting advantages, and their results challenge previous findings, being the magnetic divergence problem naturally vanished. We select a few examples to discuss some areas of interest. First, we show some Galaxy Clusters from the MUSIC project. These cosmological simulations are done with the usual sub-grid recipes, as radiative cooling and star formation, being the first ones obtained with an SPH code in a self consistent way. This demonstrates the robustness of the new method in a variety of astrophysical scenarios.
Jara, Rocio F; Wydeven, Adrian P; Samuel, Michael D
2016-01-01
World-wide concern over emerging vector-borne diseases has increased in recent years for both animal and human health. In the United Sates, concern about vector-borne diseases in canines has focused on Lyme disease, anaplasmosis, ehrlichiosis, and heartworm which infect domestic and wild canids. Of these diseases, Lyme and anaplasmosis are also frequently diagnosed in humans. Gray wolves (Canis lupus) recolonized Wisconsin in the 1970s, and we evaluated their temporal and geographic patterns of exposure to these four vector-borne diseases in Wisconsin as the population expanded between 1985 and 2011. A high proportion of the Wisconsin wolves were exposed to the agents that cause Lyme (65.6%) and anaplasma (47.7%), and a smaller proportion to ehrlichiosis (5.7%) and infected with heartworm (9.2%). Wolf exposure to tick borne diseases was consistently higher in older animals. Wolf exposure was markedly higher than domestic dog (Canis familiaris) exposure for all 4 disease agents during 2001-2013. We found a cluster of wolf exposure to Borrelia burgdorferi in northwestern Wisconsin, which overlaps human and domestic dog clusters for the same pathogen. In addition, wolf exposure to Lyme disease in Wisconsin has increased, corresponding with the increasing human incidence of Lyme disease in a similar time period. Despite generally high prevalence of exposure none of these diseases appear to have slowed the growth of the Wisconsin wolf population.
Basso, César; García da Rosa, Elsa; Romero, Sonnia; González, Cristina; Lairihoy, Rosario; Roche, Ingrid; Caffera, Ruben M.; da Rosa, Ricardo; Calfani, Marisel; Alfonso-Sierra, Eduardo; Petzold, Max; Kroeger, Axel; Sommerfeld, Johannes
2015-01-01
Background Uruguay is located at the southern border of Aedes aegypti distribution on the South American sub-continent. The reported dengue cases in the country are all imported from surrounding countries. One of the cities at higher risk of local dengue transmission is Salto, a border city with heavy traffic from dengue endemic areas. Methods We completed an intervention study using a cluster randomized trial design in 20 randomly selected ‘clusters’ in Salto. The clusters were located in neighborhoods of differing geography and economic, cultural and social aspects. Results Entomological surveys were carried out to measure the impact of the intervention on vector densities. Through participatory processes of all stakeholders, an appropriate ecosystem management intervention was defined. Residents collected the abundant small water holding containers and the Ministry of Public Health and the Municipality of Salto were responsible for collecting and eliminating them. Additional vector breeding places were large water tanks; they were either altered so that they could not hold water any more or covered so that oviposition by mosquitoes could not take place. Conclusions The response from the community and national programme managers was encouraging. The intervention evidenced opportunities for cost savings and reducing dengue vector densities (although not to statistically significant levels). The observed low vector density limits the potential reduction due to the intervention. A larger sample size is needed to obtain a statistically significant difference. PMID:25604764
Jara, Rocio F.; Wydeven, Adrian P.; Samuel, Michael D.
2016-01-01
World-wide concern over emerging vector-borne diseases has increased in recent years for both animal and human health. In the United Sates, concern about vector-borne diseases in canines has focused on Lyme disease, anaplasmosis, ehrlichiosis, and heartworm which infect domestic and wild canids. Of these diseases, Lyme and anaplasmosis are also frequently diagnosed in humans. Gray wolves (Canis lupus) recolonized Wisconsin in the 1970s, and we evaluated their temporal and geographic patterns of exposure to these four vector-borne diseases in Wisconsin as the population expanded between 1985 and 2011. A high proportion of the Wisconsin wolves were exposed to the agents that cause Lyme (65.6%) and anaplasma (47.7%), and a smaller proportion to ehrlichiosis (5.7%) and infected with heartworm (9.2%). Wolf exposure to tick borne diseases was consistently higher in older animals. Wolf exposure was markedly higher than domestic dog (Canis familiaris) exposure for all 4 disease agents during 2001–2013. We found a cluster of wolf exposure to Borrelia burgdorferi in northwestern Wisconsin, which overlaps human and domestic dog clusters for the same pathogen. In addition, wolf exposure to Lyme disease in Wisconsin has increased, corresponding with the increasing human incidence of Lyme disease in a similar time period. Despite generally high prevalence of exposure none of these diseases appear to have slowed the growth of the Wisconsin wolf population.
Spatial Variations in Dengue Transmission in Schools in Thailand
Ratanawong, Pitcha; Kittayapong, Pattamaporn; Olanratmanee, Phanthip; Wilder-Smith, Annelies; Byass, Peter; Tozan, Yesim; Dambach, Peter; Quiñonez, Carlos Alberto Montenegro; Louis, Valérie R.
2016-01-01
Background Dengue is an important neglected tropical disease, with more than half of the world’s population living in dengue endemic areas. Good understanding of dengue transmission sites is a critical factor to implement effective vector control measures. Methods A cohort of 1,811 students from 10 schools in rural, semi-rural and semi-urban Thailand participated in this study. Seroconversion data and location of participants’ residences and schools were recorded to determine spatial patterns of dengue infections. Blood samples were taken to confirm dengue infections in participants at the beginning and the end of school term. Entomological factors included a survey of adult mosquito density using a portable vacuum aspirator during the school term and a follow up survey of breeding sites of Aedes vectors in schools after the school term. Clustering analyses were performed to detect spatial aggregation of dengue infections among participants. Results A total of 57 dengue seroconversions were detected among the 1,655 participants who provided paired blood samples. Of the 57 confirmed dengue infections, 23 (40.0%) occurred in students from 6 (6.8%) of the 88 classrooms in 10 schools. Dengue infections did not show significant clustering by residential location in the study area. During the school term, a total of 66 Aedes aegypti mosquitoes were identified from the 278 mosquitoes caught in 50 classrooms of the 10 schools. In a follow-up survey of breeding sites, 484 out of 2,399 water containers surveyed (20.2%) were identified as active mosquito breeding sites. Discussion and Conclusion Our findings suggest that dengue infections were clustered among schools and among classrooms within schools. The schools studied were found to contain a large number of different types of breeding sites. Aedes vector densities in schools were correlated with dengue infections and breeding sites in those schools. Given that only a small proportion of breeding sites in the schools were subjected to vector control measures (11%), this study emphasizes the urgent need to implement vector control strategies at schools, while maintaining efforts at the household level. PMID:27669170
Design of trials for interrupting the transmission of endemic pathogens.
Silkey, Mariabeth; Homan, Tobias; Maire, Nicolas; Hiscox, Alexandra; Mukabana, Richard; Takken, Willem; Smith, Thomas A
2016-06-06
Many interventions against infectious diseases have geographically diffuse effects. This leads to contamination between arms in cluster-randomized trials (CRTs). Pathogen elimination is the goal of many intervention programs against infectious agents, but contamination means that standard CRT designs and analyses do not provide inferences about the potential of interventions to interrupt pathogen transmission at maximum scale-up. A generic model of disease transmission was used to simulate infections in stepped wedge cluster-randomized trials (SWCRTs) of a transmission-reducing intervention, where the intervention has spatially diffuse effects. Simulations of such trials were then used to examine the potential of such designs for providing generalizable causal inferences about the impact of such interventions, including measurements of the contamination effects. The simulations were applied to the geography of Rusinga Island, Lake Victoria, Kenya, the site of the SolarMal trial on the use of odor-baited mosquito traps to eliminate Plasmodium falciparum malaria. These were used to compare variants in the proposed SWCRT designs for the SolarMal trial. Measures of contamination effects were found that could be assessed in the simulated trials. Inspired by analyses of trials of insecticide-treated nets against malaria when applied to the geography of the SolarMal trial, these measures were found to be robust to different variants of SWCRT design. Analyses of the likely extent of contamination effects supported the choice of cluster size for the trial. The SWCRT is an appropriate design for trials that assess the feasibility of local elimination of a pathogen. The effects of incomplete coverage can be estimated by analyzing the extent of contamination between arms in such trials, and the estimates also support inferences about causality. The SolarMal example illustrates how generic transmission models incorporating spatial smoothing can be used to simulate such trials for a power calculation and optimization of cluster size and randomization strategies. The approach is applicable to a range of infectious diseases transmitted via environmental reservoirs or via arthropod vectors.
Multiclass Reduced-Set Support Vector Machines
NASA Technical Reports Server (NTRS)
Tang, Benyang; Mazzoni, Dominic
2006-01-01
There are well-established methods for reducing the number of support vectors in a trained binary support vector machine, often with minimal impact on accuracy. We show how reduced-set methods can be applied to multiclass SVMs made up of several binary SVMs, with significantly better results than reducing each binary SVM independently. Our approach is based on Burges' approach that constructs each reduced-set vector as the pre-image of a vector in kernel space, but we extend this by recomputing the SVM weights and bias optimally using the original SVM objective function. This leads to greater accuracy for a binary reduced-set SVM, and also allows vectors to be 'shared' between multiple binary SVMs for greater multiclass accuracy with fewer reduced-set vectors. We also propose computing pre-images using differential evolution, which we have found to be more robust than gradient descent alone. We show experimental results on a variety of problems and find that this new approach is consistently better than previous multiclass reduced-set methods, sometimes with a dramatic difference.
A Subdivision-Based Representation for Vector Image Editing.
Liao, Zicheng; Hoppe, Hugues; Forsyth, David; Yu, Yizhou
2012-11-01
Vector graphics has been employed in a wide variety of applications due to its scalability and editability. Editability is a high priority for artists and designers who wish to produce vector-based graphical content with user interaction. In this paper, we introduce a new vector image representation based on piecewise smooth subdivision surfaces, which is a simple, unified and flexible framework that supports a variety of operations, including shape editing, color editing, image stylization, and vector image processing. These operations effectively create novel vector graphics by reusing and altering existing image vectorization results. Because image vectorization yields an abstraction of the original raster image, controlling the level of detail of this abstraction is highly desirable. To this end, we design a feature-oriented vector image pyramid that offers multiple levels of abstraction simultaneously. Our new vector image representation can be rasterized efficiently using GPU-accelerated subdivision. Experiments indicate that our vector image representation achieves high visual quality and better supports editing operations than existing representations.
A possibilistic approach to clustering
NASA Technical Reports Server (NTRS)
Krishnapuram, Raghu; Keller, James M.
1993-01-01
Fuzzy clustering has been shown to be advantageous over crisp (or traditional) clustering methods in that total commitment of a vector to a given class is not required at each image pattern recognition iteration. Recently fuzzy clustering methods have shown spectacular ability to detect not only hypervolume clusters, but also clusters which are actually 'thin shells', i.e., curves and surfaces. Most analytic fuzzy clustering approaches are derived from the 'Fuzzy C-Means' (FCM) algorithm. The FCM uses the probabilistic constraint that the memberships of a data point across classes sum to one. This constraint was used to generate the membership update equations for an iterative algorithm. Recently, we cast the clustering problem into the framework of possibility theory using an approach in which the resulting partition of the data can be interpreted as a possibilistic partition, and the membership values may be interpreted as degrees of possibility of the points belonging to the classes. We show the ability of this approach to detect linear and quartic curves in the presence of considerable noise.
Performance analysis of clustering techniques over microarray data: A case study
NASA Astrophysics Data System (ADS)
Dash, Rasmita; Misra, Bijan Bihari
2018-03-01
Handling big data is one of the major issues in the field of statistical data analysis. In such investigation cluster analysis plays a vital role to deal with the large scale data. There are many clustering techniques with different cluster analysis approach. But which approach suits a particular dataset is difficult to predict. To deal with this problem a grading approach is introduced over many clustering techniques to identify a stable technique. But the grading approach depends on the characteristic of dataset as well as on the validity indices. So a two stage grading approach is implemented. In this study the grading approach is implemented over five clustering techniques like hybrid swarm based clustering (HSC), k-means, partitioning around medoids (PAM), vector quantization (VQ) and agglomerative nesting (AGNES). The experimentation is conducted over five microarray datasets with seven validity indices. The finding of grading approach that a cluster technique is significant is also established by Nemenyi post-hoc hypothetical test.
Pyeon, Hye-Rim; Nah, Hee-Ju; Kang, Seung-Hoon; Choi, Si-Sun; Kim, Eung-Soo
2017-05-31
Heterologous expression of biosynthetic gene clusters of natural microbial products has become an essential strategy for titer improvement and pathway engineering of various potentially-valuable natural products. A Streptomyces artificial chromosomal conjugation vector, pSBAC, was previously successfully applied for precise cloning and tandem integration of a large polyketide tautomycetin (TMC) biosynthetic gene cluster (Nah et al. in Microb Cell Fact 14(1):1, 2015), implying that this strategy could be employed to develop a custom overexpression scheme of natural product pathway clusters present in actinomycetes. To validate the pSBAC system as a generally-applicable heterologous overexpression system for a large-sized polyketide biosynthetic gene cluster in Streptomyces, another model polyketide compound, the pikromycin biosynthetic gene cluster, was preciously cloned and heterologously expressed using the pSBAC system. A unique HindIII restriction site was precisely inserted at one of the border regions of the pikromycin biosynthetic gene cluster within the chromosome of Streptomyces venezuelae, followed by site-specific recombination of pSBAC into the flanking region of the pikromycin gene cluster. Unlike the previous cloning process, one HindIII site integration step was skipped through pSBAC modification. pPik001, a pSBAC containing the pikromycin biosynthetic gene cluster, was directly introduced into two heterologous hosts, Streptomyces lividans and Streptomyces coelicolor, resulting in the production of 10-deoxymethynolide, a major pikromycin derivative. When two entire pikromycin biosynthetic gene clusters were tandemly introduced into the S. lividans chromosome, overproduction of 10-deoxymethynolide and the presence of pikromycin, which was previously not detected, were both confirmed. Moreover, comparative qRT-PCR results confirmed that the transcription of pikromycin biosynthetic genes was significantly upregulated in S. lividans containing tandem clusters of pikromycin biosynthetic gene clusters. The 60 kb pikromycin biosynthetic gene cluster was isolated in a single integration pSBAC vector. Introduction of the pikromycin biosynthetic gene cluster into the pikromycin non-producing strains resulted in higher pikromycin production. The utility of the pSBAC system as a precise cloning tool for large-sized biosynthetic gene clusters was verified through heterologous expression of the pikromycin biosynthetic gene cluster. Moreover, this pSBAC-driven heterologous expression strategy was confirmed to be an ideal approach for production of low and inconsistent natural products such as pikromycin in S. venezuelae, implying that this strategy could be employed for development of a custom overexpression scheme of natural product biosynthetic gene clusters in actinomycetes.
Atomically precise cluster catalysis towards quantum controlled catalysts
Watanabe, Yoshihide
2014-01-01
Catalysis of atomically precise clusters supported on a substrate is reviewed in relation to the type of reactions. The catalytic activity of supported clusters has generally been discussed in terms of electronic structure. Several lines of evidence have indicated that the electronic structure of clusters and the geometry of clusters on a support, including the accompanying cluster-support interaction, are strongly correlated with catalytic activity. The electronic states of small clusters would be easily affected by cluster–support interactions. Several studies have suggested that it is possible to tune the electronic structure through atomic control of the cluster size. It is promising to tune not only the number of cluster atoms, but also the hybridization between the electronic states of the adsorbed reactant molecules and clusters in order to realize a quantum-controlled catalyst. PMID:27877723
A Numerical Model of Hercules A by Magnetic Tower
NASA Astrophysics Data System (ADS)
Nakamura, Masanori; Tregillis, I. L.; Li, H.; Li, S.
2009-01-01
We apply magnetohydrodynamic (MHD) modeling to the radio galaxy Hercules A for investigating the jet-driven shock, jet/lobe transition, wiggling, and magnetic field distribution associated with this source. The model consists of magnetic tower jets in a galaxy cluster environment. The profile of underlying ambient gas plays an important role in jet-lobe morphology. The balance between the magnetic pressure generated by axial current and the ambient gas pressure can determine the lobe radius. The jet body is confined jointly by the external pressure and gravity inside the cluster core radius, while outside this radius it expands radially to form fat lobes in a steeply decreasing ambient thermal pressure gradient. The current-carrying jets are responsible for generating a strong, tightly wound helical magnetic field. This magnetic configuration will be unstable against the current-driven kink mode and it visibly grows beyond the cluster core radius where a separation between the jet forward and return currents occurs. The reversed pinch profile of global magnetic field associated with the jet and lobes produces projected magnetic-vector distributions aligned with the jet flow and the lobe edge. AGN-driven shock powered by the expanding magnetic tower jet surrounds the jet/lobe structure and heats the ambient ICM. The lobes expand subsonically; no obvious hot spots are produced at the heads of lobes. Several key features in our MHD modeling may be qualitatively supported by the observations of Hercules A. This work was carried out under the auspices of the National Nuclear Security Administration of the U.S. Department of Energy at Los Alamos National Laboratory under Contract No. DE-AC52-06NA25396. It was supported by the Laboratory Directed Research and Development Program at LANL and by IGPP at LANL.
USDA-ARS?s Scientific Manuscript database
A somatic transformation vector, pDP9, was constructed that provides a simplified means of producing permanently transformed cultured insect cells that support high levels of protein expression of foreign genes. The pDP9 plasmid vector incorporates DNA sequences from the Junonia coenia densovirus th...
Rizzo, Nidia; Gramajo, Rodrigo; Escobar, Maria Cabrera; Arana, Byron; Kroeger, Axel; Manrique-Saide, Pablo; Petzold, Max
2012-10-30
In view of the epidemiological expansion of dengue worldwide and the availability of new tools and strategies particularly for controlling the primary dengue vector Aedes aegypti, an intervention study was set up to test the efficacy, cost and feasibility of a combined approach of insecticide treated materials (ITMs) alone and in combination with appropriate targeted interventions of the most productive vector breeding-sites. The study was conducted as a cluster randomized community trial using "reduction of the vector population" as the main outcome variable. The trial had two arms: 10 intervention clusters (neighborhoods) and 10 control clusters in the town of Poptun Guatemala. Activities included entomological assessments (characteristics of breeding-sites, pupal productivity, Stegomyia indices) at baseline, 6 weeks after the first intervention (coverage of window and exterior doorways made of PermaNet 2.0 netting, factory treated with deltamethrin at 55 mg/m2, and of 200 L drums with similar treated material) and 6 weeks after the second intervention (combination of treated materials and other suitable interventions targeting productive breeding-sites i.e larviciding with Temephos, elimination etc.). The second intervention took place 17 months after the first intervention. The insecticide residual activity and the insecticidal content were also studied at different intervals. Additionally, information about demographic characteristics, cost of the intervention, coverage of houses protected and satisfaction in the population with the interventions was collected. At baseline (during the dry season) a variety of productive container types for Aedes pupae were identified: various container types holding >20 L, 200 L drums, washbasins and buckets (producing 83.7% of all pupae). After covering 100% of windows and exterior doorways and a small number of drums (where the commercial cover could be fixed) in 970 study households, tropical rains occurred in the area and lead to an increase of the vector population, more pronounced (but statistically not significant) in the control arm than in the intervention arm. In the second intervention (17 months later and six weeks after implementing the second intervention) the combined approach of ITMs and a combination of appropriate interventions against productive containers (Temephos in >200 L water drums, elimination of small discarded tins and bottles) lead to significant differences on reductions of the total number of pupae (P = 0.04) and the House index (P = 0.01) between intervention and control clusters, and to borderline differences on reductions of the Pupae per Person and Breteau indices (P = 0.05). The insecticide residual activity on treated curtains was high until month 18 but the chemical concentration showed a high variability. The cost per house protected with treated curtains and drum covers and targeting productive breeding-sites of the dengue vector was $ 5.31 USD. The acceptance of the measure was generally high, particularly in families who had experienced dengue. Even under difficult environmental conditions (open houses, tropical rainfall, challenging container types mainly in the peridomestic environment) the combination of insecticide treated curtains and to a less extent drum covers and interventions targeting the productive container types can reduce the dengue vector population significantly.
2012-01-01
Background In view of the epidemiological expansion of dengue worldwide and the availability of new tools and strategies particularly for controlling the primary dengue vector Aedes aegypti, an intervention study was set up to test the efficacy, cost and feasibility of a combined approach of insecticide treated materials (ITMs) alone and in combination with appropriate targeted interventions of the most productive vector breeding-sites. Methods The study was conducted as a cluster randomized community trial using “reduction of the vector population” as the main outcome variable. The trial had two arms: 10 intervention clusters (neighborhoods) and 10 control clusters in the town of Poptun Guatemala. Activities included entomological assessments (characteristics of breeding-sites, pupal productivity, Stegomyia indices) at baseline, 6 weeks after the first intervention (coverage of window and exterior doorways made of PermaNet 2.0 netting, factory treated with deltamethrin at 55 mg/m2, and of 200 L drums with similar treated material) and 6 weeks after the second intervention (combination of treated materials and other suitable interventions targeting productive breeding-sites i.e larviciding with Temephos, elimination etc.). The second intervention took place 17 months after the first intervention. The insecticide residual activity and the insecticidal content were also studied at different intervals. Additionally, information about demographic characteristics, cost of the intervention, coverage of houses protected and satisfaction in the population with the interventions was collected. Results At baseline (during the dry season) a variety of productive container types for Aedes pupae were identified: various container types holding >20 L, 200 L drums, washbasins and buckets (producing 83.7% of all pupae). After covering 100% of windows and exterior doorways and a small number of drums (where the commercial cover could be fixed) in 970 study households, tropical rains occurred in the area and lead to an increase of the vector population, more pronounced (but statistically not significant) in the control arm than in the intervention arm. In the second intervention (17 months later and six weeks after implementing the second intervention) the combined approach of ITMs and a combination of appropriate interventions against productive containers (Temephos in >200 L water drums, elimination of small discarded tins and bottles) lead to significant differences on reductions of the total number of pupae (P = 0.04) and the House index (P = 0.01) between intervention and control clusters, and to borderline differences on reductions of the Pupae per Person and Breteau indices (P = 0.05). The insecticide residual activity on treated curtains was high until month 18 but the chemical concentration showed a high variability. The cost per house protected with treated curtains and drum covers and targeting productive breeding-sites of the dengue vector was $ 5.31 USD. The acceptance of the measure was generally high, particularly in families who had experienced dengue. Conclusion Even under difficult environmental conditions (open houses, tropical rainfall, challenging container types mainly in the peridomestic environment) the combination of insecticide treated curtains and to a less extent drum covers and interventions targeting the productive container types can reduce the dengue vector population significantly. PMID:23110515
NASA Astrophysics Data System (ADS)
Costache, G. N.; Gavat, I.
2004-09-01
Along with the aggressive growing of the amount of digital data available (text, audio samples, digital photos and digital movies joined all in the multimedia domain) the need for classification, recognition and retrieval of this kind of data became very important. In this paper will be presented a system structure to handle multimedia data based on a recognition perspective. The main processing steps realized for the interesting multimedia objects are: first, the parameterization, by analysis, in order to obtain a description based on features, forming the parameter vector; second, a classification, generally with a hierarchical structure to make the necessary decisions. For audio signals, both speech and music, the derived perceptual features are the melcepstral (MFCC) and the perceptual linear predictive (PLP) coefficients. For images, the derived features are the geometric parameters of the speaker mouth. The hierarchical classifier consists generally in a clustering stage, based on the Kohonnen Self-Organizing Maps (SOM) and a final stage, based on a powerful classification algorithm called Support Vector Machines (SVM). The system, in specific variants, is applied with good results in two tasks: the first, is a bimodal speech recognition which uses features obtained from speech signal fused to features obtained from speaker's image and the second is a music retrieval from large music database.
Signature extraction of ocean pollutants by eigenvector transformation of remote spectra
NASA Technical Reports Server (NTRS)
Grew, G. W.
1978-01-01
Spectral signatures of suspended matter in the ocean are being extracted through characteristic vector analysis of remote ocean color data collected with MOCS (Multichannel Ocean Color Sensor). Spectral signatures appear to be obtainable through analyses of 'linear' clusters that appear on scatter diagrams associated with eigenvectors. Signatures associated with acid waste, sewage sludge, oil, and algae are presented. The application of vector analysis to two acid waste dumps overflown two years apart is examined in some detail. The relationships between eigenvectors and spectral signatures for these examples are analyzed. These cases demonstrate the value of characteristic vector analysis in remotely identifying pollutants in the ocean and in determining the consistency of their spectral signatures.
Improving Large-Scale Image Retrieval Through Robust Aggregation of Local Descriptors.
Husain, Syed Sameed; Bober, Miroslaw
2017-09-01
Visual search and image retrieval underpin numerous applications, however the task is still challenging predominantly due to the variability of object appearance and ever increasing size of the databases, often exceeding billions of images. Prior art methods rely on aggregation of local scale-invariant descriptors, such as SIFT, via mechanisms including Bag of Visual Words (BoW), Vector of Locally Aggregated Descriptors (VLAD) and Fisher Vectors (FV). However, their performance is still short of what is required. This paper presents a novel method for deriving a compact and distinctive representation of image content called Robust Visual Descriptor with Whitening (RVD-W). It significantly advances the state of the art and delivers world-class performance. In our approach local descriptors are rank-assigned to multiple clusters. Residual vectors are then computed in each cluster, normalized using a direction-preserving normalization function and aggregated based on the neighborhood rank. Importantly, the residual vectors are de-correlated and whitened in each cluster before aggregation, leading to a balanced energy distribution in each dimension and significantly improved performance. We also propose a new post-PCA normalization approach which improves separability between the matching and non-matching global descriptors. This new normalization benefits not only our RVD-W descriptor but also improves existing approaches based on FV and VLAD aggregation. Furthermore, we show that the aggregation framework developed using hand-crafted SIFT features also performs exceptionally well with Convolutional Neural Network (CNN) based features. The RVD-W pipeline outperforms state-of-the-art global descriptors on both the Holidays and Oxford datasets. On the large scale datasets, Holidays1M and Oxford1M, SIFT-based RVD-W representation obtains a mAP of 45.1 and 35.1 percent, while CNN-based RVD-W achieve a mAP of 63.5 and 44.8 percent, all yielding superior performance to the state-of-the-art.
NASA Astrophysics Data System (ADS)
Sopharak, Akara; Uyyanonvara, Bunyarit; Barman, Sarah; Williamson, Thomas
To prevent blindness from diabetic retinopathy, periodic screening and early diagnosis are neccessary. Due to lack of expert ophthalmologists in rural area, automated early exudate (one of visible sign of diabetic retinopathy) detection could help to reduce the number of blindness in diabetic patients. Traditional automatic exudate detection methods are based on specific parameter configuration, while the machine learning approaches which seems more flexible may be computationally high cost. A comparative analysis of traditional and machine learning of exudates detection, namely, mathematical morphology, fuzzy c-means clustering, naive Bayesian classifier, Support Vector Machine and Nearest Neighbor classifier are presented. Detected exudates are validated with expert ophthalmologists' hand-drawn ground-truths. The sensitivity, specificity, precision, accuracy and time complexity of each method are also compared.
STAMPS: Software Tool for Automated MRI Post-processing on a supercomputer.
Bigler, Don C; Aksu, Yaman; Miller, David J; Yang, Qing X
2009-08-01
This paper describes a Software Tool for Automated MRI Post-processing (STAMP) of multiple types of brain MRIs on a workstation and for parallel processing on a supercomputer (STAMPS). This software tool enables the automation of nonlinear registration for a large image set and for multiple MR image types. The tool uses standard brain MRI post-processing tools (such as SPM, FSL, and HAMMER) for multiple MR image types in a pipeline fashion. It also contains novel MRI post-processing features. The STAMP image outputs can be used to perform brain analysis using Statistical Parametric Mapping (SPM) or single-/multi-image modality brain analysis using Support Vector Machines (SVMs). Since STAMPS is PBS-based, the supercomputer may be a multi-node computer cluster or one of the latest multi-core computers.
Just-in-Time Correntropy Soft Sensor with Noisy Data for Industrial Silicon Content Prediction.
Chen, Kun; Liang, Yu; Gao, Zengliang; Liu, Yi
2017-08-08
Development of accurate data-driven quality prediction models for industrial blast furnaces encounters several challenges mainly because the collected data are nonlinear, non-Gaussian, and uneven distributed. A just-in-time correntropy-based local soft sensing approach is presented to predict the silicon content in this work. Without cumbersome efforts for outlier detection, a correntropy support vector regression (CSVR) modeling framework is proposed to deal with the soft sensor development and outlier detection simultaneously. Moreover, with a continuous updating database and a clustering strategy, a just-in-time CSVR (JCSVR) method is developed. Consequently, more accurate prediction and efficient implementations of JCSVR can be achieved. Better prediction performance of JCSVR is validated on the online silicon content prediction, compared with traditional soft sensors.
Just-in-Time Correntropy Soft Sensor with Noisy Data for Industrial Silicon Content Prediction
Chen, Kun; Liang, Yu; Gao, Zengliang; Liu, Yi
2017-01-01
Development of accurate data-driven quality prediction models for industrial blast furnaces encounters several challenges mainly because the collected data are nonlinear, non-Gaussian, and uneven distributed. A just-in-time correntropy-based local soft sensing approach is presented to predict the silicon content in this work. Without cumbersome efforts for outlier detection, a correntropy support vector regression (CSVR) modeling framework is proposed to deal with the soft sensor development and outlier detection simultaneously. Moreover, with a continuous updating database and a clustering strategy, a just-in-time CSVR (JCSVR) method is developed. Consequently, more accurate prediction and efficient implementations of JCSVR can be achieved. Better prediction performance of JCSVR is validated on the online silicon content prediction, compared with traditional soft sensors. PMID:28786957
Scarpassa, Vera Margarete; Conn, Jan E.
2011-01-01
Cryptic species and lineages characterize Anopheles nuneztovari s.l. Gabaldón, an important malaria vector in South America. We investigated the phylogeographic structure across the range of this species with cytochrome oxidase subunit I (COI) mitochondrial DNA sequences to estimate the number of clades and levels of divergence. Bayesian and maximum-likelihood phylogenetic analyses detected four groups distributed in two major monophyletic clades (I and II). Samples from the Amazon Basin were clustered in clade I, as were subclades II-A and II-B, whereas those from Bolivia/Colombia/Venezuela were restricted to one basal subclade (II-C). These data, together with a statistical parsimony network, confirm results of previous studies that An. nuneztovari is a species complex consisting of at least two cryptic taxa, one occurring in Colombia and Venezuela and the another occurring in the Amazon Basin. These data also suggest that additional incipient species may exist in the Amazon Basin. Divergence time and expansion tests suggested that these groups separated and expanded in the Pleistocene Epoch. In addition, the COI sequences clearly separated An. nuneztovari s.l. from the closely related species An. dunhami Causey, and three new records are reported for An. dunhami in Amazonian Brazil. These findings are relevant for vector control programs in areas where both species occur. Our analyses support dynamic geologic and landscape changes in northern South America, and infer particularly active divergence during the Pleistocene Epoch for New World anophelines. PMID:22049039
Ensemble based on static classifier selection for automated diagnosis of Mild Cognitive Impairment.
Nanni, Loris; Lumini, Alessandra; Zaffonato, Nicolò
2018-05-15
Alzheimer's disease (AD) is the most common cause of neurodegenerative dementia in the elderly population. Scientific research is very active in the challenge of designing automated approaches to achieve an early and certain diagnosis. Recently an international competition among AD predictors has been organized: "A Machine learning neuroimaging challenge for automated diagnosis of Mild Cognitive Impairment" (MLNeCh). This competition is based on pre-processed sets of T1-weighted Magnetic Resonance Images (MRI) to be classified in four categories: stable AD, individuals with MCI who converted to AD, individuals with MCI who did not convert to AD and healthy controls. In this work, we propose a method to perform early diagnosis of AD, which is evaluated on MLNeCh dataset. Since the automatic classification of AD is based on the use of feature vectors of high dimensionality, different techniques of feature selection/reduction are compared in order to avoid the curse-of-dimensionality problem, then the classification method is obtained as the combination of Support Vector Machines trained using different clusters of data extracted from the whole training set. The multi-classifier approach proposed in this work outperforms all the stand-alone method tested in our experiments. The final ensemble is based on a set of classifiers, each trained on a different cluster of the training data. The proposed ensemble has the great advantage of performing well using a very reduced version of the data (the reduction factor is more than 90%). The MATLAB code for the ensemble of classifiers will be publicly available 1 to other researchers for future comparisons. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Khehra, Baljit Singh; Pharwaha, Amar Partap Singh
2017-04-01
Ductal carcinoma in situ (DCIS) is one type of breast cancer. Clusters of microcalcifications (MCCs) are symptoms of DCIS that are recognized by mammography. Selection of robust features vector is the process of selecting an optimal subset of features from a large number of available features in a given problem domain after the feature extraction and before any classification scheme. Feature selection reduces the feature space that improves the performance of classifier and decreases the computational burden imposed by using many features on classifier. Selection of an optimal subset of features from a large number of available features in a given problem domain is a difficult search problem. For n features, the total numbers of possible subsets of features are 2n. Thus, selection of an optimal subset of features problem belongs to the category of NP-hard problems. In this paper, an attempt is made to find the optimal subset of MCCs features from all possible subsets of features using genetic algorithm (GA), particle swarm optimization (PSO) and biogeography-based optimization (BBO). For simulation, a total of 380 benign and malignant MCCs samples have been selected from mammogram images of DDSM database. A total of 50 features extracted from benign and malignant MCCs samples are used in this study. In these algorithms, fitness function is correct classification rate of classifier. Support vector machine is used as a classifier. From experimental results, it is also observed that the performance of PSO-based and BBO-based algorithms to select an optimal subset of features for classifying MCCs as benign or malignant is better as compared to GA-based algorithm.
Spectral analysis of two-signed microarray expression data.
Higham, Desmond J; Kalna, Gabriela; Vass, J Keith
2007-06-01
We give a simple and informative derivation of a spectral algorithm for clustering and reordering complementary DNA microarray expression data. Here, expression levels of a set of genes are recorded simultaneously across a number of samples, with a positive weight reflecting up-regulation and a negative weight reflecting down-regulation. We give theoretical support for the algorithm based on a biologically justified hypothesis about the structure of the data, and illustrate its use on public domain data in the context of unsupervised tumour classification. The algorithm is derived by considering a discrete optimization problem and then relaxing to the continuous realm. We prove that in the case where the data have an inherent 'checkerboard' sign pattern, the algorithm will automatically reveal that pattern. Further, our derivation shows that the algorithm may be regarded as imposing a random graph model on the expression levels and then clustering from a maximum likelihood perspective. This indicates that the output will be tolerant to perturbations and will reveal 'near-checkerboard' patterns when these are present in the data. It is interesting to note that the checkerboard structure is revealed by the first (dominant) singular vectors--previous work on spectral methods has focussed on the case of nonnegative edge weights, where only the second and higher singular vectors are relevant. We illustrate the algorithm on real and synthetic data, and then use it in a tumour classification context on three different cancer data sets. Our results show that respecting the two-signed nature of the data (thereby distinguishing between up-regulation and down-regulation) reveals structures that cannot be gleaned from the absolute value data (where up- and down-regulation are both regarded as 'changes').
Zhang, Sa; Li, Zhou; Xin, Xue-Gang
2017-12-20
To achieve differential diagnosis of normal and malignant gastric tissues based on discrepancies in their dielectric properties using support vector machine. The dielectric properties of normal and malignant gastric tissues at the frequency ranging from 42.58 to 500 MHz were measured by coaxial probe method, and the Cole?Cole model was used to fit the measured data. Receiver?operating characteristic (ROC) curve analysis was used to evaluate the discrimination capability with respect to permittivity, conductivity, and Cole?Cole fitting parameters. Support vector machine was used for discriminating normal and malignant gastric tissues, and the discrimination accuracy was calculated using k?fold cross? The area under the ROC curve was above 0.8 for permittivity at the 5 frequencies at the lower end of the measured frequency range. The combination of the support vector machine with the permittivity at all these 5 frequencies combined achieved the highest discrimination accuracy of 84.38% with a MATLAB runtime of 3.40 s. The support vector machine?assisted diagnosis is feasible for human malignant gastric tissues based on the dielectric properties.
Research on intrusion detection based on Kohonen network and support vector machine
NASA Astrophysics Data System (ADS)
Shuai, Chunyan; Yang, Hengcheng; Gong, Zeweiyi
2018-05-01
In view of the problem of low detection accuracy and the long detection time of support vector machine, which directly applied to the network intrusion detection system. Optimization of SVM parameters can greatly improve the detection accuracy, but it can not be applied to high-speed network because of the long detection time. a method based on Kohonen neural network feature selection is proposed to reduce the optimization time of support vector machine parameters. Firstly, this paper is to calculate the weights of the KDD99 network intrusion data by Kohonen network and select feature by weight. Then, after the feature selection is completed, genetic algorithm (GA) and grid search method are used for parameter optimization to find the appropriate parameters and classify them by support vector machines. By comparing experiments, it is concluded that feature selection can reduce the time of parameter optimization, which has little influence on the accuracy of classification. The experiments suggest that the support vector machine can be used in the network intrusion detection system and reduce the missing rate.
Clark, David J; Fondrie, William E; Liao, Zhongping; Hanson, Phyllis I; Fulton, Amy; Mao, Li; Yang, Austin J
2015-10-20
Exosomes are microvesicles of endocytic origin constitutively released by multiple cell types into the extracellular environment. With evidence that exosomes can be detected in the blood of patients with various malignancies, the development of a platform that uses exosomes as a diagnostic tool has been proposed. However, it has been difficult to truly define the exosome proteome due to the challenge of discerning contaminant proteins that may be identified via mass spectrometry using various exosome enrichment strategies. To better define the exosome proteome in breast cancer, we incorporated a combination of Tandem-Mass-Tag (TMT) quantitative proteomics approach and Support Vector Machine (SVM) cluster analysis of three conditioned media derived fractions corresponding to a 10 000g cellular debris pellet, a 100 000g crude exosome pellet, and an Optiprep enriched exosome pellet. The quantitative analysis identified 2 179 proteins in all three fractions, with known exosomal cargo proteins displaying at least a 2-fold enrichment in the exosome fraction based on the TMT protein ratios. Employing SVM cluster analysis allowed for the classification 251 proteins as "true" exosomal cargo proteins. This study provides a robust and vigorous framework for the future development of using exosomes as a potential multiprotein marker phenotyping tool that could be useful in breast cancer diagnosis and monitoring disease progression.
Estimation of rank correlation for clustered data.
Rosner, Bernard; Glynn, Robert J
2017-06-30
It is well known that the sample correlation coefficient (R xy ) is the maximum likelihood estimator of the Pearson correlation (ρ xy ) for independent and identically distributed (i.i.d.) bivariate normal data. However, this is not true for ophthalmologic data where X (e.g., visual acuity) and Y (e.g., visual field) are available for each eye and there is positive intraclass correlation for both X and Y in fellow eyes. In this paper, we provide a regression-based approach for obtaining the maximum likelihood estimator of ρ xy for clustered data, which can be implemented using standard mixed effects model software. This method is also extended to allow for estimation of partial correlation by controlling both X and Y for a vector U_ of other covariates. In addition, these methods can be extended to allow for estimation of rank correlation for clustered data by (i) converting ranks of both X and Y to the probit scale, (ii) estimating the Pearson correlation between probit scores for X and Y, and (iii) using the relationship between Pearson and rank correlation for bivariate normally distributed data. The validity of the methods in finite-sized samples is supported by simulation studies. Finally, two examples from ophthalmology and analgesic abuse are used to illustrate the methods. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Efficient active waveguiding properties of Mo6 nano-cluster-doped polymer nanotubes
NASA Astrophysics Data System (ADS)
Bigeon, J.; Huby, N.; Amela-Cortes, M.; Molard, Y.; Garreau, A.; Cordier, S.; Bêche, B.; Duvail, J.-L.
2016-06-01
We investigate 1D nanostructures based on a Mo6@SU8 hybrid nanocomposite in which photoluminescent Mo6 clusters are embedded in the photosensitive SU8 resist. Tens of micrometers long Mo6@SU8-based tubular nanostructures were fabricated by the wetting template method, enabling the control of the inner and outer diameter to about 190 nm and 240 nm respectively, as supported by structural and optical characterizations. The image plane optical study of these nanotubes under optical pumping highlights the efficient waveguiding phenomenon of the red luminescence emitted by the clusters. Moreover, the wave vector distribution in the Fourier plane determined by leakage radiation microscopy gives additional features of the emission and waveguiding. First, the anisotropic red luminescence of the whole system can be attributed to the guided mode along the nanotube. Then, a low-loss propagation behavior is evidenced in the Mo6@SU8-based nanotubes. This result contrasts with the weaker waveguiding signature in the case of UV210-based nanotubes embedding PFO (poly(9,9-di-n-octylfluorenyl-2,7-diyl)). It is attributed to the strong reabsorption phenomenon, owing to overlapping between absorption and emission bands in the semi-conducting conjugated polymer PFO. These results make this Mo6@SU8 original class of nanocomposite a promising candidate as nanosources for submicronic photonic integration.
Efficient active waveguiding properties of Mo6 nano-cluster-doped polymer nanotubes.
Bigeon, J; Huby, N; Amela-Cortes, M; Molard, Y; Garreau, A; Cordier, S; Bêche, B; Duvail, J-L
2016-06-24
We investigate 1D nanostructures based on a Mo6@SU8 hybrid nanocomposite in which photoluminescent Mo6 clusters are embedded in the photosensitive SU8 resist. Tens of micrometers long Mo6@SU8-based tubular nanostructures were fabricated by the wetting template method, enabling the control of the inner and outer diameter to about 190 nm and 240 nm respectively, as supported by structural and optical characterizations. The image plane optical study of these nanotubes under optical pumping highlights the efficient waveguiding phenomenon of the red luminescence emitted by the clusters. Moreover, the wave vector distribution in the Fourier plane determined by leakage radiation microscopy gives additional features of the emission and waveguiding. First, the anisotropic red luminescence of the whole system can be attributed to the guided mode along the nanotube. Then, a low-loss propagation behavior is evidenced in the Mo6@SU8-based nanotubes. This result contrasts with the weaker waveguiding signature in the case of UV210-based nanotubes embedding PFO (poly(9,9-di-n-octylfluorenyl-2,7-diyl)). It is attributed to the strong reabsorption phenomenon, owing to overlapping between absorption and emission bands in the semi-conducting conjugated polymer PFO. These results make this Mo6@SU8 original class of nanocomposite a promising candidate as nanosources for submicronic photonic integration.
A two-stage method for microcalcification cluster segmentation in mammography by deformable models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arikidis, N.; Kazantzi, A.; Skiadopoulos, S.
Purpose: Segmentation of microcalcification (MC) clusters in x-ray mammography is a difficult task for radiologists. Accurate segmentation is prerequisite for quantitative image analysis of MC clusters and subsequent feature extraction and classification in computer-aided diagnosis schemes. Methods: In this study, a two-stage semiautomated segmentation method of MC clusters is investigated. The first stage is targeted to accurate and time efficient segmentation of the majority of the particles of a MC cluster, by means of a level set method. The second stage is targeted to shape refinement of selected individual MCs, by means of an active contour model. Both methods aremore » applied in the framework of a rich scale-space representation, provided by the wavelet transform at integer scales. Segmentation reliability of the proposed method in terms of inter and intraobserver agreements was evaluated in a case sample of 80 MC clusters originating from the digital database for screening mammography, corresponding to 4 morphology types (punctate: 22, fine linear branching: 16, pleomorphic: 18, and amorphous: 24) of MC clusters, assessing radiologists’ segmentations quantitatively by two distance metrics (Hausdorff distance—HDIST{sub cluster}, average of minimum distance—AMINDIST{sub cluster}) and the area overlap measure (AOM{sub cluster}). The effect of the proposed segmentation method on MC cluster characterization accuracy was evaluated in a case sample of 162 pleomorphic MC clusters (72 malignant and 90 benign). Ten MC cluster features, targeted to capture morphologic properties of individual MCs in a cluster (area, major length, perimeter, compactness, and spread), were extracted and a correlation-based feature selection method yielded a feature subset to feed in a support vector machine classifier. Classification performance of the MC cluster features was estimated by means of the area under receiver operating characteristic curve (Az ± Standard Error) utilizing tenfold cross-validation methodology. A previously developed B-spline active rays segmentation method was also considered for comparison purposes. Results: Interobserver and intraobserver segmentation agreements (median and [25%, 75%] quartile range) were substantial with respect to the distance metrics HDIST{sub cluster} (2.3 [1.8, 2.9] and 2.5 [2.1, 3.2] pixels) and AMINDIST{sub cluster} (0.8 [0.6, 1.0] and 1.0 [0.8, 1.2] pixels), while moderate with respect to AOM{sub cluster} (0.64 [0.55, 0.71] and 0.59 [0.52, 0.66]). The proposed segmentation method outperformed (0.80 ± 0.04) statistically significantly (Mann-Whitney U-test, p < 0.05) the B-spline active rays segmentation method (0.69 ± 0.04), suggesting the significance of the proposed semiautomated method. Conclusions: Results indicate a reliable semiautomated segmentation method for MC clusters offered by deformable models, which could be utilized in MC cluster quantitative image analysis.« less
McCann, Robert S; van den Berg, Henk; Diggle, Peter J; van Vugt, Michèle; Terlouw, Dianne J; Phiri, Kamija S; Di Pasquale, Aurelio; Maire, Nicolas; Gowelo, Steven; Mburu, Monicah M; Kabaghe, Alinune N; Mzilahowa, Themba; Chipeta, Michael G; Takken, Willem
2017-09-22
Due to outdoor and residual transmission and insecticide resistance, long-lasting insecticidal nets (LLINs) and indoor residual spraying (IRS) will be insufficient as stand-alone malaria vector control interventions in many settings as programmes shift toward malaria elimination. Combining additional vector control interventions as part of an integrated strategy would potentially overcome these challenges. Larval source management (LSM) and structural house improvements (HI) are appealing as additional components of an integrated vector management plan because of their long histories of use, evidence on effectiveness in appropriate settings, and unique modes of action compared to LLINs and IRS. Implementation of LSM and HI through a community-based approach could provide a path for rolling-out these interventions sustainably and on a large scale. We will implement community-based LSM and HI, as additional interventions to the current national malaria control strategies, using a randomised block, 2 × 2 factorial, cluster-randomised design in rural, southern Malawi. These interventions will be continued for two years. The trial catchment area covers about 25,000 people living in 65 villages. Community participation is encouraged by training community volunteers as health animators, and supporting the organisation of village-level committees in collaboration with The Hunger Project, a non-governmental organisation. Household-level cross-sectional surveys, including parasitological and entomological sampling, will be conducted on a rolling, 2-monthly schedule to measure outcomes over two years (2016 to 2018). Coverage of LSM and HI will also be assessed throughout the trial area. Combining LSM and/or HI together with the interventions currently implemented by the Malawi National Malaria Control Programme is anticipated to reduce malaria transmission below the level reached by current interventions alone. Implementation of LSM and HI through a community-based approach provides an opportunity for optimum adaptation to the local ecological and social setting, and enhances the potential for sustainability. Registered with The Pan African Clinical Trials Registry on 3 March 2016, trial number PACTR201604001501493.
A Power Transformers Fault Diagnosis Model Based on Three DGA Ratios and PSO Optimization SVM
NASA Astrophysics Data System (ADS)
Ma, Hongzhe; Zhang, Wei; Wu, Rongrong; Yang, Chunyan
2018-03-01
In order to make up for the shortcomings of existing transformer fault diagnosis methods in dissolved gas-in-oil analysis (DGA) feature selection and parameter optimization, a transformer fault diagnosis model based on the three DGA ratios and particle swarm optimization (PSO) optimize support vector machine (SVM) is proposed. Using transforming support vector machine to the nonlinear and multi-classification SVM, establishing the particle swarm optimization to optimize the SVM multi classification model, and conducting transformer fault diagnosis combined with the cross validation principle. The fault diagnosis results show that the average accuracy of test method is better than the standard support vector machine and genetic algorithm support vector machine, and the proposed method can effectively improve the accuracy of transformer fault diagnosis is proved.
Global Positioning Systems (GPS) Technology to Study Vector-Pathogen-Host Interactions
2014-10-26
surveyed and GPS mapping completed with 35 cluster contacts enrolled all who did not have dengue infection. There were 91 Aedes aegypti mosquitoes...4. 3,455 female Aedes aegypti mosquitoes were collected within the clusters with 5 DENV-1; 36 DENV-2; 22 DENV-3 and 0 DENV-4 isolated from...isolated. 47 female Aedes aegypti were collected and no viruses were isolated. Successful 11 completion of one full dengue season. On June 2011 start
Estimated Satellite Cluster Elements in Near Circular Orbit
1988-12-01
cluster is investigated. TheAon-board estimator is the U-D covariance factor’xzatiion’filter with dynamics based on the Clohessy - Wiltshire equations...Appropriate values for the velocity vector vi can be found irom the Clohessy - Wiltshire equations [9] (these equations will be explained in detail in the...explained in this text is the f matrix. The state transition matrix was developed from the Clohessy - Wiltshire equations of motion [9:page 3] as i - 2qý
Stevens, Lori; Monroy, M. Carlota; Rodas, Antonieta Guadalupe; Hicks, Robin M.; Lucero, David E.; Lyons, Leslie A.; Dorn, Patricia L.
2015-01-01
Triatoma dimidiata (Latreille, 1811) is the most abundant and significant insect vector of the parasite Trypanosoma cruzi in Central America, and particularly in Guatemala. Tr. cruzi is the causative agent of Chagas disease, and successful disease control requires understanding the geographic distribution and degree of migration of vectors such as T. dimidiata that frequently re-infest houses within months following insecticide application. The population genetic structure of T. dimidiata collected from six villages in southern Guatemala was studied to gain insight into the migration patterns of the insects in this region where populations are largely domestic. This study provided insight into the likelihood of eliminating T. dimidiata by pesticide application as has been observed in some areas for other domestic triatomines such as Triatoma infestans. Genotypes of microsatellite loci for 178 insects from six villages were found to represent five genetic clusters using a Bayesian Markov Chain Monte Carlo method. Individual clusters were found in multiple villages, with multiple clusters in the same house. Although migration occurred, there was statistically significant genetic differentiation among villages (FRT = 0.05) and high genetic differentiation among houses within villages (FSR = 0.11). Relatedness of insects within houses varied from 0 to 0.25, i.e., from unrelated to half-sibs. The results suggest that T. dimidiata in southern Guatemala moves between houses and villages often enough that recolonization is likely, implying the use of insecticides alone is not sufficient for effective control of Chagas disease in this region and more sustainable solutions are required. PMID:26334816
The Design of a Templated C++ Small Vector Class for Numerical Computing
NASA Technical Reports Server (NTRS)
Moran, Patrick J.
2000-01-01
We describe the design and implementation of a templated C++ class for vectors. The vector class is templated both for vector length and vector component type; the vector length is fixed at template instantiation time. The vector implementation is such that for a vector of N components of type T, the total number of bytes required by the vector is equal to N * size of (T), where size of is the built-in C operator. The property of having a size no bigger than that required by the components themselves is key in many numerical computing applications, where one may allocate very large arrays of small, fixed-length vectors. In addition to the design trade-offs motivating our fixed-length vector design choice, we review some of the C++ template features essential to an efficient, succinct implementation. In particular, we highlight some of the standard C++ features, such as partial template specialization, that are not supported by all compilers currently. This report provides an inventory listing the relevant support currently provided by some key compilers, as well as test code one can use to verify compiler capabilities.
Adaptive vector validation in image velocimetry to minimise the influence of outlier clusters
NASA Astrophysics Data System (ADS)
Masullo, Alessandro; Theunissen, Raf
2016-03-01
The universal outlier detection scheme (Westerweel and Scarano in Exp Fluids 39:1096-1100, 2005) and the distance-weighted universal outlier detection scheme for unstructured data (Duncan et al. in Meas Sci Technol 21:057002, 2010) are the most common PIV data validation routines. However, such techniques rely on a spatial comparison of each vector with those in a fixed-size neighbourhood and their performance subsequently suffers in the presence of clusters of outliers. This paper proposes an advancement to render outlier detection more robust while reducing the probability of mistakenly invalidating correct vectors. Velocity fields undergo a preliminary evaluation in terms of local coherency, which parametrises the extent of the neighbourhood with which each vector will be compared subsequently. Such adaptivity is shown to reduce the number of undetected outliers, even when implemented in the afore validation schemes. In addition, the authors present an alternative residual definition considering vector magnitude and angle adopting a modified Gaussian-weighted distance-based averaging median. This procedure is able to adapt the degree of acceptable background fluctuations in velocity to the local displacement magnitude. The traditional, extended and recommended validation methods are numerically assessed on the basis of flow fields from an isolated vortex, a turbulent channel flow and a DNS simulation of forced isotropic turbulence. The resulting validation method is adaptive, requires no user-defined parameters and is demonstrated to yield the best performances in terms of outlier under- and over-detection. Finally, the novel validation routine is applied to the PIV analysis of experimental studies focused on the near wake behind a porous disc and on a supersonic jet, illustrating the potential gains in spatial resolution and accuracy.
Feldman, Steven; Valera-Leon, Carlos; Dechev, Damian
2016-03-01
The vector is a fundamental data structure, which provides constant-time access to a dynamically-resizable range of elements. Currently, there exist no wait-free vectors. The only non-blocking version supports only a subset of the sequential vector API and exhibits significant synchronization overhead caused by supporting opposing operations. Since many applications operate in phases of execution, wherein each phase only a subset of operations are used, this overhead is unnecessary for the majority of the application. To address the limitations of the non-blocking version, we present a new design that is wait-free, supports more of the operations provided by the sequential vector,more » and provides alternative implementations of key operations. These alternatives allow the developer to balance the performance and functionality of the vector as requirements change throughout execution. Compared to the known non-blocking version and the concurrent vector found in Intel’s TBB library, our design outperforms or provides comparable performance in the majority of tested scenarios. Over all tested scenarios, the presented design performs an average of 4.97 times more operations per second than the non-blocking vector and 1.54 more than the TBB vector. In a scenario designed to simulate the filling of a vector, performance improvement increases to 13.38 and 1.16 times. This work presents the first ABA-free non-blocking vector. Finally, unlike the other non-blocking approach, all operations are wait-free and bounds-checked and elements are stored contiguously in memory.« less
Recent Advances in Preclinical Developments Using Adenovirus Hybrid Vectors.
Ehrke-Schulz, Eric; Zhang, Wenli; Gao, Jian; Ehrhardt, Anja
2017-10-01
Adenovirus (Ad)-based vectors are efficient gene-transfer vehicles to deliver foreign DNA into living organisms, offering large cargo capacity and low immunogenicity and genotoxicity. As Ad shows low integration rates of their genomes into host chromosomes, vector-derived gene expression decreases due to continuous cell cycling in regenerating tissues and dividing cell populations. To overcome this hurdle, adenoviral delivery can be combined with mechanisms leading to maintenance of therapeutic DNA and long-term effects of the desired treatment. Several hybrid Ad vectors (AdV) exploiting various strategies for long-term treatment have been developed and characterized. This review summarizes recent developments of preclinical approaches using hybrid AdVs utilizing either the Sleeping Beauty transposase system for somatic integration into host chromosomes or designer nucleases, including transcription activator-like effector nucleases and clustered regularly interspaced short palindromic repeats/CRISPR-associated protein-9 nuclease for permanent gene editing. Further options on how to optimize these vectors further are discussed, which may lead to future clinical applications of these versatile gene-therapy tools.
Ecological characteristics of Simulium breeding sites in West Africa.
Cheke, Robert A; Young, Stephen; Garms, Rolf
2017-03-01
Twenty-nine taxa of Simulium were identified amongst 527 collections of larvae and pupae from untreated rivers and streams in Liberia (362 collections in 1967-71 & 1989), Togo (125 in 1979-81), Benin (35 in 1979-81) and Ghana (5 in 1980-81). Presence or absence of associations between different taxa were used to group them into six clusters using Ward agglomerative hierarchical cluster analysis. Environmental data associated with the pre-imaginal habitats were then analysed in relation to the six clusters by one way ANOVA. The results revealed significant effects in determining the clusters of maximum river width (all P<0.001 unless stated otherwise), water temperature, dry bulb air temperature, relative humidity, altitude, type of water (on a range from trickle to large river), water level, slope, current, vegetation, light conditions, discharge, length of breeding area, environs, terrain, river bed type (P<0.01), and the supports to which the insects were attached (P<0.01). When four non-significant contributors (wet bulb temperature, river features, height of waterfall and depth) were excluded and the reduced data-set analysed by principal components analysis (PCA), the first two principal components (PCs) accounted for 87% of the variance, with geographical features dominant in PC1 and hydrological characteristics in PC2. The analyses also revealed the ecological characteristics of each taxon's pre-imaginal habitats, which are discussed with particular reference to members of the Simulium damnosum species complex, whose breeding site distributions were further analysed by canonical correspondence analysis (CCA), a method also applied to the data on non-vector species. Copyright © 2016 Elsevier B.V. All rights reserved.
Nagaoka, Tomoaki; Watanabe, Soichi
2012-01-01
Electromagnetic simulation with anatomically realistic computational human model using the finite-difference time domain (FDTD) method has recently been performed in a number of fields in biomedical engineering. To improve the method's calculation speed and realize large-scale computing with the computational human model, we adapt three-dimensional FDTD code to a multi-GPU cluster environment with Compute Unified Device Architecture and Message Passing Interface. Our multi-GPU cluster system consists of three nodes. The seven GPU boards (NVIDIA Tesla C2070) are mounted on each node. We examined the performance of the FDTD calculation on multi-GPU cluster environment. We confirmed that the FDTD calculation on the multi-GPU clusters is faster than that on a multi-GPU (a single workstation), and we also found that the GPU cluster system calculate faster than a vector supercomputer. In addition, our GPU cluster system allowed us to perform the large-scale FDTD calculation because were able to use GPU memory of over 100 GB.
Data-driven cluster reinforcement and visualization in sparsely-matched self-organizing maps.
Manukyan, Narine; Eppstein, Margaret J; Rizzo, Donna M
2012-05-01
A self-organizing map (SOM) is a self-organized projection of high-dimensional data onto a typically 2-dimensional (2-D) feature map, wherein vector similarity is implicitly translated into topological closeness in the 2-D projection. However, when there are more neurons than input patterns, it can be challenging to interpret the results, due to diffuse cluster boundaries and limitations of current methods for displaying interneuron distances. In this brief, we introduce a new cluster reinforcement (CR) phase for sparsely-matched SOMs. The CR phase amplifies within-cluster similarity in an unsupervised, data-driven manner. Discontinuities in the resulting map correspond to between-cluster distances and are stored in a boundary (B) matrix. We describe a new hierarchical visualization of cluster boundaries displayed directly on feature maps, which requires no further clustering beyond what was implicitly accomplished during self-organization in SOM training. We use a synthetic benchmark problem and previously published microbial community profile data to demonstrate the benefits of the proposed methods.
Slope angle estimation method based on sparse subspace clustering for probe safe landing
NASA Astrophysics Data System (ADS)
Li, Haibo; Cao, Yunfeng; Ding, Meng; Zhuang, Likui
2018-06-01
To avoid planetary probes landing on steep slopes where they may slip or tip over, a new method of slope angle estimation based on sparse subspace clustering is proposed to improve accuracy. First, a coordinate system is defined and established to describe the measured data of light detection and ranging (LIDAR). Second, this data is processed and expressed with a sparse representation. Third, on this basis, the data is made to cluster to determine which subspace it belongs to. Fourth, eliminating outliers in subspace, the correct data points are used for the fitting planes. Finally, the vectors normal to the planes are obtained using the plane model, and the angle between the normal vectors is obtained through calculation. Based on the geometric relationship, this angle is equal in value to the slope angle. The proposed method was tested in a series of experiments. The experimental results show that this method can effectively estimate the slope angle, can overcome the influence of noise and obtain an exact slope angle. Compared with other methods, this method can minimize the measuring errors and further improve the estimation accuracy of the slope angle.
Li, Wu; Hu, Bing; Wang, Ming-wei
2014-12-01
In the present paper, the terahertz time-domain spectroscopy (THz-TDS) identification model of borneol based on principal component analysis (PCA) and support vector machine (SVM) was established. As one Chinese common agent, borneol needs a rapid, simple and accurate detection and identification method for its different source and being easily confused in the pharmaceutical and trade links. In order to assure the quality of borneol product and guard the consumer's right, quickly, efficiently and correctly identifying borneol has significant meaning to the production and transaction of borneol. Terahertz time-domain spectroscopy is a new spectroscopy approach to characterize material using terahertz pulse. The absorption terahertz spectra of blumea camphor, borneol camphor and synthetic borneol were measured in the range of 0.2 to 2 THz with the transmission THz-TDS. The PCA scores of 2D plots (PC1 X PC2) and 3D plots (PC1 X PC2 X PC3) of three kinds of borneol samples were obtained through PCA analysis, and both of them have good clustering effect on the 3 different kinds of borneol. The value matrix of the first 10 principal components (PCs) was used to replace the original spectrum data, and the 60 samples of the three kinds of borneol were trained and then the unknown 60 samples were identified. Four kinds of support vector machine model of different kernel functions were set up in this way. Results show that the accuracy of identification and classification of SVM RBF kernel function for three kinds of borneol is 100%, and we selected the SVM with the radial basis kernel function to establish the borneol identification model, in addition, in the noisy case, the classification accuracy rates of four SVM kernel function are above 85%, and this indicates that SVM has strong generalization ability. This study shows that PCA with SVM method of borneol terahertz spectroscopy has good classification and identification effects, and provides a new method for species identification of borneol in Chinese medicine.
Jara, Rocio F.; Wydeven, Adrian P.; Samuel, Michael D.
2016-01-01
World-wide concern over emerging vector-borne diseases has increased in recent years for both animal and human health. In the United Sates, concern about vector-borne diseases in canines has focused on Lyme disease, anaplasmosis, ehrlichiosis, and heartworm which infect domestic and wild canids. Of these diseases, Lyme and anaplasmosis are also frequently diagnosed in humans. Gray wolves (Canis lupus) recolonized Wisconsin in the 1970s, and we evaluated their temporal and geographic patterns of exposure to these four vector-borne diseases in Wisconsin as the population expanded between 1985 and 2011. A high proportion of the Wisconsin wolves were exposed to the agents that cause Lyme (65.6%) and anaplasma (47.7%), and a smaller proportion to ehrlichiosis (5.7%) and infected with heartworm (9.2%). Wolf exposure to tick borne diseases was consistently higher in older animals. Wolf exposure was markedly higher than domestic dog (Canis familiaris) exposure for all 4 disease agents during 2001–2013. We found a cluster of wolf exposure to Borrelia burgdorferi in northwestern Wisconsin, which overlaps human and domestic dog clusters for the same pathogen. In addition, wolf exposure to Lyme disease in Wisconsin has increased, corresponding with the increasing human incidence of Lyme disease in a similar time period. Despite generally high prevalence of exposure none of these diseases appear to have slowed the growth of the Wisconsin wolf population. PMID:27898670
Chemical reaction vector embeddings: towards predicting drug metabolism in the human gut microbiome.
Mallory, Emily K; Acharya, Ambika; Rensi, Stefano E; Turnbaugh, Peter J; Bright, Roselie A; Altman, Russ B
2018-01-01
Bacteria in the human gut have the ability to activate, inactivate, and reactivate drugs with both intended and unintended effects. For example, the drug digoxin is reduced to the inactive metabolite dihydrodigoxin by the gut Actinobacterium E. lenta, and patients colonized with high levels of drug metabolizing strains may have limited response to the drug. Understanding the complete space of drugs that are metabolized by the human gut microbiome is critical for predicting bacteria-drug relationships and their effects on individual patient response. Discovery and validation of drug metabolism via bacterial enzymes has yielded >50 drugs after nearly a century of experimental research. However, there are limited computational tools for screening drugs for potential metabolism by the gut microbiome. We developed a pipeline for comparing and characterizing chemical transformations using continuous vector representations of molecular structure learned using unsupervised representation learning. We applied this pipeline to chemical reaction data from MetaCyc to characterize the utility of vector representations for chemical reaction transformations. After clustering molecular and reaction vectors, we performed enrichment analyses and queries to characterize the space. We detected enriched enzyme names, Gene Ontology terms, and Enzyme Consortium (EC) classes within reaction clusters. In addition, we queried reactions against drug-metabolite transformations known to be metabolized by the human gut microbiome. The top results for these known drug transformations contained similar substructure modifications to the original drug pair. This work enables high throughput screening of drugs and their resulting metabolites against chemical reactions common to gut bacteria.
Method for indexing and retrieving manufacturing-specific digital imagery based on image content
Ferrell, Regina K.; Karnowski, Thomas P.; Tobin, Jr., Kenneth W.
2004-06-15
A method for indexing and retrieving manufacturing-specific digital images based on image content comprises three steps. First, at least one feature vector can be extracted from a manufacturing-specific digital image stored in an image database. In particular, each extracted feature vector corresponds to a particular characteristic of the manufacturing-specific digital image, for instance, a digital image modality and overall characteristic, a substrate/background characteristic, and an anomaly/defect characteristic. Notably, the extracting step includes generating a defect mask using a detection process. Second, using an unsupervised clustering method, each extracted feature vector can be indexed in a hierarchical search tree. Third, a manufacturing-specific digital image associated with a feature vector stored in the hierarchicial search tree can be retrieved, wherein the manufacturing-specific digital image has image content comparably related to the image content of the query image. More particularly, can include two data reductions, the first performed based upon a query vector extracted from a query image. Subsequently, a user can select relevant images resulting from the first data reduction. From the selection, a prototype vector can be calculated, from which a second-level data reduction can be performed. The second-level data reduction can result in a subset of feature vectors comparable to the prototype vector, and further comparable to the query vector. An additional fourth step can include managing the hierarchical search tree by substituting a vector average for several redundant feature vectors encapsulated by nodes in the hierarchical search tree.
Synchronized changes to relative neuron populations in postnatal human neocortical development
Cooper, David L.; Gentle, James E.; Barreto, Ernest
2010-01-01
Mammalian prenatal neocortical development is dominated by the synchronized formation of the laminae and migration of neurons. Postnatal development likewise contains “sensitive periods” during which functions such as ocular dominance emerge. Here we introduce a novel neuroinformatics approach to identify and study these periods of active development. Although many aspects of the approach can be used in other studies, some specific techniques were chosen because of a legacy dataset of human histological data (Conel in The postnatal development of the human cerebral cortex, vol 1–8. Harvard University Press, Cambridge, 1939–1967). Our method calculates normalized change vectors from the raw histological data, and then employs k-means cluster analysis of the change vectors to explore the population dynamics of neurons from 37 neocortical areas across eight postnatal developmental stages from birth to 72 months in 54 subjects. We show that the cortical “address” (Brodmann area/sub-area and layer) provides the necessary resolution to segregate neuron population changes into seven correlated “k-clusters” in k-means cluster analysis. The members in each k-cluster share a single change interval where the relative share of the cortex by the members undergoes its maximum change. The maximum change occurs in a different change interval for each k-cluster. Each k-cluster has at least one totally connected maximal “clique” which appears to correspond to cortical function. Electronic supplementary material The online version of this article (doi:10.1007/s11571-010-9103-3) contains supplementary material, which is available to authorized users. PMID:21629587
NASA Astrophysics Data System (ADS)
Li, Tao
2018-06-01
The complexity of aluminum electrolysis process leads the temperature for aluminum reduction cells hard to measure directly. However, temperature is the control center of aluminum production. To solve this problem, combining some aluminum plant's practice data, this paper presents a Soft-sensing model of temperature for aluminum electrolysis process on Improved Twin Support Vector Regression (ITSVR). ITSVR eliminates the slow learning speed of Support Vector Regression (SVR) and the over-fit risk of Twin Support Vector Regression (TSVR) by introducing a regularization term into the objective function of TSVR, which ensures the structural risk minimization principle and lower computational complexity. Finally, the model with some other parameters as auxiliary variable, predicts the temperature by ITSVR. The simulation result shows Soft-sensing model based on ITSVR has short time-consuming and better generalization.
Spin Vector Distribution in the Koronis Family for a Sample Complete to IAU H=10.88
NASA Astrophysics Data System (ADS)
Slivan, Stephen M.; Hosek, Matt; Sokol, Alyssa; Maynard, Sarah; Payne, Anna; Radford, Arden; Springmann, Alessondra; Mailhot, Emily; Midkiff, Alan; Russell, April; Stephens, Robert D.
2016-10-01
Because they share the same formation age, asteroid family members have experienced similar evolution for similar lengths of time, offering valuable information to help understand spin evolution processes. Clustered distributions of spin vectors determined from observations of ten of the largest Koronis family members (Slivan 2002) revealed evidence of spin modification by YORP thermal radiation torques (Vokrouhlický et al. 2003). The currently known spin vector sample in the Koronis family (Slivan et al., 2003; Slivan et al., 2009, Hanuš et al., 2011; Hanuš et al., 2013; Durech et al., 2016) clearly shows the two spin groupings observed among the large members: (1) the larger group with low-obliquity retrograde spin and periods between about 3 h and 30 h, and (2) a smaller group with prograde spin obliquity near 45° and periods near 8 h, characteristic of trapping in the s6 spin-orbit resonance (Vokrouhlický et al. 2003). There's also one "stray" longer-period prograde object with smaller obliquity, perhaps trapped in some other resonance.A limitation of the existing spin vector sample, which (using IAU H as a proxy for size) includes 16 of the brightest 27 members of the family, is that selection biases render it complete only to the brightest 12 members. Slivan et al. (2008) began a lightcurve observing program to increase the sample of Koronis family spin vectors down to about 20 km diameter.We report pole solutions that were determined for fourteen survey objects using lightcurves recorded from 2005-2016, which complete the Koronis spin vector sample to the brightest 22 members, now including 24 of the brightest 27 members. The larger sample adds several objects to the existing group of low-obliquity retrograde rotators, increasing the period range upward to almost 60 h, and also identifies two companions for the stray longer-period prograde spin object, strengthening the case for the presence of a second cluster of objects trapped in a spin-orbit resonance. The more complete distribution also reveals two new "strays" of its own - one lone fast prograde rotator, and one spin vector of atypical high obliquity, close to the ecliptic plane.
Dynamic competitive probabilistic principal components analysis.
López-Rubio, Ezequiel; Ortiz-DE-Lazcano-Lobato, Juan Miguel
2009-04-01
We present a new neural model which extends the classical competitive learning (CL) by performing a Probabilistic Principal Components Analysis (PPCA) at each neuron. The model also has the ability to learn the number of basis vectors required to represent the principal directions of each cluster, so it overcomes a drawback of most local PCA models, where the dimensionality of a cluster must be fixed a priori. Experimental results are presented to show the performance of the network with multispectral image data.
Community-led trials: Intervention co-design in a cluster randomised controlled trial.
Andersson, Neil
2017-05-30
In conventional randomised controlled trials (RCTs), researchers design the interventions. In the Camino Verde trial, each intervention community designed its own programmes to prevent dengue. Instead of fixed actions or menus of activities to choose from, the trial randomised clusters to a participatory research protocol that began with sharing and discussing evidence from a local survey, going on to local authorship of the action plan for vector control.Adding equitable stakeholder engagement to RCT infrastructure anchors the research culturally, making it more meaningful to stakeholders. Replicability in other conditions is straightforward, since all intervention clusters used the same engagement protocol to discuss and to mobilize for dengue prevention. The ethical codes associated with RCTs play out differently in community-led pragmatic trials, where communities essentially choose what they want to do. Several discussion groups in each intervention community produced multiple plans for prevention, recognising different time lines. Some chose fast turnarounds, like elimination of breeding sites, and some chose longer term actions like garbage disposal and improving water supplies.A big part of the skill set for community-led trials is being able to stand back and simply support communities in what they want to do and how they want to do it, something that does not come naturally to many vector control programs or to RCT researchers. Unexpected negative outcomes can come from the turbulence implicit in participatory research. One example was the gender dynamic in the Mexican arm of the Camino Verde trial. Strong involvement of women in dengue control activities seems to have discouraged men in settings where activity in public spaces or outside of the home would ordinarily be considered a "male competence".Community-led trials address the tension between one-size-fits-all programme interventions and local needs. Whatever the conventional wisdom about how prevention works at a system level, programmes have to be perceived as locally relevant and they must engage stakeholders who make them work. Locally, each participating community has to know the intervention is relevant to them; they have to want to do it. That happens much more easily if they design the programme themselves.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Feldman, Steven; Valera-Leon, Carlos; Dechev, Damian
The vector is a fundamental data structure, which provides constant-time access to a dynamically-resizable range of elements. Currently, there exist no wait-free vectors. The only non-blocking version supports only a subset of the sequential vector API and exhibits significant synchronization overhead caused by supporting opposing operations. Since many applications operate in phases of execution, wherein each phase only a subset of operations are used, this overhead is unnecessary for the majority of the application. To address the limitations of the non-blocking version, we present a new design that is wait-free, supports more of the operations provided by the sequential vector,more » and provides alternative implementations of key operations. These alternatives allow the developer to balance the performance and functionality of the vector as requirements change throughout execution. Compared to the known non-blocking version and the concurrent vector found in Intel’s TBB library, our design outperforms or provides comparable performance in the majority of tested scenarios. Over all tested scenarios, the presented design performs an average of 4.97 times more operations per second than the non-blocking vector and 1.54 more than the TBB vector. In a scenario designed to simulate the filling of a vector, performance improvement increases to 13.38 and 1.16 times. This work presents the first ABA-free non-blocking vector. Finally, unlike the other non-blocking approach, all operations are wait-free and bounds-checked and elements are stored contiguously in memory.« less
Xie, Hong-Bo; Huang, Hu; Wu, Jianhua; Liu, Lei
2015-02-01
We present a multiclass fuzzy relevance vector machine (FRVM) learning mechanism and evaluate its performance to classify multiple hand motions using surface electromyographic (sEMG) signals. The relevance vector machine (RVM) is a sparse Bayesian kernel method which avoids some limitations of the support vector machine (SVM). However, RVM still suffers the difficulty of possible unclassifiable regions in multiclass problems. We propose two fuzzy membership function-based FRVM algorithms to solve such problems, based on experiments conducted on seven healthy subjects and two amputees with six hand motions. Two feature sets, namely, AR model coefficients and room mean square value (AR-RMS), and wavelet transform (WT) features, are extracted from the recorded sEMG signals. Fuzzy support vector machine (FSVM) analysis was also conducted for wide comparison in terms of accuracy, sparsity, training and testing time, as well as the effect of training sample sizes. FRVM yielded comparable classification accuracy with dramatically fewer support vectors in comparison with FSVM. Furthermore, the processing delay of FRVM was much less than that of FSVM, whilst training time of FSVM much faster than FRVM. The results indicate that FRVM classifier trained using sufficient samples can achieve comparable generalization capability as FSVM with significant sparsity in multi-channel sEMG classification, which is more suitable for sEMG-based real-time control applications.
NASA Astrophysics Data System (ADS)
Peng, Chong; Wang, Lun; Liao, T. Warren
2015-10-01
Currently, chatter has become the critical factor in hindering machining quality and productivity in machining processes. To avoid cutting chatter, a new method based on dynamic cutting force simulation model and support vector machine (SVM) is presented for the prediction of chatter stability lobes. The cutting force is selected as the monitoring signal, and the wavelet energy entropy theory is used to extract the feature vectors. A support vector machine is constructed using the MATLAB LIBSVM toolbox for pattern classification based on the feature vectors derived from the experimental cutting data. Then combining with the dynamic cutting force simulation model, the stability lobes diagram (SLD) can be estimated. Finally, the predicted results are compared with existing methods such as zero-order analytical (ZOA) and semi-discretization (SD) method as well as actual cutting experimental results to confirm the validity of this new method.
NASA Technical Reports Server (NTRS)
Chen, D. W.; Sengupta, S. K.; Welch, R. M.
1989-01-01
This paper compares the results of cloud-field classification derived from two simplified vector approaches, the Sum and Difference Histogram (SADH) and the Gray Level Difference Vector (GLDV), with the results produced by the Gray Level Cooccurrence Matrix (GLCM) approach described by Welch et al. (1988). It is shown that the SADH method produces accuracies equivalent to those obtained using the GLCM method, while the GLDV method fails to resolve error clusters. Compared to the GLCM method, the SADH method leads to a 31 percent saving in run time and a 50 percent saving in storage requirements, while the GLVD approach leads to a 40 percent saving in run time and an 87 percent saving in storage requirements.
Buried landmine detection using multivariate normal clustering
NASA Astrophysics Data System (ADS)
Duston, Brian M.
2001-10-01
A Bayesian classification algorithm is presented for discriminating buried land mines from buried and surface clutter in Ground Penetrating Radar (GPR) signals. This algorithm is based on multivariate normal (MVN) clustering, where feature vectors are used to identify populations (clusters) of mines and clutter objects. The features are extracted from two-dimensional images created from ground penetrating radar scans. MVN clustering is used to determine the number of clusters in the data and to create probability density models for target and clutter populations, producing the MVN clustering classifier (MVNCC). The Bayesian Information Criteria (BIC) is used to evaluate each model to determine the number of clusters in the data. An extension of the MVNCC allows the model to adapt to local clutter distributions by treating each of the MVN cluster components as a Poisson process and adaptively estimating the intensity parameters. The algorithm is developed using data collected by the Mine Hunter/Killer Close-In Detector (MH/K CID) at prepared mine lanes. The Mine Hunter/Killer is a prototype mine detecting and neutralizing vehicle developed for the U.S. Army to clear roads of anti-tank mines.
Coherent clusters of inertial particles in homogeneous turbulence
NASA Astrophysics Data System (ADS)
Baker, Lucia; Frankel, Ari; Mani, Ali; Coletti, Filippo
2016-11-01
Clustering of heavy particles in turbulent flows manifests itself in a broad spectrum of physical phenomena, including sediment transport, cloud formation, and spray combustion. However, a clear topological definition of particle cluster has been lacking, limiting our ability to describe their features and dynamics. Here we introduce a definition of coherent cluster based on self-similarity, and apply it to the distribution of heavy particles in direct numerical simulations of homogeneous isotropic turbulence. We consider a range of particle Stokes numbers, with and without the effect of gravity. Clusters show self-similarity at length scales larger than twice the Kolmogorov length, with a specific fractal dimension. In the absence of gravity, clusters demonstrate a tendency to sample regions of the flow where strain is dominant over vorticity, and to align themselves with the local vorticity vector; when gravity is present, the clusters tend to align themselves with gravity, and their fall speed is different from the average settling velocity. This approach yields observations which are consistent with findings obtained from previous studies while opening new avenues for analysis of the topology and evolution of particle clusters in a wealth of applications.
Scarpassa, Vera Margarete; Cunha-Machado, Antonio Saulo; Saraiva, José Ferreira
2016-04-12
Anopheles nuneztovari sensu lato comprises cryptic species in northern South America, and the Brazilian populations encompass distinct genetic lineages within the Brazilian Amazon region. This study investigated, based on two molecular markers, whether these lineages might actually deserve species status. Specimens were collected in five localities of the Brazilian Amazon, including Manaus, Careiro Castanho and Autazes, in the State of Amazonas; Tucuruí, in the State of Pará; and Abacate da Pedreira, in the State of Amapá, and analysed for the COI gene (Barcode region) and 12 microsatellite loci. Phylogenetic analyses were performed using the maximum likelihood (ML) approach. Intra and inter samples genetic diversity were estimated using population genetics analyses, and the genetic groups were identified by means of the ML, Bayesian and factorial correspondence analyses and the Bayesian analysis of population structure. The Barcode region dataset (N = 103) generated 27 haplotypes. The haplotype network suggested three lineages. The ML tree retrieved five monophyletic groups. Group I clustered all specimens from Manaus and Careiro Castanho, the majority of Autazes and a few from Abacate da Pedreira. Group II clustered most of the specimens from Abacate da Pedreira and a few from Autazes and Tucuruí. Group III clustered only specimens from Tucuruí (lineage III), strongly supported (97 %). Groups IV and V clustered specimens of A. nuneztovari s.s. and A. dunhami, strongly (98 %) and weakly (70 %) supported, respectively. In the second phylogenetic analysis, the sequences from GenBank, identified as A. goeldii, clustered to groups I and II, but not to group III. Genetic distances (Kimura-2 parameters) among the groups ranged from 1.60 % (between I and II) to 2.32 % (between I and III). Microsatellite data revealed very high intra-population genetic variability. Genetic distances showed the highest and significant values (P = 0.005) between Tucuruí and all the other samples, and between Abacate da Pedreira and all the other samples. Genetic distances, Bayesian (Structure and BAPS) analyses and FCA suggested three distinct biological groups, supporting the barcode region results. The two markers revealed three genetic lineages for A. nuneztovari s.l. in the Brazilian Amazon region. Lineages I and II may represent genetically distinct groups or species within A. goeldii. Lineage III may represent a new species, distinct from the A. goeldii group, and may be the most ancestral in the Brazilian Amazon. They may have differences in Plasmodium susceptibility and should therefore be investigated further.
NASA Astrophysics Data System (ADS)
B. Shokouhi, Shahriar; Fooladivanda, Aida; Ahmadinejad, Nasrin
2017-12-01
A computer-aided detection (CAD) system is introduced in this paper for detection of breast lesions in dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI). The proposed CAD system firstly compensates motion artifacts and segments the breast region. Then, the potential lesion voxels are detected and used as the initial seed points for the seeded region-growing algorithm. A new and robust region-growing algorithm incorporating with Fuzzy C-means (FCM) clustering and vesselness filter is proposed to segment any potential lesion regions. Subsequently, the false positive detections are reduced by applying a discrimination step. This is based on 3D morphological characteristics of the potential lesion regions and kinetic features which are fed to the support vector machine (SVM) classifier. The performance of the proposed CAD system is evaluated using the free-response operating characteristic (FROC) curve. We introduce our collected dataset that includes 76 DCE-MRI studies, 63 malignant and 107 benign lesions. The prepared dataset has been used to verify the accuracy of the proposed CAD system. At 5.29 false positives per case, the CAD system accurately detects 94% of the breast lesions.
Su, Jin-He; Piao, Ying-Chao; Luo, Ze; Yan, Bao-Ping
2018-04-26
With the application of various data acquisition devices, a large number of animal movement data can be used to label presence data in remote sensing images and predict species distribution. In this paper, a two-stage classification approach for combining movement data and moderate-resolution remote sensing images was proposed. First, we introduced a new density-based clustering method to identify stopovers from migratory birds’ movement data and generated classification samples based on the clustering result. We split the remote sensing images into 16 × 16 patches and labeled them as positive samples if they have overlap with stopovers. Second, a multi-convolution neural network model is proposed for extracting the features from temperature data and remote sensing images, respectively. Then a Support Vector Machines (SVM) model was used to combine the features together and predict classification results eventually. The experimental analysis was carried out on public Landsat 5 TM images and a GPS dataset was collected on 29 birds over three years. The results indicated that our proposed method outperforms the existing baseline methods and was able to achieve good performance in habitat suitability prediction.
NASA Astrophysics Data System (ADS)
Usacheva, T. M.; Zhuravlev, V. I.
2013-03-01
Dielectric radiospectra (DRS) of 2,5-hexanediol and 1,2,6-hexanetriol at frequencies of 1 MHz, 9.375, 36.885, and 74.569 GHz in a temperature range of 303-423 K (above the glass transition temperatures) are studied. Experimental DRS are analyzed using the Dissado-Hill (DH) cluster model. The dependence of the equilibrium and relaxation characteristics of DRS on the number of OH groups is studied. The dipole moments of the clusters are calculated. The change in the orientation of the dipole moments of the molecules in the cluster during the rearranging of its structure is characterized through the unit vector of the longitudinal component of dipole moment M e of the cluster. The relation between a change in the Onsager-Kirkwood-Fröhlich correlation factor and the behavior of M e is shown.
Utilizing the Structure and Content Information for XML Document Clustering
NASA Astrophysics Data System (ADS)
Tran, Tien; Kutty, Sangeetha; Nayak, Richi
This paper reports on the experiments and results of a clustering approach used in the INEX 2008 document mining challenge. The clustering approach utilizes both the structure and content information of the Wikipedia XML document collection. A latent semantic kernel (LSK) is used to measure the semantic similarity between XML documents based on their content features. The construction of a latent semantic kernel involves the computing of singular vector decomposition (SVD). On a large feature space matrix, the computation of SVD is very expensive in terms of time and memory requirements. Thus in this clustering approach, the dimension of the document space of a term-document matrix is reduced before performing SVD. The document space reduction is based on the common structural information of the Wikipedia XML document collection. The proposed clustering approach has shown to be effective on the Wikipedia collection in the INEX 2008 document mining challenge.
A Wavelet Support Vector Machine Combination Model for Singapore Tourist Arrival to Malaysia
NASA Astrophysics Data System (ADS)
Rafidah, A.; Shabri, Ani; Nurulhuda, A.; Suhaila, Y.
2017-08-01
In this study, wavelet support vector machine model (WSVM) is proposed and applied for monthly data Singapore tourist time series prediction. The WSVM model is combination between wavelet analysis and support vector machine (SVM). In this study, we have two parts, first part we compare between the kernel function and second part we compare between the developed models with single model, SVM. The result showed that kernel function linear better than RBF while WSVM outperform with single model SVM to forecast monthly Singapore tourist arrival to Malaysia.
Bisenius, Sandrine; Mueller, Karsten; Diehl-Schmid, Janine; Fassbender, Klaus; Grimmer, Timo; Jessen, Frank; Kassubek, Jan; Kornhuber, Johannes; Landwehrmeyer, Bernhard; Ludolph, Albert; Schneider, Anja; Anderl-Straub, Sarah; Stuke, Katharina; Danek, Adrian; Otto, Markus; Schroeter, Matthias L
2017-01-01
Primary progressive aphasia (PPA) encompasses the three subtypes nonfluent/agrammatic variant PPA, semantic variant PPA, and the logopenic variant PPA, which are characterized by distinct patterns of language difficulties and regional brain atrophy. To validate the potential of structural magnetic resonance imaging data for early individual diagnosis, we used support vector machine classification on grey matter density maps obtained by voxel-based morphometry analysis to discriminate PPA subtypes (44 patients: 16 nonfluent/agrammatic variant PPA, 17 semantic variant PPA, 11 logopenic variant PPA) from 20 healthy controls (matched for sample size, age, and gender) in the cohort of the multi-center study of the German consortium for frontotemporal lobar degeneration. Here, we compared a whole-brain with a meta-analysis-based disease-specific regions-of-interest approach for support vector machine classification. We also used support vector machine classification to discriminate the three PPA subtypes from each other. Whole brain support vector machine classification enabled a very high accuracy between 91 and 97% for identifying specific PPA subtypes vs. healthy controls, and 78/95% for the discrimination between semantic variant vs. nonfluent/agrammatic or logopenic PPA variants. Only for the discrimination between nonfluent/agrammatic and logopenic PPA variants accuracy was low with 55%. Interestingly, the regions that contributed the most to the support vector machine classification of patients corresponded largely to the regions that were atrophic in these patients as revealed by group comparisons. Although the whole brain approach took also into account regions that were not covered in the regions-of-interest approach, both approaches showed similar accuracies due to the disease-specificity of the selected networks. Conclusion, support vector machine classification of multi-center structural magnetic resonance imaging data enables prediction of PPA subtypes with a very high accuracy paving the road for its application in clinical settings.
NASA Technical Reports Server (NTRS)
Gramenopoulos, N. (Principal Investigator)
1974-01-01
The author has identified the following significant results. A diffraction pattern analysis of MSS images led to the development of spatial signatures for farm land, urban areas and mountains. Four spatial features are employed to describe the spatial characteristics of image cells in the digital data. Three spectral features are combined with the spatial features to form a seven dimensional vector describing each cell. Then, the classification of the feature vectors is accomplished by using the maximum likelihood criterion. It was determined that the recognition accuracy with the maximum likelihood criterion depends on the statistics of the feature vectors. It was also determined that for a given geographic area the statistics of the classes remain invariable for a period of a month, but vary substantially between seasons. Three ERTS-1 images from the Phoenix, Arizona area were processed, and recognition rates between 85% and 100% were obtained for the terrain classes of desert, farms, mountains, and urban areas. To eliminate the need for training data, a new clustering algorithm has been developed. Seven ERTS-1 images from four test sites have been processed through the clustering algorithm, and high recognition rates have been achieved for all terrain classes.
A Fast Reduced Kernel Extreme Learning Machine.
Deng, Wan-Yu; Ong, Yew-Soon; Zheng, Qing-Hua
2016-04-01
In this paper, we present a fast and accurate kernel-based supervised algorithm referred to as the Reduced Kernel Extreme Learning Machine (RKELM). In contrast to the work on Support Vector Machine (SVM) or Least Square SVM (LS-SVM), which identifies the support vectors or weight vectors iteratively, the proposed RKELM randomly selects a subset of the available data samples as support vectors (or mapping samples). By avoiding the iterative steps of SVM, significant cost savings in the training process can be readily attained, especially on Big datasets. RKELM is established based on the rigorous proof of universal learning involving reduced kernel-based SLFN. In particular, we prove that RKELM can approximate any nonlinear functions accurately under the condition of support vectors sufficiency. Experimental results on a wide variety of real world small instance size and large instance size applications in the context of binary classification, multi-class problem and regression are then reported to show that RKELM can perform at competitive level of generalized performance as the SVM/LS-SVM at only a fraction of the computational effort incurred. Copyright © 2015 Elsevier Ltd. All rights reserved.
NASA Technical Reports Server (NTRS)
Garay, Michael J.; Mazzoni, Dominic; Davies, Roger; Wagstaff, Kiri
2004-01-01
Support Vector Machines (SVMs) are a type of supervised learning algorith,, other examples of which are Artificial Neural Networks (ANNs), Decision Trees, and Naive Bayesian Classifiers. Supervised learning algorithms are used to classify objects labled by a 'supervisor' - typically a human 'expert.'.
Lysine acetylation sites prediction using an ensemble of support vector machine classifiers.
Xu, Yan; Wang, Xiao-Bo; Ding, Jun; Wu, Ling-Yun; Deng, Nai-Yang
2010-05-07
Lysine acetylation is an essentially reversible and high regulated post-translational modification which regulates diverse protein properties. Experimental identification of acetylation sites is laborious and expensive. Hence, there is significant interest in the development of computational methods for reliable prediction of acetylation sites from amino acid sequences. In this paper we use an ensemble of support vector machine classifiers to perform this work. The experimentally determined acetylation lysine sites are extracted from Swiss-Prot database and scientific literatures. Experiment results show that an ensemble of support vector machine classifiers outperforms single support vector machine classifier and other computational methods such as PAIL and LysAcet on the problem of predicting acetylation lysine sites. The resulting method has been implemented in EnsemblePail, a web server for lysine acetylation sites prediction available at http://www.aporc.org/EnsemblePail/. Copyright (c) 2010 Elsevier Ltd. All rights reserved.
Product Quality Modelling Based on Incremental Support Vector Machine
NASA Astrophysics Data System (ADS)
Wang, J.; Zhang, W.; Qin, B.; Shi, W.
2012-05-01
Incremental Support vector machine (ISVM) is a new learning method developed in recent years based on the foundations of statistical learning theory. It is suitable for the problem of sequentially arriving field data and has been widely used for product quality prediction and production process optimization. However, the traditional ISVM learning does not consider the quality of the incremental data which may contain noise and redundant data; it will affect the learning speed and accuracy to a great extent. In order to improve SVM training speed and accuracy, a modified incremental support vector machine (MISVM) is proposed in this paper. Firstly, the margin vectors are extracted according to the Karush-Kuhn-Tucker (KKT) condition; then the distance from the margin vectors to the final decision hyperplane is calculated to evaluate the importance of margin vectors, where the margin vectors are removed while their distance exceed the specified value; finally, the original SVs and remaining margin vectors are used to update the SVM. The proposed MISVM can not only eliminate the unimportant samples such as noise samples, but also can preserve the important samples. The MISVM has been experimented on two public data and one field data of zinc coating weight in strip hot-dip galvanizing, and the results shows that the proposed method can improve the prediction accuracy and the training speed effectively. Furthermore, it can provide the necessary decision supports and analysis tools for auto control of product quality, and also can extend to other process industries, such as chemical process and manufacturing process.
Multiscale limited penetrable horizontal visibility graph for analyzing nonlinear time series
NASA Astrophysics Data System (ADS)
Gao, Zhong-Ke; Cai, Qing; Yang, Yu-Xuan; Dang, Wei-Dong; Zhang, Shan-Shan
2016-10-01
Visibility graph has established itself as a powerful tool for analyzing time series. We in this paper develop a novel multiscale limited penetrable horizontal visibility graph (MLPHVG). We use nonlinear time series from two typical complex systems, i.e., EEG signals and two-phase flow signals, to demonstrate the effectiveness of our method. Combining MLPHVG and support vector machine, we detect epileptic seizures from the EEG signals recorded from healthy subjects and epilepsy patients and the classification accuracy is 100%. In addition, we derive MLPHVGs from oil-water two-phase flow signals and find that the average clustering coefficient at different scales allows faithfully identifying and characterizing three typical oil-water flow patterns. These findings render our MLPHVG method particularly useful for analyzing nonlinear time series from the perspective of multiscale network analysis.
Chen, Gila; Elisha, Ety; Timor, Uri; Ronel, Natti
2013-11-01
A qualitative phenomenological study of parents of addicted male adolescents who were residents of a Jewish therapeutic community (TC) describes and interprets the parents' perceptions of the recovery process. Deep, semistructured interviews with 14 parents provided the data. The parents' perceptions were clustered into three main themes of meaning: (a) the process of change, (b) the experiences of family members in the course of the son's recovery process, and (c) the parents' perception of the treatment at Retorno. According to the parents, the admission of their sons into the TC brought notable relief to the family life, which enabled the whole family to begin a recovery process. The findings support the positive criminology perspective that emphasizes the disintegration-integration vector as significant in the recovery process. Recommendations for intervention planning are provided.
Current status of genome editing in vector mosquitoes: A review.
Reegan, Appadurai Daniel; Ceasar, Stanislaus Antony; Paulraj, Michael Gabriel; Ignacimuthu, Savarimuthu; Al-Dhabi, Naif Abdullah
2017-01-16
Mosquitoes pose a major threat to human health as they spread many deadly diseases like malaria, dengue, chikungunya, filariasis, Japanese encephalitis and Zika. Identification and use of novel molecular tools are essential to combat the spread of vector borne diseases. Genome editing tools have been used for the precise alterations of the gene of interest for producing the desirable trait in mosquitoes. Deletion of functional genes or insertion of toxic genes in vector mosquitoes will produce either knock-out or knock-in mutants that will check the spread of vector-borne diseases. Presently, three types of genome editing tools viz., zinc finger nuclease (ZFN), transcription activator-like effector nucleases (TALEN) and clustered regulatory interspaced short palindromic repeats (CRISPR) and CRISPR associated protein 9 (Cas9) are widely used for the editing of the genomes of diverse organisms. These tools are also applied in vector mosquitoes to control the spread of vector-borne diseases. A few studies have been carried out on genome editing to control the diseases spread by vector mosquitoes and more studies need to be performed with the utilization of more recently invented tools like CRISPR/Cas9 to combat the spread of deadly diseases by vector mosquitoes. The high specificity and flexibility of CRISPR/Cas9 system may offer possibilities for novel genome editing for the control of important diseases spread by vector mosquitoes. In this review, we present the current status of genome editing research on vector mosquitoes and also discuss the future applications of vector mosquito genome editing to control the spread of vectorborne diseases.
NASA Astrophysics Data System (ADS)
Maleki, Farahnaz; Schlexer, Philomena; Pacchioni, Gianfranco
2018-02-01
Oxide-supported Cu nanoparticles and clusters catalyze a variety of important reactions, such as CO/CO2 hydrogenation to methanol. Recent studies demonstrate that also sub-nanometer clusters consisting of only a few atoms can actively catalyze chemical reactions. In this study, we investigate the interaction between Cu4 clusters and silica-surfaces, considering the de-hydroxylated and the fully hydroxylated α-quartz surfaces. We also considered various dopants such as Ti- and Nb-ions substitutional to Si, respectively, in order to see if an electronic change of the support has an effect on the reaction of the supported cluster. We find that hydroxyl groups can enhance the adsorption energy of the cluster, whereas the dopants have only little effects on the adsorption mode of the Cu cluster. On the fully hydroxylated surface, the cluster may react with the hydroxyl groups via reverse hydrogen spillover. Finally, we explore the reactivity of the silica-supported Cu4 cluster in terms of acetylene trimerization, for which extended Cu surfaces have shown catalytic activity. We find that this reaction should occur with activation barriers below 0.8 eV; Nb-doping of the support does not seem to produce any direct effect on the reactivity of the Cu tetramer.
DeGroote, John P; Sugumaran, Ramanathan; Ecker, Mark
2014-11-01
After several years of low West Nile virus (WNV) occurrence in the United States of America (USA), 2012 witnessed large outbreaks in several parts of the country. In order to understand the outbreak dynamics, spatial clustering and landscape, demographic and climatic associations with WNV occurrence were investigated at a regional level in the USA. Previous research has demonstrated that there are a handful of prominent WNV mosquito vectors with varying ecological requirements responsible for WNV transmission in the USA. Published range maps of these important vectors were georeferenced and used to define eight functional ecological regions in the coterminous USA. The number of human WNV cases and human populations by county were attained in order to calculate a WNV rate for each county in 2012. Additionally, a binary value (high/low) was calculated for each county based on whether the county WNV rate was above or below the rate for the region it fell in. Global Moran's I and Anselin Local Moran's I statistics of spatial association were used per region to examine and visualize clustering of the WNV rate and the high/low rating. Spatial data on landscape, demographic and climatic variables were compiled and derived from a variety of sources and then investigated in relation to human WNV using both Spearman rho correlation coefficients and Poisson regression models. Findings demonstrated significant spatial clustering of WNV and substantial inter-regional differences in relationships between WNV occurrence and landscape, demographic and climatically related variables. The regional associations were consistent with the ecologies of the dominant vectors for those regions. The large outbreak in the Southeast region was preceded by higher than normal winter and spring precipitation followed by dry and hot conditions in the summer.
Compute Server Performance Results
NASA Technical Reports Server (NTRS)
Stockdale, I. E.; Barton, John; Woodrow, Thomas (Technical Monitor)
1994-01-01
Parallel-vector supercomputers have been the workhorses of high performance computing. As expectations of future computing needs have risen faster than projected vector supercomputer performance, much work has been done investigating the feasibility of using Massively Parallel Processor systems as supercomputers. An even more recent development is the availability of high performance workstations which have the potential, when clustered together, to replace parallel-vector systems. We present a systematic comparison of floating point performance and price-performance for various compute server systems. A suite of highly vectorized programs was run on systems including traditional vector systems such as the Cray C90, and RISC workstations such as the IBM RS/6000 590 and the SGI R8000. The C90 system delivers 460 million floating point operations per second (FLOPS), the highest single processor rate of any vendor. However, if the price-performance ration (PPR) is considered to be most important, then the IBM and SGI processors are superior to the C90 processors. Even without code tuning, the IBM and SGI PPR's of 260 and 220 FLOPS per dollar exceed the C90 PPR of 160 FLOPS per dollar when running our highly vectorized suite,
Sheela, A M; Sarun, S; Justus, J; Vineetha, P; Sheeja, R V
2015-04-01
Vector borne diseases are a threat to human health. Little attention has been paid to the prevention of these diseases. We attempted to identify the significant wetland characteristics associated with the spread of chikungunya, dengue fever and malaria in Kerala, a tropical region of South West India using multivariate analyses (hierarchical cluster analysis, factor analysis and multiple regression). High/medium turbid coastal lagoons and inland water-logged wetlands with aquatic vegetation have significant effect on the incidence of chikungunya while dengue influenced by high turbid coastal beaches and malaria by medium turbid coastal beaches. The high turbidity in water is due to the urban waste discharge namely sewage, sullage and garbage from the densely populated cities and towns. The large extent of wetland is low land area favours the occurrence of vector borne diseases. Hence the provision of pollution control measures at source including soil erosion control measures is vital. The identification of vulnerable zones favouring the vector borne diseases will help the authorities to control pollution especially from urban areas and prevent these vector borne diseases. Future research should cover land use cover changes, climatic factors, seasonal variations in weather and pollution factors favouring the occurrence of vector borne diseases.
A machine learning approach to galaxy-LSS classification - I. Imprints on halo merger trees
NASA Astrophysics Data System (ADS)
Hui, Jianan; Aragon, Miguel; Cui, Xinping; Flegal, James M.
2018-04-01
The cosmic web plays a major role in the formation and evolution of galaxies and defines, to a large extent, their properties. However, the relation between galaxies and environment is still not well understood. Here, we present a machine learning approach to study imprints of environmental effects on the mass assembly of haloes. We present a galaxy-LSS machine learning classifier based on galaxy properties sensitive to the environment. We then use the classifier to assess the relevance of each property. Correlations between galaxy properties and their cosmic environment can be used to predict galaxy membership to void/wall or filament/cluster with an accuracy of 93 per cent. Our study unveils environmental information encoded in properties of haloes not normally considered directly dependent on the cosmic environment such as merger history and complexity. Understanding the physical mechanism by which the cosmic web is imprinted in a halo can lead to significant improvements in galaxy formation models. This is accomplished by extracting features from galaxy properties and merger trees, computing feature scores for each feature and then applying support vector machine (SVM) to different feature sets. To this end, we have discovered that the shape and depth of the merger tree, formation time, and density of the galaxy are strongly associated with the cosmic environment. We describe a significant improvement in the original classification algorithm by performing LU decomposition of the distance matrix computed by the feature vectors and then using the output of the decomposition as input vectors for SVM.
Smith, T M; Jiang, Y F; Shipley, P; Floss, H G
1995-10-16
A common approach to identify and clone biosynthetic gene from an antibiotic-producing streptomycete is to clone the resistance gene for the antibiotic of interest and then use that gene to clone DNA that is linked to it. As a first step toward cloning the genes responsible for the biosynthesis of thiostrepton (Th) in Streptomyces laurentii (Sl), the Th resistance-encoding gene (tsnR) was cloned as a 1.5-kb BamHI-PvuII fragment in Escherichia coli (Ec), and shown to confer Th resistance when introduced into S. lividans TK24. The tsnR-containing DNA fragment was used as a probe to isolate clones from cosmid libraries of DNA in the Ec cosmid vector SuperCos, and pOJ446 (an Ec/streptomycete) cosmid vector. Sequence and genetic analysis of the DNA flanking the tsnR indicates that the Sl tsnR is not closely linked to biosynthetic genes. Instead it is located within a cluster of ribosomal protein operons.
Vector analysis of chemical variation in the lavas of Parícutin volcano, Mexico
Miesch, A.T.
1979-01-01
Compositional variations in the lavas of Parícutin volcano, Mexico, have been examined by an extended method of Q-mode factor analysis. Each sample composition is treated as a vector projected from an original eight-dimensional space into a vector system of three dimensions. The compositions represented by the vectors after projection are closely similar to the original compositions except for Na2Oand Fe2O3.The vectors in the three-dimensional system cluster about three different planes that represent three stages of compositional change in the Parícutin lavas. Because chemical data on the compositions of the minerals in the lavas are presently lacking, interpretations of the mineral phases that may have been involved in fractional crystallization are based on CIPW norm calculations. Changes during the first stage are attributed largely to the fractional crystallization of plagioclase and olivine. Changes during the second stage can be explained by the separation of plagioclase and pyroxene. Changes during the final stage may have resulted mostly from the assimilation of a granitic material, as previously proposed by R. E. Wilcox.
2013-05-28
those of the support vector machine and relevance vector machine, and the model runs more quickly than the other algorithms . When one class occurs...incremental support vector machine algorithm for online learning when fewer than 50 data points are available. (a) Papers published in peer-reviewed journals...learning environments, where data processing occurs one observation at a time and the classification algorithm improves over time with new
NASA Astrophysics Data System (ADS)
Nourani, Vahid; Andalib, Gholamreza; Dąbrowska, Dominika
2017-05-01
Accurate nitrate load predictions can elevate decision management of water quality of watersheds which affects to environment and drinking water. In this paper, two scenarios were considered for Multi-Station (MS) nitrate load modeling of the Little River watershed. In the first scenario, Markovian characteristics of streamflow-nitrate time series were proposed for the MS modeling. For this purpose, feature extraction criterion of Mutual Information (MI) was employed for input selection of artificial intelligence models (Feed Forward Neural Network, FFNN and least square support vector machine). In the second scenario for considering seasonality-based characteristics of the time series, wavelet transform was used to extract multi-scale features of streamflow-nitrate time series of the watershed's sub-basins to model MS nitrate loads. Self-Organizing Map (SOM) clustering technique which finds homogeneous sub-series clusters was also linked to MI for proper cluster agent choice to be imposed into the models for predicting the nitrate loads of the watershed's sub-basins. The proposed MS method not only considers the prediction of the outlet nitrate but also covers predictions of interior sub-basins nitrate load values. The results indicated that the proposed FFNN model coupled with the SOM-MI improved the performance of MS nitrate predictions compared to the Markovian-based models up to 39%. Overall, accurate selection of dominant inputs which consider seasonality-based characteristics of streamflow-nitrate process could enhance the efficiency of nitrate load predictions.
A Parallel Processing Algorithm for Remote Sensing Classification
NASA Technical Reports Server (NTRS)
Gualtieri, J. Anthony
2005-01-01
A current thread in parallel computation is the use of cluster computers created by networking a few to thousands of commodity general-purpose workstation-level commuters using the Linux operating system. For example on the Medusa cluster at NASA/GSFC, this provides for super computing performance, 130 G(sub flops) (Linpack Benchmark) at moderate cost, $370K. However, to be useful for scientific computing in the area of Earth science, issues of ease of programming, access to existing scientific libraries, and portability of existing code need to be considered. In this paper, I address these issues in the context of tools for rendering earth science remote sensing data into useful products. In particular, I focus on a problem that can be decomposed into a set of independent tasks, which on a serial computer would be performed sequentially, but with a cluster computer can be performed in parallel, giving an obvious speedup. To make the ideas concrete, I consider the problem of classifying hyperspectral imagery where some ground truth is available to train the classifier. In particular I will use the Support Vector Machine (SVM) approach as applied to hyperspectral imagery. The approach will be to introduce notions about parallel computation and then to restrict the development to the SVM problem. Pseudocode (an outline of the computation) will be described and then details specific to the implementation will be given. Then timing results will be reported to show what speedups are possible using parallel computation. The paper will close with a discussion of the results.
Generation of nanoclusters by ultrafast laser ablation of Al: Molecular dynamics study
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miloshevsky, Alexander; Phillips, Mark C.; Harilal, Sivanandan S.
The laser ablation of materials induced by an ultrashort femtosecond pulse is a complex phenomenon, which depends on both the material properties and the properties of the laser pulse. The unique capability of a combination of molecular dynamics (MD) and Momentum Scaling Model (MSM) methods is developed and applied to a large atomic system for studying the process of ultrafast laser-material interactions, behavior of matter in a highly non-equilibrium state, material disintegration, and formation of nanoparticles (NPs). Laser pulses with several fluences in the range from 500 J/m2 to 5000 J/m2 interacting with a large system of aluminum atoms aremore » simulated. The response of Al material to the laser energy deposition is investigated within the finite-size laser spot. It is found that the shape of the plasma plume is dynamically changing during an expansion process. At several tens of picoseconds it can be characterized as a long hollow ellipsoid surrounded by atomized and nano-clustered particles. The time evolution of NP clusters in the plume is investigated. The collisions between the single Al atoms and generated NPs and fragmentation of large NPs determine the fractions of different-size NP clusters in the plume. The MD-MSM simulations show that laser fluence greatly affects the size distribution of NPs, their polar angles, magnitude and direction vectors of NP velocities. These results and predictions are supported by the experimental data and previous MD simulations.« less
Comparison of four statistical and machine learning methods for crash severity prediction.
Iranitalab, Amirfarrokh; Khattak, Aemal
2017-11-01
Crash severity prediction models enable different agencies to predict the severity of a reported crash with unknown severity or the severity of crashes that may be expected to occur sometime in the future. This paper had three main objectives: comparison of the performance of four statistical and machine learning methods including Multinomial Logit (MNL), Nearest Neighbor Classification (NNC), Support Vector Machines (SVM) and Random Forests (RF), in predicting traffic crash severity; developing a crash costs-based approach for comparison of crash severity prediction methods; and investigating the effects of data clustering methods comprising K-means Clustering (KC) and Latent Class Clustering (LCC), on the performance of crash severity prediction models. The 2012-2015 reported crash data from Nebraska, United States was obtained and two-vehicle crashes were extracted as the analysis data. The dataset was split into training/estimation (2012-2014) and validation (2015) subsets. The four prediction methods were trained/estimated using the training/estimation dataset and the correct prediction rates for each crash severity level, overall correct prediction rate and a proposed crash costs-based accuracy measure were obtained for the validation dataset. The correct prediction rates and the proposed approach showed NNC had the best prediction performance in overall and in more severe crashes. RF and SVM had the next two sufficient performances and MNL was the weakest method. Data clustering did not affect the prediction results of SVM, but KC improved the prediction performance of MNL, NNC and RF, while LCC caused improvement in MNL and RF but weakened the performance of NNC. Overall correct prediction rate had almost the exact opposite results compared to the proposed approach, showing that neglecting the crash costs can lead to misjudgment in choosing the right prediction method. Copyright © 2017 Elsevier Ltd. All rights reserved.
2014-01-01
Background Protein model quality assessment is an essential component of generating and using protein structural models. During the Tenth Critical Assessment of Techniques for Protein Structure Prediction (CASP10), we developed and tested four automated methods (MULTICOM-REFINE, MULTICOM-CLUSTER, MULTICOM-NOVEL, and MULTICOM-CONSTRUCT) that predicted both local and global quality of protein structural models. Results MULTICOM-REFINE was a clustering approach that used the average pairwise structural similarity between models to measure the global quality and the average Euclidean distance between a model and several top ranked models to measure the local quality. MULTICOM-CLUSTER and MULTICOM-NOVEL were two new support vector machine-based methods of predicting both the local and global quality of a single protein model. MULTICOM-CONSTRUCT was a new weighted pairwise model comparison (clustering) method that used the weighted average similarity between models in a pool to measure the global model quality. Our experiments showed that the pairwise model assessment methods worked better when a large portion of models in the pool were of good quality, whereas single-model quality assessment methods performed better on some hard targets when only a small portion of models in the pool were of reasonable quality. Conclusions Since digging out a few good models from a large pool of low-quality models is a major challenge in protein structure prediction, single model quality assessment methods appear to be poised to make important contributions to protein structure modeling. The other interesting finding was that single-model quality assessment scores could be used to weight the models by the consensus pairwise model comparison method to improve its accuracy. PMID:24731387
Cao, Renzhi; Wang, Zheng; Cheng, Jianlin
2014-04-15
Protein model quality assessment is an essential component of generating and using protein structural models. During the Tenth Critical Assessment of Techniques for Protein Structure Prediction (CASP10), we developed and tested four automated methods (MULTICOM-REFINE, MULTICOM-CLUSTER, MULTICOM-NOVEL, and MULTICOM-CONSTRUCT) that predicted both local and global quality of protein structural models. MULTICOM-REFINE was a clustering approach that used the average pairwise structural similarity between models to measure the global quality and the average Euclidean distance between a model and several top ranked models to measure the local quality. MULTICOM-CLUSTER and MULTICOM-NOVEL were two new support vector machine-based methods of predicting both the local and global quality of a single protein model. MULTICOM-CONSTRUCT was a new weighted pairwise model comparison (clustering) method that used the weighted average similarity between models in a pool to measure the global model quality. Our experiments showed that the pairwise model assessment methods worked better when a large portion of models in the pool were of good quality, whereas single-model quality assessment methods performed better on some hard targets when only a small portion of models in the pool were of reasonable quality. Since digging out a few good models from a large pool of low-quality models is a major challenge in protein structure prediction, single model quality assessment methods appear to be poised to make important contributions to protein structure modeling. The other interesting finding was that single-model quality assessment scores could be used to weight the models by the consensus pairwise model comparison method to improve its accuracy.
NASA Astrophysics Data System (ADS)
Furfaro, R.; Linares, R.; Gaylor, D.; Jah, M.; Walls, R.
2016-09-01
In this paper, we present an end-to-end approach that employs machine learning techniques and Ontology-based Bayesian Networks (BN) to characterize the behavior of resident space objects. State-of-the-Art machine learning architectures (e.g. Extreme Learning Machines, Convolutional Deep Networks) are trained on physical models to learn the Resident Space Object (RSO) features in the vectorized energy and momentum states and parameters. The mapping from measurements to vectorized energy and momentum states and parameters enables behavior characterization via clustering in the features space and subsequent RSO classification. Additionally, Space Object Behavioral Ontologies (SOBO) are employed to define and capture the domain knowledge-base (KB) and BNs are constructed from the SOBO in a semi-automatic fashion to execute probabilistic reasoning over conclusions drawn from trained classifiers and/or directly from processed data. Such an approach enables integrating machine learning classifiers and probabilistic reasoning to support higher-level decision making for space domain awareness applications. The innovation here is to use these methods (which have enjoyed great success in other domains) in synergy so that it enables a "from data to discovery" paradigm by facilitating the linkage and fusion of large and disparate sources of information via a Big Data Science and Analytics framework.
MALINA: a web service for visual analytics of human gut microbiota whole-genome metagenomic reads.
Tyakht, Alexander V; Popenko, Anna S; Belenikin, Maxim S; Altukhov, Ilya A; Pavlenko, Alexander V; Kostryukova, Elena S; Selezneva, Oksana V; Larin, Andrei K; Karpova, Irina Y; Alexeev, Dmitry G
2012-12-07
MALINA is a web service for bioinformatic analysis of whole-genome metagenomic data obtained from human gut microbiota sequencing. As input data, it accepts metagenomic reads of various sequencing technologies, including long reads (such as Sanger and 454 sequencing) and next-generation (including SOLiD and Illumina). It is the first metagenomic web service that is capable of processing SOLiD color-space reads, to authors' knowledge. The web service allows phylogenetic and functional profiling of metagenomic samples using coverage depth resulting from the alignment of the reads to the catalogue of reference sequences which are built into the pipeline and contain prevalent microbial genomes and genes of human gut microbiota. The obtained metagenomic composition vectors are processed by the statistical analysis and visualization module containing methods for clustering, dimension reduction and group comparison. Additionally, the MALINA database includes vectors of bacterial and functional composition for human gut microbiota samples from a large number of existing studies allowing their comparative analysis together with user samples, namely datasets from Russian Metagenome project, MetaHIT and Human Microbiome Project (downloaded from http://hmpdacc.org). MALINA is made freely available on the web at http://malina.metagenome.ru. The website is implemented in JavaScript (using Ext JS), Microsoft .NET Framework, MS SQL, Python, with all major browsers supported.
NASA Astrophysics Data System (ADS)
Srivastava, D. P.; Sahni, V.; Satsangi, P. S.
2014-08-01
Graph-theoretic quantum system modelling (GTQSM) is facilitated by considering the fundamental unit of quantum computation and information, viz. a quantum bit or qubit as a basic building block. Unit directional vectors "ket 0" and "ket 1" constitute two distinct fundamental quantum across variable orthonormal basis vectors, for the Hilbert space, specifying the direction of propagation of information, or computation data, while complementary fundamental quantum through, or flow rate, variables specify probability parameters, or amplitudes, as surrogates for scalar quantum information measure (von Neumann entropy). This paper applies GTQSM in continuum of protein heterodimer tubulin molecules of self-assembling polymers, viz. microtubules in the brain as a holistic system of interacting components representing hierarchical clustered quantum Hopfield network, hQHN, of networks. The quantum input/output ports of the constituent elemental interaction components, or processes, of tunnelling interactions and Coulombic bidirectional interactions are in cascade and parallel interconnections with each other, while the classical output ports of all elemental components are interconnected in parallel to accumulate micro-energy functions generated in the system as Hamiltonian, or Lyapunov, energy function. The paper presents an insight, otherwise difficult to gain, for the complex system of systems represented by clustered quantum Hopfield network, hQHN, through the application of GTQSM construct.
Geraci, Nicholas S.; Mukbel, Rami M.; Kemp, Michael T.; Wadsworth, Mariha N.; Lesho, Emil; Stayback, Gwen M.; Champion, Matthew M.; Bernard, Megan A.; Abo-Shehada, Mahmoud; Coutinho-Abreu, Iliano V.; Ramalho-Ortigão, Marcelo; Hanafi, Hanafi A.; Fawaz, Emadeldin Y.; El-Hossary, Shabaan S.; Wortmann, Glenn; Hoel, David F.; McDowell, Mary Ann
2014-01-01
Phlebotomus papatasi sand flies are among the primary vectors of Leishmania major parasites from Morocco to the Indian subcontinent and from southern Europe to central and eastern Africa. Antibody-based immunity to sand fly salivary gland proteins in human populations remains a complex contextual problem that is not yet fully understood. We profiled the immunoreactivities of plasma antibodies to sand fly salivary gland sonicates (SGSs) from 229 human blood donors residing in different regions of sand fly endemicity throughout Jordan and Egypt as well as 69 US military personnel, who were differentially exposed to P. papatasi bites and L. major infections in Iraq. Compared with plasma from control region donors, antibodies were significantly immunoreactive to five salivary proteins (12, 26, 30, 38, and 44 kDa) among Jordanian and Egyptian donors, with immunoglobulin G4 being the dominant anti-SGS isotype. US personnel were significantly immunoreactive to only two salivary proteins (38 and 14 kDa). Using k-means clustering, donors were segregated into four clusters distinguished by unique immunoreactivity profiles to varying combinations of the significantly immunogenic salivary proteins. SGS-induced cellular proliferation was diminished among donors residing in sand fly-endemic regions. These data provide a clearer picture of human immune responses to sand fly vector salivary constituents. PMID:24615125
Image quality guided approach for adaptive modelling of biometric intra-class variations
NASA Astrophysics Data System (ADS)
Abboud, Ali J.; Jassim, Sabah A.
2010-04-01
The high intra-class variability of acquired biometric data can be attributed to several factors such as quality of acquisition sensor (e.g. thermal), environmental (e.g. lighting), behavioural (e.g. change face pose). Such large fuzziness of biometric data can cause a big difference between an acquired and stored biometric data that will eventually lead to reduced performance. Many systems store multiple templates in order to account for such variations in the biometric data during enrolment stage. The number and typicality of these templates are the most important factors that affect system performance than other factors. In this paper, a novel offline approach is proposed for systematic modelling of intra-class variability and typicality in biometric data by regularly selecting new templates from a set of available biometric images. Our proposed technique is a two stage algorithm whereby in the first stage image samples are clustered in terms of their image quality profile vectors, rather than their biometric feature vectors, and in the second stage a per cluster template is selected from a small number of samples in each clusters to create an ultimate template sets. These experiments have been conducted on five face image databases and their results will demonstrate the effectiveness of proposed quality guided approach.
NASA Technical Reports Server (NTRS)
Kramer, Williams T. C.; Simon, Horst D.
1994-01-01
This tutorial proposes to be a practical guide for the uninitiated to the main topics and themes of high-performance computing (HPC), with particular emphasis to distributed computing. The intent is first to provide some guidance and directions in the rapidly increasing field of scientific computing using both massively parallel and traditional supercomputers. Because of their considerable potential computational power, loosely or tightly coupled clusters of workstations are increasingly considered as a third alternative to both the more conventional supercomputers based on a small number of powerful vector processors, as well as high massively parallel processors. Even though many research issues concerning the effective use of workstation clusters and their integration into a large scale production facility are still unresolved, such clusters are already used for production computing. In this tutorial we will utilize the unique experience made at the NAS facility at NASA Ames Research Center. Over the last five years at NAS massively parallel supercomputers such as the Connection Machines CM-2 and CM-5 from Thinking Machines Corporation and the iPSC/860 (Touchstone Gamma Machine) and Paragon Machines from Intel were used in a production supercomputer center alongside with traditional vector supercomputers such as the Cray Y-MP and C90.
Analyzing coastal environments by means of functional data analysis
NASA Astrophysics Data System (ADS)
Sierra, Carlos; Flor-Blanco, Germán; Ordoñez, Celestino; Flor, Germán; Gallego, José R.
2017-07-01
Here we used Functional Data Analysis (FDA) to examine particle-size distributions (PSDs) in a beach/shallow marine sedimentary environment in Gijón Bay (NW Spain). The work involved both Functional Principal Components Analysis (FPCA) and Functional Cluster Analysis (FCA). The grainsize of the sand samples was characterized by means of laser dispersion spectroscopy. Within this framework, FPCA was used as a dimension reduction technique to explore and uncover patterns in grain-size frequency curves. This procedure proved useful to describe variability in the structure of the data set. Moreover, an alternative approach, FCA, was applied to identify clusters and to interpret their spatial distribution. Results obtained with this latter technique were compared with those obtained by means of two vector approaches that combine PCA with CA (Cluster Analysis). The first method, the point density function (PDF), was employed after adapting a log-normal distribution to each PSD and resuming each of the density functions by its mean, sorting, skewness and kurtosis. The second applied a centered-log-ratio (clr) to the original data. PCA was then applied to the transformed data, and finally CA to the retained principal component scores. The study revealed functional data analysis, specifically FPCA and FCA, as a suitable alternative with considerable advantages over traditional vector analysis techniques in sedimentary geology studies.
George, Phillip; Jensen, Silke; Pogorelcnik, Romain; Lee, Jiyoung; Xing, Yi; Brasset, Emilie; Vaury, Chantal; Sharakhov, Igor V
2015-01-01
Specific genomic loci, termed Piwi-interacting RNA (piRNA) clusters, manufacture piRNAs that serve as guides for the inactivation of complementary transposable elements (TEs). The piRNA pathway has been accurately detailed in Drosophila melanogaster, while it remains poorly examined in other insects. This pathway is increasingly recognized as critical for germline development and reproduction. Understanding of the piRNA functions in mosquitoes could offer an opportunity for disease vector control by the reduction of their reproductive potential. To analyze the similarities and differences in this pathway between Drosophila and mosquito, we performed an in-depth analysis of the genomic loci producing piRNAs and their targets in the African malaria vector Anopheles gambiae. We identified 187 piRNA clusters in the An. gambiae genome and 155 piRNA clusters in the D. melanogaster genome. We demonstrate that many more piRNA clusters in the mosquito compared with the fruit fly are uni-directionally transcribed and are located outside pericentromeric heterochromatin. About 11 % of the An. gambiae piRNA population map to gene transcripts. This is a noticeable increase compared with the ~6 % of the piRNA population mapped to genes in D. melanogaster. A subset of the piRNA-enriched genes in An. gambiae has functions related to reproduction and development. At least 24 and 65 % of the mapped piRNAs correspond to genomic TE sequences in An. gambiae and D. melanogaster, respectively. DNA transposons and non-LTR retrotransposons are more abundant in An. gambiae, while LTR retrotransposons are more abundant in D. melanogaster. Yet, piRNAs predominantly target LTR retrotransposons in both species, which may point to a distinct feature of these elements compared to the other classes of TEs concerning their silencing by the piRNA pathway. Here, we demonstrate that piRNA-producing loci have more ubiquitous distribution in the An. gambiae genome than in the genome of D. melanogaster. Also, protein-coding genes have an increased role in production of piRNAs in the germline of this mosquito. Genes involved in germline and embryonic development of An. gambiae generate a substantial portion of piRNAs, suggesting a role of the piRNA pathway in the epigenetic regulation of the reproductive processes in the African malaria vector.
Lee, Wen-Li; Chang, Koyin; Hsieh, Kai-Sheng
2016-09-01
Segmenting lung fields in a chest radiograph is essential for automatically analyzing an image. We present an unsupervised method based on multiresolution fractal feature vector. The feature vector characterizes the lung field region effectively. A fuzzy c-means clustering algorithm is then applied to obtain a satisfactory initial contour. The final contour is obtained by deformable models. The results show the feasibility and high performance of the proposed method. Furthermore, based on the segmentation of lung fields, the cardiothoracic ratio (CTR) can be measured. The CTR is a simple index for evaluating cardiac hypertrophy. After identifying a suspicious symptom based on the estimated CTR, a physician can suggest that the patient undergoes additional extensive tests before a treatment plan is finalized.
NASA Astrophysics Data System (ADS)
Ksoll, Victor F.; Gouliermis, Dimitrios A.; Klessen, Ralf S.; Grebel, Eva K.; Sabbi, Elena; Anderson, Jay; Lennon, Daniel J.; Cignoni, Michele; de Marchi, Guido; Smith, Linda J.; Tosi, Monica; van der Marel, Roeland P.
2018-05-01
The Hubble Tarantula Treasury Project (HTTP) has provided an unprecedented photometric coverage of the entire star-burst region of 30 Doradus down to the half Solar mass limit. We use the deep stellar catalogue of HTTP to identify all the pre-main-sequence (PMS) stars of the region, i.e., stars that have not started their lives on the main-sequence yet. The photometric distinction of these stars from the more evolved populations is not a trivial task due to several factors that alter their colour-magnitude diagram positions. The identification of PMS stars requires, thus, sophisticated statistical methods. We employ Machine Learning Classification techniques on the HTTP survey of more than 800,000 sources to identify the PMS stellar content of the observed field. Our methodology consists of 1) carefully selecting the most probable low-mass PMS stellar population of the star-forming cluster NGC2070, 2) using this sample to train classification algorithms to build a predictive model for PMS stars, and 3) applying this model in order to identify the most probable PMS content across the entire Tarantula Nebula. We employ Decision Tree, Random Forest and Support Vector Machine classifiers to categorise the stars as PMS and Non-PMS. The Random Forest and Support Vector Machine provided the most accurate models, predicting about 20,000 sources with a candidateship probability higher than 50 percent, and almost 10,000 PMS candidates with a probability higher than 95 percent. This is the richest and most accurate photometric catalogue of extragalactic PMS candidates across the extent of a whole star-forming complex.
Kazemi, Fatemeh; Najafabadi, Tooraj Abbasian; Araabi, Babak Nadjar
2016-01-01
Acute myelogenous leukemia (AML) is a subtype of acute leukemia, which is characterized by the accumulation of myeloid blasts in the bone marrow. Careful microscopic examination of stained blood smear or bone marrow aspirate is still the most significant diagnostic methodology for initial AML screening and considered as the first step toward diagnosis. It is time-consuming and due to the elusive nature of the signs and symptoms of AML; wrong diagnosis may occur by pathologists. Therefore, the need for automation of leukemia detection has arisen. In this paper, an automatic technique for identification and detection of AML and its prevalent subtypes, i.e., M2-M5 is presented. At first, microscopic images are acquired from blood smears of patients with AML and normal cases. After applying image preprocessing, color segmentation strategy is applied for segmenting white blood cells from other blood components and then discriminative features, i.e., irregularity, nucleus-cytoplasm ratio, Hausdorff dimension, shape, color, and texture features are extracted from the entire nucleus in the whole images containing multiple nuclei. Images are classified to cancerous and noncancerous images by binary support vector machine (SVM) classifier with 10-fold cross validation technique. Classifier performance is evaluated by three parameters, i.e., sensitivity, specificity, and accuracy. Cancerous images are also classified into their prevalent subtypes by multi-SVM classifier. The results show that the proposed algorithm has achieved an acceptable performance for diagnosis of AML and its common subtypes. Therefore, it can be used as an assistant diagnostic tool for pathologists.
Enhancement of plant metabolite fingerprinting by machine learning.
Scott, Ian M; Vermeer, Cornelia P; Liakata, Maria; Corol, Delia I; Ward, Jane L; Lin, Wanchang; Johnson, Helen E; Whitehead, Lynne; Kular, Baldeep; Baker, John M; Walsh, Sean; Dave, Anuja; Larson, Tony R; Graham, Ian A; Wang, Trevor L; King, Ross D; Draper, John; Beale, Michael H
2010-08-01
Metabolite fingerprinting of Arabidopsis (Arabidopsis thaliana) mutants with known or predicted metabolic lesions was performed by (1)H-nuclear magnetic resonance, Fourier transform infrared, and flow injection electrospray-mass spectrometry. Fingerprinting enabled processing of five times more plants than conventional chromatographic profiling and was competitive for discriminating mutants, other than those affected in only low-abundance metabolites. Despite their rapidity and complexity, fingerprints yielded metabolomic insights (e.g. that effects of single lesions were usually not confined to individual pathways). Among fingerprint techniques, (1)H-nuclear magnetic resonance discriminated the most mutant phenotypes from the wild type and Fourier transform infrared discriminated the fewest. To maximize information from fingerprints, data analysis was crucial. One-third of distinctive phenotypes might have been overlooked had data models been confined to principal component analysis score plots. Among several methods tested, machine learning (ML) algorithms, namely support vector machine or random forest (RF) classifiers, were unsurpassed for phenotype discrimination. Support vector machines were often the best performing classifiers, but RFs yielded some particularly informative measures. First, RFs estimated margins between mutant phenotypes, whose relations could then be visualized by Sammon mapping or hierarchical clustering. Second, RFs provided importance scores for the features within fingerprints that discriminated mutants. These scores correlated with analysis of variance F values (as did Kruskal-Wallis tests, true- and false-positive measures, mutual information, and the Relief feature selection algorithm). ML classifiers, as models trained on one data set to predict another, were ideal for focused metabolomic queries, such as the distinctiveness and consistency of mutant phenotypes. Accessible software for use of ML in plant physiology is highlighted.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, Sungsik; Lee, Byeongdu; Seifert, Sönke
2015-05-21
In this study, the catalytic activity and changes in the oxidation state during the Fischer Tropsch (FT) reaction was investigated on subnanometer size-selected cobalt clusters deposited on oxide (Al2O3, MgO) and carbon-based (ultrananocrystalline diamond UNCD) supports by temperature programmed reaction (TPRx) combined with in-situ grazing-incidence X-ray absorption characterization (GIXAS). The activity and selectivity of ultrasmall cobalt clusters exhibits a very strong dependence on cluster size and support. The evolution of the oxidation state of metal cluster during the reaction reveals that metal-support interaction plays a key role in the reaction.
Vector-model-supported approach in prostate plan optimization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Eva Sau Fan; Department of Health Technology and Informatics, The Hong Kong Polytechnic University; Wu, Vincent Wing Cheung
Lengthy time consumed in traditional manual plan optimization can limit the use of step-and-shoot intensity-modulated radiotherapy/volumetric-modulated radiotherapy (S&S IMRT/VMAT). A vector model base, retrieving similar radiotherapy cases, was developed with respect to the structural and physiologic features extracted from the Digital Imaging and Communications in Medicine (DICOM) files. Planning parameters were retrieved from the selected similar reference case and applied to the test case to bypass the gradual adjustment of planning parameters. Therefore, the planning time spent on the traditional trial-and-error manual optimization approach in the beginning of optimization could be reduced. Each S&S IMRT/VMAT prostate reference database comprised 100more » previously treated cases. Prostate cases were replanned with both traditional optimization and vector-model-supported optimization based on the oncologists' clinical dose prescriptions. A total of 360 plans, which consisted of 30 cases of S&S IMRT, 30 cases of 1-arc VMAT, and 30 cases of 2-arc VMAT plans including first optimization and final optimization with/without vector-model-supported optimization, were compared using the 2-sided t-test and paired Wilcoxon signed rank test, with a significance level of 0.05 and a false discovery rate of less than 0.05. For S&S IMRT, 1-arc VMAT, and 2-arc VMAT prostate plans, there was a significant reduction in the planning time and iteration with vector-model-supported optimization by almost 50%. When the first optimization plans were compared, 2-arc VMAT prostate plans had better plan quality than 1-arc VMAT plans. The volume receiving 35 Gy in the femoral head for 2-arc VMAT plans was reduced with the vector-model-supported optimization compared with the traditional manual optimization approach. Otherwise, the quality of plans from both approaches was comparable. Vector-model-supported optimization was shown to offer much shortened planning time and iteration number without compromising the plan quality.« less
Zhang, Guo-rong; Geller, Alfred I
2010-05-17
Multiple potential uses of direct gene transfer into neurons require restricting expression to specific classes of glutamatergic neurons. Thus, it is desirable to develop vectors containing glutamatergic class-specific promoters. The three vesicular glutamate transporters (VGLUTs) are expressed in distinct populations of neurons, and VGLUT1 is the predominant VGLUT in the neocortex, hippocampus, and cerebellar cortex. We previously reported a plasmid (amplicon) Herpes Simplex Virus (HSV-1) vector that placed the Lac Z gene under the regulation of the VGLUT1 promoter (pVGLUT1lac). Using helper virus-free vector stocks, we showed that this vector supported approximately 90% glutamatergic neuron-specific expression in postrhinal (POR) cortex, in rats sacrificed at either 4 days or 2 months after gene transfer. We now show that pVGLUT1lac supports expression preferentially in VGLUT1-containing glutamatergic neurons. pVGLUT1lac vector stock was injected into either POR cortex, which contains primarily VGLUT1-containing glutamatergic neurons, or into the ventral medial hypothalamus (VMH), which contains predominantly VGLUT2-containing glutamatergic neurons. Rats were sacrificed at 4 days after gene transfer, and the types of cells expressing ss-galactosidase were determined by immunofluorescent costaining. Cell counts showed that pVGLUT1lac supported expression in approximately 10-fold more cells in POR cortex than in the VMH, whereas a control vector supported expression in similar numbers of cells in these two areas. Further, in POR cortex, pVGLUT1lac supported expression predominately in VGLUT1-containing neurons, and, in the VMH, pVGLUT1lac showed an approximately 10-fold preference for the rare VGLUT1-containing neurons. VGLUT1-specific expression may benefit specific experiments on learning or specific gene therapy approaches, particularly in the neocortex. Copyright 2010 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Jia, Rui-Sheng; Sun, Hong-Mei; Peng, Yan-Jun; Liang, Yong-Quan; Lu, Xin-Ming
2017-07-01
Microseismic monitoring is an effective means for providing early warning of rock or coal dynamical disasters, and its first step is microseismic event detection, although low SNR microseismic signals often cannot effectively be detected by routine methods. To solve this problem, this paper presents permutation entropy and a support vector machine to detect low SNR microseismic events. First, an extraction method of signal features based on multi-scale permutation entropy is proposed by studying the influence of the scale factor on the signal permutation entropy. Second, the detection model of low SNR microseismic events based on the least squares support vector machine is built by performing a multi-scale permutation entropy calculation for the collected vibration signals, constructing a feature vector set of signals. Finally, a comparative analysis of the microseismic events and noise signals in the experiment proves that the different characteristics of the two can be fully expressed by using multi-scale permutation entropy. The detection model of microseismic events combined with the support vector machine, which has the features of high classification accuracy and fast real-time algorithms, can meet the requirements of online, real-time extractions of microseismic events.
NASA Astrophysics Data System (ADS)
Bowling, R. D.; Laya, J. C.; Everett, M. E.
2018-07-01
The study of exposed carbonate platforms provides observational constraints on regional tectonics and sea-level history. In this work Miocene-aged carbonate platform units of the Seroe Domi Formation are investigated on the island of Bonaire, located in the Southern Caribbean. Ground penetrating radar (GPR) was used to probe near-surface structural geometries associated with these lithologies. The single cross-island transect described herein allowed for continuous mapping of geologic structures on kilometre length scales. Numerical analysis was applied to the data in the form of k-means clustering of structure-parallel vectors derived from image structure tensors. This methodology enables radar facies along the survey transect to be semi-automatically mapped. The results provide subsurface evidence to support previous surficial and outcrop observations, and reveal complex stratigraphy within the platform. From the GPR data analysis, progradational clinoform geometries were observed on the northeast side of the island which support the tectonics and depositional trends of the region. Furthermore, several leeward-side radar facies are identified which correlate to environments of deposition conducive to dolomitization via reflux mechanisms.
NASA Astrophysics Data System (ADS)
Bowling, R. D.; Laya, J. C.; Everett, M. E.
2018-05-01
The study of exposed carbonate platforms provides observational constraints on regional tectonics and sea-level history. In this work Miocene-aged carbonate platform units of the Seroe Domi Formation are investigated, on the island of Bonaire, located in the Southern Caribbean. Ground penetrating radar (GPR) was used to probe near-surface structural geometries associated with these lithologies. The single cross-island transect described herein allowed for continuous mapping of geologic structures on kilometer length scales. Numerical analysis was applied to the data in the form of k-means clustering of structure-parallel vectors derived from image structure tensors. This methodology enables radar facies along the survey transect to be semi-automatically mapped. The results provide subsurface evidence to support previous surficial and outcrop observations, and reveal complex stratigraphy within the platform. From the GPR data analysis, progradational clinoform geometries were observed on the northeast side of the island which supports the tectonics and depositional trends of the region. Furthermore, several leeward-side radar facies are identified which correlate to environments of deposition conducive to dolomitization via reflux mechanisms.
Huda, M Mamun; Kumar, Vijay; Das, Murari Lal; Ghosh, Debashis; Priyanka, Jyoti; Das, Pradeep; Alim, Abdul; Matlashewski, Greg; Kroeger, Axel; Alfonso-Sierra, Eduardo; Mondal, Dinesh
2016-10-06
New methods for controlling sand fly are highly desired by the Visceral Leishmaniasis (VL) elimination program of Bangladesh, India and Nepal for its consolidation and maintenance phases. To support the program we investigated safety, efficacy and cost of Durable Wall Lining to control sand fly. This multicentre randomized controlled study in Bangladesh, India and Nepal included randomized two intervention clusters and one control cluster. Each cluster had 50 households except full wall surface coverage (DWL-FWSC) cluster in Nepal which had 46 households. Ten of 50 households were randomly selected for entomological activities except India where it was 6 households. Interventions were DWL-FWSC and reduced wall surface coverage (DWL-RWSC) with DWL which covers 1.8 m and 1.5 m height from floor respectively. Efficacy was measured by reduction in sand fly density by intervention and sand fly mortality assessment by the WHO cone bioassay test at 1 month after intervention. Trained field research assistants interviewed household heads for socio-demographic information, knowledge and practice about VL, vector control, and for their experience following the intervention. Cost data was collected using cost data collection tool which was designed for this study. Statistical analysis included difference-in-differences estimate, bivariate analysis, Poisson regression model and incremental cost-efficacy ratio calculation. Mean sand fly density reduction by DWL-FWSC and DWL-RWSC was respectively -4.96 (95 % CI, -4.54, -5.38) and -5.38 (95 % CI, -4.89, -5.88). The sand fly density reduction attributed by both the interventions were statistically significant after adjusting for covariates (IRR = 0.277, p < 0.001 for DWL-RWSC and IRR = 0.371, p < 0.001 for DWL-FWSC). The efficacy of DWL-RWSC and DWL-FWSC on sand fly density reduction was statistically comparable (p = 0.214). The acceptability of both interventions was high. Transient burning sensations, flash on face and itching were most common adverse events and were observed mostly in Indian site. There was no serious adverse event. DWL-RWSC is cost-saving compared to DWL-FWSC. The incremental cost-efficacy ratio was -6.36, where DWL-RWSC dominates DWL-FWSC. DWL-RWSC intervention is safe, efficacious, cost-saving and cost-effective in reducing indoor sand fly density. The VL elimination program in the Indian sub-continent may consider DWL-RWSC for sand fly control for its consolidation and maintenance phases.
Yang, Jun; Hou, Ziming; Wang, Changjiang; Wang, Hao; Zhang, Hongbing
2018-04-23
Adamantinomatous craniopharyngioma (ACP) is an aggressive brain tumor that occurs predominantly in the pediatric population. Conventional diagnosis method and standard therapy cannot treat ACPs effectively. In this paper, we aimed to identify key genes for ACP early diagnosis and treatment. Datasets GSE94349 and GSE68015 were obtained from Gene Expression Omnibus database. Consensus clustering was applied to discover the gene clusters in the expression data of GSE94349 and functional enrichment analysis was performed on gene set in each cluster. The protein-protein interaction (PPI) network was built by the Search Tool for the Retrieval of Interacting Genes, and hubs were selected. Support vector machine (SVM) model was built based on the signature genes identified from enrichment analysis and PPI network. Dataset GSE94349 was used for training and testing, and GSE68015 was used for validation. Besides, RT-qPCR analysis was performed to analyze the expression of signature genes in ACP samples compared with normal controls. Seven gene clusters were discovered in the differentially expressed genes identified from GSE94349 dataset. Enrichment analysis of each cluster identified 25 pathways that highly associated with ACP. PPI network was built and 46 hubs were determined. Twenty-five pathway-related genes that overlapped with the hubs in PPI network were used as signatures to establish the SVM diagnosis model for ACP. The prediction accuracy of SVM model for training, testing, and validation data were 94, 85, and 74%, respectively. The expression of CDH1, CCL2, ITGA2, COL8A1, COL6A2, and COL6A3 were significantly upregulated in ACP tumor samples, while CAMK2A, RIMS1, NEFL, SYT1, and STX1A were significantly downregulated, which were consistent with the differentially expressed gene analysis. SVM model is a promising classification tool for screening and early diagnosis of ACP. The ACP-related pathways and signature genes will advance our knowledge of ACP pathogenesis and benefit the therapy improvement.
Multiscale benchmarking of drug delivery vectors.
Summers, Huw D; Ware, Matthew J; Majithia, Ravish; Meissner, Kenith E; Godin, Biana; Rees, Paul
2016-10-01
Cross-system comparisons of drug delivery vectors are essential to ensure optimal design. An in-vitro experimental protocol is presented that separates the role of the delivery vector from that of its cargo in determining the cell response, thus allowing quantitative comparison of different systems. The technique is validated through benchmarking of the dose-response of human fibroblast cells exposed to the cationic molecule, polyethylene imine (PEI); delivered as a free molecule and as a cargo on the surface of CdSe nanoparticles and Silica microparticles. The exposure metrics are converted to a delivered dose with the transport properties of the different scale systems characterized by a delivery time, τ. The benchmarking highlights an agglomeration of the free PEI molecules into micron sized clusters and identifies the metric determining cell death as the total number of PEI molecules presented to cells, determined by the delivery vector dose and the surface density of the cargo. Copyright © 2016 Elsevier Inc. All rights reserved.
Suerth, Julia D; Maetzig, Tobias; Brugman, Martijn H; Heinz, Niels; Appelt, Jens-Uwe; Kaufmann, Kerstin B; Schmidt, Manfred; Grez, Manuel; Modlich, Ute; Baum, Christopher; Schambach, Axel
2012-01-01
Comparative integrome analyses have highlighted alpharetroviral vectors with a relatively neutral, and thus favorable, integration spectrum. However, previous studies used alpharetroviral vectors harboring viral coding sequences and intact long-terminal repeats (LTRs). We recently developed self-inactivating (SIN) alpharetroviral vectors with an advanced split-packaging design. In a murine bone marrow (BM) transplantation model we now compared alpharetroviral, gammaretroviral, and lentiviral SIN vectors and showed that all vectors transduced hematopoietic stem cells (HSCs), leading to comparable, sustained multilineage transgene expression in primary and secondary transplanted mice. Alpharetroviral integrations were decreased near transcription start sites, CpG islands, and potential cancer genes compared with gammaretroviral, and decreased in genes compared with lentiviral integrations. Analyzing the transcriptome and intragenic integrations in engrafting cells, we observed stronger correlations between in-gene integration targeting and transcriptional activity for gammaretroviral and lentiviral vectors than for alpharetroviral vectors. Importantly, the relatively “extragenic” alpharetroviral integration pattern still supported long-term transgene expression upon serial transplantation. Furthermore, sensitive genotoxicity studies revealed a decreased immortalization incidence compared with gammaretroviral and lentiviral SIN vectors. We conclude that alpharetroviral SIN vectors have a favorable integration pattern which lowers the risk of insertional mutagenesis while supporting long-term transgene expression in the progeny of transplanted HSCs. PMID:22334016
NASA Astrophysics Data System (ADS)
Sun, J.; Li, Y.
2017-12-01
Magnetic data contain important information about the subsurface rocks that were magnetized in the geological history, which provides an important avenue to the study of the crustal heterogeneities associated with magmatic and hydrothermal activities. Interpretation of magnetic data has been widely used in mineral exploration, basement characterization and large scale crustal studies for several decades. However, interpreting magnetic data has been often complicated by the presence of remanent magnetizations with unknown magnetization directions. Researchers have developed different methods to deal with the challenges posed by remanence. We have developed a new and effective approach to inverting magnetic data for magnetization vector distributions characterized by region-wise consistency in the magnetization directions. This approach combines the classical Tikhonov inversion scheme with fuzzy C-means clustering algorithm, and constrains the estimated magnetization vectors to a specified small number of possible directions while fitting the observed magnetic data to within noise level. Our magnetization vector inversion recovers both the magnitudes and the directions of the magnetizations in the subsurface. Magnetization directions reflect the unique geological or hydrothermal processes applied to each geological unit, and therefore, can potentially be used for the purpose of differentiating various geological units. We have developed a practically convenient and effective way of assessing the uncertainty associated with the inverted magnetization directions (Figure 1), and investigated how geological differentiation results might be affected (Figure 2). The algorithm and procedures we have developed for magnetization vector inversion and uncertainty analysis open up new possibilities of extracting useful information from magnetic data affected by remanence. We will use a field data example from exploration of an iron-oxide-copper-gold (IOCG) deposit in Brazil to illustrate how to solve the inverse problem, assess uncertainty, and perform geology differentiation in practice. We will also discuss the potential applications of this new method to large scale crustal studies.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shareghe, Mehraeen; Chi, Miaofang; Browning, Nigel D.
2011-01-01
The structures of small, robust metal clusters on a solid support were determined by a combination of spectroscopic and microscopic methods: extended X-ray absorption fine structure (EXAFS) spectroscopy, scanning transmission electron microscopy (STEM), and aberration-corrected STEM. The samples were synthesized from [Os{sub 3}(CO){sub 12}] on MgO powder to provide supported clusters intended to be triosmium. The results demonstrate that the supported clusters are robust in the absence of oxidants. Conventional high-angle annular dark-field (HAADF) STEM images demonstrate a high degree of uniformity of the clusters, with root-mean-square (rms) radii of 2.03 {+-} 0.06 {angstrom}. The EXAFS OsOs coordination number ofmore » 2.1 {+-} 0.4 confirms the presence of triosmium clusters on average and correspondingly determines an average rms cluster radius of 2.02 {+-} 0.04 {angstrom}. The high-resolution STEM images show the individual Os atoms in the clusters, confirming the triangular structures of their frames and determining OsOs distances of 2.80 {+-} 0.14 {angstrom}, matching the EXAFS value of 2.89 {+-} 0.06 {angstrom}. IR and EXAFS spectra demonstrate the presence of CO ligands on the clusters. This set of techniques is recommended as optimal for detailed and reliable structural characterization of supported clusters.« less
Mirabello, Lisa; Vineis, Joseph H; Yanoviak, Stephen P; Scarpassa, Vera M; Póvoa, Marinete M; Padilla, Norma; Achee, Nicole L; Conn, Jan E
2008-03-26
Anopheles darlingi is the most important malaria vector in the Neotropics. An understanding of A. darlingi's population structure and contemporary gene flow patterns is necessary if vector populations are to be successfully controlled. We assessed population genetic structure and levels of differentiation based on 1,376 samples from 31 localities throughout the Peruvian and Brazilian Amazon and Central America using 5-8 microsatellite loci. We found high levels of polymorphism for all of the Amazonian populations (mean RS = 7.62, mean HO = 0.742), and low levels for the Belize and Guatemalan populations (mean RS = 4.3, mean HO = 0.457). The Bayesian clustering analysis revealed five population clusters: northeastern Amazonian Brazil, southeastern and central Amazonian Brazil, western and central Amazonian Brazil, Peruvian Amazon, and the Central American populations. Within Central America there was low non-significant differentiation, except for between the populations separated by the Maya Mountains. Within Amazonia there was a moderate level of significant differentiation attributed to isolation by distance. Within Peru there was no significant population structure and low differentiation, and some evidence of a population expansion. The pairwise estimates of genetic differentiation between Central America and Amazonian populations were all very high and highly significant (FST = 0.1859 - 0.3901, P < 0.05). Both the DA and FST distance-based trees illustrated the main division to be between Central America and Amazonia. We detected a large amount of population structure in Amazonia, with three population clusters within Brazil and one including the Peru populations. The considerable differences in Ne among the populations may have contributed to the observed genetic differentiation. All of the data suggest that the primary division within A. darlingi corresponds to two white gene genotypes between Amazonia (genotype 1) and Central America, parts of Colombia and Venezuela (genotype 2), and are in agreement with previously published mitochondrial COI gene sequences interpreted as incipient species. Overall, it appears that two main factors have contributed to the genetic differentiation between the population clusters: physical distance between the populations and the differences in effective population sizes among the subpopulations.
Marchant, Axelle; Mougel, Florence; Jacquin-Joly, Emmanuelle; Costa, Jane; Almeida, Carlos Eduardo; Harry, Myriam
2016-01-01
Background In Latin America, the bloodsucking bugs Triatominae are vectors of Trypanosoma cruzi, the parasite that causes Chagas disease. Chemical elimination programs have been launched to control Chagas disease vectors. However, the disease persists because native vectors from sylvatic habitats are able to (re)colonize houses—a process called domiciliation. Triatoma brasiliensis is one example. Because the chemosensory system allows insects to interact with their environment and plays a key role in insect adaption, we conducted a descriptive and comparative study of the chemosensory transcriptome of T. brasiliensis samples from different ecotopes. Methodology/Principal Finding In a reference transcriptome built using de novo assembly, we found transcripts encoding 27 odorant-binding proteins (OBPs), 17 chemosensory proteins (CSPs), 3 odorant receptors (ORs), 5 transient receptor potential channel (TRPs), 1 sensory neuron membrane protein (SNMPs), 25 takeout proteins, 72 cytochrome P450s, 5 gluthatione S-transferases, and 49 cuticular proteins. Using protein phylogenies, we showed that most of the OBPs and CSPs for T. brasiliensis had well supported orthologs in the kissing bug Rhodnius prolixus. We also showed a higher number of these genes within the bloodsucking bugs and more generally within all Hemipterans compared to the other species in the super-order Paraneoptera. Using both DESeq2 and EdgeR software, we performed differential expression analyses between samples of T. brasiliensis, taking into account their environment (sylvatic, peridomiciliary and domiciliary) and sex. We also searched clusters of co-expressed contigs using HTSCluster. Among differentially expressed (DE) contigs, most were under-expressed in the chemosensory organs of the domiciliary bugs compared to the other samples and in females compared to males. We clearly identified DE genes that play a role in the chemosensory system. Conclusion/Significance Chemosensory genes could be good candidates for genes that contribute to adaptation or plastic rearrangement to an anthropogenic system. The domiciliary environment probably includes less diversity of xenobiotics and probably has more stable abiotic parameters than do sylvatic and peridomiciliary environments. This could explain why both detoxification and cuticle protein genes are less expressed in domiciliary bugs. Understanding the molecular basis for how vectors adapt to human dwellings may reveal new tools to control disease vectors; for example, by disrupting chemical communication. PMID:27792774
Marchant, Axelle; Mougel, Florence; Jacquin-Joly, Emmanuelle; Costa, Jane; Almeida, Carlos Eduardo; Harry, Myriam
2016-10-01
In Latin America, the bloodsucking bugs Triatominae are vectors of Trypanosoma cruzi, the parasite that causes Chagas disease. Chemical elimination programs have been launched to control Chagas disease vectors. However, the disease persists because native vectors from sylvatic habitats are able to (re)colonize houses-a process called domiciliation. Triatoma brasiliensis is one example. Because the chemosensory system allows insects to interact with their environment and plays a key role in insect adaption, we conducted a descriptive and comparative study of the chemosensory transcriptome of T. brasiliensis samples from different ecotopes. In a reference transcriptome built using de novo assembly, we found transcripts encoding 27 odorant-binding proteins (OBPs), 17 chemosensory proteins (CSPs), 3 odorant receptors (ORs), 5 transient receptor potential channel (TRPs), 1 sensory neuron membrane protein (SNMPs), 25 takeout proteins, 72 cytochrome P450s, 5 gluthatione S-transferases, and 49 cuticular proteins. Using protein phylogenies, we showed that most of the OBPs and CSPs for T. brasiliensis had well supported orthologs in the kissing bug Rhodnius prolixus. We also showed a higher number of these genes within the bloodsucking bugs and more generally within all Hemipterans compared to the other species in the super-order Paraneoptera. Using both DESeq2 and EdgeR software, we performed differential expression analyses between samples of T. brasiliensis, taking into account their environment (sylvatic, peridomiciliary and domiciliary) and sex. We also searched clusters of co-expressed contigs using HTSCluster. Among differentially expressed (DE) contigs, most were under-expressed in the chemosensory organs of the domiciliary bugs compared to the other samples and in females compared to males. We clearly identified DE genes that play a role in the chemosensory system. Chemosensory genes could be good candidates for genes that contribute to adaptation or plastic rearrangement to an anthropogenic system. The domiciliary environment probably includes less diversity of xenobiotics and probably has more stable abiotic parameters than do sylvatic and peridomiciliary environments. This could explain why both detoxification and cuticle protein genes are less expressed in domiciliary bugs. Understanding the molecular basis for how vectors adapt to human dwellings may reveal new tools to control disease vectors; for example, by disrupting chemical communication.
2017-08-01
An outbreak of Zika virus infection was detected in Singapore in August, 2016. We report the first comprehensive analysis of a national response to an outbreak of Zika virus infection in Asia. In the first phase of the outbreak, patients with suspected Zika virus infection were isolated in two national referral hospitals until their serum tested negative for the virus. Enhanced vector control and community engagement measures were deployed in disease clusters, including stepped-up mosquito larvicide and adulticide use, community participation in source reduction (destruction of mosquito breeding sites), and work with the local media to promote awareness of the outbreak. Clinical and epidemiological data were collected from patients with confirmed Zika virus infection during the first phase. In the second phase, admission into hospitals for isolation was stopped but vector control efforts continued. Mosquitoes were captured from areas with Zika disease clusters to assess which species were present, their breeding numbers, and to test for Zika virus. Mosquito virus strains were compared with human strains through phylogenetic analysis after full genome sequencing. Reproductive numbers and inferred dates of strain diversification were estimated through Bayesian analyses. From Aug 27 to Nov 30, 2016, 455 cases of Zika virus infection were confirmed in Singapore. Of 163 patients with confirmed Zika virus infection who presented to national referral hospitals during the first phase of the outbreak, Zika virus was detected in the blood samples of 97 (60%) patients and the urine samples of 157 (96%) patients. There were 15 disease clusters, 12 of which had high Aedes aegypti breeding percentages. Captured mosquitoes were pooled into 517 pools for Zika virus screening; nine abdomen pools (2%) were positive for Zika virus, of which seven head and thorax pools were Zika-virus positive. In the phylogenetic analysis, all mosquito sequences clustered within the outbreak lineage. The lineage showed little diversity and was distinct from other Asian lineages. The estimated most recent common ancestor of the outbreak lineage was from May, 2016. With the deployment of vector control and community engagement measures, the estimated reproductive number fell from 3·62 (95% CI 3·48-3·77) for July 31 to Sept 1, 2016, to 1·22 (95% CI 1·19-1·24) 4 weeks later (Sept 1 to Nov 24, 2016). The outbreak shows the ease with which Zika virus can be introduced and spread despite good baseline vector control. Disease surveillance, enhanced vector control, and community awareness and engagement helped to quickly curb further spread of the virus. These intensive measures might be useful for other countries facing the same threat. National Medical Research Council Singapore, Centre for Infectious Disease Epidemiology and Research, and A*STAR Biomedical Research Council. Copyright © 2017 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Šilhavý, Jakub; Minár, Jozef; Mentlík, Pavel; Sládek, Ján
2016-07-01
This paper presents a new method of automatic lineament extraction which includes the removal of the 'artefacts effect' which is associated with the process of raster based analysis. The core of the proposed Multi-Hillshade Hierarchic Clustering (MHHC) method incorporates a set of variously illuminated and rotated hillshades in combination with hierarchic clustering of derived 'protolineaments'. The algorithm also includes classification into positive and negative lineaments. MHHC was tested in two different territories in Bohemian Forest and Central Western Carpathians. The original vector-based algorithm was developed for comparison of the individual lineaments proximity. Its use confirms the compatibility of manual and automatic extraction and their similar relationships to structural data in the study areas.
Identifying saltcedar with hyperspectral data and support vector machines
USDA-ARS?s Scientific Manuscript database
Saltcedar (Tamarix spp.) are a group of dense phreatophytic shrubs and trees that are invasive to riparian areas throughout the United States. This study determined the feasibility of using hyperspectral data and a support vector machine (SVM) classifier to discriminate saltcedar from other cover t...
Decaying vector dark matter as an explanation for the 3.5 keV line from galaxy clusters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Farzan, Yasaman; Akbarieh, Amin Rezaei, E-mail: yasaman@theory.ipm.ac.ir, E-mail: am_rezaei@physics.sharif.ir
2014-11-01
We present a Vector Dark Matter (VDM) model that explains the 3.5 keV line recently observed in the XMM-Newton observatory data from galaxy clusters. In this model, dark matter is composed of two vector bosons, V and V', which couple to the photon through an effective generalized Chern-Simons coupling, g{sub V}. V' is slightly heavier than V with a mass splitting m{sub V'} – m{sub V} ≅ 3.5 keV. The decay of V' to V and a photon gives rise to the 3.5 keV line. The production of V and V' takes place in the early universe within the freeze-in framework through the effectivemore » g{sub V} coupling when m{sub V'} < T < Λ, Λ being the cut-off above which the effective g{sub V} coupling is not valid. We introduce a high energy model that gives rise to the g{sub V} coupling at low energies. To do this, V and V' are promoted to gauge bosons of spontaneously broken new U(1){sub V} and U(1){sub V'} gauge symmetries, respectively. The high energy sector includes milli-charged chiral fermions that lead to the g{sub V} coupling at low energy via triangle diagrams.« less
Mirjankar, Nikhil S; Fraga, Carlos G; Carman, April J; Moran, James J
2016-02-02
Chemical attribution signatures (CAS) for chemical threat agents (CTAs), such as cyanides, are being investigated to provide an evidentiary link between CTAs and specific sources to support criminal investigations and prosecutions. Herein, stocks of KCN and NaCN were analyzed for trace anions by high performance ion chromatography (HPIC), carbon stable isotope ratio (δ(13)C) by isotope ratio mass spectrometry (IRMS), and trace elements by inductively coupled plasma optical emission spectroscopy (ICP-OES). The collected analytical data were evaluated using hierarchical cluster analysis (HCA), Fisher-ratio (F-ratio), interval partial least-squares (iPLS), genetic algorithm-based partial least-squares (GAPLS), partial least-squares discriminant analysis (PLSDA), K nearest neighbors (KNN), and support vector machines discriminant analysis (SVMDA). HCA of anion impurity profiles from multiple cyanide stocks from six reported countries of origin resulted in cyanide samples clustering into three groups, independent of the associated alkali metal (K or Na). The three groups were independently corroborated by HCA of cyanide elemental profiles and corresponded to countries each having one known solid cyanide factory: Czech Republic, Germany, and United States. Carbon stable isotope measurements resulted in two clusters: Germany and United States (the single Czech stock grouped with United States stocks). Classification errors for two validation studies using anion impurity profiles collected over five years on different instruments were as low as zero for KNN and SVMDA, demonstrating the excellent reliability associated with using anion impurities for matching a cyanide sample to its factory using our current cyanide stocks. Variable selection methods reduced errors for those classification methods having errors greater than zero; iPLS-forward selection and F-ratio typically provided the lowest errors. Finally, using anion profiles to classify cyanides to a specific stock or stock group for a subset of United States stocks resulted in cross-validation errors ranging from 0 to 5.3%.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Keller, Brad M.; Nathan, Diane L.; Wang Yan
Purpose: The amount of fibroglandular tissue content in the breast as estimated mammographically, commonly referred to as breast percent density (PD%), is one of the most significant risk factors for developing breast cancer. Approaches to quantify breast density commonly focus on either semiautomated methods or visual assessment, both of which are highly subjective. Furthermore, most studies published to date investigating computer-aided assessment of breast PD% have been performed using digitized screen-film mammograms, while digital mammography is increasingly replacing screen-film mammography in breast cancer screening protocols. Digital mammography imaging generates two types of images for analysis, raw (i.e., 'FOR PROCESSING') andmore » vendor postprocessed (i.e., 'FOR PRESENTATION'), of which postprocessed images are commonly used in clinical practice. Development of an algorithm which effectively estimates breast PD% in both raw and postprocessed digital mammography images would be beneficial in terms of direct clinical application and retrospective analysis. Methods: This work proposes a new algorithm for fully automated quantification of breast PD% based on adaptive multiclass fuzzy c-means (FCM) clustering and support vector machine (SVM) classification, optimized for the imaging characteristics of both raw and processed digital mammography images as well as for individual patient and image characteristics. Our algorithm first delineates the breast region within the mammogram via an automated thresholding scheme to identify background air followed by a straight line Hough transform to extract the pectoral muscle region. The algorithm then applies adaptive FCM clustering based on an optimal number of clusters derived from image properties of the specific mammogram to subdivide the breast into regions of similar gray-level intensity. Finally, a SVM classifier is trained to identify which clusters within the breast tissue are likely fibroglandular, which are then aggregated into a final dense tissue segmentation that is used to compute breast PD%. Our method is validated on a group of 81 women for whom bilateral, mediolateral oblique, raw and processed screening digital mammograms were available, and agreement is assessed with both continuous and categorical density estimates made by a trained breast-imaging radiologist. Results: Strong association between algorithm-estimated and radiologist-provided breast PD% was detected for both raw (r= 0.82, p < 0.001) and processed (r= 0.85, p < 0.001) digital mammograms on a per-breast basis. Stronger agreement was found when overall breast density was assessed on a per-woman basis for both raw (r= 0.85, p < 0.001) and processed (0.89, p < 0.001) mammograms. Strong agreement between categorical density estimates was also seen (weighted Cohen's {kappa}{>=} 0.79). Repeated measures analysis of variance demonstrated no statistically significant differences between the PD% estimates (p > 0.1) due to either presentation of the image (raw vs processed) or method of PD% assessment (radiologist vs algorithm). Conclusions: The proposed fully automated algorithm was successful in estimating breast percent density from both raw and processed digital mammographic images. Accurate assessment of a woman's breast density is critical in order for the estimate to be incorporated into risk assessment models. These results show promise for the clinical application of the algorithm in quantifying breast density in a repeatable manner, both at time of imaging as well as in retrospective studies.« less
Keller, Brad M.; Nathan, Diane L.; Wang, Yan; Zheng, Yuanjie; Gee, James C.; Conant, Emily F.; Kontos, Despina
2012-01-01
Purpose: The amount of fibroglandular tissue content in the breast as estimated mammographically, commonly referred to as breast percent density (PD%), is one of the most significant risk factors for developing breast cancer. Approaches to quantify breast density commonly focus on either semiautomated methods or visual assessment, both of which are highly subjective. Furthermore, most studies published to date investigating computer-aided assessment of breast PD% have been performed using digitized screen-film mammograms, while digital mammography is increasingly replacing screen-film mammography in breast cancer screening protocols. Digital mammography imaging generates two types of images for analysis, raw (i.e., “FOR PROCESSING”) and vendor postprocessed (i.e., “FOR PRESENTATION”), of which postprocessed images are commonly used in clinical practice. Development of an algorithm which effectively estimates breast PD% in both raw and postprocessed digital mammography images would be beneficial in terms of direct clinical application and retrospective analysis. Methods: This work proposes a new algorithm for fully automated quantification of breast PD% based on adaptive multiclass fuzzy c-means (FCM) clustering and support vector machine (SVM) classification, optimized for the imaging characteristics of both raw and processed digital mammography images as well as for individual patient and image characteristics. Our algorithm first delineates the breast region within the mammogram via an automated thresholding scheme to identify background air followed by a straight line Hough transform to extract the pectoral muscle region. The algorithm then applies adaptive FCM clustering based on an optimal number of clusters derived from image properties of the specific mammogram to subdivide the breast into regions of similar gray-level intensity. Finally, a SVM classifier is trained to identify which clusters within the breast tissue are likely fibroglandular, which are then aggregated into a final dense tissue segmentation that is used to compute breast PD%. Our method is validated on a group of 81 women for whom bilateral, mediolateral oblique, raw and processed screening digital mammograms were available, and agreement is assessed with both continuous and categorical density estimates made by a trained breast-imaging radiologist. Results: Strong association between algorithm-estimated and radiologist-provided breast PD% was detected for both raw (r = 0.82, p < 0.001) and processed (r = 0.85, p < 0.001) digital mammograms on a per-breast basis. Stronger agreement was found when overall breast density was assessed on a per-woman basis for both raw (r = 0.85, p < 0.001) and processed (0.89, p < 0.001) mammograms. Strong agreement between categorical density estimates was also seen (weighted Cohen's κ ≥ 0.79). Repeated measures analysis of variance demonstrated no statistically significant differences between the PD% estimates (p > 0.1) due to either presentation of the image (raw vs processed) or method of PD% assessment (radiologist vs algorithm). Conclusions: The proposed fully automated algorithm was successful in estimating breast percent density from both raw and processed digital mammographic images. Accurate assessment of a woman's breast density is critical in order for the estimate to be incorporated into risk assessment models. These results show promise for the clinical application of the algorithm in quantifying breast density in a repeatable manner, both at time of imaging as well as in retrospective studies. PMID:22894417
Keller, Brad M; Nathan, Diane L; Wang, Yan; Zheng, Yuanjie; Gee, James C; Conant, Emily F; Kontos, Despina
2012-08-01
The amount of fibroglandular tissue content in the breast as estimated mammographically, commonly referred to as breast percent density (PD%), is one of the most significant risk factors for developing breast cancer. Approaches to quantify breast density commonly focus on either semiautomated methods or visual assessment, both of which are highly subjective. Furthermore, most studies published to date investigating computer-aided assessment of breast PD% have been performed using digitized screen-film mammograms, while digital mammography is increasingly replacing screen-film mammography in breast cancer screening protocols. Digital mammography imaging generates two types of images for analysis, raw (i.e., "FOR PROCESSING") and vendor postprocessed (i.e., "FOR PRESENTATION"), of which postprocessed images are commonly used in clinical practice. Development of an algorithm which effectively estimates breast PD% in both raw and postprocessed digital mammography images would be beneficial in terms of direct clinical application and retrospective analysis. This work proposes a new algorithm for fully automated quantification of breast PD% based on adaptive multiclass fuzzy c-means (FCM) clustering and support vector machine (SVM) classification, optimized for the imaging characteristics of both raw and processed digital mammography images as well as for individual patient and image characteristics. Our algorithm first delineates the breast region within the mammogram via an automated thresholding scheme to identify background air followed by a straight line Hough transform to extract the pectoral muscle region. The algorithm then applies adaptive FCM clustering based on an optimal number of clusters derived from image properties of the specific mammogram to subdivide the breast into regions of similar gray-level intensity. Finally, a SVM classifier is trained to identify which clusters within the breast tissue are likely fibroglandular, which are then aggregated into a final dense tissue segmentation that is used to compute breast PD%. Our method is validated on a group of 81 women for whom bilateral, mediolateral oblique, raw and processed screening digital mammograms were available, and agreement is assessed with both continuous and categorical density estimates made by a trained breast-imaging radiologist. Strong association between algorithm-estimated and radiologist-provided breast PD% was detected for both raw (r = 0.82, p < 0.001) and processed (r = 0.85, p < 0.001) digital mammograms on a per-breast basis. Stronger agreement was found when overall breast density was assessed on a per-woman basis for both raw (r = 0.85, p < 0.001) and processed (0.89, p < 0.001) mammograms. Strong agreement between categorical density estimates was also seen (weighted Cohen's κ ≥ 0.79). Repeated measures analysis of variance demonstrated no statistically significant differences between the PD% estimates (p > 0.1) due to either presentation of the image (raw vs processed) or method of PD% assessment (radiologist vs algorithm). The proposed fully automated algorithm was successful in estimating breast percent density from both raw and processed digital mammographic images. Accurate assessment of a woman's breast density is critical in order for the estimate to be incorporated into risk assessment models. These results show promise for the clinical application of the algorithm in quantifying breast density in a repeatable manner, both at time of imaging as well as in retrospective studies.
Yoshioka, Kota; Nakamura, Jiro; Pérez, Byron; Tercero, Doribel; Pérez, Lenin; Tabaru, Yuichiro
2015-12-01
Chagas disease is one of the most serious health problems in Latin America. Because the disease is transmitted mainly by triatomine vectors, a three-phase vector control strategy was used to reduce its vector-borne transmission. In Nicaragua, we implemented an indoor insecticide spraying program in five northern departments to reduce house infestation by Triatoma dimidiata. The spraying program was performed in two rounds. After each round, we conducted entomological evaluation to compare the vector infestation level before and after spraying. A total of 66,200 and 44,683 houses were sprayed in the first and second spraying rounds, respectively. The entomological evaluation showed that the proportion of houses infested by T. dimidiata was reduced from 17.0% to 3.0% after the first spraying, which was statistically significant (P < 0.0001). However, the second spraying round did not demonstrate clear effectiveness. Space-time analysis revealed that reinfestation of T. dimidiata is more likely to occur in clusters where the pre-spray infestation level is high. Here we discuss how large-scale insecticide spraying is neither effective nor affordable when T. dimidiata is widely distributed at low infestation levels. Further challenges involve research on T. dimidiata reinfestation, diversification of vector control strategies, and implementation of sustainable vector surveillance. © The American Society of Tropical Medicine and Hygiene.
Held, Elizabeth; Cape, Joshua; Tintle, Nathan
2016-01-01
Machine learning methods continue to show promise in the analysis of data from genetic association studies because of the high number of variables relative to the number of observations. However, few best practices exist for the application of these methods. We extend a recently proposed supervised machine learning approach for predicting disease risk by genotypes to be able to incorporate gene expression data and rare variants. We then apply 2 different versions of the approach (radial and linear support vector machines) to simulated data from Genetic Analysis Workshop 19 and compare performance to logistic regression. Method performance was not radically different across the 3 methods, although the linear support vector machine tended to show small gains in predictive ability relative to a radial support vector machine and logistic regression. Importantly, as the number of genes in the models was increased, even when those genes contained causal rare variants, model predictive ability showed a statistically significant decrease in performance for both the radial support vector machine and logistic regression. The linear support vector machine showed more robust performance to the inclusion of additional genes. Further work is needed to evaluate machine learning approaches on larger samples and to evaluate the relative improvement in model prediction from the incorporation of gene expression data.
Dahms, Sven O.; Kuester, Miriam; Streb, Carsten; Roth, Christian; Sträter, Norbert; Than, Manuel E.
2013-01-01
Heavy-atom clusters (HA clusters) containing a large number of specifically arranged electron-dense scatterers are especially useful for experimental phase determination of large complex structures, weakly diffracting crystals or structures with large unit cells. Often, the determination of the exact orientation of the HA cluster and hence of the individual heavy-atom positions proves to be the critical step in successful phasing and subsequent structure solution. Here, it is demonstrated that molecular replacement (MR) with either anomalous or isomorphous differences is a useful strategy for the correct placement of HA cluster compounds. The polyoxometallate cluster hexasodium α-metatungstate (HMT) was applied in phasing the structure of death receptor 6. Even though the HA cluster is bound in alternate partially occupied orientations and is located at a special position, its correct localization and orientation could be determined at resolutions as low as 4.9 Å. The broad applicability of this approach was demonstrated for five different derivative crystals that included the compounds tantalum tetradecabromide and trisodium phosphotungstate in addition to HMT. The correct placement of the HA cluster depends on the length of the intramolecular vectors chosen for MR, such that both a larger cluster size and the optimal choice of the wavelength used for anomalous data collection strongly affect the outcome. PMID:23385464
Scoring clustering solutions by their biological relevance.
Gat-Viks, I; Sharan, R; Shamir, R
2003-12-12
A central step in the analysis of gene expression data is the identification of groups of genes that exhibit similar expression patterns. Clustering gene expression data into homogeneous groups was shown to be instrumental in functional annotation, tissue classification, regulatory motif identification, and other applications. Although there is a rich literature on clustering algorithms for gene expression analysis, very few works addressed the systematic comparison and evaluation of clustering results. Typically, different clustering algorithms yield different clustering solutions on the same data, and there is no agreed upon guideline for choosing among them. We developed a novel statistically based method for assessing a clustering solution according to prior biological knowledge. Our method can be used to compare different clustering solutions or to optimize the parameters of a clustering algorithm. The method is based on projecting vectors of biological attributes of the clustered elements onto the real line, such that the ratio of between-groups and within-group variance estimators is maximized. The projected data are then scored using a non-parametric analysis of variance test, and the score's confidence is evaluated. We validate our approach using simulated data and show that our scoring method outperforms several extant methods, including the separation to homogeneity ratio and the silhouette measure. We apply our method to evaluate results of several clustering methods on yeast cell-cycle gene expression data. The software is available from the authors upon request.
Almendros, J.; Chouet, B.; Dawson, P.
2001-01-01
Array data from a seismic experiment carried out at Kilauea Volcano, Hawaii, in February 1997, are analyzed by the frequency-slowness method. The slowness vectors are determined at each of three small-aperture seismic antennas for the first arrivals of 1129 long-period (LP) events and 147 samples of volcanic tremor. The source locations are determined by using a probabilistic method which compares the event azimuths and slownesses with a slowness vector model. The results show that all the LP seismicity, including both discrete LP events and tremor, was generated in the same source region along the east flank of the Halemaumau pit crater, demonstrating the strong relation that exists between the two types of activities. The dimensions of the source region are approximately 0.6 X 1.0 X 0.5 km. For LP events we are able to resolve at least three different clusters of events. The most active cluster is centered ???200 m northeast of Halemaumau at depths shallower than 200 m beneath the caldera floor. A second cluster is located beneath the northeast quadrant of Halemaumau at a depth of ???400 m. The third cluster is <200 m deep and extends southeastward from the northeast quadrant of Halemaumau. Only one source zone is resolved for tremor. This zone is coincident with the most active source zone of LP events, northeast of Halemaumau. The location, depth, and size of the source region suggest a hydrothermal origin for all the analyzed LP seismicity. Copyright 2001 by the American Geophysical Union.
Pérez de Rosas, Alicia R.; Restelli, María F.; Fernández, Cintia J.; Blariza, María J.; García, Beatriz A.
2017-01-01
Here we apply inter-simple sequence repeat (ISSR) markers to explore the fine-scale genetic structure and dispersal in populations of Triatoma infestans. Five selected primers from 30 primers were used to amplify ISSRs by polymerase chain reaction. A total of 90 polymorphic bands were detected across 134 individuals captured from 11 peridomestic sites from the locality of San Martín (Capayán Department, Catamarca Province, Argentina). Significant levels of genetic differentiation suggest limited gene flow among sampling sites. Spatial autocorrelation analysis confirms that dispersal occurs on the scale of ∼469 m, suggesting that insecticide spraying should be extended at least within a radius of ∼500 m around the infested area. Moreover, Bayesian clustering algorithms indicated genetic exchange among different sites analyzed, supporting the hypothesis of an important role of peridomestic structures in the process of reinfestation. PMID:28115670
High-speed cell recognition algorithm for ultrafast flow cytometer imaging system.
Zhao, Wanyue; Wang, Chao; Chen, Hongwei; Chen, Minghua; Yang, Sigang
2018-04-01
An optical time-stretch flow imaging system enables high-throughput examination of cells/particles with unprecedented high speed and resolution. A significant amount of raw image data is produced. A high-speed cell recognition algorithm is, therefore, highly demanded to analyze large amounts of data efficiently. A high-speed cell recognition algorithm consisting of two-stage cascaded detection and Gaussian mixture model (GMM) classification is proposed. The first stage of detection extracts cell regions. The second stage integrates distance transform and the watershed algorithm to separate clustered cells. Finally, the cells detected are classified by GMM. We compared the performance of our algorithm with support vector machine. Results show that our algorithm increases the running speed by over 150% without sacrificing the recognition accuracy. This algorithm provides a promising solution for high-throughput and automated cell imaging and classification in the ultrafast flow cytometer imaging platform. (2018) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE).
CNN based approach for activity recognition using a wrist-worn accelerometer.
Panwar, Madhuri; Dyuthi, S Ram; Chandra Prakash, K; Biswas, Dwaipayan; Acharyya, Amit; Maharatna, Koushik; Gautam, Arvind; Naik, Ganesh R
2017-07-01
In recent years, significant advancements have taken place in human activity recognition using various machine learning approaches. However, feature engineering have dominated conventional methods involving the difficult process of optimal feature selection. This problem has been mitigated by using a novel methodology based on deep learning framework which automatically extracts the useful features and reduces the computational cost. As a proof of concept, we have attempted to design a generalized model for recognition of three fundamental movements of the human forearm performed in daily life where data is collected from four different subjects using a single wrist worn accelerometer sensor. The validation of the proposed model is done with different pre-processing and noisy data condition which is evaluated using three possible methods. The results show that our proposed methodology achieves an average recognition rate of 99.8% as opposed to conventional methods based on K-means clustering, linear discriminant analysis and support vector machine.
High-speed cell recognition algorithm for ultrafast flow cytometer imaging system
NASA Astrophysics Data System (ADS)
Zhao, Wanyue; Wang, Chao; Chen, Hongwei; Chen, Minghua; Yang, Sigang
2018-04-01
An optical time-stretch flow imaging system enables high-throughput examination of cells/particles with unprecedented high speed and resolution. A significant amount of raw image data is produced. A high-speed cell recognition algorithm is, therefore, highly demanded to analyze large amounts of data efficiently. A high-speed cell recognition algorithm consisting of two-stage cascaded detection and Gaussian mixture model (GMM) classification is proposed. The first stage of detection extracts cell regions. The second stage integrates distance transform and the watershed algorithm to separate clustered cells. Finally, the cells detected are classified by GMM. We compared the performance of our algorithm with support vector machine. Results show that our algorithm increases the running speed by over 150% without sacrificing the recognition accuracy. This algorithm provides a promising solution for high-throughput and automated cell imaging and classification in the ultrafast flow cytometer imaging platform.
Prediction of monthly rainfall in Victoria, Australia: Clusterwise linear regression approach
NASA Astrophysics Data System (ADS)
Bagirov, Adil M.; Mahmood, Arshad; Barton, Andrew
2017-05-01
This paper develops the Clusterwise Linear Regression (CLR) technique for prediction of monthly rainfall. The CLR is a combination of clustering and regression techniques. It is formulated as an optimization problem and an incremental algorithm is designed to solve it. The algorithm is applied to predict monthly rainfall in Victoria, Australia using rainfall data with five input meteorological variables over the period of 1889-2014 from eight geographically diverse weather stations. The prediction performance of the CLR method is evaluated by comparing observed and predicted rainfall values using four measures of forecast accuracy. The proposed method is also compared with the CLR using the maximum likelihood framework by the expectation-maximization algorithm, multiple linear regression, artificial neural networks and the support vector machines for regression models using computational results. The results demonstrate that the proposed algorithm outperforms other methods in most locations.
MetaDP: a comprehensive web server for disease prediction of 16S rRNA metagenomic datasets.
Xu, Xilin; Wu, Aiping; Zhang, Xinlei; Su, Mingming; Jiang, Taijiao; Yuan, Zhe-Ming
2016-01-01
High-throughput sequencing-based metagenomics has garnered considerable interest in recent years. Numerous methods and tools have been developed for the analysis of metagenomic data. However, it is still a daunting task to install a large number of tools and complete a complicated analysis, especially for researchers with minimal bioinformatics backgrounds. To address this problem, we constructed an automated software named MetaDP for 16S rRNA sequencing data analysis, including data quality control, operational taxonomic unit clustering, diversity analysis, and disease risk prediction modeling. Furthermore, a support vector machine-based prediction model for intestinal bowel syndrome (IBS) was built by applying MetaDP to microbial 16S sequencing data from 108 children. The success of the IBS prediction model suggests that the platform may also be applied to other diseases related to gut microbes, such as obesity, metabolic syndrome, or intestinal cancer, among others (http://metadp.cn:7001/).
A linear-RBF multikernel SVM to classify big text corpora.
Romero, R; Iglesias, E L; Borrajo, L
2015-01-01
Support vector machine (SVM) is a powerful technique for classification. However, SVM is not suitable for classification of large datasets or text corpora, because the training complexity of SVMs is highly dependent on the input size. Recent developments in the literature on the SVM and other kernel methods emphasize the need to consider multiple kernels or parameterizations of kernels because they provide greater flexibility. This paper shows a multikernel SVM to manage highly dimensional data, providing an automatic parameterization with low computational cost and improving results against SVMs parameterized under a brute-force search. The model consists in spreading the dataset into cohesive term slices (clusters) to construct a defined structure (multikernel). The new approach is tested on different text corpora. Experimental results show that the new classifier has good accuracy compared with the classic SVM, while the training is significantly faster than several other SVM classifiers.
Uarrota, Virgílio Gavicho; Moresco, Rodolfo; Coelho, Bianca; Nunes, Eduardo da Costa; Peruch, Luiz Augusto Martins; Neubert, Enilto de Oliveira; Rocha, Miguel; Maraschin, Marcelo
2014-10-15
Cassava roots are an important source of dietary and industrial carbohydrates and suffer markedly from postharvest physiological deterioration (PPD). This paper deals with metabolomics combined with chemometric tools for screening the chemical and enzymatic composition in several genotypes of cassava roots during PPD. Metabolome analyses showed increases in carotenoids, flavonoids, anthocyanins, phenolics, reactive scavenging species, and enzymes (superoxide dismutase family, hydrogen peroxide, and catalase) until 3-5days postharvest. PPD correlated negatively with phenolics and carotenoids and positively with anthocyanins and flavonoids. Chemometric tools such as principal component analysis, partial least squares discriminant analysis, and support vector machines discriminated well cassava samples and enabled a good prediction of samples. Hierarchical clustering analyses grouped samples according to their levels of PPD and chemical compositions. Copyright © 2014 Elsevier Ltd. All rights reserved.
Anders, Katherine L; Cutcher, Zoe; Kleinschmidt, Immo; Donnelly, Christl A; Ferguson, Neil M; Indriani, Citra; O'Neill, Scott L; Jewell, Nicholas P; Simmons, Cameron P
2018-05-07
Cluster randomized trials are the gold standard for assessing efficacy of community-level interventions, such as vector control strategies against dengue. We describe a novel cluster randomized trial methodology with a test-negative design, which offers advantages over traditional approaches. It utilizes outcome-based sampling of patients presenting with a syndrome consistent with the disease of interest, who are subsequently classified as test-positive cases or test-negative controls on the basis of diagnostic testing. We use simulations of a cluster trial to demonstrate validity of efficacy estimates under the test-negative approach. This demonstrates that, provided study arms are balanced for both test-negative and test-positive illness at baseline and that other test-negative design assumptions are met, the efficacy estimates closely match true efficacy. We also briefly discuss analytical considerations for an odds ratio-based effect estimate arising from clustered data, and outline potential approaches to analysis. We conclude that application of the test-negative design to certain cluster randomized trials could increase their efficiency and ease of implementation.
Unsupervised feature relevance analysis applied to improve ECG heartbeat clustering.
Rodríguez-Sotelo, J L; Peluffo-Ordoñez, D; Cuesta-Frau, D; Castellanos-Domínguez, G
2012-10-01
The computer-assisted analysis of biomedical records has become an essential tool in clinical settings. However, current devices provide a growing amount of data that often exceeds the processing capacity of normal computers. As this amount of information rises, new demands for more efficient data extracting methods appear. This paper addresses the task of data mining in physiological records using a feature selection scheme. An unsupervised method based on relevance analysis is described. This scheme uses a least-squares optimization of the input feature matrix in a single iteration. The output of the algorithm is a feature weighting vector. The performance of the method was assessed using a heartbeat clustering test on real ECG records. The quantitative cluster validity measures yielded a correctly classified heartbeat rate of 98.69% (specificity), 85.88% (sensitivity) and 95.04% (general clustering performance), which is even higher than the performance achieved by other similar ECG clustering studies. The number of features was reduced on average from 100 to 18, and the temporal cost was a 43% lower than in previous ECG clustering schemes. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Passing in Command Line Arguments and Parallel Cluster/Multicore Batching in R with batch.
Hoffmann, Thomas J
2011-03-01
It is often useful to rerun a command line R script with some slight change in the parameters used to run it - a new set of parameters for a simulation, a different dataset to process, etc. The R package batch provides a means to pass in multiple command line options, including vectors of values in the usual R format, easily into R. The same script can be setup to run things in parallel via different command line arguments. The R package batch also provides a means to simplify this parallel batching by allowing one to use R and an R-like syntax for arguments to spread a script across a cluster or local multicore/multiprocessor computer, with automated syntax for several popular cluster types. Finally it provides a means to aggregate the results together of multiple processes run on a cluster.
Clustering Tree-structured Data on Manifold
Lu, Na; Miao, Hongyu
2016-01-01
Tree-structured data usually contain both topological and geometrical information, and are necessarily considered on manifold instead of Euclidean space for appropriate data parameterization and analysis. In this study, we propose a novel tree-structured data parameterization, called Topology-Attribute matrix (T-A matrix), so the data clustering task can be conducted on matrix manifold. We incorporate the structure constraints embedded in data into the non-negative matrix factorization method to determine meta-trees from the T-A matrix, and the signature vector of each single tree can then be extracted by meta-tree decomposition. The meta-tree space turns out to be a cone space, in which we explore the distance metric and implement the clustering algorithm based on the concepts like Fréchet mean. Finally, the T-A matrix based clustering (TAMBAC) framework is evaluated and compared using both simulated data and real retinal images to illus trate its efficiency and accuracy. PMID:26660696
High-dimensional cluster analysis with the Masked EM Algorithm
Kadir, Shabnam N.; Goodman, Dan F. M.; Harris, Kenneth D.
2014-01-01
Cluster analysis faces two problems in high dimensions: first, the “curse of dimensionality” that can lead to overfitting and poor generalization performance; and second, the sheer time taken for conventional algorithms to process large amounts of high-dimensional data. We describe a solution to these problems, designed for the application of “spike sorting” for next-generation high channel-count neural probes. In this problem, only a small subset of features provide information about the cluster member-ship of any one data vector, but this informative feature subset is not the same for all data points, rendering classical feature selection ineffective. We introduce a “Masked EM” algorithm that allows accurate and time-efficient clustering of up to millions of points in thousands of dimensions. We demonstrate its applicability to synthetic data, and to real-world high-channel-count spike sorting data. PMID:25149694
Gould, S J; Hong, S T; Carney, J R
1998-01-01
The genes for most of the biosynthesis of the kinamycin antibiotics have been cloned and heterologously expressed. Genomic DNA of Streptomyces murayamaensis was partially digested with MboI and a library of approximately 40 kb fragments in E. coli XL1-BlueMR was prepared using the cosmid vector pOJ446. Hybridization with the actI probe from the actinorhodin polyketide synthase genes identified two clusters of polyketide genes. After transferal of these clusters to S. lividans ZX7, expression of one cluster was established by HPLC with photodiode array detection. Peaks were identified from the kin cluster for dehydrorabelomycin, kinobscurinone, and stealthin C, which are known intermediates in kinamycin biosynthesis. Two shunt metabolites, kinafluorenone and seongomycin were also identified. The structure of the latter was determined from a quantity obtained from large-scale fermentation of one of the clones.
USDA-ARS?s Scientific Manuscript database
This study evaluated linear spectral unmixing (LSU), mixture tuned matched filtering (MTMF) and support vector machine (SVM) techniques for detecting and mapping giant reed (Arundo donax L.), an invasive weed that presents a severe threat to agroecosystems and riparian areas throughout the southern ...
Support vector machines classifiers of physical activities in preschoolers
USDA-ARS?s Scientific Manuscript database
The goal of this study is to develop, test, and compare multinomial logistic regression (MLR) and support vector machines (SVM) in classifying preschool-aged children physical activity data acquired from an accelerometer. In this study, 69 children aged 3-5 years old were asked to participate in a s...
USDA-ARS?s Scientific Manuscript database
This paper presents a novel wrinkle evaluation method that uses modified wavelet coefficients and an optimized support-vector-machine (SVM) classification scheme to characterize and classify wrinkle appearance of fabric. Fabric images were decomposed with the wavelet transform (WT), and five parame...
Support vector machine (SVM) was applied for land-cover characterization using MODIS time-series data. Classification performance was examined with respect to training sample size, sample variability, and landscape homogeneity (purity). The results were compared to two convention...
Boelaert, M; Meheus, F; Sanchez, A; Singh, S P; Vanlerberghe, V; Picado, A; Meessen, B; Sundar, S
2009-06-01
To provide data about wealth distribution in visceral leishmaniasis (VL)-affected communities compared to that of the general population of Bihar State, India. After extensive disease risk mapping, 16 clusters with high VL transmission were selected in Bihar. An exhaustive census of all households in the clusters was conducted and socio-economic household characteristics were documented by questionnaire. Data on the general Bihar population taken from the National Family Health Survey of India were used for comparison. An asset index was developed based on Principal Components Analysis and the distribution of this asset index for the VL communities was compared with that of the general population of Bihar. 83% of households in communities with high VL attack rates belonged to the two lowest quintiles of the Bihar wealth distribution. All socio-economic indicators showed significantly lower wealth for those households. Visceral leishmaniasis clearly affects the poorest of the poor in India. They are most vulnerable, as this vector-born disease is linked to poor housing and unhealthy habitats. The disease leads the affected households to more destitution because of its impact on household income and wealth. Support for the present VL elimination initiative is important in the fight against poverty.
Enhancing the Biological Relevance of Machine Learning Classifiers for Reverse Vaccinology.
Heinson, Ashley I; Gunawardana, Yawwani; Moesker, Bastiaan; Hume, Carmen C Denman; Vataga, Elena; Hall, Yper; Stylianou, Elena; McShane, Helen; Williams, Ann; Niranjan, Mahesan; Woelk, Christopher H
2017-02-01
Reverse vaccinology (RV) is a bioinformatics approach that can predict antigens with protective potential from the protein coding genomes of bacterial pathogens for subunit vaccine design. RV has become firmly established following the development of the BEXSERO® vaccine against Neisseria meningitidis serogroup B. RV studies have begun to incorporate machine learning (ML) techniques to distinguish bacterial protective antigens (BPAs) from non-BPAs. This research contributes significantly to the RV field by using permutation analysis to demonstrate that a signal for protective antigens can be curated from published data. Furthermore, the effects of the following on an ML approach to RV were also assessed: nested cross-validation, balancing selection of non-BPAs for subcellular localization, increasing the training data, and incorporating greater numbers of protein annotation tools for feature generation. These enhancements yielded a support vector machine (SVM) classifier that could discriminate BPAs (n = 200) from non-BPAs (n = 200) with an area under the curve (AUC) of 0.787. In addition, hierarchical clustering of BPAs revealed that intracellular BPAs clustered separately from extracellular BPAs. However, no immediate benefit was derived when training SVM classifiers on data sets exclusively containing intra- or extracellular BPAs. In conclusion, this work demonstrates that ML classifiers have great utility in RV approaches and will lead to new subunit vaccines in the future.
ELM: an Algorithm to Estimate the Alpha Abundance from Low-resolution Spectra
NASA Astrophysics Data System (ADS)
Bu, Yude; Zhao, Gang; Pan, Jingchang; Bharat Kumar, Yerra
2016-01-01
We have investigated a novel methodology using the extreme learning machine (ELM) algorithm to determine the α abundance of stars. Applying two methods based on the ELM algorithm—ELM+spectra and ELM+Lick indices—to the stellar spectra from the ELODIE database, we measured the α abundance with a precision better than 0.065 dex. By applying these two methods to the spectra with different signal-to-noise ratios (S/Ns) and different resolutions, we found that ELM+spectra is more robust against degraded resolution and ELM+Lick indices is more robust against variation in S/N. To further validate the performance of ELM, we applied ELM+spectra and ELM+Lick indices to SDSS spectra and estimated α abundances with a precision around 0.10 dex, which is comparable to the results given by the SEGUE Stellar Parameter Pipeline. We further applied ELM to the spectra of stars in Galactic globular clusters (M15, M13, M71) and open clusters (NGC 2420, M67, NGC 6791), and results show good agreement with previous studies (within 1σ). A comparison of the ELM with other widely used methods including support vector machine, Gaussian process regression, artificial neural networks, and linear least-squares regression shows that ELM is efficient with computational resources and more accurate than other methods.
Alagha, Jawad S; Said, Md Azlin Md; Mogheir, Yunes
2014-01-01
Nitrate concentration in groundwater is influenced by complex and interrelated variables, leading to great difficulty during the modeling process. The objectives of this study are (1) to evaluate the performance of two artificial intelligence (AI) techniques, namely artificial neural networks and support vector machine, in modeling groundwater nitrate concentration using scant input data, as well as (2) to assess the effect of data clustering as a pre-modeling technique on the developed models' performance. The AI models were developed using data from 22 municipal wells of the Gaza coastal aquifer in Palestine from 2000 to 2010. Results indicated high simulation performance, with the correlation coefficient and the mean average percentage error of the best model reaching 0.996 and 7 %, respectively. The variables that strongly influenced groundwater nitrate concentration were previous nitrate concentration, groundwater recharge, and on-ground nitrogen load of each land use land cover category in the well's vicinity. The results also demonstrated the merit of performing clustering of input data prior to the application of AI models. With their high performance and simplicity, the developed AI models can be effectively utilized to assess the effects of future management scenarios on groundwater nitrate concentration, leading to more reasonable groundwater resources management and decision-making.
Blind source computer device identification from recorded VoIP calls for forensic investigation.
Jahanirad, Mehdi; Anuar, Nor Badrul; Wahab, Ainuddin Wahid Abdul
2017-03-01
The VoIP services provide fertile ground for criminal activity, thus identifying the transmitting computer devices from recorded VoIP call may help the forensic investigator to reveal useful information. It also proves the authenticity of the call recording submitted to the court as evidence. This paper extended the previous study on the use of recorded VoIP call for blind source computer device identification. Although initial results were promising but theoretical reasoning for this is yet to be found. The study suggested computing entropy of mel-frequency cepstrum coefficients (entropy-MFCC) from near-silent segments as an intrinsic feature set that captures the device response function due to the tolerances in the electronic components of individual computer devices. By applying the supervised learning techniques of naïve Bayesian, linear logistic regression, neural networks and support vector machines to the entropy-MFCC features, state-of-the-art identification accuracy of near 99.9% has been achieved on different sets of computer devices for both call recording and microphone recording scenarios. Furthermore, unsupervised learning techniques, including simple k-means, expectation-maximization and density-based spatial clustering of applications with noise (DBSCAN) provided promising results for call recording dataset by assigning the majority of instances to their correct clusters. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Eva Sau Fan; Department of Health Technology and Informatics, The Hong Kong Polytechnic University; Wu, Vincent Wing Cheung
Long planning time in volumetric-modulated arc stereotactic radiotherapy (VMA-SRT) cases can limit its clinical efficiency and use. A vector model could retrieve previously successful radiotherapy cases that share various common anatomic features with the current case. The prsent study aimed to develop a vector model that could reduce planning time by applying the optimization parameters from those retrieved reference cases. Thirty-six VMA-SRT cases of brain metastasis (gender, male [n = 23], female [n = 13]; age range, 32 to 81 years old) were collected and used as a reference database. Another 10 VMA-SRT cases were planned with both conventional optimization and vector-model-supported optimization, followingmore » the oncologists' clinical dose prescriptions. Planning time and plan quality measures were compared using the 2-sided paired Wilcoxon signed rank test with a significance level of 0.05, with positive false discovery rate (pFDR) of less than 0.05. With vector-model-supported optimization, there was a significant reduction in the median planning time, a 40% reduction from 3.7 to 2.2 hours (p = 0.002, pFDR = 0.032), and for the number of iterations, a 30% reduction from 8.5 to 6.0 (p = 0.006, pFDR = 0.047). The quality of plans from both approaches was comparable. From these preliminary results, vector-model-supported optimization can expedite the optimization of VMA-SRT for brain metastasis while maintaining plan quality.« less
Functional analysis of the upstream regulatory region of chicken miR-17-92 cluster.
Cheng, Min; Zhang, Wen-jian; Xing, Tian-yu; Yan, Xiao-hong; Li, Yu-mao; Li, Hui; Wang, Ning
2016-08-01
miR-17-92 cluster plays important roles in cell proliferation, differentiation, apoptosis, animal development and tumorigenesis. The transcriptional regulation of miR-17-92 cluster has been extensively studied in mammals, but not in birds. To date, avian miR-17-92 cluster genomic structure has not been fully determined. The promoter location and sequence of miR-17-92 cluster have not been determined, due to the existence of a genomic gap sequence upstream of miR-17-92 cluster in all the birds whose genomes have been sequenced. In this study, genome walking was used to close the genomic gap upstream of chicken miR-17-92 cluster. In addition, bioinformatics analysis, reporter gene assay and truncation mutagenesis were used to investigate functional role of the genomic gap sequence. Genome walking analysis showed that the gap region was 1704 bp long, and its GC content was 80.11%. Bioinformatics analysis showed that in the gap region, there was a 200 bp conserved sequence among the tested 10 species (Gallus gallus, Homo sapiens, Pan troglodytes, Bos taurus, Sus scrofa, Rattus norvegicus, Mus musculus, Possum, Danio rerio, Rana nigromaculata), which is core promoter region of mammalian miR-17-92 host gene (MIR17HG). Promoter luciferase reporter gene vector of the gap region was constructed and reporter assay was performed. The result showed that the promoter activity of pGL3-cMIR17HG (-4228/-2506) was 417 times than that of negative control (empty pGL3 basic vector), suggesting that chicken miR-17-92 cluster promoter exists in the gap region. To further gain insight into the promoter structure, two different truncations for the cloned gap sequence were generated by PCR. One had a truncation of 448 bp at the 5'-end and the other had a truncation of 894 bp at the 3'-end. Further reporter analysis showed that compared with the promoter activity of pGL3-cMIR17HG (-4228/-2506), the reporter activities of the 5'-end truncation and the 3'-end truncation were reduced by 19.82% and 60.14%, respectively. These data demonstrated that the important promoter region of chicken miR-17-92 cluster is located in the -3400/-2506 bp region. Our results lay the foundation for revealing the transcriptional regulatory mechanisms of chicken miR-17-92 cluster.
Data Mining Technologies Inspired from Visual Principle
NASA Astrophysics Data System (ADS)
Xu, Zongben
In this talk we review the recent work done by our group on data mining (DM) technologies deduced from simulating visual principle. Through viewing a DM problem as a cognition problems and treading a data set as an image with each light point located at a datum position, we developed a series of high efficient algorithms for clustering, classification and regression via mimicking visual principles. In pattern recognition, human eyes seem to possess a singular aptitude to group objects and find important structure in an efficient way. Thus, a DM algorithm simulating visual system may solve some basic problems in DM research. From this point of view, we proposed a new approach for data clustering by modeling the blurring effect of lateral retinal interconnections based on scale space theory. In this approach, as the data image blurs, smaller light blobs merge into large ones until the whole image becomes one light blob at a low enough level of resolution. By identifying each blob with a cluster, the blurring process then generates a family of clustering along the hierarchy. The proposed approach provides unique solutions to many long standing problems, such as the cluster validity and the sensitivity to initialization problems, in clustering. We extended such an approach to classification and regression problems, through combatively employing the Weber's law in physiology and the cell response classification facts. The resultant classification and regression algorithms are proven to be very efficient and solve the problems of model selection and applicability to huge size of data set in DM technologies. We finally applied the similar idea to the difficult parameter setting problem in support vector machine (SVM). Viewing the parameter setting problem as a recognition problem of choosing a visual scale at which the global and local structures of a data set can be preserved, and the difference between the two structures be maximized in the feature space, we derived a direct parameter setting formula for the Gaussian SVM. The simulations and applications show that the suggested formula significantly outperforms the known model selection methods in terms of efficiency and precision.
Schwach, Frank; Bushell, Ellen; Gomes, Ana Rita; Anar, Burcu; Girling, Gareth; Herd, Colin; Rayner, Julian C; Billker, Oliver
2015-01-01
The Plasmodium Genetic Modification (PlasmoGEM) database (http://plasmogem.sanger.ac.uk) provides access to a resource of modular, versatile and adaptable vectors for genome modification of Plasmodium spp. parasites. PlasmoGEM currently consists of >2000 plasmids designed to modify the genome of Plasmodium berghei, a malaria parasite of rodents, which can be requested by non-profit research organisations free of charge. PlasmoGEM vectors are designed with long homology arms for efficient genome integration and carry gene specific barcodes to identify individual mutants. They can be used for a wide array of applications, including protein localisation, gene interaction studies and high-throughput genetic screens. The vector production pipeline is supported by a custom software suite that automates both the vector design process and quality control by full-length sequencing of the finished vectors. The PlasmoGEM web interface allows users to search a database of finished knock-out and gene tagging vectors, view details of their designs, download vector sequence in different formats and view available quality control data as well as suggested genotyping strategies. We also make gDNA library clones and intermediate vectors available for researchers to produce vectors for themselves. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
WIYN OPEN CLUSTER STUDY. LV. ASTROMETRY AND MEMBERSHIP IN NGC 6819
DOE Office of Scientific and Technical Information (OSTI.GOV)
Platais, Imants; Gosnell, Natalie M.; Meibom, Soren
2013-08-01
We present proper motions and astrometric membership analysis for 15,750 stars around the intermediate-age open cluster NGC 6819. The accuracy of relative proper motions for well-measured stars ranges from {approx}0.2 mas yr{sup -1} within 10' of the cluster center to 1.1 mas yr{sup -1} outside this radius. In the proper motion vector-point diagram, the separation between the cluster members and field stars is convincing down to V {approx} 18 and within 10' from the cluster center. The formal sum of membership probabilities indicates a total of {approx}2500 cluster members down to V {approx} 22. We confirm the cluster membership ofmore » several variable stars, including some eclipsing binaries. The estimated absolute proper motion of NGC 6819 is {mu}{sub x}{sup abs}=-2.6{+-}0.5 and {mu}{sub y}{sup abs}=-4.2{+-}0.5 mas yr{sup -1}. A cross-identification between the proper motion catalog and a list of X-ray sources in the field of NGC 6819 resulted in a number of new likely optical counterparts, including a candidate CV. For the first time we show that there is significant differential reddening toward NGC 6819.« less
Zhao, Yan-Hui; Ren, Zong-Xin; Lázaro, Amparo; Wang, Hong; Bernhardt, Peter; Li, Hai-Dong; Li, De-Zhu
2016-05-24
How floral traits and community composition influence plant specialization is poorly understood and the existing evidence is restricted to regions where plant diversity is low. Here, we assessed whether plant specialization varied among four species-rich subalpine/alpine communities on the Yulong Mountain, SW China (elevation from 2725 to 3910 m). We analyzed two factors (floral traits and pollen vector community composition: richness and density) to determine the degree of plant specialization across 101 plant species in all four communities. Floral visitors were collected and pollen load analyses were conducted to identify and define pollen vectors. Plant specialization of each species was described by using both pollen vector diversity (Shannon's diversity index) and plant selectiveness (d' index), which reflected how selective a given species was relative to available pollen vectors. Pollen vector diversity tended to be higher in communities at lower elevations, while plant selectiveness was significantly lower in a community with the highest proportion of unspecialized flowers (open flowers and clusters of flowers in open inflorescences). In particular, we found that plant species with large and unspecialized flowers attracted a greater diversity of pollen vectors and showed higher selectiveness in their use of pollen vectors. Plant species with large floral displays and high flower abundance were more selective in their exploitation of pollen vectors. Moreover, there was a negative relationship between plant selectiveness and pollen vector density. These findings suggest that flower shape and flower size can increase pollen vector diversity but they also increased plant selectiveness. This indicated that those floral traits that were more attractive to insects increased the diversity of pollen vectors to plants while decreasing overlap among co-blooming plant species for the same pollen vectors. Furthermore, floral traits had a more important impact on the diversity of pollen vectors than the composition of anthophilous insect communities. Plant selectiveness of pollen vectors was strongly influenced by both floral traits and insect community composition. These findings provide a basis for a better understanding of how floral traits and community context shape interactions between flowers and their pollen vectors in species-rich communities.
Jia, Fan; Gampala, Srinivas S.L.; Mittal, Amandeep; Luo, Qingjun; Rock, Christopher D.
2009-01-01
The 14,200 available full length Arabidopsis thaliana cDNAs in the Universal Plasmid System (UPS) donor vector pUNI51 should be applied broadly and efficiently to leverage a “functional map-space” of homologous plant genes. We have engineered Cre-lox UPS host acceptor vectors (pCR701- 705) with N-terminal epitope tags in frame with the loxH site and downstream from the maize Ubiquitin promoter for use in transient protoplast expression assays and particle bombardment transformation of monocots. As an example of the utility of these vectors, we recombined them with several Arabidopsis cDNAs encoding Ser/Thr protein phosphatase type 2C (PP2Cs) known from genetic studies or predicted by hierarchical clustering meta-analysis to be involved in ABA and stress responses. Our functional results in Zea mays mesophyll protoplasts on ABA-inducible expression effects on the Late Embryogenesis Abundant promoter ProEm:GUS reporter were consistent with predictions and resulted in identification of novel activities of some PP2Cs. Deployment of these vectors can facilitate functional genomics and proteomics and identification of novel gene activities. PMID:19499346
Influence of the Lower Jaw Position on the Running Pattern.
Maurer, Christian; Stief, Felix; Jonas, Alexander; Kovac, Andrej; Groneberg, David Alexander; Meurer, Andrea; Ohlendorf, Daniela
2015-01-01
The effects of manipulated dental occlusion on body posture has been investigated quite often and discussed controversially in the literature. Far less attention has been paid to the influence of dental occlusion position on human movement. If human movement was analysed, it was mostly while walking and not while running. This study was therefore designed to identify the effect of lower jaw positions on running behaviour according to different dental occlusion positions. Twenty healthy young recreational runners (mean age = 33.9±5.8 years) participated in this study. Kinematic data were collected using an eight-camera Vicon motion capture system (VICON Motion Systems, Oxford, UK). Subjects were consecutively prepared with four different dental occlusion conditions in random order and performed five running trials per test condition on a level walkway with their preferred running shoes. Vector based pattern recognition methods, in particular cluster analysis and support vector machines (SVM) were used for movement pattern identification. Subjects exhibited unique movement patterns leading to 18 clusters for the 20 subjects. No overall classification of the splint condition could be observed. Within individual subjects different running patterns could be identified for the four splint conditions. The splint conditions lead to a more symmetrical running pattern than the control condition. The influence of an occlusal splint on running pattern can be confirmed in this study. Wearing a splint increases the symmetry of the running pattern. A more symmetrical running pattern might help to reduce the risk of injuries or help in performance. The change of the movement pattern between the neutral condition and any of the three splint conditions was significant within subjects but not across subjects. Therefore the dental splint has a measureable influence on the running pattern of subjects, however subjects individuality has to be considered when choosing the optimal splint condition for a specific subject.
Typologies of Social Support and Associations with Mental Health Outcomes Among LGBT Youth.
McConnell, Elizabeth A; Birkett, Michelle A; Mustanski, Brian
2015-03-01
Lesbian, gay, bisexual, and transgender (LGBT) youth show increased risk for a number of negative mental health outcomes, which research has linked to minority stressors such as victimization. Further, social support promotes positive mental health outcomes for LGBT youth, and different sources of social support show differential relationships with mental health outcomes. However, little is known about how combinations of different sources of support impact mental health. In the present study, we identify clusters of family, peer, and significant other social support and then examine demographic and mental health differences by cluster in an analytic sample of 232 LGBT youth between the ages of 16 and 20 years. Using k-means cluster analysis, three social support cluster types were identified: high support (44.0% of participants), low support (21.6%), and non-family support (34.5%). A series of chi-square tests were used to examine demographic differences between these clusters, which were found for socio-economic status (SES). Regression analyses indicated that, while controlling for victimization, individuals within the three clusters showed different relationships with multiple mental health outcomes: loneliness, hopelessness, depression, anxiety, somatization, general symptom severity, and symptoms of major depressive disorder (MDD). Findings suggest the combinations of sources of support LGBT youth receive are related to their mental health. Higher SES youth are more likely to receive support from family, peers, and significant others. For most mental health outcomes, family support appears to be an especially relevant and important source of support to target for LGBT youth.
1-norm support vector novelty detection and its sparseness.
Zhang, Li; Zhou, WeiDa
2013-12-01
This paper proposes a 1-norm support vector novelty detection (SVND) method and discusses its sparseness. 1-norm SVND is formulated as a linear programming problem and uses two techniques for inducing sparseness, or the 1-norm regularization and the hinge loss function. We also find two upper bounds on the sparseness of 1-norm SVND, or exact support vector (ESV) and kernel Gram matrix rank bounds. The ESV bound indicates that 1-norm SVND has a sparser representation model than SVND. The kernel Gram matrix rank bound can loosely estimate the sparseness of 1-norm SVND. Experimental results show that 1-norm SVND is feasible and effective. Copyright © 2013 Elsevier Ltd. All rights reserved.
ℓ(p)-Norm multikernel learning approach for stock market price forecasting.
Shao, Xigao; Wu, Kun; Liao, Bifeng
2012-01-01
Linear multiple kernel learning model has been used for predicting financial time series. However, ℓ(1)-norm multiple support vector regression is rarely observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures that generalize well, we adopt ℓ(p)-norm multiple kernel support vector regression (1 ≤ p < ∞) as a stock price prediction model. The optimization problem is decomposed into smaller subproblems, and the interleaved optimization strategy is employed to solve the regression model. The model is evaluated on forecasting the daily stock closing prices of Shanghai Stock Index in China. Experimental results show that our proposed model performs better than ℓ(1)-norm multiple support vector regression model.
Applications of Support Vector Machine (SVM) Learning in Cancer Genomics
HUANG, SHUJUN; CAI, NIANGUANG; PACHECO, PEDRO PENZUTI; NARANDES, SHAVIRA; WANG, YANG; XU, WAYNE
2017-01-01
Machine learning with maximization (support) of separating margin (vector), called support vector machine (SVM) learning, is a powerful classification tool that has been used for cancer genomic classification or subtyping. Today, as advancements in high-throughput technologies lead to production of large amounts of genomic and epigenomic data, the classification feature of SVMs is expanding its use in cancer genomics, leading to the discovery of new biomarkers, new drug targets, and a better understanding of cancer driver genes. Herein we reviewed the recent progress of SVMs in cancer genomic studies. We intend to comprehend the strength of the SVM learning and its future perspective in cancer genomic applications. PMID:29275361
Catalysis applications of size-selected cluster deposition
Vajda, Stefan; White, Michael G.
2015-10-23
In this Perspective, we review recent studies of size-selected cluster deposition for catalysis applications performed at the U.S. DOE National Laboratories, with emphasis on work at Argonne National Laboratory (ANL) and Brookhaven National Laboratory (BNL). The focus is on the preparation of model supported catalysts in which the number of atoms in the deposited clusters is precisely controlled using a combination of gas-phase cluster ion sources, mass spectrometry, and soft-landing techniques. This approach is particularly effective for investigations of small nanoclusters, 0.5-2 nm (<200 atoms), where the rapid evolution of the atomic and electronic structure makes it essential to havemore » precise control over cluster size. Cluster deposition allows for independent control of cluster size, coverage, and stoichiometry (e.g., the metal-to-oxygen ratio in an oxide cluster) and can be used to deposit on any substrate without constraints of nucleation and growth. Examples are presented for metal, metal oxide, and metal sulfide cluster deposition on a variety of supports (metals, oxides, carbon/diamond) where the reactivity, cluster-support electronic interactions, and cluster stability and morphology are investigated. Both UHV and in situ/operando studies are presented that also make use of surface-sensitive X-ray characterization tools from synchrotron radiation facilities. Novel applications of cluster deposition to electrochemistry and batteries are also presented. This review also highlights the application of modern ab initio electronic structure calculations (density functional theory), which can essentially model the exact experimental system used in the laboratory (i.e., cluster and support) to provide insight on atomic and electronic structure, reaction energetics, and mechanisms. As amply demonstrated in this review, the powerful combination of atomically precise cluster deposition and theory is able to address fundamental aspects of size-effects, cluster-support interactions, and reaction mechanisms of cluster materials that are central to how catalysts function. Lastly, the insight gained from such studies can be used to further the development of novel nanostructured catalysts with high activity and selectivity.« less
A transversal approach to predict gene product networks from ontology-based similarity
Chabalier, Julie; Mosser, Jean; Burgun, Anita
2007-01-01
Background Interpretation of transcriptomic data is usually made through a "standard" approach which consists in clustering the genes according to their expression patterns and exploiting Gene Ontology (GO) annotations within each expression cluster. This approach makes it difficult to underline functional relationships between gene products that belong to different expression clusters. To address this issue, we propose a transversal analysis that aims to predict functional networks based on a combination of GO processes and data expression. Results The transversal approach presented in this paper consists in computing the semantic similarity between gene products in a Vector Space Model. Through a weighting scheme over the annotations, we take into account the representativity of the terms that annotate a gene product. Comparing annotation vectors results in a matrix of gene product similarities. Combined with expression data, the matrix is displayed as a set of functional gene networks. The transversal approach was applied to 186 genes related to the enterocyte differentiation stages. This approach resulted in 18 functional networks proved to be biologically relevant. These results were compared with those obtained through a standard approach and with an approach based on information content similarity. Conclusion Complementary to the standard approach, the transversal approach offers new insight into the cellular mechanisms and reveals new research hypotheses by combining gene product networks based on semantic similarity, and data expression. PMID:17605807
Salim, Shelly; Moh, Sangman; Choi, Dongmin; Chung, Ilyong
2014-08-11
A cognitive radio sensor network (CRSN) is a wireless sensor network whose sensor nodes are equipped with cognitive radio capability. Clustering is one of the most challenging issues in CRSNs, as all sensor nodes, including the cluster head, have to use the same frequency band in order to form a cluster. However, due to the nature of heterogeneous channels in cognitive radio, it is difficult for sensor nodes to find a cluster head. This paper proposes a novel energy-efficient and compact clustering scheme named clustering with temporary support nodes (CENTRE). CENTRE efficiently achieves a compact cluster formation by adopting two-phase cluster formation with fixed duration. By introducing a novel concept of temporary support nodes to improve the cluster formation, the proposed scheme enables sensor nodes in a network to find a cluster head efficiently. The performance study shows that not only is the clustering process efficient and compact but it also results in remarkable energy savings that prolong the overall network lifetime. In addition, the proposed scheme decreases both the clustering overhead and the average distance between cluster heads and their members.
Salim, Shelly; Moh, Sangman; Choi, Dongmin; Chung, Ilyong
2014-01-01
A cognitive radio sensor network (CRSN) is a wireless sensor network whose sensor nodes are equipped with cognitive radio capability. Clustering is one of the most challenging issues in CRSNs, as all sensor nodes, including the cluster head, have to use the same frequency band in order to form a cluster. However, due to the nature of heterogeneous channels in cognitive radio, it is difficult for sensor nodes to find a cluster head. This paper proposes a novel energy-efficient and compact clustering scheme named clustering with temporary support nodes (CENTRE). CENTRE efficiently achieves a compact cluster formation by adopting two-phase cluster formation with fixed duration. By introducing a novel concept of temporary support nodes to improve the cluster formation, the proposed scheme enables sensor nodes in a network to find a cluster head efficiently. The performance study shows that not only is the clustering process efficient and compact but it also results in remarkable energy savings that prolong the overall network lifetime. In addition, the proposed scheme decreases both the clustering overhead and the average distance between cluster heads and their members. PMID:25116905
Adaptive Hybrid Picture Coding.
1986-11-30
the cluster , where 1 6 ct ;- z Ay 37 6 i=1 30 where P dk’ X. - . - X. c I c I t ’ k 38 cyi : ~i x1 38 with c the index over the cluster obtained from...by Principal Invesigator AIR PORCE C OF SCIENTIFIC ktSEARCH (AFSC) NOTICE OF TRANSMITTAL TO DrIC Udhnetrt as he- reviwed and is ’*d or tis" 1AWAfR190...compared. 2 The basic element of shape space is the shape vector k.Z, where j indicates the jth set of measurements from the kth shape. If there are K
Support vector machine applied to predict the zoonotic potential of E. coli O157 cattle isolates
USDA-ARS?s Scientific Manuscript database
Methods based on sequence data analysis facilitate the tracking of disease outbreaks, allow relationships between strains to be reconstructed and virulence factors to be identified. However, these methods are used postfactum after an outbreak has happened. Here, we show that support vector machine a...
Support vector machine incremental learning triggered by wrongly predicted samples
NASA Astrophysics Data System (ADS)
Tang, Ting-long; Guan, Qiu; Wu, Yi-rong
2018-05-01
According to the classic Karush-Kuhn-Tucker (KKT) theorem, at every step of incremental support vector machine (SVM) learning, the newly adding sample which violates the KKT conditions will be a new support vector (SV) and migrate the old samples between SV set and non-support vector (NSV) set, and at the same time the learning model should be updated based on the SVs. However, it is not exactly clear at this moment that which of the old samples would change between SVs and NSVs. Additionally, the learning model will be unnecessarily updated, which will not greatly increase its accuracy but decrease the training speed. Therefore, how to choose the new SVs from old sets during the incremental stages and when to process incremental steps will greatly influence the accuracy and efficiency of incremental SVM learning. In this work, a new algorithm is proposed to select candidate SVs and use the wrongly predicted sample to trigger the incremental processing simultaneously. Experimental results show that the proposed algorithm can achieve good performance with high efficiency, high speed and good accuracy.
Prediction of Spirometric Forced Expiratory Volume (FEV1) Data Using Support Vector Regression
NASA Astrophysics Data System (ADS)
Kavitha, A.; Sujatha, C. M.; Ramakrishnan, S.
2010-01-01
In this work, prediction of forced expiratory volume in 1 second (FEV1) in pulmonary function test is carried out using the spirometer and support vector regression analysis. Pulmonary function data are measured with flow volume spirometer from volunteers (N=175) using a standard data acquisition protocol. The acquired data are then used to predict FEV1. Support vector machines with polynomial kernel function with four different orders were employed to predict the values of FEV1. The performance is evaluated by computing the average prediction accuracy for normal and abnormal cases. Results show that support vector machines are capable of predicting FEV1 in both normal and abnormal cases and the average prediction accuracy for normal subjects was higher than that of abnormal subjects. Accuracy in prediction was found to be high for a regularization constant of C=10. Since FEV1 is the most significant parameter in the analysis of spirometric data, it appears that this method of assessment is useful in diagnosing the pulmonary abnormalities with incomplete data and data with poor recording.
Right hemisphere grey matter structure and language outcomes in chronic left hemisphere stroke
Xing, Shihui; Lacey, Elizabeth H.; Skipper-Kallal, Laura M.; Jiang, Xiong; Harris-Love, Michelle L.; Zeng, Jinsheng
2016-01-01
The neural mechanisms underlying recovery of language after left hemisphere stroke remain elusive. Although older evidence suggested that right hemisphere language homologues compensate for damage in left hemisphere language areas, the current prevailing theory suggests that right hemisphere engagement is ineffective or even maladaptive. Using a novel combination of support vector regression-based lesion-symptom mapping and voxel-based morphometry, we aimed to determine whether local grey matter volume in the right hemisphere independently contributes to aphasia outcomes after chronic left hemisphere stroke. Thirty-two left hemisphere stroke survivors with aphasia underwent language assessment with the Western Aphasia Battery-Revised and tests of other cognitive domains. High-resolution T1-weighted images were obtained in aphasia patients and 30 demographically matched healthy controls. Support vector regression-based multivariate lesion-symptom mapping was used to identify critical language areas in the left hemisphere and then to quantify each stroke survivor’s lesion burden in these areas. After controlling for these direct effects of the stroke on language, voxel-based morphometry was then used to determine whether local grey matter volumes in the right hemisphere explained additional variance in language outcomes. In brain areas in which grey matter volumes related to language outcomes, we then compared grey matter volumes in patients and healthy controls to assess post-stroke plasticity. Lesion–symptom mapping showed that specific left hemisphere regions related to different language abilities. After controlling for lesion burden in these areas, lesion size, and demographic factors, grey matter volumes in parts of the right temporoparietal cortex positively related to spontaneous speech, naming, and repetition scores. Examining whether domain general cognitive functions might explain these relationships, partial correlations demonstrated that grey matter volumes in these clusters related to verbal working memory capacity, but not other cognitive functions. Further, grey matter volumes in these areas were greater in stroke survivors than healthy control subjects. To confirm this result, 10 chronic left hemisphere stroke survivors with no history of aphasia were identified. Grey matter volumes in right temporoparietal clusters were greater in stroke survivors with aphasia compared to those without history of aphasia. These findings suggest that the grey matter structure of right hemisphere posterior dorsal stream language homologues independently contributes to language production abilities in chronic left hemisphere stroke, and that these areas may undergo hypertrophy after a stroke causing aphasia. PMID:26521078
Large inserts for big data: artificial chromosomes in the genomic era.
Tocchetti, Arianna; Donadio, Stefano; Sosio, Margherita
2018-05-01
The exponential increase in available microbial genome sequences coupled with predictive bioinformatic tools is underscoring the genetic capacity of bacteria to produce an unexpected large number of specialized bioactive compounds. Since most of the biosynthetic gene clusters (BGCs) present in microbial genomes are cryptic, i.e. not expressed under laboratory conditions, a variety of cloning systems and vectors have been devised to harbor DNA fragments large enough to carry entire BGCs and to allow their transfer in suitable heterologous hosts. This minireview provides an overview of the vectors and approaches that have been developed for cloning large BGCs, and successful examples of heterologous expression.
Quantum Support Vector Machine for Big Data Classification
NASA Astrophysics Data System (ADS)
Rebentrost, Patrick; Mohseni, Masoud; Lloyd, Seth
2014-09-01
Supervised machine learning is the classification of new data based on already classified training examples. In this work, we show that the support vector machine, an optimized binary classifier, can be implemented on a quantum computer, with complexity logarithmic in the size of the vectors and the number of training examples. In cases where classical sampling algorithms require polynomial time, an exponential speedup is obtained. At the core of this quantum big data algorithm is a nonsparse matrix exponentiation technique for efficiently performing a matrix inversion of the training data inner-product (kernel) matrix.
Ranked centroid projection: a data visualization approach with self-organizing maps.
Yen, G G; Wu, Z
2008-02-01
The self-organizing map (SOM) is an efficient tool for visualizing high-dimensional data. In this paper, the clustering and visualization capabilities of the SOM, especially in the analysis of textual data, i.e., document collections, are reviewed and further developed. A novel clustering and visualization approach based on the SOM is proposed for the task of text mining. The proposed approach first transforms the document space into a multidimensional vector space by means of document encoding. Afterwards, a growing hierarchical SOM (GHSOM) is trained and used as a baseline structure to automatically produce maps with various levels of detail. Following the GHSOM training, the new projection method, namely the ranked centroid projection (RCP), is applied to project the input vectors to a hierarchy of 2-D output maps. The RCP is used as a data analysis tool as well as a direct interface to the data. In a set of simulations, the proposed approach is applied to an illustrative data set and two real-world scientific document collections to demonstrate its applicability.
DOE Office of Scientific and Technical Information (OSTI.GOV)
García-Sánchez, Tania; Gómez-Lázaro, Emilio; Muljadi, E.
An alternative approach to characterise real voltage dips is proposed and evaluated in this study. The proposed methodology is based on voltage-space vector solutions, identifying parameters for ellipses trajectories by using the least-squares algorithm applied on a sliding window along the disturbance. The most likely patterns are then estimated through a clustering process based on the k-means algorithm. The objective is to offer an efficient and easily implemented alternative to characterise faults and visualise the most likely instantaneous phase-voltage evolution during events through their corresponding voltage-space vector trajectories. This novel solution minimises the data to be stored but maintains extensivemore » information about the dips including starting and ending transients. The proposed methodology has been applied satisfactorily to real voltage dips obtained from intensive field-measurement campaigns carried out in a Spanish wind power plant up to a time period of several years. A comparison to traditional minimum root mean square-voltage and time-duration classifications is also included in this study.« less
SSAW: A new sequence similarity analysis method based on the stationary discrete wavelet transform.
Lin, Jie; Wei, Jing; Adjeroh, Donald; Jiang, Bing-Hua; Jiang, Yue
2018-05-02
Alignment-free sequence similarity analysis methods often lead to significant savings in computational time over alignment-based counterparts. A new alignment-free sequence similarity analysis method, called SSAW is proposed. SSAW stands for Sequence Similarity Analysis using the Stationary Discrete Wavelet Transform (SDWT). It extracts k-mers from a sequence, then maps each k-mer to a complex number field. Then, the series of complex numbers formed are transformed into feature vectors using the stationary discrete wavelet transform. After these steps, the original sequence is turned into a feature vector with numeric values, which can then be used for clustering and/or classification. Using two different types of applications, namely, clustering and classification, we compared SSAW against the the-state-of-the-art alignment free sequence analysis methods. SSAW demonstrates competitive or superior performance in terms of standard indicators, such as accuracy, F-score, precision, and recall. The running time was significantly better in most cases. These make SSAW a suitable method for sequence analysis, especially, given the rapidly increasing volumes of sequence data required by most modern applications.
Predicting complications of percutaneous coronary intervention using a novel support vector method.
Lee, Gyemin; Gurm, Hitinder S; Syed, Zeeshan
2013-01-01
To explore the feasibility of a novel approach using an augmented one-class learning algorithm to model in-laboratory complications of percutaneous coronary intervention (PCI). Data from the Blue Cross Blue Shield of Michigan Cardiovascular Consortium (BMC2) multicenter registry for the years 2007 and 2008 (n=41 016) were used to train models to predict 13 different in-laboratory PCI complications using a novel one-plus-class support vector machine (OP-SVM) algorithm. The performance of these models in terms of discrimination and calibration was compared to the performance of models trained using the following classification algorithms on BMC2 data from 2009 (n=20 289): logistic regression (LR), one-class support vector machine classification (OC-SVM), and two-class support vector machine classification (TC-SVM). For the OP-SVM and TC-SVM approaches, variants of the algorithms with cost-sensitive weighting were also considered. The OP-SVM algorithm and its cost-sensitive variant achieved the highest area under the receiver operating characteristic curve for the majority of the PCI complications studied (eight cases). Similar improvements were observed for the Hosmer-Lemeshow χ(2) value (seven cases) and the mean cross-entropy error (eight cases). The OP-SVM algorithm based on an augmented one-class learning problem improved discrimination and calibration across different PCI complications relative to LR and traditional support vector machine classification. Such an approach may have value in a broader range of clinical domains.
Predicting complications of percutaneous coronary intervention using a novel support vector method
Lee, Gyemin; Gurm, Hitinder S; Syed, Zeeshan
2013-01-01
Objective To explore the feasibility of a novel approach using an augmented one-class learning algorithm to model in-laboratory complications of percutaneous coronary intervention (PCI). Materials and methods Data from the Blue Cross Blue Shield of Michigan Cardiovascular Consortium (BMC2) multicenter registry for the years 2007 and 2008 (n=41 016) were used to train models to predict 13 different in-laboratory PCI complications using a novel one-plus-class support vector machine (OP-SVM) algorithm. The performance of these models in terms of discrimination and calibration was compared to the performance of models trained using the following classification algorithms on BMC2 data from 2009 (n=20 289): logistic regression (LR), one-class support vector machine classification (OC-SVM), and two-class support vector machine classification (TC-SVM). For the OP-SVM and TC-SVM approaches, variants of the algorithms with cost-sensitive weighting were also considered. Results The OP-SVM algorithm and its cost-sensitive variant achieved the highest area under the receiver operating characteristic curve for the majority of the PCI complications studied (eight cases). Similar improvements were observed for the Hosmer–Lemeshow χ2 value (seven cases) and the mean cross-entropy error (eight cases). Conclusions The OP-SVM algorithm based on an augmented one-class learning problem improved discrimination and calibration across different PCI complications relative to LR and traditional support vector machine classification. Such an approach may have value in a broader range of clinical domains. PMID:23599229
Personalized identification of differentially expressed pathways in pediatric sepsis.
Li, Binjie; Zeng, Qiyi
2017-10-01
Sepsis is a leading killer of children worldwide with numerous differentially expressed genes reported to be associated with sepsis. Identifying core pathways in an individual is important for understanding septic mechanisms and for the future application of custom therapeutic decisions. Samples used in the study were from a control group (n=18) and pediatric sepsis group (n=52). Based on Kauffman's attractor theory, differentially expressed pathways associated with pediatric sepsis were detected as attractors. When the distribution results of attractors are consistent with the distribution of total data assessed using support vector machine, the individualized pathway aberrance score (iPAS) was calculated to distinguish differences. Through attractor and Kyoto Encyclopedia of Genes and Genomes functional analysis, 277 enriched pathways were identified as attractors. There were 81 pathways with P<0.05 and 59 pathways with P<0.01. Distribution outcomes of screened attractors were mostly consistent with the total data demonstrated by the six classifying parameters, which suggested the efficiency of attractors. Cluster analysis of pediatric sepsis using the iPAS method identified seven pathway clusters and four sample clusters. Thus, in the majority pediatric sepsis samples, core pathways can be detected as different from accumulated normal samples. In conclusion, a novel procedure that identified the dysregulated attractors in individuals with pediatric sepsis was constructed. Attractors can be markers to identify pathways involved in pediatric sepsis. iPAS may provide a correlation score for each of the signaling pathways present in an individual patient. This process may improve the personalized interpretation of disease mechanisms and may be useful in the forthcoming era of personalized medicine.
Gene Discovery in Bladder Cancer Progression using cDNA Microarrays
Sanchez-Carbayo, Marta; Socci, Nicholas D.; Lozano, Juan Jose; Li, Wentian; Charytonowicz, Elizabeth; Belbin, Thomas J.; Prystowsky, Michael B.; Ortiz, Angel R.; Childs, Geoffrey; Cordon-Cardo, Carlos
2003-01-01
To identify gene expression changes along progression of bladder cancer, we compared the expression profiles of early-stage and advanced bladder tumors using cDNA microarrays containing 17,842 known genes and expressed sequence tags. The application of bootstrapping techniques to hierarchical clustering segregated early-stage and invasive transitional carcinomas into two main clusters. Multidimensional analysis confirmed these clusters and more importantly, it separated carcinoma in situ from papillary superficial lesions and subgroups within early-stage and invasive tumors displaying different overall survival. Additionally, it recognized early-stage tumors showing gene profiles similar to invasive disease. Different techniques including standard t-test, single-gene logistic regression, and support vector machine algorithms were applied to identify relevant genes involved in bladder cancer progression. Cytokeratin 20, neuropilin-2, p21, and p33ING1 were selected among the top ranked molecular targets differentially expressed and validated by immunohistochemistry using tissue microarrays (n = 173). Their expression patterns were significantly associated with pathological stage, tumor grade, and altered retinoblastoma (RB) expression. Moreover, p33ING1 expression levels were significantly associated with overall survival. Analysis of the annotation of the most significant genes revealed the relevance of critical genes and pathways during bladder cancer progression, including the overexpression of oncogenic genes such as DEK in superficial tumors or immune response genes such as Cd86 antigen in invasive disease. Gene profiling successfully classified bladder tumors based on their progression and clinical outcome. The present study has identified molecular biomarkers of potential clinical significance and critical molecular targets associated with bladder cancer progression. PMID:12875971
Banerjee, Arindam; Ghosh, Joydeep
2004-05-01
Competitive learning mechanisms for clustering, in general, suffer from poor performance for very high-dimensional (>1000) data because of "curse of dimensionality" effects. In applications such as document clustering, it is customary to normalize the high-dimensional input vectors to unit length, and it is sometimes also desirable to obtain balanced clusters, i.e., clusters of comparable sizes. The spherical kmeans (spkmeans) algorithm, which normalizes the cluster centers as well as the inputs, has been successfully used to cluster normalized text documents in 2000+ dimensional space. Unfortunately, like regular kmeans and its soft expectation-maximization-based version, spkmeans tends to generate extremely imbalanced clusters in high-dimensional spaces when the desired number of clusters is large (tens or more). This paper first shows that the spkmeans algorithm can be derived from a certain maximum likelihood formulation using a mixture of von Mises-Fisher distributions as the generative model, and in fact, it can be considered as a batch-mode version of (normalized) competitive learning. The proposed generative model is then adapted in a principled way to yield three frequency-sensitive competitive learning variants that are applicable to static data and produced high-quality and well-balanced clusters for high-dimensional data. Like kmeans, each iteration is linear in the number of data points and in the number of clusters for all the three algorithms. A frequency-sensitive algorithm to cluster streaming data is also proposed. Experimental results on clustering of high-dimensional text data sets are provided to show the effectiveness and applicability of the proposed techniques. Index Terms-Balanced clustering, expectation maximization (EM), frequency-sensitive competitive learning (FSCL), high-dimensional clustering, kmeans, normalized data, scalable clustering, streaming data, text clustering.
Brain tumor classification using the diffusion tensor image segmentation (D-SEG) technique.
Jones, Timothy L; Byrnes, Tiernan J; Yang, Guang; Howe, Franklyn A; Bell, B Anthony; Barrick, Thomas R
2015-03-01
There is an increasing demand for noninvasive brain tumor biomarkers to guide surgery and subsequent oncotherapy. We present a novel whole-brain diffusion tensor imaging (DTI) segmentation (D-SEG) to delineate tumor volumes of interest (VOIs) for subsequent classification of tumor type. D-SEG uses isotropic (p) and anisotropic (q) components of the diffusion tensor to segment regions with similar diffusion characteristics. DTI scans were acquired from 95 patients with low- and high-grade glioma, metastases, and meningioma and from 29 healthy subjects. D-SEG uses k-means clustering of the 2D (p,q) space to generate segments with different isotropic and anisotropic diffusion characteristics. Our results are visualized using a novel RGB color scheme incorporating p, q and T2-weighted information within each segment. The volumetric contribution of each segment to gray matter, white matter, and cerebrospinal fluid spaces was used to generate healthy tissue D-SEG spectra. Tumor VOIs were extracted using a semiautomated flood-filling technique and D-SEG spectra were computed within the VOI. Classification of tumor type using D-SEG spectra was performed using support vector machines. D-SEG was computationally fast and stable and delineated regions of healthy tissue from tumor and edema. D-SEG spectra were consistent for each tumor type, with constituent diffusion characteristics potentially reflecting regional differences in tissue microstructure. Support vector machines classified tumor type with an overall accuracy of 94.7%, providing better classification than previously reported. D-SEG presents a user-friendly, semiautomated biomarker that may provide a valuable adjunct in noninvasive brain tumor diagnosis and treatment planning. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Neuro-Oncology.
Lajnef, Tarek; Chaibi, Sahbi; Ruby, Perrine; Aguera, Pierre-Emmanuel; Eichenlaub, Jean-Baptiste; Samet, Mounir; Kachouri, Abdennaceur; Jerbi, Karim
2015-07-30
Sleep staging is a critical step in a range of electrophysiological signal processing pipelines used in clinical routine as well as in sleep research. Although the results currently achievable with automatic sleep staging methods are promising, there is need for improvement, especially given the time-consuming and tedious nature of visual sleep scoring. Here we propose a sleep staging framework that consists of a multi-class support vector machine (SVM) classification based on a decision tree approach. The performance of the method was evaluated using polysomnographic data from 15 subjects (electroencephalogram (EEG), electrooculogram (EOG) and electromyogram (EMG) recordings). The decision tree, or dendrogram, was obtained using a hierarchical clustering technique and a wide range of time and frequency-domain features were extracted. Feature selection was carried out using forward sequential selection and classification was evaluated using k-fold cross-validation. The dendrogram-based SVM (DSVM) achieved mean specificity, sensitivity and overall accuracy of 0.92, 0.74 and 0.88 respectively, compared to expert visual scoring. Restricting DSVM classification to data where both experts' scoring was consistent (76.73% of the data) led to a mean specificity, sensitivity and overall accuracy of 0.94, 0.82 and 0.92 respectively. The DSVM framework outperforms classification with more standard multi-class "one-against-all" SVM and linear-discriminant analysis. The promising results of the proposed methodology suggest that it may be a valuable alternative to existing automatic methods and that it could accelerate visual scoring by providing a robust starting hypnogram that can be further fine-tuned by expert inspection. Copyright © 2015 Elsevier B.V. All rights reserved.
Ensemble Feature Learning of Genomic Data Using Support Vector Machine
Anaissi, Ali; Goyal, Madhu; Catchpoole, Daniel R.; Braytee, Ali; Kennedy, Paul J.
2016-01-01
The identification of a subset of genes having the ability to capture the necessary information to distinguish classes of patients is crucial in bioinformatics applications. Ensemble and bagging methods have been shown to work effectively in the process of gene selection and classification. Testament to that is random forest which combines random decision trees with bagging to improve overall feature selection and classification accuracy. Surprisingly, the adoption of these methods in support vector machines has only recently received attention but mostly on classification not gene selection. This paper introduces an ensemble SVM-Recursive Feature Elimination (ESVM-RFE) for gene selection that follows the concepts of ensemble and bagging used in random forest but adopts the backward elimination strategy which is the rationale of RFE algorithm. The rationale behind this is, building ensemble SVM models using randomly drawn bootstrap samples from the training set, will produce different feature rankings which will be subsequently aggregated as one feature ranking. As a result, the decision for elimination of features is based upon the ranking of multiple SVM models instead of choosing one particular model. Moreover, this approach will address the problem of imbalanced datasets by constructing a nearly balanced bootstrap sample. Our experiments show that ESVM-RFE for gene selection substantially increased the classification performance on five microarray datasets compared to state-of-the-art methods. Experiments on the childhood leukaemia dataset show that an average 9% better accuracy is achieved by ESVM-RFE over SVM-RFE, and 5% over random forest based approach. The selected genes by the ESVM-RFE algorithm were further explored with Singular Value Decomposition (SVD) which reveals significant clusters with the selected data. PMID:27304923
Enhancement of Plant Metabolite Fingerprinting by Machine Learning1[W
Scott, Ian M.; Vermeer, Cornelia P.; Liakata, Maria; Corol, Delia I.; Ward, Jane L.; Lin, Wanchang; Johnson, Helen E.; Whitehead, Lynne; Kular, Baldeep; Baker, John M.; Walsh, Sean; Dave, Anuja; Larson, Tony R.; Graham, Ian A.; Wang, Trevor L.; King, Ross D.; Draper, John; Beale, Michael H.
2010-01-01
Metabolite fingerprinting of Arabidopsis (Arabidopsis thaliana) mutants with known or predicted metabolic lesions was performed by 1H-nuclear magnetic resonance, Fourier transform infrared, and flow injection electrospray-mass spectrometry. Fingerprinting enabled processing of five times more plants than conventional chromatographic profiling and was competitive for discriminating mutants, other than those affected in only low-abundance metabolites. Despite their rapidity and complexity, fingerprints yielded metabolomic insights (e.g. that effects of single lesions were usually not confined to individual pathways). Among fingerprint techniques, 1H-nuclear magnetic resonance discriminated the most mutant phenotypes from the wild type and Fourier transform infrared discriminated the fewest. To maximize information from fingerprints, data analysis was crucial. One-third of distinctive phenotypes might have been overlooked had data models been confined to principal component analysis score plots. Among several methods tested, machine learning (ML) algorithms, namely support vector machine or random forest (RF) classifiers, were unsurpassed for phenotype discrimination. Support vector machines were often the best performing classifiers, but RFs yielded some particularly informative measures. First, RFs estimated margins between mutant phenotypes, whose relations could then be visualized by Sammon mapping or hierarchical clustering. Second, RFs provided importance scores for the features within fingerprints that discriminated mutants. These scores correlated with analysis of variance F values (as did Kruskal-Wallis tests, true- and false-positive measures, mutual information, and the Relief feature selection algorithm). ML classifiers, as models trained on one data set to predict another, were ideal for focused metabolomic queries, such as the distinctiveness and consistency of mutant phenotypes. Accessible software for use of ML in plant physiology is highlighted. PMID:20566707
Nagaoka, Shuhei; Matsumoto, Takeshi; Okada, Eiji; Mitsui, Masaaki; Nakajima, Atsushi
2006-08-17
The adsorption state and thermal stability of V(benzene)2 sandwich clusters soft-landed onto a self-assembled monolayer of different chain-length n-alkanethiols (Cn-SAM, n = 8, 12, 16, 18, and 22) were studied by means of infrared reflection absorption spectroscopy (IRAS) and temperature-programmed desorption (TPD). The IRAS measurement confirmed that V(benzene)2 clusters are molecularly adsorbed and maintain a sandwich structure on all of the SAM substrates. In addition, the clusters supported on the SAM substrates are oriented with their molecular axes tilted 70-80 degrees off the surface normal. An Arrhenius analysis of the TPD spectra reveals that the activation energy for the desorption of the supported clusters increases linearly with the chain length of the SAMs. For the longest chain C22-SAM, the activation energy reaches approximately 150 kJ/mol, and the thermal desorption of the supported clusters can be considerably suppressed near room temperature. The clear chain-length-dependent thermal stability of the supported clusters observed here can be explained well in terms of the cluster penetration into the SAM matrixes.
A support vector machine approach for classification of welding defects from ultrasonic signals
NASA Astrophysics Data System (ADS)
Chen, Yuan; Ma, Hong-Wei; Zhang, Guang-Ming
2014-07-01
Defect classification is an important issue in ultrasonic non-destructive evaluation. A layered multi-class support vector machine (LMSVM) classification system, which combines multiple SVM classifiers through a layered architecture, is proposed in this paper. The proposed LMSVM classification system is applied to the classification of welding defects from ultrasonic test signals. The measured ultrasonic defect echo signals are first decomposed into wavelet coefficients by the wavelet packet transform. The energy of the wavelet coefficients at different frequency channels are used to construct the feature vectors. The bees algorithm (BA) is then used for feature selection and SVM parameter optimisation for the LMSVM classification system. The BA-based feature selection optimises the energy feature vectors. The optimised feature vectors are input to the LMSVM classification system for training and testing. Experimental results of classifying welding defects demonstrate that the proposed technique is highly robust, precise and reliable for ultrasonic defect classification.
Ben Salem, Samira; Bacha, Khmais; Chaari, Abdelkader
2012-09-01
In this work we suggest an original fault signature based on an improved combination of Hilbert and Park transforms. Starting from this combination we can create two fault signatures: Hilbert modulus current space vector (HMCSV) and Hilbert phase current space vector (HPCSV). These two fault signatures are subsequently analysed using the classical fast Fourier transform (FFT). The effects of mechanical faults on the HMCSV and HPCSV spectrums are described, and the related frequencies are determined. The magnitudes of spectral components, relative to the studied faults (air-gap eccentricity and outer raceway ball bearing defect), are extracted in order to develop the input vector necessary for learning and testing the support vector machine with an aim of classifying automatically the various states of the induction motor. Copyright © 2012 ISA. Published by Elsevier Ltd. All rights reserved.
Clustered Integrin Ligands as a Novel Approach for the Targeting of Non-Viral Vectors
NASA Astrophysics Data System (ADS)
Ng, Quinn Kwan Tai
Gene transfer or gene delivery is described as the process in which foreign DNA is introduced into cells. Over the years, gene delivery has gained the attention of many researchers and has been developed as powerful tools for use in biotechnology and medicine. With the completion of the Human Genome Project, such advances in technology allowed for the identification of diseases ranging from hereditary disorders to acquired ones (cancer) which were thought to be incurable. Gene therapy provides the means necessary to treat or eliminate genetic diseases from its origin, unlike traditional medicine which only treat symptoms. With ongoing clinical trials for gene therapy increasing, the greatest difficulty still lies in developing safe systems which can target cells of interest to provide efficient delivery. Nature, over millions of years of evolution, has provided an example of one of the most efficient delivery systems: viruses. Although the use of viruses for gene delivery has been well studied, the safety issues involving immunogenicity, insertional mutagenesis, high cost, and poor reproducibility has provided problems for their clinical application. From understanding viruses, we gain insight to designing new systems for non-viral gene delivery. One of these techniques utilized by adenoviruses is the clustering of ligands on its surface through the use of a protein called a penton base. Through the use of nanotechnology we can mimic this basic concept in non-viral gene delivery systems. This dissertation research is focused on developing and applying a novel system for displaying the integrin binding ligand (RGD) in a constrained manner to form a clustered integrin ligand binding platform to be used to enhance the targeting and efficiency of non-viral gene delivery vectors. Peptide mixed monolayer protected gold nanoparticles provides a suitable surface for ligand clustering. A relationship between the peptide ratios in the reaction solution used to form these ligand clusters compared to the reacted amounts on the surface of the particle was studied. This provided us the ability to control the size of the clusters formed and the spacing between the integrins for gold nanoparticles of various sizes. We then applied the clustered ligand binding system for targeting of DNA/PEI polyplexes and demonstrated that the use of RGD nanoclusters enhances gene transfer up to 35-fold which was dependent on the density of alphavbeta3 integrins on the cell surface. Cell integrin sensitivity was shown in which cells with higher alpha vbeta3 densities resulting in higher luciferase transgene expression. The targeting of RGD nanoclusters for DNA/PEI polyplexes was further shown in vivo using PET/CT technology which displayed improved targeting towards high level alphavbeta3 integrin expression (U87MG) tumors over medium level alphavbeta 3 integrin expression (HeLa). In addition to studying the clustered integrin binding system, the current non-viral vectors used suffer from stability and toxicity issues in vitro and in vivo. We have applied a new chemistry for synthesizing nanogels utilizing a Traut's reagent initiated Michael addition reaction for modification of diamine containing crosslikers which will allow for the development of stable and cell demanded release of oligonucleotides. We have shown bulk gels made were capable of encapsulating and holding DNA within the gel and were able to synthesize them into nanogels. The combined research shown here using clustered integrin ligands and a new type of nanogel synthesis provides an ideal system for gene delivery in the future.
Supporting Dynamic Quantization for High-Dimensional Data Analytics.
Guzun, Gheorghi; Canahuate, Guadalupe
2017-05-01
Similarity searches are at the heart of exploratory data analysis tasks. Distance metrics are typically used to characterize the similarity between data objects represented as feature vectors. However, when the dimensionality of the data increases and the number of features is large, traditional distance metrics fail to distinguish between the closest and furthest data points. Localized distance functions have been proposed as an alternative to traditional distance metrics. These functions only consider dimensions close to query to compute the distance/similarity. Furthermore, in order to enable interactive explorations of high-dimensional data, indexing support for ad-hoc queries is needed. In this work we set up to investigate whether bit-sliced indices can be used for exploratory analytics such as similarity searches and data clustering for high-dimensional big-data. We also propose a novel dynamic quantization called Query dependent Equi-Depth (QED) quantization and show its effectiveness on characterizing high-dimensional similarity. When applying QED we observe improvements in kNN classification accuracy over traditional distance functions. Gheorghi Guzun and Guadalupe Canahuate. 2017. Supporting Dynamic Quantization for High-Dimensional Data Analytics. In Proceedings of Ex-ploreDB'17, Chicago, IL, USA, May 14-19, 2017, 6 pages. https://doi.org/http://dx.doi.org/10.1145/3077331.3077336.
Brown, Zachary S; Kramer, Randall A; Ocan, David; Oryema, Christine
2016-10-06
Insecticide-based tools remain critical for controlling vector-borne diseases in Uganda. Securing public support from targeted populations for such tools is an important component in sustaining their long-run effectiveness. Yet little quantitative evidence is available on the perceived benefits and costs of vector control programmes among targeted households. A survey was administered to a clustered random sample of 612 households in Gulu and Oyam districts of northern Uganda during a period of very high malaria transmission and following a pilot indoor residual spray (IRS) programme. A discrete choice experiment was conducted within the survey, in which respondents indicated their preferences for different IRS programmes relative to money compensation in a series of experimentally controlled, hypothetical choice sets. The data were analysed using conditional logit regression models to estimate respondents' willingness to accept (WTA) some amount of money compensation in lieu of foregone malaria risk reductions. Latent class models were used to analyse whether respondent characteristics predicted WTA. Average WTA is estimated at $8.94 annually for a 10 % reduction in malaria risk, and additional co-benefits of IRS were estimated to be worth on average $54-$56 (depending on insecticide type) per round of IRS. Significant heterogeneity is observed: Four in five household heads in northern Uganda have high valuations for IRS programmes, while the remaining 20 % experience costly side effects of IRS (valued at between $2 and $3 per round). Statistically significant predictors of belonging to the high-value group include respondent gender, mean age of household members, participation in previous IRS, basic knowledge of mosquito reproduction, and the number of mosquito nets owned. Proxies for household income and wealth are not found to be statistically significant predictors of WTA. This study suggests that the majority of people in areas of high malaria transmission like northern Uganda place a high value on vector control programmes using IRS. However, there is significant heterogeneity in terms of the perceived side effects (positive and negative). This has implications for sustaining public support for these programmes in the long-term.
Learning atoms for materials discovery.
Zhou, Quan; Tang, Peizhe; Liu, Shenxiu; Pan, Jinbo; Yan, Qimin; Zhang, Shou-Cheng
2018-06-26
Exciting advances have been made in artificial intelligence (AI) during recent decades. Among them, applications of machine learning (ML) and deep learning techniques brought human-competitive performances in various tasks of fields, including image recognition, speech recognition, and natural language understanding. Even in Go, the ancient game of profound complexity, the AI player has already beat human world champions convincingly with and without learning from the human. In this work, we show that our unsupervised machines (Atom2Vec) can learn the basic properties of atoms by themselves from the extensive database of known compounds and materials. These learned properties are represented in terms of high-dimensional vectors, and clustering of atoms in vector space classifies them into meaningful groups consistent with human knowledge. We use the atom vectors as basic input units for neural networks and other ML models designed and trained to predict materials properties, which demonstrate significant accuracy. Copyright © 2018 the Author(s). Published by PNAS.
Gloria-Soria, Andrea; Caccone, Adalgisa; Evans, Benjamin; Schama, Renata; Martins, Ademir Jesus; Powell, Jeffrey R.
2017-01-01
Background Aedes aegypti, commonly known as “the yellow fever mosquito”, is of great medical concern today primarily as the major vector of dengue, chikungunya and Zika viruses, although yellow fever remains a serious health concern in some regions. The history of Ae. aegypti in Brazil is of particular interest because the country was subjected to a well-documented eradication program during 1940s-1950s. After cessation of the campaign, the mosquito quickly re-established in the early 1970s with several dengue outbreaks reported during the last 30 years. Brazil can be considered the country suffering the most from the yellow fever mosquito, given the high number of dengue, chikungunya and Zika cases reported in the country, after having once been declared “free of Ae. aegypti”. Methodology/Principal findings We used 12 microsatellite markers to infer the genetic structure of Brazilian Ae. aegypti populations, genetic variability, genetic affinities with neighboring geographic areas, and the timing of their arrival and spread. This enabled us to reconstruct their recent history and evaluate whether the reappearance in Brazil was the result of re-invasion from neighboring non-eradicated areas or re-emergence from local refugia surviving the eradication program. Our results indicate a genetic break separating the northern and southern Brazilian Ae. aegypti populations, with further genetic differentiation within each cluster, especially in southern Brazil. Conclusions/Significance Based on our results, re-invasions from non-eradicated regions are the most likely scenario for the reappearance of Ae. aegypti in Brazil. While populations in the northern cluster are likely to have descended from Venezuela populations as early as the 1970s, southern populations seem to have derived more recently from northern Brazilian areas. Possible entry points are also revealed within both southern and northern clusters that could inform strategies to control and monitor this important arbovirus vector. PMID:28742801
Kotsakiozi, Panayiota; Gloria-Soria, Andrea; Caccone, Adalgisa; Evans, Benjamin; Schama, Renata; Martins, Ademir Jesus; Powell, Jeffrey R
2017-07-01
Aedes aegypti, commonly known as "the yellow fever mosquito", is of great medical concern today primarily as the major vector of dengue, chikungunya and Zika viruses, although yellow fever remains a serious health concern in some regions. The history of Ae. aegypti in Brazil is of particular interest because the country was subjected to a well-documented eradication program during 1940s-1950s. After cessation of the campaign, the mosquito quickly re-established in the early 1970s with several dengue outbreaks reported during the last 30 years. Brazil can be considered the country suffering the most from the yellow fever mosquito, given the high number of dengue, chikungunya and Zika cases reported in the country, after having once been declared "free of Ae. aegypti". We used 12 microsatellite markers to infer the genetic structure of Brazilian Ae. aegypti populations, genetic variability, genetic affinities with neighboring geographic areas, and the timing of their arrival and spread. This enabled us to reconstruct their recent history and evaluate whether the reappearance in Brazil was the result of re-invasion from neighboring non-eradicated areas or re-emergence from local refugia surviving the eradication program. Our results indicate a genetic break separating the northern and southern Brazilian Ae. aegypti populations, with further genetic differentiation within each cluster, especially in southern Brazil. Based on our results, re-invasions from non-eradicated regions are the most likely scenario for the reappearance of Ae. aegypti in Brazil. While populations in the northern cluster are likely to have descended from Venezuela populations as early as the 1970s, southern populations seem to have derived more recently from northern Brazilian areas. Possible entry points are also revealed within both southern and northern clusters that could inform strategies to control and monitor this important arbovirus vector.
NASA Astrophysics Data System (ADS)
Anikeenko, A. V.; Malenkov, G. G.; Naberukhin, Yu. I.
2018-03-01
We propose a new measure of collectivity of molecular motion in the liquid: the average vector of displacement of the particles, ⟨ΔR⟩, which initially have been localized within a sphere of radius Rsph and then have executed the diffusive motion during a time interval Δt. The more correlated the motion of the particles is, the longer will be the vector ⟨ΔR⟩. We visualize the picture of collective motions in molecular dynamics (MD) models of liquids by constructing the ⟨ΔR⟩ vectors and pinning them to the sites of the uniform grid which divides each of the edges of the model box into equal parts. MD models of liquid argon and water have been studied by this method. Qualitatively, the patterns of ⟨ΔR⟩ vectors are similar for these two liquids but differ in minor details. The most important result of our research is the revealing of the aggregates of ⟨ΔR⟩ vectors which have the form of extended flows which sometimes look like the parts of vortices. These vortex-like clusters of ⟨ΔR⟩ vectors have the mesoscopic size (of the order of 10 nm) and persist for tens of picoseconds. Dependence of the ⟨ΔR⟩ vector field on parameters Rsph, Δt, and on the model size has been investigated. This field in the models of liquids differs essentially from that in a random-walk model.
Zhang, Heng; Pan, Zhongming; Zhang, Wenna
2018-06-07
An acoustic⁻seismic mixed feature extraction method based on the wavelet coefficient energy ratio (WCER) of the target signal is proposed in this study for classifying vehicle targets in wireless sensor networks. The signal was decomposed into a set of wavelet coefficients using the à trous algorithm, which is a concise method used to implement the wavelet transform of a discrete signal sequence. After the wavelet coefficients of the target acoustic and seismic signals were obtained, the energy ratio of each layer coefficient was calculated as the feature vector of the target signals. Subsequently, the acoustic and seismic features were merged into an acoustic⁻seismic mixed feature to improve the target classification accuracy after the acoustic and seismic WCER features of the target signal were simplified using the hierarchical clustering method. We selected the support vector machine method for classification and utilized the data acquired from a real-world experiment to validate the proposed method. The calculated results show that the WCER feature extraction method can effectively extract the target features from target signals. Feature simplification can reduce the time consumption of feature extraction and classification, with no effect on the target classification accuracy. The use of acoustic⁻seismic mixed features effectively improved target classification accuracy by approximately 12% compared with either acoustic signal or seismic signal alone.
Typologies of Social Support and Associations with Mental Health Outcomes Among LGBT Youth
Birkett, Michelle A.; Mustanski, Brian
2015-01-01
Abstract Purpose: Lesbian, gay, bisexual, and transgender (LGBT) youth show increased risk for a number of negative mental health outcomes, which research has linked to minority stressors such as victimization. Further, social support promotes positive mental health outcomes for LGBT youth, and different sources of social support show differential relationships with mental health outcomes. However, little is known about how combinations of different sources of support impact mental health. Methods: In the present study, we identify clusters of family, peer, and significant other social support and then examine demographic and mental health differences by cluster in an analytic sample of 232 LGBT youth between the ages of 16 and 20 years. Results: Using k-means cluster analysis, three social support cluster types were identified: high support (44.0% of participants), low support (21.6%), and non-family support (34.5%). A series of chi-square tests were used to examine demographic differences between these clusters, which were found for socio-economic status (SES). Regression analyses indicated that, while controlling for victimization, individuals within the three clusters showed different relationships with multiple mental health outcomes: loneliness, hopelessness, depression, anxiety, somatization, general symptom severity, and symptoms of major depressive disorder (MDD). Conclusion: Findings suggest the combinations of sources of support LGBT youth receive are related to their mental health. Higher SES youth are more likely to receive support from family, peers, and significant others. For most mental health outcomes, family support appears to be an especially relevant and important source of support to target for LGBT youth. PMID:26790019
GRASS GIS: The first Open Source Temporal GIS
NASA Astrophysics Data System (ADS)
Gebbert, Sören; Leppelt, Thomas
2015-04-01
GRASS GIS is a full featured, general purpose Open Source geographic information system (GIS) with raster, 3D raster and vector processing support[1]. Recently, time was introduced as a new dimension that transformed GRASS GIS into the first Open Source temporal GIS with comprehensive spatio-temporal analysis, processing and visualization capabilities[2]. New spatio-temporal data types were introduced in GRASS GIS version 7, to manage raster, 3D raster and vector time series. These new data types are called space time datasets. They are designed to efficiently handle hundreds of thousands of time stamped raster, 3D raster and vector map layers of any size. Time stamps can be defined as time intervals or time instances in Gregorian calendar time or relative time. Space time datasets are simplifying the processing and analysis of large time series in GRASS GIS, since these new data types are used as input and output parameter in temporal modules. The handling of space time datasets is therefore equal to the handling of raster, 3D raster and vector map layers in GRASS GIS. A new dedicated Python library, the GRASS GIS Temporal Framework, was designed to implement the spatio-temporal data types and their management. The framework provides the functionality to efficiently handle hundreds of thousands of time stamped map layers and their spatio-temporal topological relations. The framework supports reasoning based on the temporal granularity of space time datasets as well as their temporal topology. It was designed in conjunction with the PyGRASS [3] library to support parallel processing of large datasets, that has a long tradition in GRASS GIS [4,5]. We will present a subset of more than 40 temporal modules that were implemented based on the GRASS GIS Temporal Framework, PyGRASS and the GRASS GIS Python scripting library. These modules provide a comprehensive temporal GIS tool set. The functionality range from space time dataset and time stamped map layer management over temporal aggregation, temporal accumulation, spatio-temporal statistics, spatio-temporal sampling, temporal algebra, temporal topology analysis, time series animation and temporal topology visualization to time series import and export capabilities with support for NetCDF and VTK data formats. We will present several temporal modules that support parallel processing of raster and 3D raster time series. [1] GRASS GIS Open Source Approaches in Spatial Data Handling In Open Source Approaches in Spatial Data Handling, Vol. 2 (2008), pp. 171-199, doi:10.1007/978-3-540-74831-19 by M. Neteler, D. Beaudette, P. Cavallini, L. Lami, J. Cepicky edited by G. Brent Hall, Michael G. Leahy [2] Gebbert, S., Pebesma, E., 2014. A temporal GIS for field based environmental modeling. Environ. Model. Softw. 53, 1-12. [3] Zambelli, P., Gebbert, S., Ciolli, M., 2013. Pygrass: An Object Oriented Python Application Programming Interface (API) for Geographic Resources Analysis Support System (GRASS) Geographic Information System (GIS). ISPRS Intl Journal of Geo-Information 2, 201-219. [4] Löwe, P., Klump, J., Thaler, J. (2012): The FOSS GIS Workbench on the GFZ Load Sharing Facility compute cluster, (Geophysical Research Abstracts Vol. 14, EGU2012-4491, 2012), General Assembly European Geosciences Union (Vienna, Austria 2012). [5] Akhter, S., Aida, K., Chemin, Y., 2010. "GRASS GIS on High Performance Computing with MPI, OpenMP and Ninf-G Programming Framework". ISPRS Conference, Kyoto, 9-12 August 2010
Atrial fibrillation detection by heart rate variability in Poincare plot.
Park, Jinho; Lee, Sangwook; Jeon, Moongu
2009-12-11
Atrial fibrillation (AFib) is one of the prominent causes of stroke, and its risk increases with age. We need to detect AFib correctly as early as possible to avoid medical disaster because it is likely to proceed into a more serious form in short time. If we can make a portable AFib monitoring system, it will be helpful to many old people because we cannot predict when a patient will have a spasm of AFib. We analyzed heart beat variability from inter-beat intervals obtained by a wavelet-based detector. We made a Poincare plot using the inter-beat intervals. By analyzing the plot, we extracted three feature measures characterizing AFib and non-AFib: the number of clusters, mean stepping increment of inter-beat intervals, and dispersion of the points around a diagonal line in the plot. We divided distribution of the number of clusters into two and calculated mean value of the lower part by k-means clustering method. We classified data whose number of clusters is more than one and less than this mean value as non-AFib data. In the other case, we tried to discriminate AFib from non-AFib using support vector machine with the other feature measures: the mean stepping increment and dispersion of the points in the Poincare plot. We found that Poincare plot from non-AFib data showed some pattern, while the plot from AFib data showed irregularly irregular shape. In case of non-AFib data, the definite pattern in the plot manifested itself with some limited number of clusters or closely packed one cluster. In case of AFib data, the number of clusters in the plot was one or too many. We evaluated the accuracy using leave-one-out cross-validation. Mean sensitivity and mean specificity were 91.4% and 92.9% respectively. Because pulse beats of ventricles are less likely to be influenced by baseline wandering and noise, we used the inter-beat intervals to diagnose AFib. We visually displayed regularity of the inter-beat intervals by way of Poincare plot. We tried to design an automated algorithm which did not require any human intervention and any specific threshold, and could be installed in a portable AFib monitoring system.
LANTCET: laser nanotechnology for screening and treating tumors ex vivo and in vivo
NASA Astrophysics Data System (ADS)
Lapotko, Dmitri O.; Lukianova-Hleb, Ekaterina Y.; Zhdanok, Sergei A.; Hafner, Jason H.; Rostro, Betty C.; Scully, Peter; Konopleva, Marina; Andreeff, Michael; Li, Chun; Hanna, Ehab Y.; Myers, Jeffrey N.; Oraevsky, Alexander A.
2007-06-01
LANTCET (laser-activated nano-thermolysis as cell elimination technology) was developed for selective detection and destruction of individual tumor cells through generation of photothermal bubbles around clusters of light absorbing gold nanoparticles (nanorods and nanoshells) that are selectively formed in target tumor cells. We have applied bare nanoparticles and their conjugates with cell-specific vectors such as monoclonal antibodies CD33 (specific for Acute Myeloid Leukemia) and C225 (specific for carcinoma cells that express epidermal growth factor -EGF). Clusters were formed by using vector-receptor interactions with further clusterization of nanoparticles due to endocytosis. Formation of clusters was verified directly with optical resonance scattering microscopy and microspectroscopy. LANTCET method was tested in vitro for living cell samples with: (1) model myeloid K562 cells (CD33 positive), (2) primary human bone marrow CD33-positive blast cells from patients with the diagnosis of acute myeloid leukemia, (3) monolayers of living EGF-positive carcinoma cells (Hep-2C), (4) human lymphocytes and red blood cells as normal cells. The LANTCET method was also tested in vivo using rats with experimental polymorphic sarcoma. Photothermal bubbles were generated and detected in vitro with a photothermal microscope equipped with a tunable Ti-Sa pulsed laser. We have found that cluster formation caused an almost 100-fold decrease in the bubble generation threshold of laser pulse fluence in tumor cells compared to the bubble generation threshold for normal cells. The animal tumor that was treated with a single laser pulse showed a necrotic area of diameter close to the pump laser beam diameter and a depth of 1-2 mm. Cell level selectivity of tumor damage with single laser pulse was demonstrated. Combining lightscattering imaging with bubble imaging, we introduced a new image-guided mode of the LANTCET operation for screening and treatment of tumors ex vivo and in vivo.
Conformational and functional analysis of molecular dynamics trajectories by Self-Organising Maps
2011-01-01
Background Molecular dynamics (MD) simulations are powerful tools to investigate the conformational dynamics of proteins that is often a critical element of their function. Identification of functionally relevant conformations is generally done clustering the large ensemble of structures that are generated. Recently, Self-Organising Maps (SOMs) were reported performing more accurately and providing more consistent results than traditional clustering algorithms in various data mining problems. We present a novel strategy to analyse and compare conformational ensembles of protein domains using a two-level approach that combines SOMs and hierarchical clustering. Results The conformational dynamics of the α-spectrin SH3 protein domain and six single mutants were analysed by MD simulations. The Cα's Cartesian coordinates of conformations sampled in the essential space were used as input data vectors for SOM training, then complete linkage clustering was performed on the SOM prototype vectors. A specific protocol to optimize a SOM for structural ensembles was proposed: the optimal SOM was selected by means of a Taguchi experimental design plan applied to different data sets, and the optimal sampling rate of the MD trajectory was selected. The proposed two-level approach was applied to single trajectories of the SH3 domain independently as well as to groups of them at the same time. The results demonstrated the potential of this approach in the analysis of large ensembles of molecular structures: the possibility of producing a topological mapping of the conformational space in a simple 2D visualisation, as well as of effectively highlighting differences in the conformational dynamics directly related to biological functions. Conclusions The use of a two-level approach combining SOMs and hierarchical clustering for conformational analysis of structural ensembles of proteins was proposed. It can easily be extended to other study cases and to conformational ensembles from other sources. PMID:21569575
Bayesian data assimilation provides rapid decision support for vector-borne diseases.
Jewell, Chris P; Brown, Richard G
2015-07-06
Predicting the spread of vector-borne diseases in response to incursions requires knowledge of both host and vector demographics in advance of an outbreak. Although host population data are typically available, for novel disease introductions there is a high chance of the pathogen using a vector for which data are unavailable. This presents a barrier to estimating the parameters of dynamical models representing host-vector-pathogen interaction, and hence limits their ability to provide quantitative risk forecasts. The Theileria orientalis (Ikeda) outbreak in New Zealand cattle demonstrates this problem: even though the vector has received extensive laboratory study, a high degree of uncertainty persists over its national demographic distribution. Addressing this, we develop a Bayesian data assimilation approach whereby indirect observations of vector activity inform a seasonal spatio-temporal risk surface within a stochastic epidemic model. We provide quantitative predictions for the future spread of the epidemic, quantifying uncertainty in the model parameters, case infection times and the disease status of undetected infections. Importantly, we demonstrate how our model learns sequentially as the epidemic unfolds and provide evidence for changing epidemic dynamics through time. Our approach therefore provides a significant advance in rapid decision support for novel vector-borne disease outbreaks. © 2015 The Author(s) Published by the Royal Society. All rights reserved.
Support Vector Machines: Relevance Feedback and Information Retrieval.
ERIC Educational Resources Information Center
Drucker, Harris; Shahrary, Behzad; Gibbon, David C.
2002-01-01
Compares support vector machines (SVMs) to Rocchio, Ide regular and Ide dec-hi algorithms in information retrieval (IR) of text documents using relevancy feedback. If the preliminary search is so poor that one has to search through many documents to find at least one relevant document, then SVM is preferred. Includes nine tables. (Contains 24…
Jeffrey T. Walton
2008-01-01
Three machine learning subpixel estimation methods (Cubist, Random Forests, and support vector regression) were applied to estimate urban cover. Urban forest canopy cover and impervious surface cover were estimated from Landsat-7 ETM+ imagery using a higher resolution cover map resampled to 30 m as training and reference data. Three different band combinations (...
Applications of Support Vector Machine (SVM) Learning in Cancer Genomics.
Huang, Shujun; Cai, Nianguang; Pacheco, Pedro Penzuti; Narrandes, Shavira; Wang, Yang; Xu, Wayne
2018-01-01
Machine learning with maximization (support) of separating margin (vector), called support vector machine (SVM) learning, is a powerful classification tool that has been used for cancer genomic classification or subtyping. Today, as advancements in high-throughput technologies lead to production of large amounts of genomic and epigenomic data, the classification feature of SVMs is expanding its use in cancer genomics, leading to the discovery of new biomarkers, new drug targets, and a better understanding of cancer driver genes. Herein we reviewed the recent progress of SVMs in cancer genomic studies. We intend to comprehend the strength of the SVM learning and its future perspective in cancer genomic applications. Copyright© 2018, International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved.
Dual linear structured support vector machine tracking method via scale correlation filter
NASA Astrophysics Data System (ADS)
Li, Weisheng; Chen, Yanquan; Xiao, Bin; Feng, Chen
2018-01-01
Adaptive tracking-by-detection methods based on structured support vector machine (SVM) performed well on recent visual tracking benchmarks. However, these methods did not adopt an effective strategy of object scale estimation, which limits the overall tracking performance. We present a tracking method based on a dual linear structured support vector machine (DLSSVM) with a discriminative scale correlation filter. The collaborative tracker comprised of a DLSSVM model and a scale correlation filter obtains good results in tracking target position and scale estimation. The fast Fourier transform is applied for detection. Extensive experiments show that our tracking approach outperforms many popular top-ranking trackers. On a benchmark including 100 challenging video sequences, the average precision of the proposed method is 82.8%.
Object recognition of ladar with support vector machine
NASA Astrophysics Data System (ADS)
Sun, Jian-Feng; Li, Qi; Wang, Qi
2005-01-01
Intensity, range and Doppler images can be obtained by using laser radar. Laser radar can detect much more object information than other detecting sensor, such as passive infrared imaging and synthetic aperture radar (SAR), so it is well suited as the sensor of object recognition. Traditional method of laser radar object recognition is extracting target features, which can be influenced by noise. In this paper, a laser radar recognition method-Support Vector Machine is introduced. Support Vector Machine (SVM) is a new hotspot of recognition research after neural network. It has well performance on digital written and face recognition. Two series experiments about SVM designed for preprocessing and non-preprocessing samples are performed by real laser radar images, and the experiments results are compared.
nu-Anomica: A Fast Support Vector Based Novelty Detection Technique
NASA Technical Reports Server (NTRS)
Das, Santanu; Bhaduri, Kanishka; Oza, Nikunj C.; Srivastava, Ashok N.
2009-01-01
In this paper we propose nu-Anomica, a novel anomaly detection technique that can be trained on huge data sets with much reduced running time compared to the benchmark one-class Support Vector Machines algorithm. In -Anomica, the idea is to train the machine such that it can provide a close approximation to the exact decision plane using fewer training points and without losing much of the generalization performance of the classical approach. We have tested the proposed algorithm on a variety of continuous data sets under different conditions. We show that under all test conditions the developed procedure closely preserves the accuracy of standard one-class Support Vector Machines while reducing both the training time and the test time by 5 - 20 times.
ℓ p-Norm Multikernel Learning Approach for Stock Market Price Forecasting
Shao, Xigao; Wu, Kun; Liao, Bifeng
2012-01-01
Linear multiple kernel learning model has been used for predicting financial time series. However, ℓ 1-norm multiple support vector regression is rarely observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures that generalize well, we adopt ℓ p-norm multiple kernel support vector regression (1 ≤ p < ∞) as a stock price prediction model. The optimization problem is decomposed into smaller subproblems, and the interleaved optimization strategy is employed to solve the regression model. The model is evaluated on forecasting the daily stock closing prices of Shanghai Stock Index in China. Experimental results show that our proposed model performs better than ℓ 1-norm multiple support vector regression model. PMID:23365561
Support vector machine for automatic pain recognition
NASA Astrophysics Data System (ADS)
Monwar, Md Maruf; Rezaei, Siamak
2009-02-01
Facial expressions are a key index of emotion and the interpretation of such expressions of emotion is critical to everyday social functioning. In this paper, we present an efficient video analysis technique for recognition of a specific expression, pain, from human faces. We employ an automatic face detector which detects face from the stored video frame using skin color modeling technique. For pain recognition, location and shape features of the detected faces are computed. These features are then used as inputs to a support vector machine (SVM) for classification. We compare the results with neural network based and eigenimage based automatic pain recognition systems. The experiment results indicate that using support vector machine as classifier can certainly improve the performance of automatic pain recognition system.
Design of 2D time-varying vector fields.
Chen, Guoning; Kwatra, Vivek; Wei, Li-Yi; Hansen, Charles D; Zhang, Eugene
2012-10-01
Design of time-varying vector fields, i.e., vector fields that can change over time, has a wide variety of important applications in computer graphics. Existing vector field design techniques do not address time-varying vector fields. In this paper, we present a framework for the design of time-varying vector fields, both for planar domains as well as manifold surfaces. Our system supports the creation and modification of various time-varying vector fields with desired spatial and temporal characteristics through several design metaphors, including streamlines, pathlines, singularity paths, and bifurcations. These design metaphors are integrated into an element-based design to generate the time-varying vector fields via a sequence of basis field summations or spatial constrained optimizations at the sampled times. The key-frame design and field deformation are also introduced to support other user design scenarios. Accordingly, a spatial-temporal constrained optimization and the time-varying transformation are employed to generate the desired fields for these two design scenarios, respectively. We apply the time-varying vector fields generated using our design system to a number of important computer graphics applications that require controllable dynamic effects, such as evolving surface appearance, dynamic scene design, steerable crowd movement, and painterly animation. Many of these are difficult or impossible to achieve via prior simulation-based methods. In these applications, the time-varying vector fields have been applied as either orientation fields or advection fields to control the instantaneous appearance or evolving trajectories of the dynamic effects.
Marcet, PL; Mora, MS; Cutrera, AP; Jones, L; Gürtler, RE; Kitron, U; Dotson, EM
2008-01-01
To gain an understanding of the genetic structure and dispersal dynamics of T. infestans populations, we analyzed the multilocus genotype of 10 microsatellite loci for 352 T. infestans collected in 21 houses of 11 rural communities in October 2002. Genetic structure was analyzed at the community and house compound levels. Analysis revealed that vector control actions affected the genetic structure of T. infestans populations. Bug populations from communities under sustained vector control (core area) were highly structured and genetic differentiation between neighboring house compounds was significant. In contrast, bug populations from communities with sporadic vector control actions were more homogeneous and lacked defined genetic clusters. Genetic differentiation between population pairs did not fit a model of isolation by distance at the microgeographical level. Evidence consistent with flight or walking bug dispersal was detected within and among communities, dispersal was more female-biased in the core area and results suggested that houses received immigrants from more than one source. Putative sources and mechanisms of re-infestation are described. These data may be use to design improved vector control strategies PMID:18773972
Mao, Bao -Hua; Chang, Rui; Shi, Lei; ...
2014-10-29
Here, we have investigated model systems of silver clusters with different sizes (3 and 15 atoms) deposited on alumina and titania supports using ambient pressure X-ray photoelectron spectroscopy. The electronic structures of silver clusters and support materials are studied upon exposure to various atmospheres (ultrahigh vacuum, O 2 and CO) at different temperatures. Compared to bulk silver, the binding energies of silver clusters are about 0.55 eV higher on TiO 2 and 0.95 eV higher on Al 2O 3 due to the final state effect and the interaction with supports. No clear size effect of the silver XPS peak ismore » observed on different silver clusters among these samples. Silver clusters on titania show better stability against sintering. Al 2p and Ti 2p core level peak positions of the alumina and titania support surfaces change upon exposure to oxygen while the Ag 3d core level position remains unchanged. We discuss the origin of these core level shifts and their implications for catalytic properties of Ag clusters.« less
Techniques utilized in the simulated altitude testing of a 2D-CD vectoring and reversing nozzle
NASA Technical Reports Server (NTRS)
Block, H. Bruce; Bryant, Lively; Dicus, John H.; Moore, Allan S.; Burns, Maureen E.; Solomon, Robert F.; Sheer, Irving
1988-01-01
Simulated altitude testing of a two-dimensional, convergent-divergent, thrust vectoring and reversing exhaust nozzle was accomplished. An important objective of this test was to develop test hardware and techniques to properly operate a vectoring and reversing nozzle within the confines of an altitude test facility. This report presents detailed information on the major test support systems utilized, the operational performance of the systems and the problems encountered, and test equipment improvements recommended for future tests. The most challenging support systems included the multi-axis thrust measurement system, vectored and reverse exhaust gas collection systems, and infrared temperature measurement systems used to evaluate and monitor the nozzle. The feasibility of testing a vectoring and reversing nozzle of this type in an altitude chamber was successfully demonstrated. Supporting systems performed as required. During reverser operation, engine exhaust gases were successfully captured and turned downstream. However, a small amount of exhaust gas spilled out the collector ducts' inlet openings when the reverser was opened more than 60 percent. The spillage did not affect engine or nozzle performance. The three infrared systems which viewed the nozzle through the exhaust collection system worked remarkably well considering the harsh environment.
TOSCA-based orchestration of complex clusters at the IaaS level
NASA Astrophysics Data System (ADS)
Caballer, M.; Donvito, G.; Moltó, G.; Rocha, R.; Velten, M.
2017-10-01
This paper describes the adoption and extension of the TOSCA standard by the INDIGO-DataCloud project for the definition and deployment of complex computing clusters together with the required support in both OpenStack and OpenNebula, carried out in close collaboration with industry partners such as IBM. Two examples of these clusters are described in this paper, the definition of an elastic computing cluster to support the Galaxy bioinformatics application where the nodes are dynamically added and removed from the cluster to adapt to the workload, and the definition of an scalable Apache Mesos cluster for the execution of batch jobs and support for long-running services. The coupling of TOSCA with Ansible Roles to perform automated installation has resulted in the definition of high-level, deterministic templates to provision complex computing clusters across different Cloud sites.
Adenovirus Vectors Target Several Cell Subtypes of Mammalian Inner Ear In Vivo
Li, Wenyan; Shen, Jun
2016-01-01
Mammalian inner ear harbors diverse cell types that are essential for hearing and balance. Adenovirus is one of the major vectors to deliver genes into the inner ear for functional studies and hair cell regeneration. To identify adenovirus vectors that target specific cell subtypes in the inner ear, we studied three adenovirus vectors, carrying a reporter gene encoding green fluorescent protein (GFP) from two vendors or with a genome editing gene Cre recombinase (Cre), by injection into postnatal days 0 (P0) and 4 (P4) mouse cochlea through scala media by cochleostomy in vivo. We found three adenovirus vectors transduced mouse inner ear cells with different specificities and expression levels, depending on the type of adenoviral vectors and the age of mice. The most frequently targeted region was the cochlear sensory epithelium, including auditory hair cells and supporting cells. Adenovirus with GFP transduced utricular supporting cells as well. This study shows that adenovirus vectors are capable of efficiently and specifically transducing different cell types in the mammalian inner ear and provides useful tools to study inner ear gene function and to evaluate gene therapy to treat hearing loss and vestibular dysfunction. PMID:28116172
Bayesian data assimilation provides rapid decision support for vector-borne diseases
Jewell, Chris P.; Brown, Richard G.
2015-01-01
Predicting the spread of vector-borne diseases in response to incursions requires knowledge of both host and vector demographics in advance of an outbreak. Although host population data are typically available, for novel disease introductions there is a high chance of the pathogen using a vector for which data are unavailable. This presents a barrier to estimating the parameters of dynamical models representing host–vector–pathogen interaction, and hence limits their ability to provide quantitative risk forecasts. The Theileria orientalis (Ikeda) outbreak in New Zealand cattle demonstrates this problem: even though the vector has received extensive laboratory study, a high degree of uncertainty persists over its national demographic distribution. Addressing this, we develop a Bayesian data assimilation approach whereby indirect observations of vector activity inform a seasonal spatio-temporal risk surface within a stochastic epidemic model. We provide quantitative predictions for the future spread of the epidemic, quantifying uncertainty in the model parameters, case infection times and the disease status of undetected infections. Importantly, we demonstrate how our model learns sequentially as the epidemic unfolds and provide evidence for changing epidemic dynamics through time. Our approach therefore provides a significant advance in rapid decision support for novel vector-borne disease outbreaks. PMID:26136225
Stable Local Volatility Calibration Using Kernel Splines
NASA Astrophysics Data System (ADS)
Coleman, Thomas F.; Li, Yuying; Wang, Cheng
2010-09-01
We propose an optimization formulation using L1 norm to ensure accuracy and stability in calibrating a local volatility function for option pricing. Using a regularization parameter, the proposed objective function balances the calibration accuracy with the model complexity. Motivated by the support vector machine learning, the unknown local volatility function is represented by a kernel function generating splines and the model complexity is controlled by minimizing the 1-norm of the kernel coefficient vector. In the context of the support vector regression for function estimation based on a finite set of observations, this corresponds to minimizing the number of support vectors for predictability. We illustrate the ability of the proposed approach to reconstruct the local volatility function in a synthetic market. In addition, based on S&P 500 market index option data, we demonstrate that the calibrated local volatility surface is simple and resembles the observed implied volatility surface in shape. Stability is illustrated by calibrating local volatility functions using market option data from different dates.
Patterns of workplace supervisor support desired by abused women.
Perrin, Nancy A; Yragui, Nanette L; Hanson, Ginger C; Glass, Nancy
2011-07-01
The purpose of this study was to understand differences in patterns of supervisor support desired by female victims of intimate partner violence (IPV) and to examine whether the pattern of support desired at work is reflective of a woman's stage of change in the abusive relationship, IPV-related work interference, and IPV-related job reprimands or job loss. We conducted interviews in Spanish or English with adult women working in low-income jobs who had been physically or sexually abused by an intimate partner/ ex-partner in the past year ( N = 133). Cluster analysis revealed three distinct clusters that form a hierarchy of type of support wanted: those who desired limited support; those who desired confidential, time-off, and emotional support; and those who desired support in wide variety of ways from their supervisor. The clusters appeared to reflect stages of behavior change in an abusive relationship. Specifically, the limited-support cluster may represent an early precontemplation stage, with women reporting the least interference with work. The support-in-every-way cluster may represent later stages of change, in which women are breaking away from the abusive partner and report the greatest interference with work. Women in the confidential-, time-off-, and emotional-support cluster are in a transition stage in which they are considering change and are exploring options in their abusive relationship. Understanding the hierarchy of the type of support desired, and its relationship to stages of change in the abusive relationship and work interference, may provide a strong foundation for developing appropriate and effective workplace interventions to guide supervisors in providing support to women experiencing IPV.
Efficient architecture for spike sorting in reconfigurable hardware.
Hwang, Wen-Jyi; Lee, Wei-Hao; Lin, Shiow-Jyu; Lai, Sheng-Ying
2013-11-01
This paper presents a novel hardware architecture for fast spike sorting. The architecture is able to perform both the feature extraction and clustering in hardware. The generalized Hebbian algorithm (GHA) and fuzzy C-means (FCM) algorithm are used for feature extraction and clustering, respectively. The employment of GHA allows efficient computation of principal components for subsequent clustering operations. The FCM is able to achieve near optimal clustering for spike sorting. Its performance is insensitive to the selection of initial cluster centers. The hardware implementations of GHA and FCM feature low area costs and high throughput. In the GHA architecture, the computation of different weight vectors share the same circuit for lowering the area costs. Moreover, in the FCM hardware implementation, the usual iterative operations for updating the membership matrix and cluster centroid are merged into one single updating process to evade the large storage requirement. To show the effectiveness of the circuit, the proposed architecture is physically implemented by field programmable gate array (FPGA). It is embedded in a System-on-Chip (SOC) platform for performance measurement. Experimental results show that the proposed architecture is an efficient spike sorting design for attaining high classification correct rate and high speed computation.
Differential dynamic microscopy of weakly scattering and polydisperse protein-rich clusters
NASA Astrophysics Data System (ADS)
Safari, Mohammad S.; Vorontsova, Maria A.; Poling-Skutvik, Ryan; Vekilov, Peter G.; Conrad, Jacinta C.
2015-10-01
Nanoparticle dynamics impact a wide range of biological transport processes and applications in nanomedicine and natural resource engineering. Differential dynamic microscopy (DDM) was recently developed to quantify the dynamics of submicron particles in solutions from fluctuations of intensity in optical micrographs. Differential dynamic microscopy is well established for monodisperse particle populations, but has not been applied to solutions containing weakly scattering polydisperse biological nanoparticles. Here we use bright-field DDM (BDDM) to measure the dynamics of protein-rich liquid clusters, whose size ranges from tens to hundreds of nanometers and whose total volume fraction is less than 10-5. With solutions of two proteins, hemoglobin A and lysozyme, we evaluate the cluster diffusion coefficients from the dependence of the diffusive relaxation time on the scattering wave vector. We establish that for weakly scattering populations, an optimal thickness of the sample chamber exists at which the BDDM signal is maximized at the smallest sample volume. The average cluster diffusion coefficient measured using BDDM is consistently lower than that obtained from dynamic light scattering at a scattering angle of 90∘. This apparent discrepancy is due to Mie scattering from the polydisperse cluster population, in which larger clusters preferentially scatter more light in the forward direction.
Using SVD on Clusters to Improve Precision of Interdocument Similarity Measure.
Zhang, Wen; Xiao, Fan; Li, Bin; Zhang, Siguang
2016-01-01
Recently, LSI (Latent Semantic Indexing) based on SVD (Singular Value Decomposition) is proposed to overcome the problems of polysemy and homonym in traditional lexical matching. However, it is usually criticized as with low discriminative power for representing documents although it has been validated as with good representative quality. In this paper, SVD on clusters is proposed to improve the discriminative power of LSI. The contribution of this paper is three manifolds. Firstly, we make a survey of existing linear algebra methods for LSI, including both SVD based methods and non-SVD based methods. Secondly, we propose SVD on clusters for LSI and theoretically explain that dimension expansion of document vectors and dimension projection using SVD are the two manipulations involved in SVD on clusters. Moreover, we develop updating processes to fold in new documents and terms in a decomposed matrix by SVD on clusters. Thirdly, two corpora, a Chinese corpus and an English corpus, are used to evaluate the performances of the proposed methods. Experiments demonstrate that, to some extent, SVD on clusters can improve the precision of interdocument similarity measure in comparison with other SVD based LSI methods.
Using SVD on Clusters to Improve Precision of Interdocument Similarity Measure
Xiao, Fan; Li, Bin; Zhang, Siguang
2016-01-01
Recently, LSI (Latent Semantic Indexing) based on SVD (Singular Value Decomposition) is proposed to overcome the problems of polysemy and homonym in traditional lexical matching. However, it is usually criticized as with low discriminative power for representing documents although it has been validated as with good representative quality. In this paper, SVD on clusters is proposed to improve the discriminative power of LSI. The contribution of this paper is three manifolds. Firstly, we make a survey of existing linear algebra methods for LSI, including both SVD based methods and non-SVD based methods. Secondly, we propose SVD on clusters for LSI and theoretically explain that dimension expansion of document vectors and dimension projection using SVD are the two manipulations involved in SVD on clusters. Moreover, we develop updating processes to fold in new documents and terms in a decomposed matrix by SVD on clusters. Thirdly, two corpora, a Chinese corpus and an English corpus, are used to evaluate the performances of the proposed methods. Experiments demonstrate that, to some extent, SVD on clusters can improve the precision of interdocument similarity measure in comparison with other SVD based LSI methods. PMID:27579031
Hong, S T; Carney, J R; Gould, S J
1997-01-01
The genes for the complete pathways for two polycyclic aromatic polyketides of the angucyclinone class have been cloned and heterologously expressed. Genomic DNAs of Streptomyces rimosus NRRL 3016 and Streptomyces strain WP 4669 were partially digested with MboI, and libraries (ca. 40-kb fragments) in Escherichia coli XL1-Blue MR were prepared with the cosmid vector pOJ446. Hybridization with the actI probe from the actinorhodin polyketide synthase genes identified two clusters of polyketide genes from each organism. After transfer of the four clusters to Streptomyces lividans TK24, expression of one cluster from each organism was established through the identification of pathway-specific products by high-performance liquid chromatography with photodiode array detection. Peaks were identified from the S. rimosus cluster (pksRIM-1) for tetrangulol, tetrangomycin, and fridamycin E. Peaks were identified from the WP 4669 cluster (pksWP-2) for tetrangulol, 19-hydroxytetrangulol, 8-O-methyltetrangulol, 19-hydroxy-8-O-methyltetrangulol, and PD 116740. Structures were confirmed by 1H nuclear magnetic resonance spectroscopy and high-resolution mass spectrometry. PMID:8990300
Hong, S T; Carney, J R; Gould, S J
1997-01-01
The genes for the complete pathways for two polycyclic aromatic polyketides of the angucyclinone class have been cloned and heterologously expressed. Genomic DNAs of Streptomyces rimosus NRRL 3016 and Streptomyces strain WP 4669 were partially digested with MboI, and libraries (ca. 40-kb fragments) in Escherichia coli XL1-Blue MR were prepared with the cosmid vector pOJ446. Hybridization with the actI probe from the actinorhodin polyketide synthase genes identified two clusters of polyketide genes from each organism. After transfer of the four clusters to Streptomyces lividans TK24, expression of one cluster from each organism was established through the identification of pathway-specific products by high-performance liquid chromatography with photodiode array detection. Peaks were identified from the S. rimosus cluster (pksRIM-1) for tetrangulol, tetrangomycin, and fridamycin E. Peaks were identified from the WP 4669 cluster (pksWP-2) for tetrangulol, 19-hydroxytetrangulol, 8-O-methyltetrangulol, 19-hydroxy-8-O-methyltetrangulol, and PD 116740. Structures were confirmed by 1H nuclear magnetic resonance spectroscopy and high-resolution mass spectrometry.
An adaptive clustering algorithm for image matching based on corner feature
NASA Astrophysics Data System (ADS)
Wang, Zhe; Dong, Min; Mu, Xiaomin; Wang, Song
2018-04-01
The traditional image matching algorithm always can not balance the real-time and accuracy better, to solve the problem, an adaptive clustering algorithm for image matching based on corner feature is proposed in this paper. The method is based on the similarity of the matching pairs of vector pairs, and the adaptive clustering is performed on the matching point pairs. Harris corner detection is carried out first, the feature points of the reference image and the perceived image are extracted, and the feature points of the two images are first matched by Normalized Cross Correlation (NCC) function. Then, using the improved algorithm proposed in this paper, the matching results are clustered to reduce the ineffective operation and improve the matching speed and robustness. Finally, the Random Sample Consensus (RANSAC) algorithm is used to match the matching points after clustering. The experimental results show that the proposed algorithm can effectively eliminate the most wrong matching points while the correct matching points are retained, and improve the accuracy of RANSAC matching, reduce the computation load of whole matching process at the same time.
Yang, Yao Ming; Jia, Ruo; Xun, Hui; Yang, Jie; Chen, Qiang; Zeng, Xiang Guang; Yang, Ming
2018-02-21
Simulium quinquestriatum Shiraki (Diptera: Simuliidae), a human-biting fly that is distributed widely across Asia, is a vector for multiple pathogens. However, the larval development of this species is poorly understood. In this study, we determined the number of instars in this pest using three batches of field-collected larvae from Guiyang, Guizhou, China. The postgenal length, head capsule width, mandibular phragma length, and body length of 773 individuals were measured, and k-means clustering was used for instar grouping. Four distance measures-Manhattan, Euclidean, Chebyshev, and Canberra-were determined. The reported instar numbers, ranging from 4 to 11, were set as initial cluster centers for k-means clustering. The Canberra distance yielded reliable instar grouping, which was consistent with the first instar, as characterized by egg bursters and prepupae with dark histoblasts. Females and males of the last cluster of larvae were identified using Feulgen-stained gonads. Morphometric differences between the two sexes were not significant. Validation was performed using the Brooks-Dyar and Crosby rules, revealing that the larval stage of S. quinquestriatum is composed of eight instars.
Reducing the Volume of NASA Earth-Science Data
NASA Technical Reports Server (NTRS)
Lee, Seungwon; Braverman, Amy J.; Guillaume, Alexandre
2010-01-01
A computer program reduces data generated by NASA Earth-science missions into representative clusters characterized by centroids and membership information, thereby reducing the large volume of data to a level more amenable to analysis. The program effects an autonomous data-reduction/clustering process to produce a representative distribution and joint relationships of the data, without assuming a specific type of distribution and relationship and without resorting to domain-specific knowledge about the data. The program implements a combination of a data-reduction algorithm known as the entropy-constrained vector quantization (ECVQ) and an optimization algorithm known as the differential evolution (DE). The combination of algorithms generates the Pareto front of clustering solutions that presents the compromise between the quality of the reduced data and the degree of reduction. Similar prior data-reduction computer programs utilize only a clustering algorithm, the parameters of which are tuned manually by users. In the present program, autonomous optimization of the parameters by means of the DE supplants the manual tuning of the parameters. Thus, the program determines the best set of clustering solutions without human intervention.
High performance geospatial and climate data visualization using GeoJS
NASA Astrophysics Data System (ADS)
Chaudhary, A.; Beezley, J. D.
2015-12-01
GeoJS (https://github.com/OpenGeoscience/geojs) is an open-source library developed to support interactive scientific and geospatial visualization of climate and earth science datasets in a web environment. GeoJS has a convenient application programming interface (API) that enables users to harness the fast performance of WebGL and Canvas 2D APIs with sophisticated Scalable Vector Graphics (SVG) features in a consistent and convenient manner. We started the project in response to the need for an open-source JavaScript library that can combine traditional geographic information systems (GIS) and scientific visualization on the web. Many libraries, some of which are open source, support mapping or other GIS capabilities, but lack the features required to visualize scientific and other geospatial datasets. For instance, such libraries are not be capable of rendering climate plots from NetCDF files, and some libraries are limited in regards to geoinformatics (infovis in a geospatial environment). While libraries such as d3.js are extremely powerful for these kinds of plots, in order to integrate them into other GIS libraries, the construction of geoinformatics visualizations must be completed manually and separately, or the code must somehow be mixed in an unintuitive way.We developed GeoJS with the following motivations:• To create an open-source geovisualization and GIS library that combines scientific visualization with GIS and informatics• To develop an extensible library that can combine data from multiple sources and render them using multiple backends• To build a library that works well with existing scientific visualizations tools such as VTKWe have successfully deployed GeoJS-based applications for multiple domains across various projects. The ClimatePipes project funded by the Department of Energy, for example, used GeoJS to visualize NetCDF datasets from climate data archives. Other projects built visualizations using GeoJS for interactively exploring data and analysis regarding 1) the human trafficking domain, 2) New York City taxi drop-offs and pick-ups, and 3) the Ebola outbreak. GeoJS supports advanced visualization features such as picking and selecting, as well as clustering. It also supports 2D contour plots, vector plots, heat maps, and geospatial graphs.