An Extraction Method of an Informative DOM Node from a Web Page by Using Layout Information
NASA Astrophysics Data System (ADS)
Tsuruta, Masanobu; Masuyama, Shigeru
We propose an informative DOM node extraction method from a Web page for preprocessing of Web content mining. Our proposed method LM uses layout data of DOM nodes generated by a generic Web browser, and the learning set consists of hundreds of Web pages and the annotations of informative DOM nodes of those Web pages. Our method does not require large scale crawling of the whole Web site to which the target Web page belongs. We design LM so that it uses the information of the learning set more efficiently in comparison to the existing method that uses the same learning set. By experiments, we evaluate the methods obtained by combining one that consists of the method for extracting the informative DOM node both the proposed method and the existing methods, and the existing noise elimination methods: Heur removes advertisements and link-lists by some heuristics and CE removes the DOM nodes existing in the Web pages in the same Web site to which the target Web page belongs. Experimental results show that 1) LM outperforms other methods for extracting the informative DOM node, 2) the combination method (LM, {CE(10), Heur}) based on LM (precision: 0.755, recall: 0.826, F-measure: 0.746) outperforms other combination methods.
Prediction of heterotrimeric protein complexes by two-phase learning using neighboring kernels
2014-01-01
Background Protein complexes play important roles in biological systems such as gene regulatory networks and metabolic pathways. Most methods for predicting protein complexes try to find protein complexes with size more than three. It, however, is known that protein complexes with smaller sizes occupy a large part of whole complexes for several species. In our previous work, we developed a method with several feature space mappings and the domain composition kernel for prediction of heterodimeric protein complexes, which outperforms existing methods. Results We propose methods for prediction of heterotrimeric protein complexes by extending techniques in the previous work on the basis of the idea that most heterotrimeric protein complexes are not likely to share the same protein with each other. We make use of the discriminant function in support vector machines (SVMs), and design novel feature space mappings for the second phase. As the second classifier, we examine SVMs and relevance vector machines (RVMs). We perform 10-fold cross-validation computational experiments. The results suggest that our proposed two-phase methods and SVM with the extended features outperform the existing method NWE, which was reported to outperform other existing methods such as MCL, MCODE, DPClus, CMC, COACH, RRW, and PPSampler for prediction of heterotrimeric protein complexes. Conclusions We propose two-phase prediction methods with the extended features, the domain composition kernel, SVMs and RVMs. The two-phase method with the extended features and the domain composition kernel using SVM as the second classifier is particularly useful for prediction of heterotrimeric protein complexes. PMID:24564744
Classification of hyperspectral imagery with neural networks: comparison to conventional tools
NASA Astrophysics Data System (ADS)
Merényi, Erzsébet; Farrand, William H.; Taranik, James V.; Minor, Timothy B.
2014-12-01
Efficient exploitation of hyperspectral imagery is of great importance in remote sensing. Artificial intelligence approaches have been receiving favorable reviews for classification of hyperspectral data because the complexity of such data challenges the limitations of many conventional methods. Artificial neural networks (ANNs) were shown to outperform traditional classifiers in many situations. However, studies that use the full spectral dimensionality of hyperspectral images to classify a large number of surface covers are scarce if non-existent. We advocate the need for methods that can handle the full dimensionality and a large number of classes to retain the discovery potential and the ability to discriminate classes with subtle spectral differences. We demonstrate that such a method exists in the family of ANNs. We compare the maximum likelihood, Mahalonobis distance, minimum distance, spectral angle mapper, and a hybrid ANN classifier for real hyperspectral AVIRIS data, using the full spectral resolution to map 23 cover types and using a small training set. Rigorous evaluation of the classification accuracies shows that the ANN outperforms the other methods and achieves ≈90% accuracy on test data.
A new family of Polak-Ribiere-Polyak conjugate gradient method with the strong-Wolfe line search
NASA Astrophysics Data System (ADS)
Ghani, Nur Hamizah Abdul; Mamat, Mustafa; Rivaie, Mohd
2017-08-01
Conjugate gradient (CG) method is an important technique in unconstrained optimization, due to its effectiveness and low memory requirements. The focus of this paper is to introduce a new CG method for solving large scale unconstrained optimization. Theoretical proofs show that the new method fulfills sufficient descent condition if strong Wolfe-Powell inexact line search is used. Besides, computational results show that our proposed method outperforms to other existing CG methods.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tuo, Rui; Wu, C. F. Jeff
Many computer models contain unknown parameters which need to be estimated using physical observations. Furthermore, the calibration method based on Gaussian process models may lead to unreasonable estimate for imperfect computer models. In this work, we extend their study to calibration problems with stochastic physical data. We propose a novel method, called the L 2 calibration, and show its semiparametric efficiency. The conventional method of the ordinary least squares is also studied. Theoretical analysis shows that it is consistent but not efficient. Here, numerical examples show that the proposed method outperforms the existing ones.
A reconsideration of negative ratings for network-based recommendation
NASA Astrophysics Data System (ADS)
Hu, Liang; Ren, Liang; Lin, Wenbin
2018-01-01
Recommendation algorithms based on bipartite networks have become increasingly popular, thanks to their accuracy and flexibility. Currently, many of these methods ignore users' negative ratings. In this work, we propose a method to exploit negative ratings for the network-based inference algorithm. We find that negative ratings play a positive role regardless of sparsity of data sets. Furthermore, we improve the efficiency of our method and compare it with the state-of-the-art algorithms. Experimental results show that the present method outperforms the existing algorithms.
Predicting missing links via correlation between nodes
NASA Astrophysics Data System (ADS)
Liao, Hao; Zeng, An; Zhang, Yi-Cheng
2015-10-01
As a fundamental problem in many different fields, link prediction aims to estimate the likelihood of an existing link between two nodes based on the observed information. Since this problem is related to many applications ranging from uncovering missing data to predicting the evolution of networks, link prediction has been intensively investigated recently and many methods have been proposed so far. The essential challenge of link prediction is to estimate the similarity between nodes. Most of the existing methods are based on the common neighbor index and its variants. In this paper, we propose to calculate the similarity between nodes by the Pearson correlation coefficient. This method is found to be very effective when applied to calculate similarity based on high order paths. We finally fuse the correlation-based method with the resource allocation method, and find that the combined method can substantially outperform the existing methods, especially in sparse networks.
Li, Jian-Long; Wang, Peng; Fung, Wing Kam; Zhou, Ji-Yuan
2017-10-16
For dichotomous traits, the generalized disequilibrium test with the moment estimate of the variance (GDT-ME) is a powerful family-based association method. Genomic imprinting is an important epigenetic phenomenon and currently, there has been increasing interest of incorporating imprinting to improve the test power of association analysis. However, GDT-ME does not take imprinting effects into account, and it has not been investigated whether it can be used for association analysis when the effects indeed exist. In this article, based on a novel decomposition of the genotype score according to the paternal or maternal source of the allele, we propose the generalized disequilibrium test with imprinting (GDTI) for complete pedigrees without any missing genotypes. Then, we extend GDTI and GDT-ME to accommodate incomplete pedigrees with some pedigrees having missing genotypes, by using a Monte Carlo (MC) sampling and estimation scheme to infer missing genotypes given available genotypes in each pedigree, denoted by MCGDTI and MCGDT-ME, respectively. The proposed GDTI and MCGDTI methods evaluate the differences of the paternal as well as maternal allele scores for all discordant relative pairs in a pedigree, including beyond first-degree relative pairs. Advantages of the proposed GDTI and MCGDTI test statistics over existing methods are demonstrated by simulation studies under various simulation settings and by application to the rheumatoid arthritis dataset. Simulation results show that the proposed tests control the size well under the null hypothesis of no association, and outperform the existing methods under various imprinting effect models. The existing GDT-ME and the proposed MCGDT-ME can be used to test for association even when imprinting effects exist. For the application to the rheumatoid arthritis data, compared to the existing methods, MCGDTI identifies more loci statistically significantly associated with the disease. Under complete and incomplete imprinting effect models, our proposed GDTI and MCGDTI methods, by considering the information on imprinting effects and all discordant relative pairs within each pedigree, outperform all the existing test statistics and MCGDTI can recapture much of the missing information. Therefore, MCGDTI is recommended in practice.
Efficient calibration for imperfect computer models
Tuo, Rui; Wu, C. F. Jeff
2015-12-01
Many computer models contain unknown parameters which need to be estimated using physical observations. Furthermore, the calibration method based on Gaussian process models may lead to unreasonable estimate for imperfect computer models. In this work, we extend their study to calibration problems with stochastic physical data. We propose a novel method, called the L 2 calibration, and show its semiparametric efficiency. The conventional method of the ordinary least squares is also studied. Theoretical analysis shows that it is consistent but not efficient. Here, numerical examples show that the proposed method outperforms the existing ones.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Niu, S; Zhang, Y; Ma, J
Purpose: To investigate iterative reconstruction via prior image constrained total generalized variation (PICTGV) for spectral computed tomography (CT) using fewer projections while achieving greater image quality. Methods: The proposed PICTGV method is formulated as an optimization problem, which balances the data fidelity and prior image constrained total generalized variation of reconstructed images in one framework. The PICTGV method is based on structure correlations among images in the energy domain and high-quality images to guide the reconstruction of energy-specific images. In PICTGV method, the high-quality image is reconstructed from all detector-collected X-ray signals and is referred as the broad-spectrum image. Distinctmore » from the existing reconstruction methods applied on the images with first order derivative, the higher order derivative of the images is incorporated into the PICTGV method. An alternating optimization algorithm is used to minimize the PICTGV objective function. We evaluate the performance of PICTGV on noise and artifacts suppressing using phantom studies and compare the method with the conventional filtered back-projection method as well as TGV based method without prior image. Results: On the digital phantom, the proposed method outperforms the existing TGV method in terms of the noise reduction, artifacts suppression, and edge detail preservation. Compared to that obtained by the TGV based method without prior image, the relative root mean square error in the images reconstructed by the proposed method is reduced by over 20%. Conclusion: The authors propose an iterative reconstruction via prior image constrained total generalize variation for spectral CT. Also, we have developed an alternating optimization algorithm and numerically demonstrated the merits of our approach. Results show that the proposed PICTGV method outperforms the TGV method for spectral CT.« less
NASA Astrophysics Data System (ADS)
Mansourian, Leila; Taufik Abdullah, Muhamad; Nurliyana Abdullah, Lili; Azman, Azreen; Mustaffa, Mas Rina
2017-02-01
Pyramid Histogram of Words (PHOW), combined Bag of Visual Words (BoVW) with the spatial pyramid matching (SPM) in order to add location information to extracted features. However, different PHOW extracted from various color spaces, and they did not extract color information individually, that means they discard color information, which is an important characteristic of any image that is motivated by human vision. This article, concatenated PHOW Multi-Scale Dense Scale Invariant Feature Transform (MSDSIFT) histogram and a proposed Color histogram to improve the performance of existing image classification algorithms. Performance evaluation on several datasets proves that the new approach outperforms other existing, state-of-the-art methods.
Storyline Visualization: A Compelling Way to Understand Patterns over Time and Space
DOE Office of Scientific and Technical Information (OSTI.GOV)
None
2017-10-16
Storyline visualization is a compelling way to understand patterns over time and space. Much effort has been spent developing efficient and aesthetically pleasing layout optimization algorithms. But what if those algorithms are optimizing the wrong things? To answer this question, we conducted a design study with different storyline layout algorithms. We found that users with our new design principles for storyline visualization outperform existing methods.
Multiple network alignment via multiMAGNA+.
Vijayan, Vipin; Milenkovic, Tijana
2017-08-21
Network alignment (NA) aims to find a node mapping that identifies topologically or functionally similar network regions between molecular networks of different species. Analogous to genomic sequence alignment, NA can be used to transfer biological knowledge from well- to poorly-studied species between aligned network regions. Pairwise NA (PNA) finds similar regions between two networks while multiple NA (MNA) can align more than two networks. We focus on MNA. Existing MNA methods aim to maximize total similarity over all aligned nodes (node conservation). Then, they evaluate alignment quality by measuring the amount of conserved edges, but only after the alignment is constructed. Directly optimizing edge conservation during alignment construction in addition to node conservation may result in superior alignments. Thus, we present a novel MNA method called multiMAGNA++ that can achieve this. Indeed, multiMAGNA++ outperforms or is on par with existing MNA methods, while often completing faster than existing methods. That is, multiMAGNA++ scales well to larger network data and can be parallelized effectively. During method evaluation, we also introduce new MNA quality measures to allow for more fair MNA method comparison compared to the existing alignment quality measures. MultiMAGNA++ code is available on the method's web page at http://nd.edu/~cone/multiMAGNA++/.
Joint histogram-based cost aggregation for stereo matching.
Min, Dongbo; Lu, Jiangbo; Do, Minh N
2013-10-01
This paper presents a novel method for performing efficient cost aggregation in stereo matching. The cost aggregation problem is reformulated from the perspective of a histogram, giving us the potential to reduce the complexity of the cost aggregation in stereo matching significantly. Differently from previous methods which have tried to reduce the complexity in terms of the size of an image and a matching window, our approach focuses on reducing the computational redundancy that exists among the search range, caused by a repeated filtering for all the hypotheses. Moreover, we also reduce the complexity of the window-based filtering through an efficient sampling scheme inside the matching window. The tradeoff between accuracy and complexity is extensively investigated by varying the parameters used in the proposed method. Experimental results show that the proposed method provides high-quality disparity maps with low complexity and outperforms existing local methods. This paper also provides new insights into complexity-constrained stereo-matching algorithm design.
Hybrid recommendation methods in complex networks.
Fiasconaro, A; Tumminello, M; Nicosia, V; Latora, V; Mantegna, R N
2015-07-01
We propose two recommendation methods, based on the appropriate normalization of already existing similarity measures, and on the convex combination of the recommendation scores derived from similarity between users and between objects. We validate the proposed measures on three data sets, and we compare the performance of our methods to other recommendation systems recently proposed in the literature. We show that the proposed similarity measures allow us to attain an improvement of performances of up to 20% with respect to existing nonparametric methods, and that the accuracy of a recommendation can vary widely from one specific bipartite network to another, which suggests that a careful choice of the most suitable method is highly relevant for an effective recommendation on a given system. Finally, we study how an increasing presence of random links in the network affects the recommendation scores, finding that one of the two recommendation algorithms introduced here can systematically outperform the others in noisy data sets.
FIND: difFerential chromatin INteractions Detection using a spatial Poisson process
Chen, Yang; Zhang, Michael Q.
2018-01-01
Polymer-based simulations and experimental studies indicate the existence of a spatial dependency between the adjacent DNA fibers involved in the formation of chromatin loops. However, the existing strategies for detecting differential chromatin interactions assume that the interacting segments are spatially independent from the other segments nearby. To resolve this issue, we developed a new computational method, FIND, which considers the local spatial dependency between interacting loci. FIND uses a spatial Poisson process to detect differential chromatin interactions that show a significant difference in their interaction frequency and the interaction frequency of their neighbors. Simulation and biological data analysis show that FIND outperforms the widely used count-based methods and has a better signal-to-noise ratio. PMID:29440282
A hybrid frame concealment algorithm for H.264/AVC.
Yan, Bo; Gharavi, Hamid
2010-01-01
In packet-based video transmissions, packets loss due to channel errors may result in the loss of the whole video frame. Recently, many error concealment algorithms have been proposed in order to combat channel errors; however, most of the existing algorithms can only deal with the loss of macroblocks and are not able to conceal the whole missing frame. In order to resolve this problem, in this paper, we have proposed a new hybrid motion vector extrapolation (HMVE) algorithm to recover the whole missing frame, and it is able to provide more accurate estimation for the motion vectors of the missing frame than other conventional methods. Simulation results show that it is highly effective and significantly outperforms other existing frame recovery methods.
Classifying medical relations in clinical text via convolutional neural networks.
He, Bin; Guan, Yi; Dai, Rui
2018-05-16
Deep learning research on relation classification has achieved solid performance in the general domain. This study proposes a convolutional neural network (CNN) architecture with a multi-pooling operation for medical relation classification on clinical records and explores a loss function with a category-level constraint matrix. Experiments using the 2010 i2b2/VA relation corpus demonstrate these models, which do not depend on any external features, outperform previous single-model methods and our best model is competitive with the existing ensemble-based method. Copyright © 2018. Published by Elsevier B.V.
Waytowich, Nicholas R.; Lawhern, Vernon J.; Bohannon, Addison W.; Ball, Kenneth R.; Lance, Brent J.
2016-01-01
Recent advances in signal processing and machine learning techniques have enabled the application of Brain-Computer Interface (BCI) technologies to fields such as medicine, industry, and recreation; however, BCIs still suffer from the requirement of frequent calibration sessions due to the intra- and inter-individual variability of brain-signals, which makes calibration suppression through transfer learning an area of increasing interest for the development of practical BCI systems. In this paper, we present an unsupervised transfer method (spectral transfer using information geometry, STIG), which ranks and combines unlabeled predictions from an ensemble of information geometry classifiers built on data from individual training subjects. The STIG method is validated in both off-line and real-time feedback analysis during a rapid serial visual presentation task (RSVP). For detection of single-trial, event-related potentials (ERPs), the proposed method can significantly outperform existing calibration-free techniques as well as outperform traditional within-subject calibration techniques when limited data is available. This method demonstrates that unsupervised transfer learning for single-trial detection in ERP-based BCIs can be achieved without the requirement of costly training data, representing a step-forward in the overall goal of achieving a practical user-independent BCI system. PMID:27713685
Waytowich, Nicholas R; Lawhern, Vernon J; Bohannon, Addison W; Ball, Kenneth R; Lance, Brent J
2016-01-01
Recent advances in signal processing and machine learning techniques have enabled the application of Brain-Computer Interface (BCI) technologies to fields such as medicine, industry, and recreation; however, BCIs still suffer from the requirement of frequent calibration sessions due to the intra- and inter-individual variability of brain-signals, which makes calibration suppression through transfer learning an area of increasing interest for the development of practical BCI systems. In this paper, we present an unsupervised transfer method (spectral transfer using information geometry, STIG), which ranks and combines unlabeled predictions from an ensemble of information geometry classifiers built on data from individual training subjects. The STIG method is validated in both off-line and real-time feedback analysis during a rapid serial visual presentation task (RSVP). For detection of single-trial, event-related potentials (ERPs), the proposed method can significantly outperform existing calibration-free techniques as well as outperform traditional within-subject calibration techniques when limited data is available. This method demonstrates that unsupervised transfer learning for single-trial detection in ERP-based BCIs can be achieved without the requirement of costly training data, representing a step-forward in the overall goal of achieving a practical user-independent BCI system.
ERIC Educational Resources Information Center
Kember, David
2016-01-01
One of the major current issues in education is the question of why Chinese and East Asian students are outperforming those from Western countries. Research into the approaches to learning of Chinese students revealed the existence of intermediate approaches, combining memorising and understanding, which were distinct from rote learning. At the…
Hybrid statistics-simulations based method for atom-counting from ADF STEM images.
De Wael, Annelies; De Backer, Annick; Jones, Lewys; Nellist, Peter D; Van Aert, Sandra
2017-06-01
A hybrid statistics-simulations based method for atom-counting from annular dark field scanning transmission electron microscopy (ADF STEM) images of monotype crystalline nanostructures is presented. Different atom-counting methods already exist for model-like systems. However, the increasing relevance of radiation damage in the study of nanostructures demands a method that allows atom-counting from low dose images with a low signal-to-noise ratio. Therefore, the hybrid method directly includes prior knowledge from image simulations into the existing statistics-based method for atom-counting, and accounts in this manner for possible discrepancies between actual and simulated experimental conditions. It is shown by means of simulations and experiments that this hybrid method outperforms the statistics-based method, especially for low electron doses and small nanoparticles. The analysis of a simulated low dose image of a small nanoparticle suggests that this method allows for far more reliable quantitative analysis of beam-sensitive materials. Copyright © 2017 Elsevier B.V. All rights reserved.
Analysis of Genome-Wide Association Studies with Multiple Outcomes Using Penalization
Liu, Jin; Huang, Jian; Ma, Shuangge
2012-01-01
Genome-wide association studies have been extensively conducted, searching for markers for biologically meaningful outcomes and phenotypes. Penalization methods have been adopted in the analysis of the joint effects of a large number of SNPs (single nucleotide polymorphisms) and marker identification. This study is partly motivated by the analysis of heterogeneous stock mice dataset, in which multiple correlated phenotypes and a large number of SNPs are available. Existing penalization methods designed to analyze a single response variable cannot accommodate the correlation among multiple response variables. With multiple response variables sharing the same set of markers, joint modeling is first employed to accommodate the correlation. The group Lasso approach is adopted to select markers associated with all the outcome variables. An efficient computational algorithm is developed. Simulation study and analysis of the heterogeneous stock mice dataset show that the proposed method can outperform existing penalization methods. PMID:23272092
Schouten, Kim; van der Weijde, Onne; Frasincar, Flavius; Dekker, Rommert
2018-04-01
Using online consumer reviews as electronic word of mouth to assist purchase-decision making has become increasingly popular. The Web provides an extensive source of consumer reviews, but one can hardly read all reviews to obtain a fair evaluation of a product or service. A text processing framework that can summarize reviews, would therefore be desirable. A subtask to be performed by such a framework would be to find the general aspect categories addressed in review sentences, for which this paper presents two methods. In contrast to most existing approaches, the first method presented is an unsupervised method that applies association rule mining on co-occurrence frequency data obtained from a corpus to find these aspect categories. While not on par with state-of-the-art supervised methods, the proposed unsupervised method performs better than several simple baselines, a similar but supervised method, and a supervised baseline, with an -score of 67%. The second method is a supervised variant that outperforms existing methods with an -score of 84%.
Walking on a user similarity network towards personalized recommendations.
Gan, Mingxin
2014-01-01
Personalized recommender systems have been receiving more and more attention in addressing the serious problem of information overload accompanying the rapid evolution of the world-wide-web. Although traditional collaborative filtering approaches based on similarities between users have achieved remarkable success, it has been shown that the existence of popular objects may adversely influence the correct scoring of candidate objects, which lead to unreasonable recommendation results. Meanwhile, recent advances have demonstrated that approaches based on diffusion and random walk processes exhibit superior performance over collaborative filtering methods in both the recommendation accuracy and diversity. Building on these results, we adopt three strategies (power-law adjustment, nearest neighbor, and threshold filtration) to adjust a user similarity network from user similarity scores calculated on historical data, and then propose a random walk with restart model on the constructed network to achieve personalized recommendations. We perform cross-validation experiments on two real data sets (MovieLens and Netflix) and compare the performance of our method against the existing state-of-the-art methods. Results show that our method outperforms existing methods in not only recommendation accuracy and diversity, but also retrieval performance.
Yamagata, Koichi; Yamanishi, Ayako; Kokubu, Chikara; Takeda, Junji; Sese, Jun
2016-01-01
An important challenge in cancer genomics is precise detection of structural variations (SVs) by high-throughput short-read sequencing, which is hampered by the high false discovery rates of existing analysis tools. Here, we propose an accurate SV detection method named COSMOS, which compares the statistics of the mapped read pairs in tumor samples with isogenic normal control samples in a distinct asymmetric manner. COSMOS also prioritizes the candidate SVs using strand-specific read-depth information. Performance tests on modeled tumor genomes revealed that COSMOS outperformed existing methods in terms of F-measure. We also applied COSMOS to an experimental mouse cell-based model, in which SVs were induced by genome engineering and gamma-ray irradiation, followed by polymerase chain reaction-based confirmation. The precision of COSMOS was 84.5%, while the next best existing method was 70.4%. Moreover, the sensitivity of COSMOS was the highest, indicating that COSMOS has great potential for cancer genome analysis. PMID:26833260
FIND: difFerential chromatin INteractions Detection using a spatial Poisson process.
Djekidel, Mohamed Nadhir; Chen, Yang; Zhang, Michael Q
2018-02-12
Polymer-based simulations and experimental studies indicate the existence of a spatial dependency between the adjacent DNA fibers involved in the formation of chromatin loops. However, the existing strategies for detecting differential chromatin interactions assume that the interacting segments are spatially independent from the other segments nearby. To resolve this issue, we developed a new computational method, FIND, which considers the local spatial dependency between interacting loci. FIND uses a spatial Poisson process to detect differential chromatin interactions that show a significant difference in their interaction frequency and the interaction frequency of their neighbors. Simulation and biological data analysis show that FIND outperforms the widely used count-based methods and has a better signal-to-noise ratio. © 2018 Djekidel et al.; Published by Cold Spring Harbor Laboratory Press.
Mao, Wenzhi; Kaya, Cihan; Dutta, Anindita; Horovitz, Amnon; Bahar, Ivet
2015-06-15
With rapid accumulation of sequence data on several species, extracting rational and systematic information from multiple sequence alignments (MSAs) is becoming increasingly important. Currently, there is a plethora of computational methods for investigating coupled evolutionary changes in pairs of positions along the amino acid sequence, and making inferences on structure and function. Yet, the significance of coevolution signals remains to be established. Also, a large number of false positives (FPs) arise from insufficient MSA size, phylogenetic background and indirect couplings. Here, a set of 16 pairs of non-interacting proteins is thoroughly examined to assess the effectiveness and limitations of different methods. The analysis shows that recent computationally expensive methods designed to remove biases from indirect couplings outperform others in detecting tertiary structural contacts as well as eliminating intermolecular FPs; whereas traditional methods such as mutual information benefit from refinements such as shuffling, while being highly efficient. Computations repeated with 2,330 pairs of protein families from the Negatome database corroborated these results. Finally, using a training dataset of 162 families of proteins, we propose a combined method that outperforms existing individual methods. Overall, the study provides simple guidelines towards the choice of suitable methods and strategies based on available MSA size and computing resources. Software is freely available through the Evol component of ProDy API. © The Author 2015. Published by Oxford University Press.
Automatic Authorship Detection Using Textual Patterns Extracted from Integrated Syntactic Graphs
Gómez-Adorno, Helena; Sidorov, Grigori; Pinto, David; Vilariño, Darnes; Gelbukh, Alexander
2016-01-01
We apply the integrated syntactic graph feature extraction methodology to the task of automatic authorship detection. This graph-based representation allows integrating different levels of language description into a single structure. We extract textual patterns based on features obtained from shortest path walks over integrated syntactic graphs and apply them to determine the authors of documents. On average, our method outperforms the state of the art approaches and gives consistently high results across different corpora, unlike existing methods. Our results show that our textual patterns are useful for the task of authorship attribution. PMID:27589740
Gaussian Multiscale Aggregation Applied to Segmentation in Hand Biometrics
de Santos Sierra, Alberto; Ávila, Carmen Sánchez; Casanova, Javier Guerra; del Pozo, Gonzalo Bailador
2011-01-01
This paper presents an image segmentation algorithm based on Gaussian multiscale aggregation oriented to hand biometric applications. The method is able to isolate the hand from a wide variety of background textures such as carpets, fabric, glass, grass, soil or stones. The evaluation was carried out by using a publicly available synthetic database with 408,000 hand images in different backgrounds, comparing the performance in terms of accuracy and computational cost to two competitive segmentation methods existing in literature, namely Lossy Data Compression (LDC) and Normalized Cuts (NCuts). The results highlight that the proposed method outperforms current competitive segmentation methods with regard to computational cost, time performance, accuracy and memory usage. PMID:22247658
Gaussian multiscale aggregation applied to segmentation in hand biometrics.
de Santos Sierra, Alberto; Avila, Carmen Sánchez; Casanova, Javier Guerra; del Pozo, Gonzalo Bailador
2011-01-01
This paper presents an image segmentation algorithm based on Gaussian multiscale aggregation oriented to hand biometric applications. The method is able to isolate the hand from a wide variety of background textures such as carpets, fabric, glass, grass, soil or stones. The evaluation was carried out by using a publicly available synthetic database with 408,000 hand images in different backgrounds, comparing the performance in terms of accuracy and computational cost to two competitive segmentation methods existing in literature, namely Lossy Data Compression (LDC) and Normalized Cuts (NCuts). The results highlight that the proposed method outperforms current competitive segmentation methods with regard to computational cost, time performance, accuracy and memory usage.
A new distributed systems scheduling algorithm: a swarm intelligence approach
NASA Astrophysics Data System (ADS)
Haghi Kashani, Mostafa; Sarvizadeh, Raheleh; Jameii, Mahdi
2011-12-01
The scheduling problem in distributed systems is known as an NP-complete problem, and methods based on heuristic or metaheuristic search have been proposed to obtain optimal and suboptimal solutions. The task scheduling is a key factor for distributed systems to gain better performance. In this paper, an efficient method based on memetic algorithm is developed to solve the problem of distributed systems scheduling. With regard to load balancing efficiently, Artificial Bee Colony (ABC) has been applied as local search in the proposed memetic algorithm. The proposed method has been compared to existing memetic-Based approach in which Learning Automata method has been used as local search. The results demonstrated that the proposed method outperform the above mentioned method in terms of communication cost.
Sengupta Chattopadhyay, Amrita; Hsiao, Ching-Lin; Chang, Chien Ching; Lian, Ie-Bin; Fann, Cathy S J
2014-01-01
Identifying susceptibility genes that influence complex diseases is extremely difficult because loci often influence the disease state through genetic interactions. Numerous approaches to detect disease-associated SNP-SNP interactions have been developed, but none consistently generates high-quality results under different disease scenarios. Using summarizing techniques to combine a number of existing methods may provide a solution to this problem. Here we used three popular non-parametric methods-Gini, absolute probability difference (APD), and entropy-to develop two novel summary scores, namely principle component score (PCS) and Z-sum score (ZSS), with which to predict disease-associated genetic interactions. We used a simulation study to compare performance of the non-parametric scores, the summary scores, the scaled-sum score (SSS; used in polymorphism interaction analysis (PIA)), and the multifactor dimensionality reduction (MDR). The non-parametric methods achieved high power, but no non-parametric method outperformed all others under a variety of epistatic scenarios. PCS and ZSS, however, outperformed MDR. PCS, ZSS and SSS displayed controlled type-I-errors (<0.05) compared to GS, APDS, ES (>0.05). A real data study using the genetic-analysis-workshop 16 (GAW 16) rheumatoid arthritis dataset identified a number of interesting SNP-SNP interactions. © 2013 Elsevier B.V. All rights reserved.
EMUDRA: Ensemble of Multiple Drug Repositioning Approaches to Improve Prediction Accuracy.
Zhou, Xianxiao; Wang, Minghui; Katsyv, Igor; Irie, Hanna; Zhang, Bin
2018-04-24
Availability of large-scale genomic, epigenetic and proteomic data in complex diseases makes it possible to objectively and comprehensively identify therapeutic targets that can lead to new therapies. The Connectivity Map has been widely used to explore novel indications of existing drugs. However, the prediction accuracy of the existing methods, such as Kolmogorov-Smirnov statistic remains low. Here we present a novel high-performance drug repositioning approach that improves over the state-of-the-art methods. We first designed an expression weighted cosine method (EWCos) to minimize the influence of the uninformative expression changes and then developed an ensemble approach termed EMUDRA (Ensemble of Multiple Drug Repositioning Approaches) to integrate EWCos and three existing state-of-the-art methods. EMUDRA significantly outperformed individual drug repositioning methods when applied to simulated and independent evaluation datasets. We predicted using EMUDRA and experimentally validated an antibiotic rifabutin as an inhibitor of cell growth in triple negative breast cancer. EMUDRA can identify drugs that more effectively target disease gene signatures and will thus be a useful tool for identifying novel therapies for complex diseases and predicting new indications for existing drugs. The EMUDRA R package is available at doi:10.7303/syn11510888. bin.zhang@mssm.edu or zhangb@hotmail.com. Supplementary data are available at Bioinformatics online.
2017-01-01
Mapping gene expression as a quantitative trait using whole genome-sequencing and transcriptome analysis allows to discover the functional consequences of genetic variation. We developed a novel method and ultra-fast software Findr for higly accurate causal inference between gene expression traits using cis-regulatory DNA variations as causal anchors, which improves current methods by taking into consideration hidden confounders and weak regulations. Findr outperformed existing methods on the DREAM5 Systems Genetics challenge and on the prediction of microRNA and transcription factor targets in human lymphoblastoid cells, while being nearly a million times faster. Findr is publicly available at https://github.com/lingfeiwang/findr. PMID:28821014
Walking on a User Similarity Network towards Personalized Recommendations
Gan, Mingxin
2014-01-01
Personalized recommender systems have been receiving more and more attention in addressing the serious problem of information overload accompanying the rapid evolution of the world-wide-web. Although traditional collaborative filtering approaches based on similarities between users have achieved remarkable success, it has been shown that the existence of popular objects may adversely influence the correct scoring of candidate objects, which lead to unreasonable recommendation results. Meanwhile, recent advances have demonstrated that approaches based on diffusion and random walk processes exhibit superior performance over collaborative filtering methods in both the recommendation accuracy and diversity. Building on these results, we adopt three strategies (power-law adjustment, nearest neighbor, and threshold filtration) to adjust a user similarity network from user similarity scores calculated on historical data, and then propose a random walk with restart model on the constructed network to achieve personalized recommendations. We perform cross-validation experiments on two real data sets (MovieLens and Netflix) and compare the performance of our method against the existing state-of-the-art methods. Results show that our method outperforms existing methods in not only recommendation accuracy and diversity, but also retrieval performance. PMID:25489942
Towards an Automated Acoustic Detection System for Free Ranging Elephants.
Zeppelzauer, Matthias; Hensman, Sean; Stoeger, Angela S
The human-elephant conflict is one of the most serious conservation problems in Asia and Africa today. The involuntary confrontation of humans and elephants claims the lives of many animals and humans every year. A promising approach to alleviate this conflict is the development of an acoustic early warning system. Such a system requires the robust automated detection of elephant vocalizations under unconstrained field conditions. Today, no system exists that fulfills these requirements. In this paper, we present a method for the automated detection of elephant vocalizations that is robust to the diverse noise sources present in the field. We evaluate the method on a dataset recorded under natural field conditions to simulate a real-world scenario. The proposed method outperformed existing approaches and robustly and accurately detected elephants. It thus can form the basis for a future automated early warning system for elephants. Furthermore, the method may be a useful tool for scientists in bioacoustics for the study of wildlife recordings.
The Filament Sensor for Near Real-Time Detection of Cytoskeletal Fiber Structures
Eltzner, Benjamin; Wollnik, Carina; Gottschlich, Carsten; Huckemann, Stephan; Rehfeldt, Florian
2015-01-01
A reliable extraction of filament data from microscopic images is of high interest in the analysis of acto-myosin structures as early morphological markers in mechanically guided differentiation of human mesenchymal stem cells and the understanding of the underlying fiber arrangement processes. In this paper, we propose the filament sensor (FS), a fast and robust processing sequence which detects and records location, orientation, length, and width for each single filament of an image, and thus allows for the above described analysis. The extraction of these features has previously not been possible with existing methods. We evaluate the performance of the proposed FS in terms of accuracy and speed in comparison to three existing methods with respect to their limited output. Further, we provide a benchmark dataset of real cell images along with filaments manually marked by a human expert as well as simulated benchmark images. The FS clearly outperforms existing methods in terms of computational runtime and filament extraction accuracy. The implementation of the FS and the benchmark database are available as open source. PMID:25996921
Yamagata, Koichi; Yamanishi, Ayako; Kokubu, Chikara; Takeda, Junji; Sese, Jun
2016-05-05
An important challenge in cancer genomics is precise detection of structural variations (SVs) by high-throughput short-read sequencing, which is hampered by the high false discovery rates of existing analysis tools. Here, we propose an accurate SV detection method named COSMOS, which compares the statistics of the mapped read pairs in tumor samples with isogenic normal control samples in a distinct asymmetric manner. COSMOS also prioritizes the candidate SVs using strand-specific read-depth information. Performance tests on modeled tumor genomes revealed that COSMOS outperformed existing methods in terms of F-measure. We also applied COSMOS to an experimental mouse cell-based model, in which SVs were induced by genome engineering and gamma-ray irradiation, followed by polymerase chain reaction-based confirmation. The precision of COSMOS was 84.5%, while the next best existing method was 70.4%. Moreover, the sensitivity of COSMOS was the highest, indicating that COSMOS has great potential for cancer genome analysis. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
On the Quantification of Cellular Velocity Fields.
Vig, Dhruv K; Hamby, Alex E; Wolgemuth, Charles W
2016-04-12
The application of flow visualization in biological systems is becoming increasingly common in studies ranging from intracellular transport to the movements of whole organisms. In cell biology, the standard method for measuring cell-scale flows and/or displacements has been particle image velocimetry (PIV); however, alternative methods exist, such as optical flow constraint. Here we review PIV and optical flow, focusing on the accuracy and efficiency of these methods in the context of cellular biophysics. Although optical flow is not as common, a relatively simple implementation of this method can outperform PIV and is easily augmented to extract additional biophysical/chemical information such as local vorticity or net polymerization rates from speckle microscopy. Copyright © 2016 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Multiplicative noise removal via a learned dictionary.
Huang, Yu-Mei; Moisan, Lionel; Ng, Michael K; Zeng, Tieyong
2012-11-01
Multiplicative noise removal is a challenging image processing problem, and most existing methods are based on the maximum a posteriori formulation and the logarithmic transformation of multiplicative denoising problems into additive denoising problems. Sparse representations of images have shown to be efficient approaches for image recovery. Following this idea, in this paper, we propose to learn a dictionary from the logarithmic transformed image, and then to use it in a variational model built for noise removal. Extensive experimental results suggest that in terms of visual quality, peak signal-to-noise ratio, and mean absolute deviation error, the proposed algorithm outperforms state-of-the-art methods.
Rational-operator-based depth-from-defocus approach to scene reconstruction.
Li, Ang; Staunton, Richard; Tjahjadi, Tardi
2013-09-01
This paper presents a rational-operator-based approach to depth from defocus (DfD) for the reconstruction of three-dimensional scenes from two-dimensional images, which enables fast DfD computation that is independent of scene textures. Two variants of the approach, one using the Gaussian rational operators (ROs) that are based on the Gaussian point spread function (PSF) and the second based on the generalized Gaussian PSF, are considered. A novel DfD correction method is also presented to further improve the performance of the approach. Experimental results are considered for real scenes and show that both approaches outperform existing RO-based methods.
AUC-Maximized Deep Convolutional Neural Fields for Protein Sequence Labeling.
Wang, Sheng; Sun, Siqi; Xu, Jinbo
2016-09-01
Deep Convolutional Neural Networks (DCNN) has shown excellent performance in a variety of machine learning tasks. This paper presents Deep Convolutional Neural Fields (DeepCNF), an integration of DCNN with Conditional Random Field (CRF), for sequence labeling with an imbalanced label distribution. The widely-used training methods, such as maximum-likelihood and maximum labelwise accuracy, do not work well on imbalanced data. To handle this, we present a new training algorithm called maximum-AUC for DeepCNF. That is, we train DeepCNF by directly maximizing the empirical Area Under the ROC Curve (AUC), which is an unbiased measurement for imbalanced data. To fulfill this, we formulate AUC in a pairwise ranking framework, approximate it by a polynomial function and then apply a gradient-based procedure to optimize it. Our experimental results confirm that maximum-AUC greatly outperforms the other two training methods on 8-state secondary structure prediction and disorder prediction since their label distributions are highly imbalanced and also has similar performance as the other two training methods on solvent accessibility prediction, which has three equally-distributed labels. Furthermore, our experimental results show that our AUC-trained DeepCNF models greatly outperform existing popular predictors of these three tasks. The data and software related to this paper are available at https://github.com/realbigws/DeepCNF_AUC.
AUC-Maximized Deep Convolutional Neural Fields for Protein Sequence Labeling
Wang, Sheng; Sun, Siqi
2017-01-01
Deep Convolutional Neural Networks (DCNN) has shown excellent performance in a variety of machine learning tasks. This paper presents Deep Convolutional Neural Fields (DeepCNF), an integration of DCNN with Conditional Random Field (CRF), for sequence labeling with an imbalanced label distribution. The widely-used training methods, such as maximum-likelihood and maximum labelwise accuracy, do not work well on imbalanced data. To handle this, we present a new training algorithm called maximum-AUC for DeepCNF. That is, we train DeepCNF by directly maximizing the empirical Area Under the ROC Curve (AUC), which is an unbiased measurement for imbalanced data. To fulfill this, we formulate AUC in a pairwise ranking framework, approximate it by a polynomial function and then apply a gradient-based procedure to optimize it. Our experimental results confirm that maximum-AUC greatly outperforms the other two training methods on 8-state secondary structure prediction and disorder prediction since their label distributions are highly imbalanced and also has similar performance as the other two training methods on solvent accessibility prediction, which has three equally-distributed labels. Furthermore, our experimental results show that our AUC-trained DeepCNF models greatly outperform existing popular predictors of these three tasks. The data and software related to this paper are available at https://github.com/realbigws/DeepCNF_AUC. PMID:28884168
Regoui, Chaouki; Durand, Guillaume; Belliveau, Luc; Léger, Serge
2013-01-01
This paper presents a novel hybrid DNA encryption (HyDEn) approach that uses randomized assignments of unique error-correcting DNA Hamming code words for single characters in the extended ASCII set. HyDEn relies on custom-built quaternary codes and a private key used in the randomized assignment of code words and the cyclic permutations applied on the encoded message. Along with its ability to detect and correct errors, HyDEn equals or outperforms existing cryptographic methods and represents a promising in silico DNA steganographic approach. PMID:23984392
Brain medical image diagnosis based on corners with importance-values.
Gao, Linlin; Pan, Haiwei; Li, Qing; Xie, Xiaoqin; Zhang, Zhiqiang; Han, Jinming; Zhai, Xiao
2017-11-21
Brain disorders are one of the top causes of human death. Generally, neurologists analyze brain medical images for diagnosis. In the image analysis field, corners are one of the most important features, which makes corner detection and matching studies essential. However, existing corner detection studies do not consider the domain information of brain. This leads to many useless corners and the loss of significant information. Regarding corner matching, the uncertainty and structure of brain are not employed in existing methods. Moreover, most corner matching studies are used for 3D image registration. They are inapplicable for 2D brain image diagnosis because of the different mechanisms. To address these problems, we propose a novel corner-based brain medical image classification method. Specifically, we automatically extract multilayer texture images (MTIs) which embody diagnostic information from neurologists. Moreover, we present a corner matching method utilizing the uncertainty and structure of brain medical images and a bipartite graph model. Finally, we propose a similarity calculation method for diagnosis. Brain CT and MRI image sets are utilized to evaluate the proposed method. First, classifiers are trained in N-fold cross-validation analysis to produce the best θ and K. Then independent brain image sets are tested to evaluate the classifiers. Moreover, the classifiers are also compared with advanced brain image classification studies. For the brain CT image set, the proposed classifier outperforms the comparison methods by at least 8% on accuracy and 2.4% on F1-score. Regarding the brain MRI image set, the proposed classifier is superior to the comparison methods by more than 7.3% on accuracy and 4.9% on F1-score. Results also demonstrate that the proposed method is robust to different intensity ranges of brain medical image. In this study, we develop a robust corner-based brain medical image classifier. Specifically, we propose a corner detection method utilizing the diagnostic information from neurologists and a corner matching method based on the uncertainty and structure of brain medical images. Additionally, we present a similarity calculation method for brain image classification. Experimental results on two brain image sets show the proposed corner-based brain medical image classifier outperforms the state-of-the-art studies.
A Hyper-Heuristic Ensemble Method for Static Job-Shop Scheduling.
Hart, Emma; Sim, Kevin
2016-01-01
We describe a new hyper-heuristic method NELLI-GP for solving job-shop scheduling problems (JSSP) that evolves an ensemble of heuristics. The ensemble adopts a divide-and-conquer approach in which each heuristic solves a unique subset of the instance set considered. NELLI-GP extends an existing ensemble method called NELLI by introducing a novel heuristic generator that evolves heuristics composed of linear sequences of dispatching rules: each rule is represented using a tree structure and is itself evolved. Following a training period, the ensemble is shown to outperform both existing dispatching rules and a standard genetic programming algorithm on a large set of new test instances. In addition, it obtains superior results on a set of 210 benchmark problems from the literature when compared to two state-of-the-art hyper-heuristic approaches. Further analysis of the relationship between heuristics in the evolved ensemble and the instances each solves provides new insights into features that might describe similar instances.
Product component genealogy modeling and field-failure prediction
DOE Office of Scientific and Technical Information (OSTI.GOV)
King, Caleb; Hong, Yili; Meeker, William Q.
Many industrial products consist of multiple components that are necessary for system operation. There is an abundance of literature on modeling the lifetime of such components through competing risks models. During the life-cycle of a product, it is common for there to be incremental design changes to improve reliability, to reduce costs, or due to changes in availability of certain part numbers. These changes can affect product reliability but are often ignored in system lifetime modeling. By incorporating this information about changes in part numbers over time (information that is readily available in most production databases), better accuracy can bemore » achieved in predicting time to failure, thus yielding more accurate field-failure predictions. This paper presents methods for estimating parameters and predictions for this generational model and a comparison with existing methods through the use of simulation. Our results indicate that the generational model has important practical advantages and outperforms the existing methods in predicting field failures.« less
Product component genealogy modeling and field-failure prediction
King, Caleb; Hong, Yili; Meeker, William Q.
2016-04-13
Many industrial products consist of multiple components that are necessary for system operation. There is an abundance of literature on modeling the lifetime of such components through competing risks models. During the life-cycle of a product, it is common for there to be incremental design changes to improve reliability, to reduce costs, or due to changes in availability of certain part numbers. These changes can affect product reliability but are often ignored in system lifetime modeling. By incorporating this information about changes in part numbers over time (information that is readily available in most production databases), better accuracy can bemore » achieved in predicting time to failure, thus yielding more accurate field-failure predictions. This paper presents methods for estimating parameters and predictions for this generational model and a comparison with existing methods through the use of simulation. Our results indicate that the generational model has important practical advantages and outperforms the existing methods in predicting field failures.« less
Gene regulatory network identification from the yeast cell cycle based on a neuro-fuzzy system.
Wang, B H; Lim, J W; Lim, J S
2016-08-30
Many studies exist for reconstructing gene regulatory networks (GRNs). In this paper, we propose a method based on an advanced neuro-fuzzy system, for gene regulatory network reconstruction from microarray time-series data. This approach uses a neural network with a weighted fuzzy function to model the relationships between genes. Fuzzy rules, which determine the regulators of genes, are very simplified through this method. Additionally, a regulator selection procedure is proposed, which extracts the exact dynamic relationship between genes, using the information obtained from the weighted fuzzy function. Time-series related features are extracted from the original data to employ the characteristics of temporal data that are useful for accurate GRN reconstruction. The microarray dataset of the yeast cell cycle was used for our study. We measured the mean squared prediction error for the efficiency of the proposed approach and evaluated the accuracy in terms of precision, sensitivity, and F-score. The proposed method outperformed the other existing approaches.
NASA Astrophysics Data System (ADS)
Liu, Miaofeng
2017-07-01
In recent years, deep convolutional neural networks come into use in image inpainting and super-resolution in many fields. Distinct to most of the former methods requiring to know beforehand the local information for corrupted pixels, we propose a 20-depth fully convolutional network to learn an end-to-end mapping a dataset of damaged/ground truth subimage pairs realizing non-local blind inpainting and super-resolution. As there often exist image with huge corruptions or inpainting on a low-resolution image that the existing approaches unable to perform well, we also share parameters in local area of layers to achieve spatial recursion and enlarge the receptive field. To avoid the difficulty of training this deep neural network, skip-connections between symmetric convolutional layers are designed. Experimental results shows that the proposed method outperforms state-of-the-art methods for diverse corrupting and low-resolution conditions, it works excellently when realizing super-resolution and image inpainting simultaneously
Sub-pattern based multi-manifold discriminant analysis for face recognition
NASA Astrophysics Data System (ADS)
Dai, Jiangyan; Guo, Changlu; Zhou, Wei; Shi, Yanjiao; Cong, Lin; Yi, Yugen
2018-04-01
In this paper, we present a Sub-pattern based Multi-manifold Discriminant Analysis (SpMMDA) algorithm for face recognition. Unlike existing Multi-manifold Discriminant Analysis (MMDA) approach which is based on holistic information of face image for recognition, SpMMDA operates on sub-images partitioned from the original face image and then extracts the discriminative local feature from the sub-images separately. Moreover, the structure information of different sub-images from the same face image is considered in the proposed method with the aim of further improve the recognition performance. Extensive experiments on three standard face databases (Extended YaleB, CMU PIE and AR) demonstrate that the proposed method is effective and outperforms some other sub-pattern based face recognition methods.
A Penalized Robust Method for Identifying Gene-Environment Interactions
Shi, Xingjie; Liu, Jin; Huang, Jian; Zhou, Yong; Xie, Yang; Ma, Shuangge
2015-01-01
In high-throughput studies, an important objective is to identify gene-environment interactions associated with disease outcomes and phenotypes. Many commonly adopted methods assume specific parametric or semiparametric models, which may be subject to model mis-specification. In addition, they usually use significance level as the criterion for selecting important interactions. In this study, we adopt the rank-based estimation, which is much less sensitive to model specification than some of the existing methods and includes several commonly encountered data and models as special cases. Penalization is adopted for the identification of gene-environment interactions. It achieves simultaneous estimation and identification and does not rely on significance level. For computation feasibility, a smoothed rank estimation is further proposed. Simulation shows that under certain scenarios, for example with contaminated or heavy-tailed data, the proposed method can significantly outperform the existing alternatives with more accurate identification. We analyze a lung cancer prognosis study with gene expression measurements under the AFT (accelerated failure time) model. The proposed method identifies interactions different from those using the alternatives. Some of the identified genes have important implications. PMID:24616063
BPP: a sequence-based algorithm for branch point prediction.
Zhang, Qing; Fan, Xiaodan; Wang, Yejun; Sun, Ming-An; Shao, Jianlin; Guo, Dianjing
2017-10-15
Although high-throughput sequencing methods have been proposed to identify splicing branch points in the human genome, these methods can only detect a small fraction of the branch points subject to the sequencing depth, experimental cost and the expression level of the mRNA. An accurate computational model for branch point prediction is therefore an ongoing objective in human genome research. We here propose a novel branch point prediction algorithm that utilizes information on the branch point sequence and the polypyrimidine tract. Using experimentally validated data, we demonstrate that our proposed method outperforms existing methods. Availability and implementation: https://github.com/zhqingit/BPP. djguo@cuhk.edu.hk. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Identification of influential users by neighbors in online social networks
NASA Astrophysics Data System (ADS)
Sheikhahmadi, Amir; Nematbakhsh, Mohammad Ali; Zareie, Ahmad
2017-11-01
Identification and ranking of influential users in social networks for the sake of news spreading and advertising has recently become an attractive field of research. Given the large number of users in social networks and also the various relations that exist among them, providing an effective method to identify influential users has been gradually considered as an essential factor. In most of the already-provided methods, those users who are located in an appropriate structural position of the network are regarded as influential users. These methods do not usually pay attention to the interactions among users, and also consider those relations as being binary in nature. This paper, therefore, proposes a new method to identify influential users in a social network by considering those interactions that exist among the users. Since users tend to act within the frame of communities, the network is initially divided into different communities. Then the amount of interaction among users is used as a parameter to set the weight of relations existing within the network. Afterward, by determining the neighbors' role for each user, a two-level method is proposed for both detecting users' influence and also ranking them. Simulation and experimental results on twitter data shows that those users who are selected by the proposed method, comparing to other existing ones, are distributed in a more appropriate distance. Moreover, the proposed method outperforms the other ones in terms of both the influential speed and capacity of the users it selects.
Mapping quantitative trait loci for traits defined as ratios.
Yang, Runqing; Li, Jiahan; Xu, Shizhong
2008-03-01
Many traits are defined as ratios of two quantitative traits. Methods of QTL mapping for regular quantitative traits are not optimal when applied to ratios due to lack of normality for traits defined as ratios. We develop a new method of QTL mapping for traits defined as ratios. The new method uses a special linear combination of the two component traits, and thus takes advantage of the normal property of the new variable. Simulation study shows that the new method can substantially increase the statistical power of QTL detection relative to the method which treats ratios as regular quantitative traits. The new method also outperforms the method that uses Box-Cox transformed ratio as the phenotype. A real example of QTL mapping for relative growth rate in soybean demonstrates that the new method can detect more QTL than existing methods of QTL mapping for traits defined as ratios.
An image retrieval framework for real-time endoscopic image retargeting.
Ye, Menglong; Johns, Edward; Walter, Benjamin; Meining, Alexander; Yang, Guang-Zhong
2017-08-01
Serial endoscopic examinations of a patient are important for early diagnosis of malignancies in the gastrointestinal tract. However, retargeting for optical biopsy is challenging due to extensive tissue variations between examinations, requiring the method to be tolerant to these changes whilst enabling real-time retargeting. This work presents an image retrieval framework for inter-examination retargeting. We propose both a novel image descriptor tolerant of long-term tissue changes and a novel descriptor matching method in real time. The descriptor is based on histograms generated from regional intensity comparisons over multiple scales, offering stability over long-term appearance changes at the higher levels, whilst remaining discriminative at the lower levels. The matching method then learns a hashing function using random forests, to compress the string and allow for fast image comparison by a simple Hamming distance metric. A dataset that contains 13 in vivo gastrointestinal videos was collected from six patients, representing serial examinations of each patient, which includes videos captured with significant time intervals. Precision-recall for retargeting shows that our new descriptor outperforms a number of alternative descriptors, whilst our hashing method outperforms a number of alternative hashing approaches. We have proposed a novel framework for optical biopsy in serial endoscopic examinations. A new descriptor, combined with a novel hashing method, achieves state-of-the-art retargeting, with validation on in vivo videos from six patients. Real-time performance also allows for practical integration without disturbing the existing clinical workflow.
View-Invariant Gait Recognition Through Genetic Template Segmentation
NASA Astrophysics Data System (ADS)
Isaac, Ebenezer R. H. P.; Elias, Susan; Rajagopalan, Srinivasan; Easwarakumar, K. S.
2017-08-01
Template-based model-free approach provides by far the most successful solution to the gait recognition problem in literature. Recent work discusses how isolating the head and leg portion of the template increase the performance of a gait recognition system making it robust against covariates like clothing and carrying conditions. However, most involve a manual definition of the boundaries. The method we propose, the genetic template segmentation (GTS), employs the genetic algorithm to automate the boundary selection process. This method was tested on the GEI, GEnI and AEI templates. GEI seems to exhibit the best result when segmented with our approach. Experimental results depict that our approach significantly outperforms the existing implementations of view-invariant gait recognition.
Dong, Yadong; Sun, Yongqi; Qin, Chao
2018-01-01
The existing protein complex detection methods can be broadly divided into two categories: unsupervised and supervised learning methods. Most of the unsupervised learning methods assume that protein complexes are in dense regions of protein-protein interaction (PPI) networks even though many true complexes are not dense subgraphs. Supervised learning methods utilize the informative properties of known complexes; they often extract features from existing complexes and then use the features to train a classification model. The trained model is used to guide the search process for new complexes. However, insufficient extracted features, noise in the PPI data and the incompleteness of complex data make the classification model imprecise. Consequently, the classification model is not sufficient for guiding the detection of complexes. Therefore, we propose a new robust score function that combines the classification model with local structural information. Based on the score function, we provide a search method that works both forwards and backwards. The results from experiments on six benchmark PPI datasets and three protein complex datasets show that our approach can achieve better performance compared with the state-of-the-art supervised, semi-supervised and unsupervised methods for protein complex detection, occasionally significantly outperforming such methods.
On piecewise interpolation techniques for estimating solar radiation missing values in Kedah
DOE Office of Scientific and Technical Information (OSTI.GOV)
Saaban, Azizan; Zainudin, Lutfi; Bakar, Mohd Nazari Abu
2014-12-04
This paper discusses the use of piecewise interpolation method based on cubic Ball and Bézier curves representation to estimate the missing value of solar radiation in Kedah. An hourly solar radiation dataset is collected at Alor Setar Meteorology Station that is taken from Malaysian Meteorology Deparment. The piecewise cubic Ball and Bézier functions that interpolate the data points are defined on each hourly intervals of solar radiation measurement and is obtained by prescribing first order derivatives at the starts and ends of the intervals. We compare the performance of our proposed method with existing methods using Root Mean Squared Errormore » (RMSE) and Coefficient of Detemination (CoD) which is based on missing values simulation datasets. The results show that our method is outperformed the other previous methods.« less
A New Shape Description Method Using Angular Radial Transform
NASA Astrophysics Data System (ADS)
Lee, Jong-Min; Kim, Whoi-Yul
Shape is one of the primary low-level image features in content-based image retrieval. In this paper we propose a new shape description method that consists of a rotationally invariant angular radial transform descriptor (IARTD). The IARTD is a feature vector that combines the magnitude and aligned phases of the angular radial transform (ART) coefficients. A phase correction scheme is employed to produce the aligned phase so that the IARTD is invariant to rotation. The distance between two IARTDs is defined by combining differences in the magnitudes and aligned phases. In an experiment using the MPEG-7 shape dataset, the proposed method outperforms existing methods; the average BEP of the proposed method is 57.69%, while the average BEPs of the invariant Zernike moments descriptor and the traditional ART are 41.64% and 36.51%, respectively.
Zhao, Huaqing; Rebbeck, Timothy R; Mitra, Nandita
2009-12-01
Confounding due to population stratification (PS) arises when differences in both allele and disease frequencies exist in a population of mixed racial/ethnic subpopulations. Genomic control, structured association, principal components analysis (PCA), and multidimensional scaling (MDS) approaches have been proposed to address this bias using genetic markers. However, confounding due to PS can also be due to non-genetic factors. Propensity scores are widely used to address confounding in observational studies but have not been adapted to deal with PS in genetic association studies. We propose a genomic propensity score (GPS) approach to correct for bias due to PS that considers both genetic and non-genetic factors. We compare the GPS method with PCA and MDS using simulation studies. Our results show that GPS can adequately adjust and consistently correct for bias due to PS. Under no/mild, moderate, and severe PS, GPS yielded estimated with bias close to 0 (mean=-0.0044, standard error=0.0087). Under moderate or severe PS, the GPS method consistently outperforms the PCA method in terms of bias, coverage probability (CP), and type I error. Under moderate PS, the GPS method consistently outperforms the MDS method in terms of CP. PCA maintains relatively high power compared to both MDS and GPS methods under the simulated situations. GPS and MDS are comparable in terms of statistical properties such as bias, type I error, and power. The GPS method provides a novel and robust tool for obtaining less-biased estimates of genetic associations that can consider both genetic and non-genetic factors. 2009 Wiley-Liss, Inc.
Xi, Jianing; Wang, Minghui; Li, Ao
2018-06-05
Discovery of mutated driver genes is one of the primary objective for studying tumorigenesis. To discover some relatively low frequently mutated driver genes from somatic mutation data, many existing methods incorporate interaction network as prior information. However, the prior information of mRNA expression patterns are not exploited by these existing network-based methods, which is also proven to be highly informative of cancer progressions. To incorporate prior information from both interaction network and mRNA expressions, we propose a robust and sparse co-regularized nonnegative matrix factorization to discover driver genes from mutation data. Furthermore, our framework also conducts Frobenius norm regularization to overcome overfitting issue. Sparsity-inducing penalty is employed to obtain sparse scores in gene representations, of which the top scored genes are selected as driver candidates. Evaluation experiments by known benchmarking genes indicate that the performance of our method benefits from the two type of prior information. Our method also outperforms the existing network-based methods, and detect some driver genes that are not predicted by the competing methods. In summary, our proposed method can improve the performance of driver gene discovery by effectively incorporating prior information from interaction network and mRNA expression patterns into a robust and sparse co-regularized matrix factorization framework.
Multi-Label Learning via Random Label Selection for Protein Subcellular Multi-Locations Prediction.
Wang, Xiao; Li, Guo-Zheng
2013-03-12
Prediction of protein subcellular localization is an important but challenging problem, particularly when proteins may simultaneously exist at, or move between, two or more different subcellular location sites. Most of the existing protein subcellular localization methods are only used to deal with the single-location proteins. In the past few years, only a few methods have been proposed to tackle proteins with multiple locations. However, they only adopt a simple strategy, that is, transforming the multi-location proteins to multiple proteins with single location, which doesn't take correlations among different subcellular locations into account. In this paper, a novel method named RALS (multi-label learning via RAndom Label Selection), is proposed to learn from multi-location proteins in an effective and efficient way. Through five-fold cross validation test on a benchmark dataset, we demonstrate our proposed method with consideration of label correlations obviously outperforms the baseline BR method without consideration of label correlations, indicating correlations among different subcellular locations really exist and contribute to improvement of prediction performance. Experimental results on two benchmark datasets also show that our proposed methods achieve significantly higher performance than some other state-of-the-art methods in predicting subcellular multi-locations of proteins. The prediction web server is available at http://levis.tongji.edu.cn:8080/bioinfo/MLPred-Euk/ for the public usage.
Joint Concept Correlation and Feature-Concept Relevance Learning for Multilabel Classification.
Zhao, Xiaowei; Ma, Zhigang; Li, Zhi; Li, Zhihui
2018-02-01
In recent years, multilabel classification has attracted significant attention in multimedia annotation. However, most of the multilabel classification methods focus only on the inherent correlations existing among multiple labels and concepts and ignore the relevance between features and the target concepts. To obtain more robust multilabel classification results, we propose a new multilabel classification method aiming to capture the correlations among multiple concepts by leveraging hypergraph that is proved to be beneficial for relational learning. Moreover, we consider mining feature-concept relevance, which is often overlooked by many multilabel learning algorithms. To better show the feature-concept relevance, we impose a sparsity constraint on the proposed method. We compare the proposed method with several other multilabel classification methods and evaluate the classification performance by mean average precision on several data sets. The experimental results show that the proposed method outperforms the state-of-the-art methods.
Modeling Aromatic Liquids: Toluene, Phenol, and Pyridine.
Baker, Christopher M; Grant, Guy H
2007-03-01
Aromatic groups are now acknowledged to play an important role in many systems of interest. However, existing molecular mechanics methods provide a poor representation of these groups. In a previous paper, we have shown that the molecular mechanics treatment of benzene can be improved by the incorporation of an explicit representation of the aromatic π electrons. Here, we develop this concept further, developing charge-separation models for toluene, phenol, and pyridine. Monte Carlo simulations are used to parametrize the models, via the reproduction of experimental thermodynamic data, and our models are shown to outperform an existing atom-centered model. The models are then used to make predictions about the structures of the liquids at the molecular level and are tested further through their application to the modeling of gas-phase dimers and cation-π interactions.
enDNA-Prot: identification of DNA-binding proteins by applying ensemble learning.
Xu, Ruifeng; Zhou, Jiyun; Liu, Bin; Yao, Lin; He, Yulan; Zou, Quan; Wang, Xiaolong
2014-01-01
DNA-binding proteins are crucial for various cellular processes, such as recognition of specific nucleotide, regulation of transcription, and regulation of gene expression. Developing an effective model for identifying DNA-binding proteins is an urgent research problem. Up to now, many methods have been proposed, but most of them focus on only one classifier and cannot make full use of the large number of negative samples to improve predicting performance. This study proposed a predictor called enDNA-Prot for DNA-binding protein identification by employing the ensemble learning technique. Experiential results showed that enDNA-Prot was comparable with DNA-Prot and outperformed DNAbinder and iDNA-Prot with performance improvement in the range of 3.97-9.52% in ACC and 0.08-0.19 in MCC. Furthermore, when the benchmark dataset was expanded with negative samples, the performance of enDNA-Prot outperformed the three existing methods by 2.83-16.63% in terms of ACC and 0.02-0.16 in terms of MCC. It indicated that enDNA-Prot is an effective method for DNA-binding protein identification and expanding training dataset with negative samples can improve its performance. For the convenience of the vast majority of experimental scientists, we developed a user-friendly web-server for enDNA-Prot which is freely accessible to the public.
Generative model selection using a scalable and size-independent complex network classifier
NASA Astrophysics Data System (ADS)
Motallebi, Sadegh; Aliakbary, Sadegh; Habibi, Jafar
2013-12-01
Real networks exhibit nontrivial topological features, such as heavy-tailed degree distribution, high clustering, and small-worldness. Researchers have developed several generative models for synthesizing artificial networks that are structurally similar to real networks. An important research problem is to identify the generative model that best fits to a target network. In this paper, we investigate this problem and our goal is to select the model that is able to generate graphs similar to a given network instance. By the means of generating synthetic networks with seven outstanding generative models, we have utilized machine learning methods to develop a decision tree for model selection. Our proposed method, which is named "Generative Model Selection for Complex Networks," outperforms existing methods with respect to accuracy, scalability, and size-independence.
Prediction of protein-protein interaction network using a multi-objective optimization approach.
Chowdhury, Archana; Rakshit, Pratyusha; Konar, Amit
2016-06-01
Protein-Protein Interactions (PPIs) are very important as they coordinate almost all cellular processes. This paper attempts to formulate PPI prediction problem in a multi-objective optimization framework. The scoring functions for the trial solution deal with simultaneous maximization of functional similarity, strength of the domain interaction profiles, and the number of common neighbors of the proteins predicted to be interacting. The above optimization problem is solved using the proposed Firefly Algorithm with Nondominated Sorting. Experiments undertaken reveal that the proposed PPI prediction technique outperforms existing methods, including gene ontology-based Relative Specific Similarity, multi-domain-based Domain Cohesion Coupling method, domain-based Random Decision Forest method, Bagging with REP Tree, and evolutionary/swarm algorithm-based approaches, with respect to sensitivity, specificity, and F1 score.
Dimitrakopoulos, Christos; Theofilatos, Konstantinos; Pegkas, Andreas; Likothanassis, Spiros; Mavroudi, Seferina
2016-07-01
Proteins are vital biological molecules driving many fundamental cellular processes. They rarely act alone, but form interacting groups called protein complexes. The study of protein complexes is a key goal in systems biology. Recently, large protein-protein interaction (PPI) datasets have been published and a plethora of computational methods that provide new ideas for the prediction of protein complexes have been implemented. However, most of the methods suffer from two major limitations: First, they do not account for proteins participating in multiple functions and second, they are unable to handle weighted PPI graphs. Moreover, the problem remains open as existing algorithms and tools are insufficient in terms of predictive metrics. In the present paper, we propose gradually expanding neighborhoods with adjustment (GENA), a new algorithm that gradually expands neighborhoods in a graph starting from highly informative "seed" nodes. GENA considers proteins as multifunctional molecules allowing them to participate in more than one protein complex. In addition, GENA accepts weighted PPI graphs by using a weighted evaluation function for each cluster. In experiments with datasets from Saccharomyces cerevisiae and human, GENA outperformed Markov clustering, restricted neighborhood search and clustering with overlapping neighborhood expansion, three state-of-the-art methods for computationally predicting protein complexes. Seven PPI networks and seven evaluation datasets were used in total. GENA outperformed existing methods in 16 out of 18 experiments achieving an average improvement of 5.5% when the maximum matching ratio metric was used. Our method was able to discover functionally homogeneous protein clusters and uncover important network modules in a Parkinson expression dataset. When used on the human networks, around 47% of the detected clusters were enriched in gene ontology (GO) terms with depth higher than five in the GO hierarchy. In the present manuscript, we introduce a new method for the computational prediction of protein complexes by making the realistic assumption that proteins participate in multiple protein complexes and cellular functions. Our method can detect accurate and functionally homogeneous clusters. Copyright © 2016 Elsevier B.V. All rights reserved.
Literature-based condition-specific miRNA-mRNA target prediction.
Oh, Minsik; Rhee, Sungmin; Moon, Ji Hwan; Chae, Heejoon; Lee, Sunwon; Kang, Jaewoo; Kim, Sun
2017-01-01
miRNAs are small non-coding RNAs that regulate gene expression by binding to the 3'-UTR of genes. Many recent studies have reported that miRNAs play important biological roles by regulating specific mRNAs or genes. Many sequence-based target prediction algorithms have been developed to predict miRNA targets. However, these methods are not designed for condition-specific target predictions and produce many false positives; thus, expression-based target prediction algorithms have been developed for condition-specific target predictions. A typical strategy to utilize expression data is to leverage the negative control roles of miRNAs on genes. To control false positives, a stringent cutoff value is typically set, but in this case, these methods tend to reject many true target relationships, i.e., false negatives. To overcome these limitations, additional information should be utilized. The literature is probably the best resource that we can utilize. Recent literature mining systems compile millions of articles with experiments designed for specific biological questions, and the systems provide a function to search for specific information. To utilize the literature information, we used a literature mining system, BEST, that automatically extracts information from the literature in PubMed and that allows the user to perform searches of the literature with any English words. By integrating omics data analysis methods and BEST, we developed Context-MMIA, a miRNA-mRNA target prediction method that combines expression data analysis results and the literature information extracted based on the user-specified context. In the pathway enrichment analysis using genes included in the top 200 miRNA-targets, Context-MMIA outperformed the four existing target prediction methods that we tested. In another test on whether prediction methods can re-produce experimentally validated target relationships, Context-MMIA outperformed the four existing target prediction methods. In summary, Context-MMIA allows the user to specify a context of the experimental data to predict miRNA targets, and we believe that Context-MMIA is very useful for predicting condition-specific miRNA targets.
Using constrained information entropy to detect rare adverse drug reactions from medical forums.
Yi Zheng; Chaowang Lan; Hui Peng; Jinyan Li
2016-08-01
Adverse drug reactions (ADRs) detection is critical to avoid malpractices yet challenging due to its uncertainty in pre-marketing review and the underreporting in post-marketing surveillance. To conquer this predicament, social media based ADRs detection methods have been proposed recently. However, existing researches are mostly co-occurrence based methods and face several issues, in particularly, leaving out the rare ADRs and unable to distinguish irrelevant ADRs. In this work, we introduce a constrained information entropy (CIE) method to solve these problems. CIE first recognizes the drug-related adverse reactions using a predefined keyword dictionary and then captures high- and low-frequency (rare) ADRs by information entropy. Extensive experiments on medical forums dataset demonstrate that CIE outperforms the state-of-the-art co-occurrence based methods, especially in rare ADRs detection.
Fast Image Restoration for Spatially Varying Defocus Blur of Imaging Sensor
Cheong, Hejin; Chae, Eunjung; Lee, Eunsung; Jo, Gwanghyun; Paik, Joonki
2015-01-01
This paper presents a fast adaptive image restoration method for removing spatially varying out-of-focus blur of a general imaging sensor. After estimating the parameters of space-variant point-spread-function (PSF) using the derivative in each uniformly blurred region, the proposed method performs spatially adaptive image restoration by selecting the optimal restoration filter according to the estimated blur parameters. Each restoration filter is implemented in the form of a combination of multiple FIR filters, which guarantees the fast image restoration without the need of iterative or recursive processing. Experimental results show that the proposed method outperforms existing space-invariant restoration methods in the sense of both objective and subjective performance measures. The proposed algorithm can be employed to a wide area of image restoration applications, such as mobile imaging devices, robot vision, and satellite image processing. PMID:25569760
Body-Earth Mover's Distance: A Matching-Based Approach for Sleep Posture Recognition.
Xu, Xiaowei; Lin, Feng; Wang, Aosen; Hu, Yu; Huang, Ming-Chun; Xu, Wenyao
2016-10-01
Sleep posture is a key component in sleep quality assessment and pressure ulcer prevention. Currently, body pressure analysis has been a popular method for sleep posture recognition. In this paper, a matching-based approach, Body-Earth Mover's Distance (BEMD), for sleep posture recognition is proposed. BEMD treats pressure images as weighted 2D shapes, and combines EMD and Euclidean distance for similarity measure. Compared with existing work, sleep posture recognition is achieved with posture similarity rather than multiple features for specific postures. A pilot study is performed with 14 persons for six different postures. The experimental results show that the proposed BEMD can achieve 91.21% accuracy, which outperforms the previous method with an improvement of 8.01%.
Song, Dandan; Li, Ning; Liao, Lejian
2015-01-01
Due to the generation of enormous amounts of data at both lower costs as well as in shorter times, whole-exome sequencing technologies provide dramatic opportunities for identifying disease genes implicated in Mendelian disorders. Since upwards of thousands genomic variants can be sequenced in each exome, it is challenging to filter pathogenic variants in protein coding regions and reduce the number of missing true variants. Therefore, an automatic and efficient pipeline for finding disease variants in Mendelian disorders is designed by exploiting a combination of variants filtering steps to analyze the family-based exome sequencing approach. Recent studies on the Freeman-Sheldon disease are revisited and show that the proposed method outperforms other existing candidate gene identification methods.
A salient region detection model combining background distribution measure for indoor robots.
Li, Na; Xu, Hui; Wang, Zhenhua; Sun, Lining; Chen, Guodong
2017-01-01
Vision system plays an important role in the field of indoor robot. Saliency detection methods, capturing regions that are perceived as important, are used to improve the performance of visual perception system. Most of state-of-the-art methods for saliency detection, performing outstandingly in natural images, cannot work in complicated indoor environment. Therefore, we propose a new method comprised of graph-based RGB-D segmentation, primary saliency measure, background distribution measure, and combination. Besides, region roundness is proposed to describe the compactness of a region to measure background distribution more robustly. To validate the proposed approach, eleven influential methods are compared on the DSD and ECSSD dataset. Moreover, we build a mobile robot platform for application in an actual environment, and design three different kinds of experimental constructions that are different viewpoints, illumination variations and partial occlusions. Experimental results demonstrate that our model outperforms existing methods and is useful for indoor mobile robots.
Exploiting salient semantic analysis for information retrieval
NASA Astrophysics Data System (ADS)
Luo, Jing; Meng, Bo; Quan, Changqin; Tu, Xinhui
2016-11-01
Recently, many Wikipedia-based methods have been proposed to improve the performance of different natural language processing (NLP) tasks, such as semantic relatedness computation, text classification and information retrieval. Among these methods, salient semantic analysis (SSA) has been proven to be an effective way to generate conceptual representation for words or documents. However, its feasibility and effectiveness in information retrieval is mostly unknown. In this paper, we study how to efficiently use SSA to improve the information retrieval performance, and propose a SSA-based retrieval method under the language model framework. First, SSA model is adopted to build conceptual representations for documents and queries. Then, these conceptual representations and the bag-of-words (BOW) representations can be used in combination to estimate the language models of queries and documents. The proposed method is evaluated on several standard text retrieval conference (TREC) collections. Experiment results on standard TREC collections show the proposed models consistently outperform the existing Wikipedia-based retrieval methods.
Drift-Free Position Estimation of Periodic or Quasi-Periodic Motion Using Inertial Sensors
Latt, Win Tun; Veluvolu, Kalyana Chakravarthy; Ang, Wei Tech
2011-01-01
Position sensing with inertial sensors such as accelerometers and gyroscopes usually requires other aided sensors or prior knowledge of motion characteristics to remove position drift resulting from integration of acceleration or velocity so as to obtain accurate position estimation. A method based on analytical integration has previously been developed to obtain accurate position estimate of periodic or quasi-periodic motion from inertial sensors using prior knowledge of the motion but without using aided sensors. In this paper, a new method is proposed which employs linear filtering stage coupled with adaptive filtering stage to remove drift and attenuation. The prior knowledge of the motion the proposed method requires is only approximate band of frequencies of the motion. Existing adaptive filtering methods based on Fourier series such as weighted-frequency Fourier linear combiner (WFLC), and band-limited multiple Fourier linear combiner (BMFLC) are modified to combine with the proposed method. To validate and compare the performance of the proposed method with the method based on analytical integration, simulation study is performed using periodic signals as well as real physiological tremor data, and real-time experiments are conducted using an ADXL-203 accelerometer. Results demonstrate that the performance of the proposed method outperforms the existing analytical integration method. PMID:22163935
Sketch Matching on Topology Product Graph.
Liang, Shuang; Luo, Jun; Liu, Wenyin; Wei, Yichen
2015-08-01
Sketch matching is the fundamental problem in sketch based interfaces. After years of study, it remains challenging when there exists large irregularity and variations in the hand drawn sketch shapes. While most existing works exploit topology relations and graph representations for this problem, they are usually limited by the coarse topology exploration and heuristic (thus suboptimal) similarity metrics between graphs. We present a new sketch matching method with two novel contributions. We introduce a comprehensive definition of topology relations, which results in a rich and informative graph representation of sketches. For graph matching, we propose topology product graph that retains the full correspondence for matching two graphs. Based on it, we derive an intuitive sketch similarity metric whose exact solution is easy to compute. In addition, the graph representation and new metric naturally support partial matching, an important practical problem that received less attention in the literature. Extensive experimental results on a real challenging dataset and the superior performance of our method show that it outperforms the state-of-the-art.
RootGraph: a graphic optimization tool for automated image analysis of plant roots
Cai, Jinhai; Zeng, Zhanghui; Connor, Jason N.; Huang, Chun Yuan; Melino, Vanessa; Kumar, Pankaj; Miklavcic, Stanley J.
2015-01-01
This paper outlines a numerical scheme for accurate, detailed, and high-throughput image analysis of plant roots. In contrast to existing root image analysis tools that focus on root system-average traits, a novel, fully automated and robust approach for the detailed characterization of root traits, based on a graph optimization process is presented. The scheme, firstly, distinguishes primary roots from lateral roots and, secondly, quantifies a broad spectrum of root traits for each identified primary and lateral root. Thirdly, it associates lateral roots and their properties with the specific primary root from which the laterals emerge. The performance of this approach was evaluated through comparisons with other automated and semi-automated software solutions as well as against results based on manual measurements. The comparisons and subsequent application of the algorithm to an array of experimental data demonstrate that this method outperforms existing methods in terms of accuracy, robustness, and the ability to process root images under high-throughput conditions. PMID:26224880
Multiuser receiver for DS-CDMA signals in multipath channels: an enhanced multisurface method.
Mahendra, Chetan; Puthusserypady, Sadasivan
2006-11-01
This paper deals with the problem of multiuser detection in direct-sequence code-division multiple-access (DS-CDMA) systems in multipath environments. The existing multiuser detectors can be divided into two categories: (1) low-complexity poor-performance linear detectors and (2) high-complexity good-performance nonlinear detectors. In particular, in channels where the orthogonality of the code sequences is destroyed by multipath, detectors with linear complexity perform much worse than the nonlinear detectors. In this paper, we propose an enhanced multisurface method (EMSM) for multiuser detection in multipath channels. EMSM is an intermediate piecewise linear detection scheme with a run-time complexity linear in the number of users. Its bit error rate performance is compared with existing linear detectors, a nonlinear radial basis function detector trained by the new support vector learning algorithm, and Verdu's optimal detector. Simulations in multipath channels, for both synchronous and asynchronous cases, indicate that it always outperforms all other linear detectors, performing nearly as well as nonlinear detectors.
Szatkiewicz, Jin P; Wang, WeiBo; Sullivan, Patrick F; Wang, Wei; Sun, Wei
2013-02-01
Structural variation is an important class of genetic variation in mammals. High-throughput sequencing (HTS) technologies promise to revolutionize copy-number variation (CNV) detection but present substantial analytic challenges. Converging evidence suggests that multiple types of CNV-informative data (e.g. read-depth, read-pair, split-read) need be considered, and that sophisticated methods are needed for more accurate CNV detection. We observed that various sources of experimental biases in HTS confound read-depth estimation, and note that bias correction has not been adequately addressed by existing methods. We present a novel read-depth-based method, GENSENG, which uses a hidden Markov model and negative binomial regression framework to identify regions of discrete copy-number changes while simultaneously accounting for the effects of multiple confounders. Based on extensive calibration using multiple HTS data sets, we conclude that our method outperforms existing read-depth-based CNV detection algorithms. The concept of simultaneous bias correction and CNV detection can serve as a basis for combining read-depth with other types of information such as read-pair or split-read in a single analysis. A user-friendly and computationally efficient implementation of our method is freely available.
Adaptive Distributed Video Coding with Correlation Estimation using Expectation Propagation
Cui, Lijuan; Wang, Shuang; Jiang, Xiaoqian; Cheng, Samuel
2013-01-01
Distributed video coding (DVC) is rapidly increasing in popularity by the way of shifting the complexity from encoder to decoder, whereas no compression performance degrades, at least in theory. In contrast with conventional video codecs, the inter-frame correlation in DVC is explored at decoder based on the received syndromes of Wyner-Ziv (WZ) frame and side information (SI) frame generated from other frames available only at decoder. However, the ultimate decoding performances of DVC are based on the assumption that the perfect knowledge of correlation statistic between WZ and SI frames should be available at decoder. Therefore, the ability of obtaining a good statistical correlation estimate is becoming increasingly important in practical DVC implementations. Generally, the existing correlation estimation methods in DVC can be classified into two main types: pre-estimation where estimation starts before decoding and on-the-fly (OTF) estimation where estimation can be refined iteratively during decoding. As potential changes between frames might be unpredictable or dynamical, OTF estimation methods usually outperforms pre-estimation techniques with the cost of increased decoding complexity (e.g., sampling methods). In this paper, we propose a low complexity adaptive DVC scheme using expectation propagation (EP), where correlation estimation is performed OTF as it is carried out jointly with decoding of the factor graph-based DVC code. Among different approximate inference methods, EP generally offers better tradeoff between accuracy and complexity. Experimental results show that our proposed scheme outperforms the benchmark state-of-the-art DISCOVER codec and other cases without correlation tracking, and achieves comparable decoding performance but with significantly low complexity comparing with sampling method. PMID:23750314
Adaptive distributed video coding with correlation estimation using expectation propagation
NASA Astrophysics Data System (ADS)
Cui, Lijuan; Wang, Shuang; Jiang, Xiaoqian; Cheng, Samuel
2012-10-01
Distributed video coding (DVC) is rapidly increasing in popularity by the way of shifting the complexity from encoder to decoder, whereas no compression performance degrades, at least in theory. In contrast with conventional video codecs, the inter-frame correlation in DVC is explored at decoder based on the received syndromes of Wyner-Ziv (WZ) frame and side information (SI) frame generated from other frames available only at decoder. However, the ultimate decoding performances of DVC are based on the assumption that the perfect knowledge of correlation statistic between WZ and SI frames should be available at decoder. Therefore, the ability of obtaining a good statistical correlation estimate is becoming increasingly important in practical DVC implementations. Generally, the existing correlation estimation methods in DVC can be classified into two main types: pre-estimation where estimation starts before decoding and on-the-fly (OTF) estimation where estimation can be refined iteratively during decoding. As potential changes between frames might be unpredictable or dynamical, OTF estimation methods usually outperforms pre-estimation techniques with the cost of increased decoding complexity (e.g., sampling methods). In this paper, we propose a low complexity adaptive DVC scheme using expectation propagation (EP), where correlation estimation is performed OTF as it is carried out jointly with decoding of the factor graph-based DVC code. Among different approximate inference methods, EP generally offers better tradeoff between accuracy and complexity. Experimental results show that our proposed scheme outperforms the benchmark state-of-the-art DISCOVER codec and other cases without correlation tracking, and achieves comparable decoding performance but with significantly low complexity comparing with sampling method.
Adaptive Distributed Video Coding with Correlation Estimation using Expectation Propagation.
Cui, Lijuan; Wang, Shuang; Jiang, Xiaoqian; Cheng, Samuel
2012-10-15
Distributed video coding (DVC) is rapidly increasing in popularity by the way of shifting the complexity from encoder to decoder, whereas no compression performance degrades, at least in theory. In contrast with conventional video codecs, the inter-frame correlation in DVC is explored at decoder based on the received syndromes of Wyner-Ziv (WZ) frame and side information (SI) frame generated from other frames available only at decoder. However, the ultimate decoding performances of DVC are based on the assumption that the perfect knowledge of correlation statistic between WZ and SI frames should be available at decoder. Therefore, the ability of obtaining a good statistical correlation estimate is becoming increasingly important in practical DVC implementations. Generally, the existing correlation estimation methods in DVC can be classified into two main types: pre-estimation where estimation starts before decoding and on-the-fly (OTF) estimation where estimation can be refined iteratively during decoding. As potential changes between frames might be unpredictable or dynamical, OTF estimation methods usually outperforms pre-estimation techniques with the cost of increased decoding complexity (e.g., sampling methods). In this paper, we propose a low complexity adaptive DVC scheme using expectation propagation (EP), where correlation estimation is performed OTF as it is carried out jointly with decoding of the factor graph-based DVC code. Among different approximate inference methods, EP generally offers better tradeoff between accuracy and complexity. Experimental results show that our proposed scheme outperforms the benchmark state-of-the-art DISCOVER codec and other cases without correlation tracking, and achieves comparable decoding performance but with significantly low complexity comparing with sampling method.
A study of active learning methods for named entity recognition in clinical text.
Chen, Yukun; Lasko, Thomas A; Mei, Qiaozhu; Denny, Joshua C; Xu, Hua
2015-12-01
Named entity recognition (NER), a sequential labeling task, is one of the fundamental tasks for building clinical natural language processing (NLP) systems. Machine learning (ML) based approaches can achieve good performance, but they often require large amounts of annotated samples, which are expensive to build due to the requirement of domain experts in annotation. Active learning (AL), a sample selection approach integrated with supervised ML, aims to minimize the annotation cost while maximizing the performance of ML-based models. In this study, our goal was to develop and evaluate both existing and new AL methods for a clinical NER task to identify concepts of medical problems, treatments, and lab tests from the clinical notes. Using the annotated NER corpus from the 2010 i2b2/VA NLP challenge that contained 349 clinical documents with 20,423 unique sentences, we simulated AL experiments using a number of existing and novel algorithms in three different categories including uncertainty-based, diversity-based, and baseline sampling strategies. They were compared with the passive learning that uses random sampling. Learning curves that plot performance of the NER model against the estimated annotation cost (based on number of sentences or words in the training set) were generated to evaluate different active learning and the passive learning methods and the area under the learning curve (ALC) score was computed. Based on the learning curves of F-measure vs. number of sentences, uncertainty sampling algorithms outperformed all other methods in ALC. Most diversity-based methods also performed better than random sampling in ALC. To achieve an F-measure of 0.80, the best method based on uncertainty sampling could save 66% annotations in sentences, as compared to random sampling. For the learning curves of F-measure vs. number of words, uncertainty sampling methods again outperformed all other methods in ALC. To achieve 0.80 in F-measure, in comparison to random sampling, the best uncertainty based method saved 42% annotations in words. But the best diversity based method reduced only 7% annotation effort. In the simulated setting, AL methods, particularly uncertainty-sampling based approaches, seemed to significantly save annotation cost for the clinical NER task. The actual benefit of active learning in clinical NER should be further evaluated in a real-time setting. Copyright © 2015 Elsevier Inc. All rights reserved.
Overlapping communities from dense disjoint and high total degree clusters
NASA Astrophysics Data System (ADS)
Zhang, Hongli; Gao, Yang; Zhang, Yue
2018-04-01
Community plays an important role in the field of sociology, biology and especially in domains of computer science, where systems are often represented as networks. And community detection is of great importance in the domains. A community is a dense subgraph of the whole graph with more links between its members than between its members to the outside nodes, and nodes in the same community probably share common properties or play similar roles in the graph. Communities overlap when nodes in a graph belong to multiple communities. A vast variety of overlapping community detection methods have been proposed in the literature, and the local expansion method is one of the most successful techniques dealing with large networks. The paper presents a density-based seeding method, in which dense disjoint local clusters are searched and selected as seeds. The proposed method selects a seed by the total degree and density of local clusters utilizing merely local structures of the network. Furthermore, this paper proposes a novel community refining phase via minimizing the conductance of each community, through which the quality of identified communities is largely improved in linear time. Experimental results in synthetic networks show that the proposed seeding method outperforms other seeding methods in the state of the art and the proposed refining method largely enhances the quality of the identified communities. Experimental results in real graphs with ground-truth communities show that the proposed approach outperforms other state of the art overlapping community detection algorithms, in particular, it is more than two orders of magnitude faster than the existing global algorithms with higher quality, and it obtains much more accurate community structure than the current local algorithms without any priori information.
Virtual fringe projection system with nonparallel illumination based on iteration
NASA Astrophysics Data System (ADS)
Zhou, Duo; Wang, Zhangying; Gao, Nan; Zhang, Zonghua; Jiang, Xiangqian
2017-06-01
Fringe projection profilometry has been widely applied in many fields. To set up an ideal measuring system, a virtual fringe projection technique has been studied to assist in the design of hardware configurations. However, existing virtual fringe projection systems use parallel illumination and have a fixed optical framework. This paper presents a virtual fringe projection system with nonparallel illumination. Using an iterative method to calculate intersection points between rays and reference planes or object surfaces, the proposed system can simulate projected fringe patterns and captured images. A new explicit calibration method has been presented to validate the precision of the system. Simulated results indicate that the proposed iterative method outperforms previous systems. Our virtual system can be applied to error analysis, algorithm optimization, and help operators to find ideal system parameter settings for actual measurements.
A dictionary learning approach for Poisson image deblurring.
Ma, Liyan; Moisan, Lionel; Yu, Jian; Zeng, Tieyong
2013-07-01
The restoration of images corrupted by blur and Poisson noise is a key issue in medical and biological image processing. While most existing methods are based on variational models, generally derived from a maximum a posteriori (MAP) formulation, recently sparse representations of images have shown to be efficient approaches for image recovery. Following this idea, we propose in this paper a model containing three terms: a patch-based sparse representation prior over a learned dictionary, the pixel-based total variation regularization term and a data-fidelity term capturing the statistics of Poisson noise. The resulting optimization problem can be solved by an alternating minimization technique combined with variable splitting. Extensive experimental results suggest that in terms of visual quality, peak signal-to-noise ratio value and the method noise, the proposed algorithm outperforms state-of-the-art methods.
Retinal artery-vein classification via topology estimation
Estrada, Rolando; Allingham, Michael J.; Mettu, Priyatham S.; Cousins, Scott W.; Tomasi, Carlo; Farsiu, Sina
2015-01-01
We propose a novel, graph-theoretic framework for distinguishing arteries from veins in a fundus image. We make use of the underlying vessel topology to better classify small and midsized vessels. We extend our previously proposed tree topology estimation framework by incorporating expert, domain-specific features to construct a simple, yet powerful global likelihood model. We efficiently maximize this model by iteratively exploring the space of possible solutions consistent with the projected vessels. We tested our method on four retinal datasets and achieved classification accuracies of 91.0%, 93.5%, 91.7%, and 90.9%, outperforming existing methods. Our results show the effectiveness of our approach, which is capable of analyzing the entire vasculature, including peripheral vessels, in wide field-of-view fundus photographs. This topology-based method is a potentially important tool for diagnosing diseases with retinal vascular manifestation. PMID:26068204
Mining the preferences of patients for ubiquitous clinic recommendation.
Chen, Tin-Chih Toly; Chiu, Min-Chi
2018-03-06
A challenge facing all ubiquitous clinic recommendation systems is that patients often have difficulty articulating their requirements. To overcome this problem, a ubiquitous clinic recommendation mechanism was designed in this study by mining the clinic preferences of patients. Their preferences were defined using the weights in the ubiquitous clinic recommendation mechanism. An integer nonlinear programming problem was solved to tune the values of the weights on a rolling basis. In addition, since it may take a long time to adjust the values of weights to their asymptotic values, the back propagation network (BPN)-response surface method (RSM) method is applied to estimate the asymptotic values of weights. The proposed methodology was tested in a regional study. Experimental results indicated that the ubiquitous clinic recommendation system outperformed several existing methods in improving the successful recommendation rate.
Generative model selection using a scalable and size-independent complex network classifier
DOE Office of Scientific and Technical Information (OSTI.GOV)
Motallebi, Sadegh, E-mail: motallebi@ce.sharif.edu; Aliakbary, Sadegh, E-mail: aliakbary@ce.sharif.edu; Habibi, Jafar, E-mail: jhabibi@sharif.edu
2013-12-15
Real networks exhibit nontrivial topological features, such as heavy-tailed degree distribution, high clustering, and small-worldness. Researchers have developed several generative models for synthesizing artificial networks that are structurally similar to real networks. An important research problem is to identify the generative model that best fits to a target network. In this paper, we investigate this problem and our goal is to select the model that is able to generate graphs similar to a given network instance. By the means of generating synthetic networks with seven outstanding generative models, we have utilized machine learning methods to develop a decision tree formore » model selection. Our proposed method, which is named “Generative Model Selection for Complex Networks,” outperforms existing methods with respect to accuracy, scalability, and size-independence.« less
Determining the semantic similarities among Gene Ontology terms.
Taha, Kamal
2013-05-01
We present in this paper novel techniques that determine the semantic relationships among GeneOntology (GO) terms. We implemented these techniques in a prototype system called GoSE, which resides between user application and GO database. Given a set S of GO terms, GoSE would return another set S' of GO terms, where each term in S' is semantically related to each term in S. Most current research is focused on determining the semantic similarities among GO ontology terms based solely on their IDs and proximity to one another in the GO graph structure, while overlooking the contexts of the terms, which may lead to erroneous results. The context of a GO term T is the set of other terms, whose existence in the GO graph structure is dependent on T. We propose novel techniques that determine the contexts of terms based on the concept of existence dependency. We present a stack-based sort-merge algorithm employing these techniques for determining the semantic similarities among GO terms.We evaluated GoSE experimentally and compared it with three existing methods. The results of measuring the semantic similarities among genes in KEGG and Pfam pathways retrieved from the DBGET and Sanger Pfam databases, respectively, have shown that our method outperforms the other three methods in recall and precision.
FGWAS: Functional genome wide association analysis.
Huang, Chao; Thompson, Paul; Wang, Yalin; Yu, Yang; Zhang, Jingwen; Kong, Dehan; Colen, Rivka R; Knickmeyer, Rebecca C; Zhu, Hongtu
2017-10-01
Functional phenotypes (e.g., subcortical surface representation), which commonly arise in imaging genetic studies, have been used to detect putative genes for complexly inherited neuropsychiatric and neurodegenerative disorders. However, existing statistical methods largely ignore the functional features (e.g., functional smoothness and correlation). The aim of this paper is to develop a functional genome-wide association analysis (FGWAS) framework to efficiently carry out whole-genome analyses of functional phenotypes. FGWAS consists of three components: a multivariate varying coefficient model, a global sure independence screening procedure, and a test procedure. Compared with the standard multivariate regression model, the multivariate varying coefficient model explicitly models the functional features of functional phenotypes through the integration of smooth coefficient functions and functional principal component analysis. Statistically, compared with existing methods for genome-wide association studies (GWAS), FGWAS can substantially boost the detection power for discovering important genetic variants influencing brain structure and function. Simulation studies show that FGWAS outperforms existing GWAS methods for searching sparse signals in an extremely large search space, while controlling for the family-wise error rate. We have successfully applied FGWAS to large-scale analysis of data from the Alzheimer's Disease Neuroimaging Initiative for 708 subjects, 30,000 vertices on the left and right hippocampal surfaces, and 501,584 SNPs. Copyright © 2017 Elsevier Inc. All rights reserved.
Tweaked residual convolutional network for face alignment
NASA Astrophysics Data System (ADS)
Du, Wenchao; Li, Ke; Zhao, Qijun; Zhang, Yi; Chen, Hu
2017-08-01
We propose a novel Tweaked Residual Convolutional Network approach for face alignment with two-level convolutional networks architecture. Specifically, the first-level Tweaked Convolutional Network (TCN) module predicts the landmark quickly but accurately enough as a preliminary, by taking low-resolution version of the detected face holistically as the input. The following Residual Convolutional Networks (RCN) module progressively refines the landmark by taking as input the local patch extracted around the predicted landmark, particularly, which allows the Convolutional Neural Network (CNN) to extract local shape-indexed features to fine tune landmark position. Extensive evaluations show that the proposed Tweaked Residual Convolutional Network approach outperforms existing methods.
Salient regions detection using convolutional neural networks and color volume
NASA Astrophysics Data System (ADS)
Liu, Guang-Hai; Hou, Yingkun
2018-03-01
Convolutional neural network is an important technique in machine learning, pattern recognition and image processing. In order to reduce the computational burden and extend the classical LeNet-5 model to the field of saliency detection, we propose a simple and novel computing model based on LeNet-5 network. In the proposed model, hue, saturation and intensity are utilized to extract depth cues, and then we integrate depth cues and color volume to saliency detection following the basic structure of the feature integration theory. Experimental results show that the proposed computing model outperforms some existing state-of-the-art methods on MSRA1000 and ECSSD datasets.
Pooling across cells to normalize single-cell RNA sequencing data with many zero counts.
Lun, Aaron T L; Bach, Karsten; Marioni, John C
2016-04-27
Normalization of single-cell RNA sequencing data is necessary to eliminate cell-specific biases prior to downstream analyses. However, this is not straightforward for noisy single-cell data where many counts are zero. We present a novel approach where expression values are summed across pools of cells, and the summed values are used for normalization. Pool-based size factors are then deconvolved to yield cell-based factors. Our deconvolution approach outperforms existing methods for accurate normalization of cell-specific biases in simulated data. Similar behavior is observed in real data, where deconvolution improves the relevance of results of downstream analyses.
Robust Transceiver Design for Multiuser MIMO Downlink with Channel Uncertainties
NASA Astrophysics Data System (ADS)
Miao, Wei; Li, Yunzhou; Chen, Xiang; Zhou, Shidong; Wang, Jing
This letter addresses the problem of robust transceiver design for the multiuser multiple-input-multiple-output (MIMO) downlink where the channel state information at the base station (BS) is imperfect. A stochastic approach which minimizes the expectation of the total mean square error (MSE) of the downlink conditioned on the channel estimates under a total transmit power constraint is adopted. The iterative algorithm reported in [2] is improved to handle the proposed robust optimization problem. Simulation results show that our proposed robust scheme effectively reduces the performance loss due to channel uncertainties and outperforms existing methods, especially when the channel errors of the users are different.
Information Filtering via Heterogeneous Diffusion in Online Bipartite Networks
Zhang, Fu-Guo; Zeng, An
2015-01-01
The rapid expansion of Internet brings us overwhelming online information, which is impossible for an individual to go through all of it. Therefore, recommender systems were created to help people dig through this abundance of information. In networks composed by users and objects, recommender algorithms based on diffusion have been proven to be one of the best performing methods. Previous works considered the diffusion process from user to object, and from object to user to be equivalent. We show in this work that it is not the case and we improve the quality of the recommendation by taking into account the asymmetrical nature of this process. We apply this idea to modify the state-of-the-art recommendation methods. The simulation results show that the new methods can outperform these existing methods in both recommendation accuracy and diversity. Finally, this modification is checked to be able to improve the recommendation in a realistic case. PMID:26125631
Information Filtering via Heterogeneous Diffusion in Online Bipartite Networks.
Zhang, Fu-Guo; Zeng, An
2015-01-01
The rapid expansion of Internet brings us overwhelming online information, which is impossible for an individual to go through all of it. Therefore, recommender systems were created to help people dig through this abundance of information. In networks composed by users and objects, recommender algorithms based on diffusion have been proven to be one of the best performing methods. Previous works considered the diffusion process from user to object, and from object to user to be equivalent. We show in this work that it is not the case and we improve the quality of the recommendation by taking into account the asymmetrical nature of this process. We apply this idea to modify the state-of-the-art recommendation methods. The simulation results show that the new methods can outperform these existing methods in both recommendation accuracy and diversity. Finally, this modification is checked to be able to improve the recommendation in a realistic case.
Huang, Yi-Fei; Gulko, Brad; Siepel, Adam
2017-04-01
Many genetic variants that influence phenotypes of interest are located outside of protein-coding genes, yet existing methods for identifying such variants have poor predictive power. Here we introduce a new computational method, called LINSIGHT, that substantially improves the prediction of noncoding nucleotide sites at which mutations are likely to have deleterious fitness consequences, and which, therefore, are likely to be phenotypically important. LINSIGHT combines a generalized linear model for functional genomic data with a probabilistic model of molecular evolution. The method is fast and highly scalable, enabling it to exploit the 'big data' available in modern genomics. We show that LINSIGHT outperforms the best available methods in identifying human noncoding variants associated with inherited diseases. In addition, we apply LINSIGHT to an atlas of human enhancers and show that the fitness consequences at enhancers depend on cell type, tissue specificity, and constraints at associated promoters.
COUSCOus: improved protein contact prediction using an empirical Bayes covariance estimator.
Rawi, Reda; Mall, Raghvendra; Kunji, Khalid; El Anbari, Mohammed; Aupetit, Michael; Ullah, Ehsan; Bensmail, Halima
2016-12-15
The post-genomic era with its wealth of sequences gave rise to a broad range of protein residue-residue contact detecting methods. Although various coevolution methods such as PSICOV, DCA and plmDCA provide correct contact predictions, they do not completely overlap. Hence, new approaches and improvements of existing methods are needed to motivate further development and progress in the field. We present a new contact detecting method, COUSCOus, by combining the best shrinkage approach, the empirical Bayes covariance estimator and GLasso. Using the original PSICOV benchmark dataset, COUSCOus achieves mean accuracies of 0.74, 0.62 and 0.55 for the top L/10 predicted long, medium and short range contacts, respectively. In addition, COUSCOus attains mean areas under the precision-recall curves of 0.25, 0.29 and 0.30 for long, medium and short contacts and outperforms PSICOV. We also observed that COUSCOus outperforms PSICOV w.r.t. Matthew's correlation coefficient criterion on full list of residue contacts. Furthermore, COUSCOus achieves on average 10% more gain in prediction accuracy compared to PSICOV on an independent test set composed of CASP11 protein targets. Finally, we showed that when using a simple random forest meta-classifier, by combining contact detecting techniques and sequence derived features, PSICOV predictions should be replaced by the more accurate COUSCOus predictions. We conclude that the consideration of superior covariance shrinkage approaches will boost several research fields that apply the GLasso procedure, amongst the presented one of residue-residue contact prediction as well as fields such as gene network reconstruction.
Duan, Qian-Qian; Yang, Gen-Ke; Pan, Chang-Chun
2014-01-01
A hybrid optimization algorithm combining finite state method (FSM) and genetic algorithm (GA) is proposed to solve the crude oil scheduling problem. The FSM and GA are combined to take the advantage of each method and compensate deficiencies of individual methods. In the proposed algorithm, the finite state method makes up for the weakness of GA which is poor at local searching ability. The heuristic returned by the FSM can guide the GA algorithm towards good solutions. The idea behind this is that we can generate promising substructure or partial solution by using FSM. Furthermore, the FSM can guarantee that the entire solution space is uniformly covered. Therefore, the combination of the two algorithms has better global performance than the existing GA or FSM which is operated individually. Finally, a real-life crude oil scheduling problem from the literature is used for conducting simulation. The experimental results validate that the proposed method outperforms the state-of-art GA method. PMID:24772031
A preliminary study of muscular artifact cancellation in single-channel EEG.
Chen, Xun; Liu, Aiping; Peng, Hu; Ward, Rabab K
2014-10-01
Electroencephalogram (EEG) recordings are often contaminated with muscular artifacts that strongly obscure the EEG signals and complicates their analysis. For the conventional case, where the EEG recordings are obtained simultaneously over many EEG channels, there exists a considerable range of methods for removing muscular artifacts. In recent years, there has been an increasing trend to use EEG information in ambulatory healthcare and related physiological signal monitoring systems. For practical reasons, a single EEG channel system must be used in these situations. Unfortunately, there exist few studies for muscular artifact cancellation in single-channel EEG recordings. To address this issue, in this preliminary study, we propose a simple, yet effective, method to achieve the muscular artifact cancellation for the single-channel EEG case. This method is a combination of the ensemble empirical mode decomposition (EEMD) and the joint blind source separation (JBSS) techniques. We also conduct a study that compares and investigates all possible single-channel solutions and demonstrate the performance of these methods using numerical simulations and real-life applications. The proposed method is shown to significantly outperform all other methods. It can successfully remove muscular artifacts without altering the underlying EEG activity. It is thus a promising tool for use in ambulatory healthcare systems.
Multiagent scheduling method with earliness and tardiness objectives in flexible job shops.
Wu, Zuobao; Weng, Michael X
2005-04-01
Flexible job-shop scheduling problems are an important extension of the classical job-shop scheduling problems and present additional complexity. Such problems are mainly due to the existence of a considerable amount of overlapping capacities with modern machines. Classical scheduling methods are generally incapable of addressing such capacity overlapping. We propose a multiagent scheduling method with job earliness and tardiness objectives in a flexible job-shop environment. The earliness and tardiness objectives are consistent with the just-in-time production philosophy which has attracted significant attention in both industry and academic community. A new job-routing and sequencing mechanism is proposed. In this mechanism, two kinds of jobs are defined to distinguish jobs with one operation left from jobs with more than one operation left. Different criteria are proposed to route these two kinds of jobs. Job sequencing enables to hold a job that may be completed too early. Two heuristic algorithms for job sequencing are developed to deal with these two kinds of jobs. The computational experiments show that the proposed multiagent scheduling method significantly outperforms the existing scheduling methods in the literature. In addition, the proposed method is quite fast. In fact, the simulation time to find a complete schedule with over 2000 jobs on ten machines is less than 1.5 min.
A nonparametric spatial scan statistic for continuous data.
Jung, Inkyung; Cho, Ho Jin
2015-10-20
Spatial scan statistics are widely used for spatial cluster detection, and several parametric models exist. For continuous data, a normal-based scan statistic can be used. However, the performance of the model has not been fully evaluated for non-normal data. We propose a nonparametric spatial scan statistic based on the Wilcoxon rank-sum test statistic and compared the performance of the method with parametric models via a simulation study under various scenarios. The nonparametric method outperforms the normal-based scan statistic in terms of power and accuracy in almost all cases under consideration in the simulation study. The proposed nonparametric spatial scan statistic is therefore an excellent alternative to the normal model for continuous data and is especially useful for data following skewed or heavy-tailed distributions.
Sparse nonnegative matrix factorization with ℓ0-constraints
Peharz, Robert; Pernkopf, Franz
2012-01-01
Although nonnegative matrix factorization (NMF) favors a sparse and part-based representation of nonnegative data, there is no guarantee for this behavior. Several authors proposed NMF methods which enforce sparseness by constraining or penalizing the ℓ1-norm of the factor matrices. On the other hand, little work has been done using a more natural sparseness measure, the ℓ0-pseudo-norm. In this paper, we propose a framework for approximate NMF which constrains the ℓ0-norm of the basis matrix, or the coefficient matrix, respectively. For this purpose, techniques for unconstrained NMF can be easily incorporated, such as multiplicative update rules, or the alternating nonnegative least-squares scheme. In experiments we demonstrate the benefits of our methods, which compare to, or outperform existing approaches. PMID:22505792
Zhang, Ai-bing; Feng, Jie; Ward, Robert D; Wan, Ping; Gao, Qiang; Wu, Jun; Zhao, Wei-zhong
2012-01-01
Species identification via DNA barcodes is contributing greatly to current bioinventory efforts. The initial, and widely accepted, proposal was to use the protein-coding cytochrome c oxidase subunit I (COI) region as the standard barcode for animals, but recently non-coding internal transcribed spacer (ITS) genes have been proposed as candidate barcodes for both animals and plants. However, achieving a robust alignment for non-coding regions can be problematic. Here we propose two new methods (DV-RBF and FJ-RBF) to address this issue for species assignment by both coding and non-coding sequences that take advantage of the power of machine learning and bioinformatics. We demonstrate the value of the new methods with four empirical datasets, two representing typical protein-coding COI barcode datasets (neotropical bats and marine fish) and two representing non-coding ITS barcodes (rust fungi and brown algae). Using two random sub-sampling approaches, we demonstrate that the new methods significantly outperformed existing Neighbor-joining (NJ) and Maximum likelihood (ML) methods for both coding and non-coding barcodes when there was complete species coverage in the reference dataset. The new methods also out-performed NJ and ML methods for non-coding sequences in circumstances of potentially incomplete species coverage, although then the NJ and ML methods performed slightly better than the new methods for protein-coding barcodes. A 100% success rate of species identification was achieved with the two new methods for 4,122 bat queries and 5,134 fish queries using COI barcodes, with 95% confidence intervals (CI) of 99.75-100%. The new methods also obtained a 96.29% success rate (95%CI: 91.62-98.40%) for 484 rust fungi queries and a 98.50% success rate (95%CI: 96.60-99.37%) for 1094 brown algae queries, both using ITS barcodes.
Ma, Junshui; Wang, Shubing; Raubertas, Richard; Svetnik, Vladimir
2010-07-15
With the increasing popularity of using electroencephalography (EEG) to reveal the treatment effect in drug development clinical trials, the vast volume and complex nature of EEG data compose an intriguing, but challenging, topic. In this paper the statistical analysis methods recommended by the EEG community, along with methods frequently used in the published literature, are first reviewed. A straightforward adjustment of the existing methods to handle multichannel EEG data is then introduced. In addition, based on the spatial smoothness property of EEG data, a new category of statistical methods is proposed. The new methods use a linear combination of low-degree spherical harmonic (SPHARM) basis functions to represent a spatially smoothed version of the EEG data on the scalp, which is close to a sphere in shape. In total, seven statistical methods, including both the existing and the newly proposed methods, are applied to two clinical datasets to compare their power to detect a drug effect. Contrary to the EEG community's recommendation, our results suggest that (1) the nonparametric method does not outperform its parametric counterpart; and (2) including baseline data in the analysis does not always improve the statistical power. In addition, our results recommend that (3) simple paired statistical tests should be avoided due to their poor power; and (4) the proposed spatially smoothed methods perform better than their unsmoothed versions. Copyright 2010 Elsevier B.V. All rights reserved.
2 Major incident triage and the implementation of a new triage tool, the MPTT-24.
Vassallo, James; Smith, Jason
2017-12-01
Over the last decade, a number of European cities including London, have witnessed high profile terrorist attacks resulting in major incidents with large numbers of casualties. Triage, the process of categorising casualties on the basis of their clinical acuity, is a key principle in the effective management of major incidents.The Modified Physiological Triage Tool (MPTT) is a recently developed primary triage tool which in comparison to existing triage tools, including the 2013 UK NARU Sieve, demonstrates the greatest sensitivity at predicting need for life-saving intervention (LSI) within both military and civilian populations.To improve the applicability and usability of the MPTT we increased the upper respiratory rate threshold to 24 breaths per minute (MPTT-24), to make it divisible by four, and included an assessment of external catastrophic haemorrhage. The aim of this study was to conduct a feasibility analysis of the proposed MPTT-24 (figure 1).emermed;34/12/A860-b/F1F1F1Figure 1MPTT-24 METHODS: A retrospective review of the Joint Theatre Trauma Registry (JTTR) and Trauma Audit Research Network (TARN) databases was performed for all adult ( > 18 years) patients presenting between 2006-2013 (JTTR) and 2014 (TARN). Patients were defined as priority one (P1) if they had received one or more life-saving interventions.Using first recorded hospital physiology, patients were categorised as P1 or not-P1 by existing triage tools and both MPTT and MPTT-24. Performance characteristics were evaluated using sensitivity, specificity, under and over-triage with a McNemar test to determine statistical significance. Basic study characteristics are shown in Table 1. Both the MPTT and MPTT-24 outperformed all existing triage methods with a statistically significant (p<0.001) absolute reduction of between 25.5%-29.5% in under-triage when compared to existing UK civilian methods (NARU Sieve). In both populations the MPTT-24 demonstrated an absolute reduction in sensitivity with an increase in specificity when compared to the MPTT. A statistically significant difference was observed between the MPTT and MPTT-24 in the way they categorised TARN and JTTR cases as P1 (p<0.001).emermed;34/12/A860-b/T1F2T1Table 1Study characteristicsemermed;34/12/A860-b/T2F3T2Table 2Performance analysis CONCLUSION: Existing UK methods of primary major incident triage, including the NARU Sieve, are not fit for purpose, with unacceptably high rates of under-triage. When compared to the MPTT, the MPTT-24 allows for a more rapid triage assessment and continues to outperform existing triage tools at predicting need for life-saving intervention. Its use should be considered in civilian and military major incidents. © 2017, Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Revealing the hidden language of complex networks.
Yaveroğlu, Ömer Nebil; Malod-Dognin, Noël; Davis, Darren; Levnajic, Zoran; Janjic, Vuk; Karapandza, Rasa; Stojmirovic, Aleksandar; Pržulj, Nataša
2014-04-01
Sophisticated methods for analysing complex networks promise to be of great benefit to almost all scientific disciplines, yet they elude us. In this work, we make fundamental methodological advances to rectify this. We discover that the interaction between a small number of roles, played by nodes in a network, can characterize a network's structure and also provide a clear real-world interpretation. Given this insight, we develop a framework for analysing and comparing networks, which outperforms all existing ones. We demonstrate its strength by uncovering novel relationships between seemingly unrelated networks, such as Facebook, metabolic, and protein structure networks. We also use it to track the dynamics of the world trade network, showing that a country's role of a broker between non-trading countries indicates economic prosperity, whereas peripheral roles are associated with poverty. This result, though intuitive, has escaped all existing frameworks. Finally, our approach translates network topology into everyday language, bringing network analysis closer to domain scientists.
Multilabel learning via random label selection for protein subcellular multilocations prediction.
Wang, Xiao; Li, Guo-Zheng
2013-01-01
Prediction of protein subcellular localization is an important but challenging problem, particularly when proteins may simultaneously exist at, or move between, two or more different subcellular location sites. Most of the existing protein subcellular localization methods are only used to deal with the single-location proteins. In the past few years, only a few methods have been proposed to tackle proteins with multiple locations. However, they only adopt a simple strategy, that is, transforming the multilocation proteins to multiple proteins with single location, which does not take correlations among different subcellular locations into account. In this paper, a novel method named random label selection (RALS) (multilabel learning via RALS), which extends the simple binary relevance (BR) method, is proposed to learn from multilocation proteins in an effective and efficient way. RALS does not explicitly find the correlations among labels, but rather implicitly attempts to learn the label correlations from data by augmenting original feature space with randomly selected labels as its additional input features. Through the fivefold cross-validation test on a benchmark data set, we demonstrate our proposed method with consideration of label correlations obviously outperforms the baseline BR method without consideration of label correlations, indicating correlations among different subcellular locations really exist and contribute to improvement of prediction performance. Experimental results on two benchmark data sets also show that our proposed methods achieve significantly higher performance than some other state-of-the-art methods in predicting subcellular multilocations of proteins. The prediction web server is available at >http://levis.tongji.edu.cn:8080/bioinfo/MLPred-Euk/ for the public usage.
Knowledge Discovery from Posts in Online Health Communities Using Unified Medical Language System.
Chen, Donghua; Zhang, Runtong; Liu, Kecheng; Hou, Lei
2018-06-19
Patient-reported posts in Online Health Communities (OHCs) contain various valuable information that can help establish knowledge-based online support for online patients. However, utilizing these reports to improve online patient services in the absence of appropriate medical and healthcare expert knowledge is difficult. Thus, we propose a comprehensive knowledge discovery method that is based on the Unified Medical Language System for the analysis of narrative posts in OHCs. First, we propose a domain-knowledge support framework for OHCs to provide a basis for post analysis. Second, we develop a Knowledge-Involved Topic Modeling (KI-TM) method to extract and expand explicit knowledge within the text. We propose four metrics, namely, explicit knowledge rate, latent knowledge rate, knowledge correlation rate, and perplexity, for the evaluation of the KI-TM method. Our experimental results indicate that our proposed method outperforms existing methods in terms of providing knowledge support. Our method enhances knowledge support for online patients and can help develop intelligent OHCs in the future.
Automatic comic page image understanding based on edge segment analysis
NASA Astrophysics Data System (ADS)
Liu, Dong; Wang, Yongtao; Tang, Zhi; Li, Luyuan; Gao, Liangcai
2013-12-01
Comic page image understanding aims to analyse the layout of the comic page images by detecting the storyboards and identifying the reading order automatically. It is the key technique to produce the digital comic documents suitable for reading on mobile devices. In this paper, we propose a novel comic page image understanding method based on edge segment analysis. First, we propose an efficient edge point chaining method to extract Canny edge segments (i.e., contiguous chains of Canny edge points) from the input comic page image; second, we propose a top-down scheme to detect line segments within each obtained edge segment; third, we develop a novel method to detect the storyboards by selecting the border lines and further identify the reading order of these storyboards. The proposed method is performed on a data set consisting of 2000 comic page images from ten printed comic series. The experimental results demonstrate that the proposed method achieves satisfactory results on different comics and outperforms the existing methods.
Recovery of failed solid-state anaerobic digesters.
Yang, Liangcheng; Ge, Xumeng; Li, Yebo
2016-08-01
This study examined the performance of three methods for recovering failed solid-state anaerobic digesters. The 9-L digesters, which were fed with corn stover, failed at a feedstock/inoculum (F/I) ratio of 10 with negligible methane yields. To recover the systems, inoculum was added to bring the F/I ratio to 4. Inoculum was either added to the top of a failed digester, injected into it, or well-mixed with the existing feedstock. Digesters using top-addition and injection methods quickly resumed and achieved peak yields in 10days, while digesters using well-mixed method recovered slowly but showed 50% higher peak yields. Overall, these methods recovered 30-40% methane from failed digesters. The well-mixed method showed the highest methane yield, followed by the injection and top-addition methods. Recovered digesters outperformed digesters had a constant F/I ratio of 4. Slow mass transfer and slow growth of microbes were believed to be the major limiting factors for recovery. Copyright © 2016 Elsevier Ltd. All rights reserved.
Serang, Oliver
2012-01-01
Linear programming (LP) problems are commonly used in analysis and resource allocation, frequently surfacing as approximations to more difficult problems. Existing approaches to LP have been dominated by a small group of methods, and randomized algorithms have not enjoyed popularity in practice. This paper introduces a novel randomized method of solving LP problems by moving along the facets and within the interior of the polytope along rays randomly sampled from the polyhedral cones defined by the bounding constraints. This conic sampling method is then applied to randomly sampled LPs, and its runtime performance is shown to compare favorably to the simplex and primal affine-scaling algorithms, especially on polytopes with certain characteristics. The conic sampling method is then adapted and applied to solve a certain quadratic program, which compute a projection onto a polytope; the proposed method is shown to outperform the proprietary software Mathematica on large, sparse QP problems constructed from mass spectometry-based proteomics. PMID:22952741
Yousefi, Siavash; Qin, Jia; Zhi, Zhongwei; Wang, Ruikang K
2013-02-01
Optical microangiography is an imaging technology that is capable of providing detailed functional blood flow maps within microcirculatory tissue beds in vivo. Some practical issues however exist when displaying and quantifying the microcirculation that perfuses the scanned tissue volume. These issues include: (I) Probing light is subject to specular reflection when it shines onto sample. The unevenness of the tissue surface makes the light energy entering the tissue not uniform over the entire scanned tissue volume. (II) The biological tissue is heterogeneous in nature, meaning the scattering and absorption properties of tissue would attenuate the probe beam. These physical limitations can result in local contrast degradation and non-uniform micro-angiogram images. In this paper, we propose a post-processing method that uses Rayleigh contrast-limited adaptive histogram equalization to increase the contrast and improve the overall appearance and uniformity of optical micro-angiograms without saturating the vessel intensity and changing the physical meaning of the micro-angiograms. The qualitative and quantitative performance of the proposed method is compared with those of common histogram equalization and contrast enhancement methods. We demonstrate that the proposed method outperforms other existing approaches. The proposed method is not limited to optical microangiography and can be used in other image modalities such as photo-acoustic tomography and scanning laser confocal microscopy.
Convex blind image deconvolution with inverse filtering
NASA Astrophysics Data System (ADS)
Lv, Xiao-Guang; Li, Fang; Zeng, Tieyong
2018-03-01
Blind image deconvolution is the process of estimating both the original image and the blur kernel from the degraded image with only partial or no information about degradation and the imaging system. It is a bilinear ill-posed inverse problem corresponding to the direct problem of convolution. Regularization methods are used to handle the ill-posedness of blind deconvolution and get meaningful solutions. In this paper, we investigate a convex regularized inverse filtering method for blind deconvolution of images. We assume that the support region of the blur object is known, as has been done in a few existing works. By studying the inverse filters of signal and image restoration problems, we observe the oscillation structure of the inverse filters. Inspired by the oscillation structure of the inverse filters, we propose to use the star norm to regularize the inverse filter. Meanwhile, we use the total variation to regularize the resulting image obtained by convolving the inverse filter with the degraded image. The proposed minimization model is shown to be convex. We employ the first-order primal-dual method for the solution of the proposed minimization model. Numerical examples for blind image restoration are given to show that the proposed method outperforms some existing methods in terms of peak signal-to-noise ratio (PSNR), structural similarity (SSIM), visual quality and time consumption.
Image quality assessment using deep convolutional networks
NASA Astrophysics Data System (ADS)
Li, Yezhou; Ye, Xiang; Li, Yong
2017-12-01
This paper proposes a method of accurately assessing image quality without a reference image by using a deep convolutional neural network. Existing training based methods usually utilize a compact set of linear filters for learning features of images captured by different sensors to assess their quality. These methods may not be able to learn the semantic features that are intimately related with the features used in human subject assessment. Observing this drawback, this work proposes training a deep convolutional neural network (CNN) with labelled images for image quality assessment. The ReLU in the CNN allows non-linear transformations for extracting high-level image features, providing a more reliable assessment of image quality than linear filters. To enable the neural network to take images of any arbitrary size as input, the spatial pyramid pooling (SPP) is introduced connecting the top convolutional layer and the fully-connected layer. In addition, the SPP makes the CNN robust to object deformations to a certain extent. The proposed method taking an image as input carries out an end-to-end learning process, and outputs the quality of the image. It is tested on public datasets. Experimental results show that it outperforms existing methods by a large margin and can accurately assess the image quality on images taken by different sensors of varying sizes.
Manifold Regularized Experimental Design for Active Learning.
Zhang, Lining; Shum, Hubert P H; Shao, Ling
2016-12-02
Various machine learning and data mining tasks in classification require abundant data samples to be labeled for training. Conventional active learning methods aim at labeling the most informative samples for alleviating the labor of the user. Many previous studies in active learning select one sample after another in a greedy manner. However, this is not very effective because the classification models has to be retrained for each newly labeled sample. Moreover, many popular active learning approaches utilize the most uncertain samples by leveraging the classification hyperplane of the classifier, which is not appropriate since the classification hyperplane is inaccurate when the training data are small-sized. The problem of insufficient training data in real-world systems limits the potential applications of these approaches. This paper presents a novel method of active learning called manifold regularized experimental design (MRED), which can label multiple informative samples at one time for training. In addition, MRED gives an explicit geometric explanation for the selected samples to be labeled by the user. Different from existing active learning methods, our method avoids the intrinsic problems caused by insufficiently labeled samples in real-world applications. Various experiments on synthetic datasets, the Yale face database and the Corel image database have been carried out to show how MRED outperforms existing methods.
Misaligned Image Integration With Local Linear Model.
Baba, Tatsuya; Matsuoka, Ryo; Shirai, Keiichiro; Okuda, Masahiro
2016-05-01
We present a new image integration technique for a flash and long-exposure image pair to capture a dark scene without incurring blurring or noisy artifacts. Most existing methods require well-aligned images for the integration, which is often a burdensome restriction in practical use. We address this issue by locally transferring the colors of the flash images using a small fraction of the corresponding pixels in the long-exposure images. We formulate the image integration as a convex optimization problem with the local linear model. The proposed method makes it possible to integrate the color of the long-exposure image with the detail of the flash image without causing any harmful effects to its contrast, where we do not need perfect alignment between the images by virtue of our new integration principle. We show that our method successfully outperforms the state of the art in the image integration and reference-based color transfer for challenging misaligned data sets.
Automatic PSO-Based Deformable Structures Markerless Tracking in Laparoscopic Cholecystectomy
NASA Astrophysics Data System (ADS)
Djaghloul, Haroun; Batouche, Mohammed; Jessel, Jean-Pierre
An automatic and markerless tracking method of deformable structures (digestive organs) during laparoscopic cholecystectomy intervention that uses the (PSO) behavour and the preoperative a priori knowledge is presented. The associated shape to the global best particles of the population determines a coarse representation of the targeted organ (the gallbladder) in monocular laparoscopic colored images. The swarm behavour is directed by a new fitness function to be optimized to improve the detection and tracking performance. The function is defined by a linear combination of two terms, namely, the human a priori knowledge term (H) and the particle's density term (D). Under the limits of standard (PSO) characteristics, experimental results on both synthetic and real data show the effectiveness and robustness of our method. Indeed, it outperforms existing methods without need of explicit initialization (such as active contours, deformable models and Gradient Vector Flow) on accuracy and convergence rate.
Detection of tuberculosis using hybrid features from chest radiographs
NASA Astrophysics Data System (ADS)
Fatima, Ayesha; Akram, M. Usman; Akhtar, Mahmood; Shafique, Irrum
2017-02-01
Tuberculosis is an infectious disease and becomes a major threat all over the world but still diagnosis of tuberculosis is a challenging task. In literature, chest radiographs are considered as most commonly used medical images in under developed countries for the diagnosis of TB. Different methods have been proposed but they are not helpful for radiologists due to cost and accuracy issues. Our paper presents a methodology in which different combinations of features are extracted based on intensities, shape and texture of chest radiograph and given to classifier for the detection of TB. The performance of our methodology is evaluated using publically available standard dataset Montgomery Country (MC) which contains 138 CXRs among which 80 CXRs are normal and 58 CXRs are abnormal including effusion and miliary patterns etc. The accuracy of 81.16% was achieved and the results show that proposed method have outperformed existing state of the art methods on MC dataset.
Song, Youyi; Zhang, Ling; Chen, Siping; Ni, Dong; Lei, Baiying; Wang, Tianfu
2015-10-01
In this paper, a multiscale convolutional network (MSCN) and graph-partitioning-based method is proposed for accurate segmentation of cervical cytoplasm and nuclei. Specifically, deep learning via the MSCN is explored to extract scale invariant features, and then, segment regions centered at each pixel. The coarse segmentation is refined by an automated graph partitioning method based on the pretrained feature. The texture, shape, and contextual information of the target objects are learned to localize the appearance of distinctive boundary, which is also explored to generate markers to split the touching nuclei. For further refinement of the segmentation, a coarse-to-fine nucleus segmentation framework is developed. The computational complexity of the segmentation is reduced by using superpixel instead of raw pixels. Extensive experimental results demonstrate that the proposed cervical nucleus cell segmentation delivers promising results and outperforms existing methods.
Variable context Markov chains for HIV protease cleavage site prediction.
Oğul, Hasan
2009-06-01
Deciphering the knowledge of HIV protease specificity and developing computational tools for detecting its cleavage sites in protein polypeptide chain are very desirable for designing efficient and specific chemical inhibitors to prevent acquired immunodeficiency syndrome. In this study, we developed a generative model based on a generalization of variable order Markov chains (VOMC) for peptide sequences and adapted the model for prediction of their cleavability by certain proteases. The new method, called variable context Markov chains (VCMC), attempts to identify the context equivalence based on the evolutionary similarities between individual amino acids. It was applied for HIV-1 protease cleavage site prediction problem and shown to outperform existing methods in terms of prediction accuracy on a common dataset. In general, the method is a promising tool for prediction of cleavage sites of all proteases and encouraged to be used for any kind of peptide classification problem as well.
New insights from cluster analysis methods for RNA secondary structure prediction
Rogers, Emily; Heitsch, Christine
2016-01-01
A widening gap exists between the best practices for RNA secondary structure prediction developed by computational researchers and the methods used in practice by experimentalists. Minimum free energy (MFE) predictions, although broadly used, are outperformed by methods which sample from the Boltzmann distribution and data mine the results. In particular, moving beyond the single structure prediction paradigm yields substantial gains in accuracy. Furthermore, the largest improvements in accuracy and precision come from viewing secondary structures not at the base pair level but at lower granularity/higher abstraction. This suggests that random errors affecting precision and systematic ones affecting accuracy are both reduced by this “fuzzier” view of secondary structures. Thus experimentalists who are willing to adopt a more rigorous, multilayered approach to secondary structure prediction by iterating through these levels of granularity will be much better able to capture fundamental aspects of RNA base pairing. PMID:26971529
Enriching consumer health vocabulary through mining a social Q&A site: A similarity-based approach.
He, Zhe; Chen, Zhiwei; Oh, Sanghee; Hou, Jinghui; Bian, Jiang
2017-05-01
The widely known vocabulary gap between health consumers and healthcare professionals hinders information seeking and health dialogue of consumers on end-user health applications. The Open Access and Collaborative Consumer Health Vocabulary (OAC CHV), which contains health-related terms used by lay consumers, has been created to bridge such a gap. Specifically, the OAC CHV facilitates consumers' health information retrieval by enabling consumer-facing health applications to translate between professional language and consumer friendly language. To keep up with the constantly evolving medical knowledge and language use, new terms need to be identified and added to the OAC CHV. User-generated content on social media, including social question and answer (social Q&A) sites, afford us an enormous opportunity in mining consumer health terms. Existing methods of identifying new consumer terms from text typically use ad-hoc lexical syntactic patterns and human review. Our study extends an existing method by extracting n-grams from a social Q&A textual corpus and representing them with a rich set of contextual and syntactic features. Using K-means clustering, our method, simiTerm, was able to identify terms that are both contextually and syntactically similar to the existing OAC CHV terms. We tested our method on social Q&A corpora on two disease domains: diabetes and cancer. Our method outperformed three baseline ranking methods. A post-hoc qualitative evaluation by human experts further validated that our method can effectively identify meaningful new consumer terms on social Q&A. Copyright © 2017 Elsevier Inc. All rights reserved.
Generalized superradiant assembly for nanophotonic thermal emitters
NASA Astrophysics Data System (ADS)
Mallawaarachchi, Sudaraka; Gunapala, Sarath D.; Stockman, Mark I.; Premaratne, Malin
2018-03-01
Superradiance explains the collective enhancement of emission, observed when nanophotonic emitters are arranged within subwavelength proximity and perfect symmetry. Thermal superradiant emitter assemblies with variable photon far-field coupling rates are known to be capable of outperforming their conventional, nonsuperradiant counterparts. However, due to the inability to account for assemblies comprising emitters with various materials and dimensional configurations, existing thermal superradiant models are inadequate and incongruent. In this paper, a generalized thermal superradiant assembly for nanophotonic emitters is developed from first principles. Spectral analysis shows that not only does the proposed model outperform existing models in power delivery, but also portrays unforeseen and startling characteristics during emission. These electromagnetically induced transparency like (EIT-like) and superscattering-like characteristics are reported here for a superradiant assembly, and the effects escalate as the emitters become increasingly disparate. The fact that the EIT-like characteristics are in close agreement with a recent experimental observation involving the superradiant decay of qubits strongly bolsters the validity of the proposed model.
Frappier, Vincent; Najmanovich, Rafael J.
2014-01-01
Normal mode analysis (NMA) methods are widely used to study dynamic aspects of protein structures. Two critical components of NMA methods are coarse-graining in the level of simplification used to represent protein structures and the choice of potential energy functional form. There is a trade-off between speed and accuracy in different choices. In one extreme one finds accurate but slow molecular-dynamics based methods with all-atom representations and detailed atom potentials. On the other extreme, fast elastic network model (ENM) methods with Cα−only representations and simplified potentials that based on geometry alone, thus oblivious to protein sequence. Here we present ENCoM, an Elastic Network Contact Model that employs a potential energy function that includes a pairwise atom-type non-bonded interaction term and thus makes it possible to consider the effect of the specific nature of amino-acids on dynamics within the context of NMA. ENCoM is as fast as existing ENM methods and outperforms such methods in the generation of conformational ensembles. Here we introduce a new application for NMA methods with the use of ENCoM in the prediction of the effect of mutations on protein stability. While existing methods are based on machine learning or enthalpic considerations, the use of ENCoM, based on vibrational normal modes, is based on entropic considerations. This represents a novel area of application for NMA methods and a novel approach for the prediction of the effect of mutations. We compare ENCoM to a large number of methods in terms of accuracy and self-consistency. We show that the accuracy of ENCoM is comparable to that of the best existing methods. We show that existing methods are biased towards the prediction of destabilizing mutations and that ENCoM is less biased at predicting stabilizing mutations. PMID:24762569
Deng, Bai-chuan; Yun, Yong-huan; Liang, Yi-zeng; Yi, Lun-zhao
2014-10-07
In this study, a new optimization algorithm called the Variable Iterative Space Shrinkage Approach (VISSA) that is based on the idea of model population analysis (MPA) is proposed for variable selection. Unlike most of the existing optimization methods for variable selection, VISSA statistically evaluates the performance of variable space in each step of optimization. Weighted binary matrix sampling (WBMS) is proposed to generate sub-models that span the variable subspace. Two rules are highlighted during the optimization procedure. First, the variable space shrinks in each step. Second, the new variable space outperforms the previous one. The second rule, which is rarely satisfied in most of the existing methods, is the core of the VISSA strategy. Compared with some promising variable selection methods such as competitive adaptive reweighted sampling (CARS), Monte Carlo uninformative variable elimination (MCUVE) and iteratively retaining informative variables (IRIV), VISSA showed better prediction ability for the calibration of NIR data. In addition, VISSA is user-friendly; only a few insensitive parameters are needed, and the program terminates automatically without any additional conditions. The Matlab codes for implementing VISSA are freely available on the website: https://sourceforge.net/projects/multivariateanalysis/files/VISSA/.
Active link selection for efficient semi-supervised community detection
NASA Astrophysics Data System (ADS)
Yang, Liang; Jin, Di; Wang, Xiao; Cao, Xiaochun
2015-03-01
Several semi-supervised community detection algorithms have been proposed recently to improve the performance of traditional topology-based methods. However, most of them focus on how to integrate supervised information with topology information; few of them pay attention to which information is critical for performance improvement. This leads to large amounts of demand for supervised information, which is expensive or difficult to obtain in most fields. For this problem we propose an active link selection framework, that is we actively select the most uncertain and informative links for human labeling for the efficient utilization of the supervised information. We also disconnect the most likely inter-community edges to further improve the efficiency. Our main idea is that, by connecting uncertain nodes to their community hubs and disconnecting the inter-community edges, one can sharpen the block structure of adjacency matrix more efficiently than randomly labeling links as the existing methods did. Experiments on both synthetic and real networks demonstrate that our new approach significantly outperforms the existing methods in terms of the efficiency of using supervised information. It needs ~13% of the supervised information to achieve a performance similar to that of the original semi-supervised approaches.
NASA Astrophysics Data System (ADS)
Liu, Chunlei; Ding, Wenrui; Li, Hongguang; Li, Jiankun
2017-09-01
Haze removal is a nontrivial work for medium-altitude unmanned aerial vehicle (UAV) image processing because of the effects of light absorption and scattering. The challenges are attributed mainly to image distortion and detail blur during the long-distance and large-scale imaging process. In our work, a metadata-assisted nonuniform atmospheric scattering model is proposed to deal with the aforementioned problems of medium-altitude UAV. First, to better describe the real atmosphere, we propose a nonuniform atmospheric scattering model according to the aerosol distribution, which directly benefits the image distortion correction. Second, considering the characteristics of long-distance imaging, we calculate the depth map, which is an essential clue to modeling, on the basis of UAV metadata information. An accurate depth map reduces the color distortion compared with the depth of field obtained by other existing methods based on priors or assumptions. Furthermore, we use an adaptive median filter to address the problem of fuzzy details caused by the global airlight value. Experimental results on both real flight and synthetic images demonstrate that our proposed method outperforms four other existing haze removal methods.
Active link selection for efficient semi-supervised community detection
Yang, Liang; Jin, Di; Wang, Xiao; Cao, Xiaochun
2015-01-01
Several semi-supervised community detection algorithms have been proposed recently to improve the performance of traditional topology-based methods. However, most of them focus on how to integrate supervised information with topology information; few of them pay attention to which information is critical for performance improvement. This leads to large amounts of demand for supervised information, which is expensive or difficult to obtain in most fields. For this problem we propose an active link selection framework, that is we actively select the most uncertain and informative links for human labeling for the efficient utilization of the supervised information. We also disconnect the most likely inter-community edges to further improve the efficiency. Our main idea is that, by connecting uncertain nodes to their community hubs and disconnecting the inter-community edges, one can sharpen the block structure of adjacency matrix more efficiently than randomly labeling links as the existing methods did. Experiments on both synthetic and real networks demonstrate that our new approach significantly outperforms the existing methods in terms of the efficiency of using supervised information. It needs ~13% of the supervised information to achieve a performance similar to that of the original semi-supervised approaches. PMID:25761385
Jian, Yulin; Huang, Daoyu; Yan, Jia; Lu, Kun; Huang, Ying; Wen, Tailai; Zeng, Tanyue; Zhong, Shijie; Xie, Qilong
2017-06-19
A novel classification model, named the quantum-behaved particle swarm optimization (QPSO)-based weighted multiple kernel extreme learning machine (QWMK-ELM), is proposed in this paper. Experimental validation is carried out with two different electronic nose (e-nose) datasets. Being different from the existing multiple kernel extreme learning machine (MK-ELM) algorithms, the combination coefficients of base kernels are regarded as external parameters of single-hidden layer feedforward neural networks (SLFNs). The combination coefficients of base kernels, the model parameters of each base kernel, and the regularization parameter are optimized by QPSO simultaneously before implementing the kernel extreme learning machine (KELM) with the composite kernel function. Four types of common single kernel functions (Gaussian kernel, polynomial kernel, sigmoid kernel, and wavelet kernel) are utilized to constitute different composite kernel functions. Moreover, the method is also compared with other existing classification methods: extreme learning machine (ELM), kernel extreme learning machine (KELM), k-nearest neighbors (KNN), support vector machine (SVM), multi-layer perceptron (MLP), radical basis function neural network (RBFNN), and probabilistic neural network (PNN). The results have demonstrated that the proposed QWMK-ELM outperforms the aforementioned methods, not only in precision, but also in efficiency for gas classification.
Heating and flooding: A unified approach for rapid generation of free energy surfaces
NASA Astrophysics Data System (ADS)
Chen, Ming; Cuendet, Michel A.; Tuckerman, Mark E.
2012-07-01
We propose a general framework for the efficient sampling of conformational equilibria in complex systems and the generation of associated free energy hypersurfaces in terms of a set of collective variables. The method is a strategic synthesis of the adiabatic free energy dynamics approach, previously introduced by us and others, and existing schemes using Gaussian-based adaptive bias potentials to disfavor previously visited regions. In addition, we suggest sampling the thermodynamic force instead of the probability density to reconstruct the free energy hypersurface. All these elements are combined into a robust extended phase-space formalism that can be easily incorporated into existing molecular dynamics packages. The unified scheme is shown to outperform both metadynamics and adiabatic free energy dynamics in generating two-dimensional free energy surfaces for several example cases including the alanine dipeptide in the gas and aqueous phases and the met-enkephalin oligopeptide. In addition, the method can efficiently generate higher dimensional free energy landscapes, which we demonstrate by calculating a four-dimensional surface in the Ramachandran angles of the gas-phase alanine tripeptide.
Pose estimation for augmented reality applications using genetic algorithm.
Yu, Ying Kin; Wong, Kin Hong; Chang, Michael Ming Yuen
2005-12-01
This paper describes a genetic algorithm that tackles the pose-estimation problem in computer vision. Our genetic algorithm can find the rotation and translation of an object accurately when the three-dimensional structure of the object is given. In our implementation, each chromosome encodes both the pose and the indexes to the selected point features of the object. Instead of only searching for the pose as in the existing work, our algorithm, at the same time, searches for a set containing the most reliable feature points in the process. This mismatch filtering strategy successfully makes the algorithm more robust under the presence of point mismatches and outliers in the images. Our algorithm has been tested with both synthetic and real data with good results. The accuracy of the recovered pose is compared to the existing algorithms. Our approach outperformed the Lowe's method and the other two genetic algorithms under the presence of point mismatches and outliers. In addition, it has been used to estimate the pose of a real object. It is shown that the proposed method is applicable to augmented reality applications.
Deep-Learning-Based Drug-Target Interaction Prediction.
Wen, Ming; Zhang, Zhimin; Niu, Shaoyu; Sha, Haozhi; Yang, Ruihan; Yun, Yonghuan; Lu, Hongmei
2017-04-07
Identifying interactions between known drugs and targets is a major challenge in drug repositioning. In silico prediction of drug-target interaction (DTI) can speed up the expensive and time-consuming experimental work by providing the most potent DTIs. In silico prediction of DTI can also provide insights about the potential drug-drug interaction and promote the exploration of drug side effects. Traditionally, the performance of DTI prediction depends heavily on the descriptors used to represent the drugs and the target proteins. In this paper, to accurately predict new DTIs between approved drugs and targets without separating the targets into different classes, we developed a deep-learning-based algorithmic framework named DeepDTIs. It first abstracts representations from raw input descriptors using unsupervised pretraining and then applies known label pairs of interaction to build a classification model. Compared with other methods, it is found that DeepDTIs reaches or outperforms other state-of-the-art methods. The DeepDTIs can be further used to predict whether a new drug targets to some existing targets or whether a new target interacts with some existing drugs.
Mifsud, Borbala; Martincorena, Inigo; Darbo, Elodie; Sugar, Robert; Schoenfelder, Stefan; Fraser, Peter; Luscombe, Nicholas M
2017-01-01
Hi-C is one of the main methods for investigating spatial co-localisation of DNA in the nucleus. However, the raw sequencing data obtained from Hi-C experiments suffer from large biases and spurious contacts, making it difficult to identify true interactions. Existing methods use complex models to account for biases and do not provide a significance threshold for detecting interactions. Here we introduce a simple binomial probabilistic model that resolves complex biases and distinguishes between true and false interactions. The model corrects biases of known and unknown origin and yields a p-value for each interaction, providing a reliable threshold based on significance. We demonstrate this experimentally by testing the method against a random ligation dataset. Our method outperforms previous methods and provides a statistical framework for further data analysis, such as comparisons of Hi-C interactions between different conditions. GOTHiC is available as a BioConductor package (http://www.bioconductor.org/packages/release/bioc/html/GOTHiC.html).
Kong, Xiang-Zhen; Liu, Jin-Xing; Zheng, Chun-Hou; Hou, Mi-Xiao; Wang, Juan
2017-07-01
High dimensionality has become a typical feature of biomolecular data. In this paper, a novel dimension reduction method named p-norm singular value decomposition (PSVD) is proposed to seek the low-rank approximation matrix to the biomolecular data. To enhance the robustness to outliers, the Lp-norm is taken as the error function and the Schatten p-norm is used as the regularization function in the optimization model. To evaluate the performance of PSVD, the Kmeans clustering method is then employed for tumor clustering based on the low-rank approximation matrix. Extensive experiments are carried out on five gene expression data sets including two benchmark data sets and three higher dimensional data sets from the cancer genome atlas. The experimental results demonstrate that the PSVD-based method outperforms many existing methods. Especially, it is experimentally proved that the proposed method is more efficient for processing higher dimensional data with good robustness, stability, and superior time performance.
Global positioning method based on polarized light compass system
NASA Astrophysics Data System (ADS)
Liu, Jun; Yang, Jiangtao; Wang, Yubo; Tang, Jun; Shen, Chong
2018-05-01
This paper presents a global positioning method based on a polarized light compass system. A main limitation of polarization positioning is the environment such as weak and locally destroyed polarization environments, and the solution to the positioning problem is given in this paper which is polarization image de-noising and segmentation. Therefore, the pulse coupled neural network is employed for enhancing positioning performance. The prominent advantages of the present positioning technique are as follows: (i) compared to the existing position method based on polarized light, better sun tracking accuracy can be achieved and (ii) the robustness and accuracy of positioning under weak and locally destroyed polarization environments, such as cloudy or building shielding, are improved significantly. Finally, some field experiments are given to demonstrate the effectiveness and applicability of the proposed global positioning technique. The experiments have shown that our proposed method outperforms the conventional polarization positioning method, the real time longitude and latitude with accuracy up to 0.0461° and 0.0911°, respectively.
A new exact method for line radiative transfer
NASA Astrophysics Data System (ADS)
Elitzur, Moshe; Asensio Ramos, Andrés
2006-01-01
We present a new method, the coupled escape probability (CEP), for exact calculation of line emission from multi-level systems, solving only algebraic equations for the level populations. The CEP formulation of the classical two-level problem is a set of linear equations, and we uncover an exact analytic expression for the emission from two-level optically thick sources that holds as long as they are in the `effectively thin' regime. In a comparative study of a number of standard problems, the CEP method outperformed the leading line transfer methods by substantial margins. The algebraic equations employed by our new method are already incorporated in numerous codes based on the escape probability approximation. All that is required for an exact solution with these existing codes is to augment the expression for the escape probability with simple zone-coupling terms. As an application, we find that standard escape probability calculations generally produce the correct cooling emission by the CII 158-μm line but not by the 3P lines of OI.
Castellana, Stefano; Fusilli, Caterina; Mazzoccoli, Gianluigi; Biagini, Tommaso; Capocefalo, Daniele; Carella, Massimo; Vescovi, Angelo Luigi; Mazza, Tommaso
2017-06-01
24,189 are all the possible non-synonymous amino acid changes potentially affecting the human mitochondrial DNA. Only a tiny subset was functionally evaluated with certainty so far, while the pathogenicity of the vast majority was only assessed in-silico by software predictors. Since these tools proved to be rather incongruent, we have designed and implemented APOGEE, a machine-learning algorithm that outperforms all existing prediction methods in estimating the harmfulness of mitochondrial non-synonymous genome variations. We provide a detailed description of the underlying algorithm, of the selected and manually curated training and test sets of variants, as well as of its classification ability.
Yang, James J; Li, Jia; Williams, L Keoki; Buu, Anne
2016-01-05
In genome-wide association studies (GWAS) for complex diseases, the association between a SNP and each phenotype is usually weak. Combining multiple related phenotypic traits can increase the power of gene search and thus is a practically important area that requires methodology work. This study provides a comprehensive review of existing methods for conducting GWAS on complex diseases with multiple phenotypes including the multivariate analysis of variance (MANOVA), the principal component analysis (PCA), the generalizing estimating equations (GEE), the trait-based association test involving the extended Simes procedure (TATES), and the classical Fisher combination test. We propose a new method that relaxes the unrealistic independence assumption of the classical Fisher combination test and is computationally efficient. To demonstrate applications of the proposed method, we also present the results of statistical analysis on the Study of Addiction: Genetics and Environment (SAGE) data. Our simulation study shows that the proposed method has higher power than existing methods while controlling for the type I error rate. The GEE and the classical Fisher combination test, on the other hand, do not control the type I error rate and thus are not recommended. In general, the power of the competing methods decreases as the correlation between phenotypes increases. All the methods tend to have lower power when the multivariate phenotypes come from long tailed distributions. The real data analysis also demonstrates that the proposed method allows us to compare the marginal results with the multivariate results and specify which SNPs are specific to a particular phenotype or contribute to the common construct. The proposed method outperforms existing methods in most settings and also has great applications in GWAS on complex diseases with multiple phenotypes such as the substance abuse disorders.
Flip-avoiding interpolating surface registration for skull reconstruction.
Xie, Shudong; Leow, Wee Kheng; Lee, Hanjing; Lim, Thiam Chye
2018-03-30
Skull reconstruction is an important and challenging task in craniofacial surgery planning, forensic investigation and anthropological studies. Existing methods typically reconstruct approximating surfaces that regard corresponding points on the target skull as soft constraints, thus incurring non-zero error even for non-defective parts and high overall reconstruction error. This paper proposes a novel geometric reconstruction method that non-rigidly registers an interpolating reference surface that regards corresponding target points as hard constraints, thus achieving low reconstruction error. To overcome the shortcoming of interpolating a surface, a flip-avoiding method is used to detect and exclude conflicting hard constraints that would otherwise cause surface patches to flip and self-intersect. Comprehensive test results show that our method is more accurate and robust than existing skull reconstruction methods. By incorporating symmetry constraints, it can produce more symmetric and normal results than other methods in reconstructing defective skulls with a large number of defects. It is robust against severe outliers such as radiation artifacts in computed tomography due to dental implants. In addition, test results also show that our method outperforms thin-plate spline for model resampling, which enables the active shape model to yield more accurate reconstruction results. As the reconstruction accuracy of defective parts varies with the use of different reference models, we also study the implication of reference model selection for skull reconstruction. Copyright © 2018 John Wiley & Sons, Ltd.
Zhou, Jiyun; Xu, Ruifeng; He, Yulan; Lu, Qin; Wang, Hongpeng; Kong, Bing
2016-01-01
Protein-DNA interactions are involved in many fundamental biological processes essential for cellular function. Most of the existing computational approaches employed only the sequence context of the target residue for its prediction. In the present study, for each target residue, we applied both the spatial context and the sequence context to construct the feature space. Subsequently, Latent Semantic Analysis (LSA) was applied to remove the redundancies in the feature space. Finally, a predictor (PDNAsite) was developed through the integration of the support vector machines (SVM) classifier and ensemble learning. Results on the PDNA-62 and the PDNA-224 datasets demonstrate that features extracted from spatial context provide more information than those from sequence context and the combination of them gives more performance gain. An analysis of the number of binding sites in the spatial context of the target site indicates that the interactions between binding sites next to each other are important for protein-DNA recognition and their binding ability. The comparison between our proposed PDNAsite method and the existing methods indicate that PDNAsite outperforms most of the existing methods and is a useful tool for DNA-binding site identification. A web-server of our predictor (http://hlt.hitsz.edu.cn:8080/PDNAsite/) is made available for free public accessible to the biological research community. PMID:27282833
On an Additive Semigraphoid Model for Statistical Networks With Application to Pathway Analysis.
Li, Bing; Chun, Hyonho; Zhao, Hongyu
2014-09-01
We introduce a nonparametric method for estimating non-gaussian graphical models based on a new statistical relation called additive conditional independence, which is a three-way relation among random vectors that resembles the logical structure of conditional independence. Additive conditional independence allows us to use one-dimensional kernel regardless of the dimension of the graph, which not only avoids the curse of dimensionality but also simplifies computation. It also gives rise to a parallel structure to the gaussian graphical model that replaces the precision matrix by an additive precision operator. The estimators derived from additive conditional independence cover the recently introduced nonparanormal graphical model as a special case, but outperform it when the gaussian copula assumption is violated. We compare the new method with existing ones by simulations and in genetic pathway analysis.
Semi-supervised prediction of gene regulatory networks using machine learning algorithms.
Patel, Nihir; Wang, Jason T L
2015-10-01
Use of computational methods to predict gene regulatory networks (GRNs) from gene expression data is a challenging task. Many studies have been conducted using unsupervised methods to fulfill the task; however, such methods usually yield low prediction accuracies due to the lack of training data. In this article, we propose semi-supervised methods for GRN prediction by utilizing two machine learning algorithms, namely, support vector machines (SVM) and random forests (RF). The semi-supervised methods make use of unlabelled data for training. We investigated inductive and transductive learning approaches, both of which adopt an iterative procedure to obtain reliable negative training data from the unlabelled data. We then applied our semi-supervised methods to gene expression data of Escherichia coli and Saccharomyces cerevisiae, and evaluated the performance of our methods using the expression data. Our analysis indicated that the transductive learning approach outperformed the inductive learning approach for both organisms. However, there was no conclusive difference identified in the performance of SVM and RF. Experimental results also showed that the proposed semi-supervised methods performed better than existing supervised methods for both organisms.
Segmentation of malignant lesions in 3D breast ultrasound using a depth-dependent model.
Tan, Tao; Gubern-Mérida, Albert; Borelli, Cristina; Manniesing, Rashindra; van Zelst, Jan; Wang, Lei; Zhang, Wei; Platel, Bram; Mann, Ritse M; Karssemeijer, Nico
2016-07-01
Automated 3D breast ultrasound (ABUS) has been proposed as a complementary screening modality to mammography for early detection of breast cancers. To facilitate the interpretation of ABUS images, automated diagnosis and detection techniques are being developed, in which malignant lesion segmentation plays an important role. However, automated segmentation of cancer in ABUS is challenging since lesion edges might not be well defined. In this study, the authors aim at developing an automated segmentation method for malignant lesions in ABUS that is robust to ill-defined cancer edges and posterior shadowing. A segmentation method using depth-guided dynamic programming based on spiral scanning is proposed. The method automatically adjusts aggressiveness of the segmentation according to the position of the voxels relative to the lesion center. Segmentation is more aggressive in the upper part of the lesion (close to the transducer) than at the bottom (far away from the transducer), where posterior shadowing is usually visible. The authors used Dice similarity coefficient (Dice) for evaluation. The proposed method is compared to existing state of the art approaches such as graph cut, level set, and smart opening and an existing dynamic programming method without depth dependence. In a dataset of 78 cancers, our proposed segmentation method achieved a mean Dice of 0.73 ± 0.14. The method outperforms an existing dynamic programming method (0.70 ± 0.16) on this task (p = 0.03) and it is also significantly (p < 0.001) better than graph cut (0.66 ± 0.18), level set based approach (0.63 ± 0.20) and smart opening (0.65 ± 0.12). The proposed depth-guided dynamic programming method achieves accurate breast malignant lesion segmentation results in automated breast ultrasound.
Joint Facial Action Unit Detection and Feature Fusion: A Multi-conditional Learning Approach.
Eleftheriadis, Stefanos; Rudovic, Ognjen; Pantic, Maja
2016-10-05
Automated analysis of facial expressions can benefit many domains, from marketing to clinical diagnosis of neurodevelopmental disorders. Facial expressions are typically encoded as a combination of facial muscle activations, i.e., action units. Depending on context, these action units co-occur in specific patterns, and rarely in isolation. Yet, most existing methods for automatic action unit detection fail to exploit dependencies among them, and the corresponding facial features. To address this, we propose a novel multi-conditional latent variable model for simultaneous fusion of facial features and joint action unit detection. Specifically, the proposed model performs feature fusion in a generative fashion via a low-dimensional shared subspace, while simultaneously performing action unit detection using a discriminative classification approach. We show that by combining the merits of both approaches, the proposed methodology outperforms existing purely discriminative/generative methods for the target task. To reduce the number of parameters, and avoid overfitting, a novel Bayesian learning approach based on Monte Carlo sampling is proposed, to integrate out the shared subspace. We validate the proposed method on posed and spontaneous data from three publicly available datasets (CK+, DISFA and Shoulder-pain), and show that both feature fusion and joint learning of action units leads to improved performance compared to the state-of-the-art methods for the task.
Lu, Chao; Chelikani, Sudhakar; Jaffray, David A.; Milosevic, Michael F.; Staib, Lawrence H.; Duncan, James S.
2013-01-01
External beam radiation therapy (EBRT) for the treatment of cancer enables accurate placement of radiation dose on the cancerous region. However, the deformation of soft tissue during the course of treatment, such as in cervical cancer, presents significant challenges for the delineation of the target volume and other structures of interest. Furthermore, the presence and regression of pathologies such as tumors may violate registration constraints and cause registration errors. In this paper, automatic segmentation, nonrigid registration and tumor detection in cervical magnetic resonance (MR) data are addressed simultaneously using a unified Bayesian framework. The proposed novel method can generate a tumor probability map while progressively identifying the boundary of an organ of interest based on the achieved nonrigid transformation. The method is able to handle the challenges of significant tumor regression and its effect on surrounding tissues. The new method was compared to various currently existing algorithms on a set of 36 MR data from six patients, each patient has six T2-weighted MR cervical images. The results show that the proposed approach achieves an accuracy comparable to manual segmentation and it significantly outperforms the existing registration algorithms. In addition, the tumor detection result generated by the proposed method has a high agreement with manual delineation by a qualified clinician. PMID:22328178
Acoustic window planning for ultrasound acquisition.
Göbl, Rüdiger; Virga, Salvatore; Rackerseder, Julia; Frisch, Benjamin; Navab, Nassir; Hennersperger, Christoph
2017-06-01
Autonomous robotic ultrasound has recently gained considerable interest, especially for collaborative applications. Existing methods for acquisition trajectory planning are solely based on geometrical considerations, such as the pose of the transducer with respect to the patient surface. This work aims at establishing acoustic window planning to enable autonomous ultrasound acquisitions of anatomies with restricted acoustic windows, such as the liver or the heart. We propose a fully automatic approach for the planning of acquisition trajectories, which only requires information about the target region as well as existing tomographic imaging data, such as X-ray computed tomography. The framework integrates both geometrical and physics-based constraints to estimate the best ultrasound acquisition trajectories with respect to the available acoustic windows. We evaluate the developed method using virtual planning scenarios based on real patient data as well as for real robotic ultrasound acquisitions on a tissue-mimicking phantom. The proposed method yields superior image quality in comparison with a naive planning approach, while maintaining the necessary coverage of the target. We demonstrate that by taking image formation properties into account acquisition planning methods can outperform naive plannings. Furthermore, we show the need for such planning techniques, since naive approaches are not sufficient as they do not take the expected image quality into account.
Closing the Education Gap: A Mayo Clinic Approach to Academic Achievement.
ERIC Educational Resources Information Center
Sang, Herb A.
Despite recent efforts to provide equal education, agreement exists that blacks, females, and disadvantaged students as a group are outperformed in mathematics and science by white middle-class students. To help disadvantaged students, the Duval County Public Schools (Jacksonville, Florida) have developed a "Mayo Clinic" approach to…
Ma, Junshui; Bayram, Sevinç; Tao, Peining; Svetnik, Vladimir
2011-03-15
After a review of the ocular artifact reduction literature, a high-throughput method designed to reduce the ocular artifacts in multichannel continuous EEG recordings acquired at clinical EEG laboratories worldwide is proposed. The proposed method belongs to the category of component-based methods, and does not rely on any electrooculography (EOG) signals. Based on a concept that all ocular artifact components exist in a signal component subspace, the method can uniformly handle all types of ocular artifacts, including eye-blinks, saccades, and other eye movements, by automatically identifying ocular components from decomposed signal components. This study also proposes an improved strategy to objectively and quantitatively evaluate artifact reduction methods. The evaluation strategy uses real EEG signals to synthesize realistic simulated datasets with different amounts of ocular artifacts. The simulated datasets enable us to objectively demonstrate that the proposed method outperforms some existing methods when no high-quality EOG signals are available. Moreover, the results of the simulated datasets improve our understanding of the involved signal decomposition algorithms, and provide us with insights into the inconsistency regarding the performance of different methods in the literature. The proposed method was also applied to two independent clinical EEG datasets involving 28 volunteers and over 1000 EEG recordings. This effort further confirms that the proposed method can effectively reduce ocular artifacts in large clinical EEG datasets in a high-throughput fashion. Copyright © 2011 Elsevier B.V. All rights reserved.
A new method for enhancer prediction based on deep belief network.
Bu, Hongda; Gan, Yanglan; Wang, Yang; Zhou, Shuigeng; Guan, Jihong
2017-10-16
Studies have shown that enhancers are significant regulatory elements to play crucial roles in gene expression regulation. Since enhancers are unrelated to the orientation and distance to their target genes, it is a challenging mission for scholars and researchers to accurately predicting distal enhancers. In the past years, with the high-throughout ChiP-seq technologies development, several computational techniques emerge to predict enhancers using epigenetic or genomic features. Nevertheless, the inconsistency of computational models across different cell-lines and the unsatisfactory prediction performance call for further research in this area. Here, we propose a new Deep Belief Network (DBN) based computational method for enhancer prediction, which is called EnhancerDBN. This method combines diverse features, composed of DNA sequence compositional features, DNA methylation and histone modifications. Our computational results indicate that 1) EnhancerDBN outperforms 13 existing methods in prediction, and 2) GC content and DNA methylation can serve as relevant features for enhancer prediction. Deep learning is effective in boosting the performance of enhancer prediction.
Naturalness preservation image contrast enhancement via histogram modification
NASA Astrophysics Data System (ADS)
Tian, Qi-Chong; Cohen, Laurent D.
2018-04-01
Contrast enhancement is a technique for enhancing image contrast to obtain better visual quality. Since many existing contrast enhancement algorithms usually produce over-enhanced results, the naturalness preservation is needed to be considered in the framework of image contrast enhancement. This paper proposes a naturalness preservation contrast enhancement method, which adopts the histogram matching to improve the contrast and uses the image quality assessment to automatically select the optimal target histogram. The contrast improvement and the naturalness preservation are both considered in the target histogram, so this method can avoid the over-enhancement problem. In the proposed method, the optimal target histogram is a weighted sum of the original histogram, the uniform histogram, and the Gaussian-shaped histogram. Then the structural metric and the statistical naturalness metric are used to determine the weights of corresponding histograms. At last, the contrast-enhanced image is obtained via matching the optimal target histogram. The experiments demonstrate the proposed method outperforms the compared histogram-based contrast enhancement algorithms.
Jia, Erik; Chen, Tianlu
2018-01-01
Left-censored missing values commonly exist in targeted metabolomics datasets and can be considered as missing not at random (MNAR). Improper data processing procedures for missing values will cause adverse impacts on subsequent statistical analyses. However, few imputation methods have been developed and applied to the situation of MNAR in the field of metabolomics. Thus, a practical left-censored missing value imputation method is urgently needed. We developed an iterative Gibbs sampler based left-censored missing value imputation approach (GSimp). We compared GSimp with other three imputation methods on two real-world targeted metabolomics datasets and one simulation dataset using our imputation evaluation pipeline. The results show that GSimp outperforms other imputation methods in terms of imputation accuracy, observation distribution, univariate and multivariate analyses, and statistical sensitivity. Additionally, a parallel version of GSimp was developed for dealing with large scale metabolomics datasets. The R code for GSimp, evaluation pipeline, tutorial, real-world and simulated targeted metabolomics datasets are available at: https://github.com/WandeRum/GSimp. PMID:29385130
Clustering Categorical Data Using Community Detection Techniques
2017-01-01
With the advent of the k-modes algorithm, the toolbox for clustering categorical data has an efficient tool that scales linearly in the number of data items. However, random initialization of cluster centers in k-modes makes it hard to reach a good clustering without resorting to many trials. Recently proposed methods for better initialization are deterministic and reduce the clustering cost considerably. A variety of initialization methods differ in how the heuristics chooses the set of initial centers. In this paper, we address the clustering problem for categorical data from the perspective of community detection. Instead of initializing k modes and running several iterations, our scheme, CD-Clustering, builds an unweighted graph and detects highly cohesive groups of nodes using a fast community detection technique. The top-k detected communities by size will define the k modes. Evaluation on ten real categorical datasets shows that our method outperforms the existing initialization methods for k-modes in terms of accuracy, precision, and recall in most of the cases. PMID:29430249
Crawford, Forrest W.; Suchard, Marc A.
2011-01-01
A birth-death process is a continuous-time Markov chain that counts the number of particles in a system over time. In the general process with n current particles, a new particle is born with instantaneous rate λn and a particle dies with instantaneous rate μn. Currently no robust and efficient method exists to evaluate the finite-time transition probabilities in a general birth-death process with arbitrary birth and death rates. In this paper, we first revisit the theory of continued fractions to obtain expressions for the Laplace transforms of these transition probabilities and make explicit an important derivation connecting transition probabilities and continued fractions. We then develop an efficient algorithm for computing these probabilities that analyzes the error associated with approximations in the method. We demonstrate that this error-controlled method agrees with known solutions and outperforms previous approaches to computing these probabilities. Finally, we apply our novel method to several important problems in ecology, evolution, and genetics. PMID:21984359
Bounded Kalman filter method for motion-robust, non-contact heart rate estimation
Prakash, Sakthi Kumar Arul; Tucker, Conrad S.
2018-01-01
The authors of this work present a real-time measurement of heart rate across different lighting conditions and motion categories. This is an advancement over existing remote Photo Plethysmography (rPPG) methods that require a static, controlled environment for heart rate detection, making them impractical for real-world scenarios wherein a patient may be in motion, or remotely connected to a healthcare provider through telehealth technologies. The algorithm aims to minimize motion artifacts such as blurring and noise due to head movements (uniform, random) by employing i) a blur identification and denoising algorithm for each frame and ii) a bounded Kalman filter technique for motion estimation and feature tracking. A case study is presented that demonstrates the feasibility of the algorithm in non-contact estimation of the pulse rate of subjects performing everyday head and body movements. The method in this paper outperforms state of the art rPPG methods in heart rate detection, as revealed by the benchmarked results. PMID:29552419
Calibrated Multivariate Regression with Application to Neural Semantic Basis Discovery.
Liu, Han; Wang, Lie; Zhao, Tuo
2015-08-01
We propose a calibrated multivariate regression method named CMR for fitting high dimensional multivariate regression models. Compared with existing methods, CMR calibrates regularization for each regression task with respect to its noise level so that it simultaneously attains improved finite-sample performance and tuning insensitiveness. Theoretically, we provide sufficient conditions under which CMR achieves the optimal rate of convergence in parameter estimation. Computationally, we propose an efficient smoothed proximal gradient algorithm with a worst-case numerical rate of convergence O (1/ ϵ ), where ϵ is a pre-specified accuracy of the objective function value. We conduct thorough numerical simulations to illustrate that CMR consistently outperforms other high dimensional multivariate regression methods. We also apply CMR to solve a brain activity prediction problem and find that it is as competitive as a handcrafted model created by human experts. The R package camel implementing the proposed method is available on the Comprehensive R Archive Network http://cran.r-project.org/web/packages/camel/.
LOCALIZER: subcellular localization prediction of both plant and effector proteins in the plant cell
Sperschneider, Jana; Catanzariti, Ann-Maree; DeBoer, Kathleen; Petre, Benjamin; Gardiner, Donald M.; Singh, Karam B.; Dodds, Peter N.; Taylor, Jennifer M.
2017-01-01
Pathogens secrete effector proteins and many operate inside plant cells to enable infection. Some effectors have been found to enter subcellular compartments by mimicking host targeting sequences. Although many computational methods exist to predict plant protein subcellular localization, they perform poorly for effectors. We introduce LOCALIZER for predicting plant and effector protein localization to chloroplasts, mitochondria, and nuclei. LOCALIZER shows greater prediction accuracy for chloroplast and mitochondrial targeting compared to other methods for 652 plant proteins. For 107 eukaryotic effectors, LOCALIZER outperforms other methods and predicts a previously unrecognized chloroplast transit peptide for the ToxA effector, which we show translocates into tobacco chloroplasts. Secretome-wide predictions and confocal microscopy reveal that rust fungi might have evolved multiple effectors that target chloroplasts or nuclei. LOCALIZER is the first method for predicting effector localisation in plants and is a valuable tool for prioritizing effector candidates for functional investigations. LOCALIZER is available at http://localizer.csiro.au/. PMID:28300209
Gao, Yujuan; Wang, Sheng; Deng, Minghua; Xu, Jinbo
2018-05-08
Protein dihedral angles provide a detailed description of protein local conformation. Predicted dihedral angles can be used to narrow down the conformational space of the whole polypeptide chain significantly, thus aiding protein tertiary structure prediction. However, direct angle prediction from sequence alone is challenging. In this article, we present a novel method (named RaptorX-Angle) to predict real-valued angles by combining clustering and deep learning. Tested on a subset of PDB25 and the targets in the latest two Critical Assessment of protein Structure Prediction (CASP), our method outperforms the existing state-of-art method SPIDER2 in terms of Pearson Correlation Coefficient (PCC) and Mean Absolute Error (MAE). Our result also shows approximately linear relationship between the real prediction errors and our estimated bounds. That is, the real prediction error can be well approximated by our estimated bounds. Our study provides an alternative and more accurate prediction of dihedral angles, which may facilitate protein structure prediction and functional study.
The cost-constrained traveling salesman problem
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sokkappa, P.R.
1990-10-01
The Cost-Constrained Traveling Salesman Problem (CCTSP) is a variant of the well-known Traveling Salesman Problem (TSP). In the TSP, the goal is to find a tour of a given set of cities such that the total cost of the tour is minimized. In the CCTSP, each city is given a value, and a fixed cost-constraint is specified. The objective is to find a subtour of the cities that achieves maximum value without exceeding the cost-constraint. Thus, unlike the TSP, the CCTSP requires both selection and sequencing. As a consequence, most results for the TSP cannot be extended to the CCTSP.more » We show that the CCTSP is NP-hard and that no K-approximation algorithm or fully polynomial approximation scheme exists, unless P = NP. We also show that several special cases are polynomially solvable. Algorithms for the CCTSP, which outperform previous methods, are developed in three areas: upper bounding methods, exact algorithms, and heuristics. We found that a bounding strategy based on the knapsack problem performs better, both in speed and in the quality of the bounds, than methods based on the assignment problem. Likewise, we found that a branch-and-bound approach using the knapsack bound was superior to a method based on a common branch-and-bound method for the TSP. In our study of heuristic algorithms, we found that, when selecting modes for inclusion in the subtour, it is important to consider the neighborhood'' of the nodes. A node with low value that brings the subtour near many other nodes may be more desirable than an isolated node of high value. We found two types of repetition to be desirable: repetitions based on randomization in the subtour buildings process, and repetitions encouraging the inclusion of different subsets of the nodes. By varying the number and type of repetitions, we can adjust the computation time required by our method to obtain algorithms that outperform previous methods.« less
Butch-Femme Identity and Visuospatial Performance Among Lesbian and Bisexual Women in China.
Zheng, Lijun; Wen, Guangju; Zheng, Yong
2018-05-01
Lesbian and bisexual women who self-identify as "butch" show a masculine profile with regard to gender roles, gender nonconformity, and systemizing cognitive style, whereas lesbian and bisexual women who self-identify as "femme" show a corresponding feminine profile and those who self-identify as "androgynes" show an intermediate profile. This study examined the association between butch or femme lesbian or bisexual identity and visuospatial ability among 323 lesbian and bisexual women, compared to heterosexual women (n = 207) and men (n = 125), from multiple cities in China. Visuospatial ability was assessed using a Shepard and Metzler-type mental rotation task and Judgment of Line Angle and Position (JLAP) test on the Internet. Heterosexual men outperformed heterosexual women on both mental rotation and JLAP tasks. Lesbian and bisexual women outperformed heterosexual women on mental rotation, but not on JLAP. There were significant differences in mental rotation performance among women, with butch- and androgyne-identified lesbian/bisexual women outperforming femme-identified and heterosexual women. There were also significant differences in JLAP performance among women, with butch- and androgyne-identified lesbian/bisexual women and heterosexual women outperforming femme-identified lesbian/bisexual women. The butch-femme differences in visuospatial ability indicated an association between cognitive ability and butch-femme identity and suggest that neurobiological underpinnings may contribute to butch-femme identity although alternative explanations exist.
Fidelity of the Integrated Force Method Solution
NASA Technical Reports Server (NTRS)
Hopkins, Dale; Halford, Gary; Coroneos, Rula; Patnaik, Surya
2002-01-01
The theory of strain compatibility of the solid mechanics discipline was incomplete since St. Venant's 'strain formulation' in 1876. We have addressed the compatibility condition both in the continuum and the discrete system. This has lead to the formulation of the Integrated Force Method. A dual Integrated Force Method with displacement as the primal variable has also been formulated. A modest finite element code (IFM/Analyzers) based on the IFM theory has been developed. For a set of standard test problems the IFM results were compared with the stiffness method solutions and the MSC/Nastran code. For the problems IFM outperformed the existing methods. Superior IFM performance is attributed to simultaneous compliance of equilibrium equation and compatibility condition. MSC/Nastran organization expressed reluctance to accept the high fidelity IFM solutions. This report discusses the solutions to the examples. No inaccuracy was detected in the IFM solutions. A stiffness method code with a small programming effort can be improved to reap the many IFM benefits when implemented with the IFMD elements. Dr. Halford conducted a peer-review on the Integrated Force Method. Reviewers' response is included.
Combining Biomarkers Linearly and Nonlinearly for Classification Using the Area Under the ROC Curve
Fong, Youyi; Yin, Shuxin; Huang, Ying
2016-01-01
In biomedical studies, it is often of interest to classify/predict a subject’s disease status based on a variety of biomarker measurements. A commonly used classification criterion is based on AUC - Area under the Receiver Operating Characteristic Curve. Many methods have been proposed to optimize approximated empirical AUC criteria, but there are two limitations to the existing methods. First, most methods are only designed to find the best linear combination of biomarkers, which may not perform well when there is strong nonlinearity in the data. Second, many existing linear combination methods use gradient-based algorithms to find the best marker combination, which often result in sub-optimal local solutions. In this paper, we address these two problems by proposing a new kernel-based AUC optimization method called Ramp AUC (RAUC). This method approximates the empirical AUC loss function with a ramp function, and finds the best combination by a difference of convex functions algorithm. We show that as a linear combination method, RAUC leads to a consistent and asymptotically normal estimator of the linear marker combination when the data is generated from a semiparametric generalized linear model, just as the Smoothed AUC method (SAUC). Through simulation studies and real data examples, we demonstrate that RAUC out-performs SAUC in finding the best linear marker combinations, and can successfully capture nonlinear pattern in the data to achieve better classification performance. We illustrate our method with a dataset from a recent HIV vaccine trial. PMID:27058981
Multiscale site-response mapping: A case study of Parkfield, California
Thompson, E.M.; Baise, L.G.; Kayen, R.E.; Morgan, E.C.; Kaklamanos, J.
2011-01-01
The scale of previously proposed methods for mapping site-response ranges from global coverage down to individual urban regions. Typically, spatial coverage and accuracy are inversely related.We use the densely spaced strong-motion stations in Parkfield, California, to estimate the accuracy of different site-response mapping methods and demonstrate a method for integrating multiple site-response estimates from the site to the global scale. This method is simply a weighted mean of a suite of different estimates, where the weights are the inverse of the variance of the individual estimates. Thus, the dominant site-response model varies in space as a function of the accuracy of the different models. For mapping applications, site-response models should be judged in terms of both spatial coverage and the degree of correlation with observed amplifications. Performance varies with period, but in general the Parkfield data show that: (1) where a velocity profile is available, the square-rootof- impedance (SRI) method outperforms the measured VS30 (30 m divided by the S-wave travel time to 30 m depth) and (2) where velocity profiles are unavailable, the topographic slope method outperforms surficial geology for short periods, but geology outperforms slope at longer periods. We develop new equations to estimate site response from topographic slope, derived from the Next Generation Attenuation (NGA) database.
Drug-target interaction prediction: A Bayesian ranking approach.
Peska, Ladislav; Buza, Krisztian; Koller, Júlia
2017-12-01
In silico prediction of drug-target interactions (DTI) could provide valuable information and speed-up the process of drug repositioning - finding novel usage for existing drugs. In our work, we focus on machine learning algorithms supporting drug-centric repositioning approach, which aims to find novel usage for existing or abandoned drugs. We aim at proposing a per-drug ranking-based method, which reflects the needs of drug-centric repositioning research better than conventional drug-target prediction approaches. We propose Bayesian Ranking Prediction of Drug-Target Interactions (BRDTI). The method is based on Bayesian Personalized Ranking matrix factorization (BPR) which has been shown to be an excellent approach for various preference learning tasks, however, it has not been used for DTI prediction previously. In order to successfully deal with DTI challenges, we extended BPR by proposing: (i) the incorporation of target bias, (ii) a technique to handle new drugs and (iii) content alignment to take structural similarities of drugs and targets into account. Evaluation on five benchmark datasets shows that BRDTI outperforms several state-of-the-art approaches in terms of per-drug nDCG and AUC. BRDTI results w.r.t. nDCG are 0.929, 0.953, 0.948, 0.897 and 0.690 for G-Protein Coupled Receptors (GPCR), Ion Channels (IC), Nuclear Receptors (NR), Enzymes (E) and Kinase (K) datasets respectively. Additionally, BRDTI significantly outperformed other methods (BLM-NII, WNN-GIP, NetLapRLS and CMF) w.r.t. nDCG in 17 out of 20 cases. Furthermore, BRDTI was also shown to be able to predict novel drug-target interactions not contained in the original datasets. The average recall at top-10 predicted targets for each drug was 0.762, 0.560, 1.000 and 0.404 for GPCR, IC, NR, and E datasets respectively. Based on the evaluation, we can conclude that BRDTI is an appropriate choice for researchers looking for an in silico DTI prediction technique to be used in drug-centric repositioning scenarios. BRDTI Software and supplementary materials are available online at www.ksi.mff.cuni.cz/∼peska/BRDTI. Copyright © 2017 Elsevier B.V. All rights reserved.
Phylogenetic Placement of Exact Amplicon Sequences Improves Associations with Clinical Information
McDonald, Daniel; Gonzalez, Antonio; Navas-Molina, Jose A.; Jiang, Lingjing; Xu, Zhenjiang Zech; Winker, Kevin; Kado, Deborah M.; Orwoll, Eric; Manary, Mark; Mirarab, Siavash
2018-01-01
ABSTRACT Recent algorithmic advances in amplicon-based microbiome studies enable the inference of exact amplicon sequence fragments. These new methods enable the investigation of sub-operational taxonomic units (sOTU) by removing erroneous sequences. However, short (e.g., 150-nucleotide [nt]) DNA sequence fragments do not contain sufficient phylogenetic signal to reproduce a reasonable tree, introducing a barrier in the utilization of critical phylogenetically aware metrics such as Faith’s PD or UniFrac. Although fragment insertion methods do exist, those methods have not been tested for sOTUs from high-throughput amplicon studies in insertions against a broad reference phylogeny. We benchmarked the SATé-enabled phylogenetic placement (SEPP) technique explicitly against 16S V4 sequence fragments and showed that it outperforms the conceptually problematic but often-used practice of reconstructing de novo phylogenies. In addition, we provide a BSD-licensed QIIME2 plugin (https://github.com/biocore/q2-fragment-insertion) for SEPP and integration into the microbial study management platform QIITA. IMPORTANCE The move from OTU-based to sOTU-based analysis, while providing additional resolution, also introduces computational challenges. We demonstrate that one popular method of dealing with sOTUs (building a de novo tree from the short sequences) can provide incorrect results in human gut metagenomic studies and show that phylogenetic placement of the new sequences with SEPP resolves this problem while also yielding other benefits over existing methods. PMID:29719869
Yousefi, Siavash; Qin, Jia; Zhi, Zhongwei
2013-01-01
Optical microangiography is an imaging technology that is capable of providing detailed functional blood flow maps within microcirculatory tissue beds in vivo. Some practical issues however exist when displaying and quantifying the microcirculation that perfuses the scanned tissue volume. These issues include: (I) Probing light is subject to specular reflection when it shines onto sample. The unevenness of the tissue surface makes the light energy entering the tissue not uniform over the entire scanned tissue volume. (II) The biological tissue is heterogeneous in nature, meaning the scattering and absorption properties of tissue would attenuate the probe beam. These physical limitations can result in local contrast degradation and non-uniform micro-angiogram images. In this paper, we propose a post-processing method that uses Rayleigh contrast-limited adaptive histogram equalization to increase the contrast and improve the overall appearance and uniformity of optical micro-angiograms without saturating the vessel intensity and changing the physical meaning of the micro-angiograms. The qualitative and quantitative performance of the proposed method is compared with those of common histogram equalization and contrast enhancement methods. We demonstrate that the proposed method outperforms other existing approaches. The proposed method is not limited to optical microangiography and can be used in other image modalities such as photo-acoustic tomography and scanning laser confocal microscopy. PMID:23482880
Mousavi Kahaki, Seyed Mostafa; Nordin, Md Jan; Ashtari, Amir H.; J. Zahra, Sophia
2016-01-01
An invariant feature matching method is proposed as a spatially invariant feature matching approach. Deformation effects, such as affine and homography, change the local information within the image and can result in ambiguous local information pertaining to image points. New method based on dissimilarity values, which measures the dissimilarity of the features through the path based on Eigenvector properties, is proposed. Evidence shows that existing matching techniques using similarity metrics—such as normalized cross-correlation, squared sum of intensity differences and correlation coefficient—are insufficient for achieving adequate results under different image deformations. Thus, new descriptor’s similarity metrics based on normalized Eigenvector correlation and signal directional differences, which are robust under local variation of the image information, are proposed to establish an efficient feature matching technique. The method proposed in this study measures the dissimilarity in the signal frequency along the path between two features. Moreover, these dissimilarity values are accumulated in a 2D dissimilarity space, allowing accurate corresponding features to be extracted based on the cumulative space using a voting strategy. This method can be used in image registration applications, as it overcomes the limitations of the existing approaches. The output results demonstrate that the proposed technique outperforms the other methods when evaluated using a standard dataset, in terms of precision-recall and corner correspondence. PMID:26985996
A blur-invariant local feature for motion blurred image matching
NASA Astrophysics Data System (ADS)
Tong, Qiang; Aoki, Terumasa
2017-07-01
Image matching between a blurred (caused by camera motion, out of focus, etc.) image and a non-blurred image is a critical task for many image/video applications. However, most of the existing local feature schemes fail to achieve this work. This paper presents a blur-invariant descriptor and a novel local feature scheme including the descriptor and the interest point detector based on moment symmetry - the authors' previous work. The descriptor is based on a new concept - center peak moment-like element (CPME) which is robust to blur and boundary effect. Then by constructing CPMEs, the descriptor is also distinctive and suitable for image matching. Experimental results show our scheme outperforms state of the art methods for blurred image matching
Underwater image enhancement through depth estimation based on random forest
NASA Astrophysics Data System (ADS)
Tai, Shen-Chuan; Tsai, Ting-Chou; Huang, Jyun-Han
2017-11-01
Light absorption and scattering in underwater environments can result in low-contrast images with a distinct color cast. This paper proposes a systematic framework for the enhancement of underwater images. Light transmission is estimated using the random forest algorithm. RGB values, luminance, color difference, blurriness, and the dark channel are treated as features in training and estimation. Transmission is calculated using an ensemble machine learning algorithm to deal with a variety of conditions encountered in underwater environments. A color compensation and contrast enhancement algorithm based on depth information was also developed with the aim of improving the visual quality of underwater images. Experimental results demonstrate that the proposed scheme outperforms existing methods with regard to subjective visual quality as well as objective measurements.
Vibration control in smart coupled beams subjected to pulse excitations
NASA Astrophysics Data System (ADS)
Pisarski, Dominik; Bajer, Czesław I.; Dyniewicz, Bartłomiej; Bajkowski, Jacek M.
2016-10-01
In this paper, a control method to stabilize the vibration of adjacent structures is presented. The control is realized by changes of the stiffness parameters of the structure's couplers. A pulse excitation applied to the coupled adjacent beams is imposed as the kinematic excitation. For such a representation, the designed control law provides the best rate of energy dissipation. By means of a stability analysis, the performance in different structural settings is studied. The efficiency of the proposed strategy is examined via numerical simulations. In terms of the assumed energy metric, the controlled structure outperforms its passively damped equivalent by over 50 percent. The functionality of the proposed control strategy should attract the attention of practising engineers who seek solutions to upgrade existing damping systems.
Chen Peng; Ao Li
2017-01-01
The emergence of multi-dimensional data offers opportunities for more comprehensive analysis of the molecular characteristics of human diseases and therefore improving diagnosis, treatment, and prevention. In this study, we proposed a heterogeneous network based method by integrating multi-dimensional data (HNMD) to identify GBM-related genes. The novelty of the method lies in that the multi-dimensional data of GBM from TCGA dataset that provide comprehensive information of genes, are combined with protein-protein interactions to construct a weighted heterogeneous network, which reflects both the general and disease-specific relationships between genes. In addition, a propagation algorithm with resistance is introduced to precisely score and rank GBM-related genes. The results of comprehensive performance evaluation show that the proposed method significantly outperforms the network based methods with single-dimensional data and other existing approaches. Subsequent analysis of the top ranked genes suggests they may be functionally implicated in GBM, which further corroborates the superiority of the proposed method. The source code and the results of HNMD can be downloaded from the following URL: http://bioinformatics.ustc.edu.cn/hnmd/ .
Lazy collaborative filtering for data sets with missing values.
Ren, Yongli; Li, Gang; Zhang, Jun; Zhou, Wanlei
2013-12-01
As one of the biggest challenges in research on recommender systems, the data sparsity issue is mainly caused by the fact that users tend to rate a small proportion of items from the huge number of available items. This issue becomes even more problematic for the neighborhood-based collaborative filtering (CF) methods, as there are even lower numbers of ratings available in the neighborhood of the query item. In this paper, we aim to address the data sparsity issue in the context of neighborhood-based CF. For a given query (user, item), a set of key ratings is first identified by taking the historical information of both the user and the item into account. Then, an auto-adaptive imputation (AutAI) method is proposed to impute the missing values in the set of key ratings. We present a theoretical analysis to show that the proposed imputation method effectively improves the performance of the conventional neighborhood-based CF methods. The experimental results show that our new method of CF with AutAI outperforms six existing recommendation methods in terms of accuracy.
MLACP: machine-learning-based prediction of anticancer peptides
Manavalan, Balachandran; Basith, Shaherin; Shin, Tae Hwan; Choi, Sun; Kim, Myeong Ok; Lee, Gwang
2017-01-01
Cancer is the second leading cause of death globally, and use of therapeutic peptides to target and kill cancer cells has received considerable attention in recent years. Identification of anticancer peptides (ACPs) through wet-lab experimentation is expensive and often time consuming; therefore, development of an efficient computational method is essential to identify potential ACP candidates prior to in vitro experimentation. In this study, we developed support vector machine- and random forest-based machine-learning methods for the prediction of ACPs using the features calculated from the amino acid sequence, including amino acid composition, dipeptide composition, atomic composition, and physicochemical properties. We trained our methods using the Tyagi-B dataset and determined the machine parameters by 10-fold cross-validation. Furthermore, we evaluated the performance of our methods on two benchmarking datasets, with our results showing that the random forest-based method outperformed the existing methods with an average accuracy and Matthews correlation coefficient value of 88.7% and 0.78, respectively. To assist the scientific community, we also developed a publicly accessible web server at www.thegleelab.org/MLACP.html. PMID:29100375
Jonnagaddala, Jitendra; Jue, Toni Rose; Chang, Nai-Wen; Dai, Hong-Jie
2016-01-01
The rapidly increasing biomedical literature calls for the need of an automatic approach in the recognition and normalization of disease mentions in order to increase the precision and effectivity of disease based information retrieval. A variety of methods have been proposed to deal with the problem of disease named entity recognition and normalization. Among all the proposed methods, conditional random fields (CRFs) and dictionary lookup method are widely used for named entity recognition and normalization respectively. We herein developed a CRF-based model to allow automated recognition of disease mentions, and studied the effect of various techniques in improving the normalization results based on the dictionary lookup approach. The dataset from the BioCreative V CDR track was used to report the performance of the developed normalization methods and compare with other existing dictionary lookup based normalization methods. The best configuration achieved an F-measure of 0.77 for the disease normalization, which outperformed the best dictionary lookup based baseline method studied in this work by an F-measure of 0.13. Database URL: https://github.com/TCRNBioinformatics/DiseaseExtract PMID:27504009
The Achievement Crisis Is Real: A Review of "The Manufactured Crisis."
ERIC Educational Resources Information Center
Stedman, Lawrence C.
1996-01-01
In "The Manufactured Crisis," D. Berliner and B. Biddle argue that there has been no decline in achievement test scores, that today's students outperform their parents and do well in international examinations, and that the supposed crisis in American education does not exist. This review refutes all these claims. (SLD)
Stauffer, Reto; Mayr, Georg J; Messner, Jakob W; Umlauf, Nikolaus; Zeileis, Achim
2017-06-15
Flexible spatio-temporal models are widely used to create reliable and accurate estimates for precipitation climatologies. Most models are based on square root transformed monthly or annual means, where a normal distribution seems to be appropriate. This assumption becomes invalid on a daily time scale as the observations involve large fractions of zero observations and are limited to non-negative values. We develop a novel spatio-temporal model to estimate the full climatological distribution of precipitation on a daily time scale over complex terrain using a left-censored normal distribution. The results demonstrate that the new method is able to account for the non-normal distribution and the large fraction of zero observations. The new climatology provides the full climatological distribution on a very high spatial and temporal resolution, and is competitive with, or even outperforms existing methods, even for arbitrary locations.
An Integrated Method Based on PSO and EDA for the Max-Cut Problem.
Lin, Geng; Guan, Jian
2016-01-01
The max-cut problem is NP-hard combinatorial optimization problem with many real world applications. In this paper, we propose an integrated method based on particle swarm optimization and estimation of distribution algorithm (PSO-EDA) for solving the max-cut problem. The integrated algorithm overcomes the shortcomings of particle swarm optimization and estimation of distribution algorithm. To enhance the performance of the PSO-EDA, a fast local search procedure is applied. In addition, a path relinking procedure is developed to intensify the search. To evaluate the performance of PSO-EDA, extensive experiments were carried out on two sets of benchmark instances with 800 to 20,000 vertices from the literature. Computational results and comparisons show that PSO-EDA significantly outperforms the existing PSO-based and EDA-based algorithms for the max-cut problem. Compared with other best performing algorithms, PSO-EDA is able to find very competitive results in terms of solution quality.
Kang, Dongwan D.; Froula, Jeff; Egan, Rob; ...
2015-01-01
Grouping large genomic fragments assembled from shotgun metagenomic sequences to deconvolute complex microbial communities, or metagenome binning, enables the study of individual organisms and their interactions. Because of the complex nature of these communities, existing metagenome binning methods often miss a large number of microbial species. In addition, most of the tools are not scalable to large datasets. Here we introduce automated software called MetaBAT that integrates empirical probabilistic distances of genome abundance and tetranucleotide frequency for accurate metagenome binning. MetaBAT outperforms alternative methods in accuracy and computational efficiency on both synthetic and real metagenome datasets. Lastly, it automatically formsmore » hundreds of high quality genome bins on a very large assembly consisting millions of contigs in a matter of hours on a single node. MetaBAT is open source software and available at https://bitbucket.org/berkeleylab/metabat.« less
Refining Automatically Extracted Knowledge Bases Using Crowdsourcing.
Li, Chunhua; Zhao, Pengpeng; Sheng, Victor S; Xian, Xuefeng; Wu, Jian; Cui, Zhiming
2017-01-01
Machine-constructed knowledge bases often contain noisy and inaccurate facts. There exists significant work in developing automated algorithms for knowledge base refinement. Automated approaches improve the quality of knowledge bases but are far from perfect. In this paper, we leverage crowdsourcing to improve the quality of automatically extracted knowledge bases. As human labelling is costly, an important research challenge is how we can use limited human resources to maximize the quality improvement for a knowledge base. To address this problem, we first introduce a concept of semantic constraints that can be used to detect potential errors and do inference among candidate facts. Then, based on semantic constraints, we propose rank-based and graph-based algorithms for crowdsourced knowledge refining, which judiciously select the most beneficial candidate facts to conduct crowdsourcing and prune unnecessary questions. Our experiments show that our method improves the quality of knowledge bases significantly and outperforms state-of-the-art automatic methods under a reasonable crowdsourcing cost.
Research on B Cell Algorithm for Learning to Rank Method Based on Parallel Strategy.
Tian, Yuling; Zhang, Hongxian
2016-01-01
For the purposes of information retrieval, users must find highly relevant documents from within a system (and often a quite large one comprised of many individual documents) based on input query. Ranking the documents according to their relevance within the system to meet user needs is a challenging endeavor, and a hot research topic-there already exist several rank-learning methods based on machine learning techniques which can generate ranking functions automatically. This paper proposes a parallel B cell algorithm, RankBCA, for rank learning which utilizes a clonal selection mechanism based on biological immunity. The novel algorithm is compared with traditional rank-learning algorithms through experimentation and shown to outperform the others in respect to accuracy, learning time, and convergence rate; taken together, the experimental results show that the proposed algorithm indeed effectively and rapidly identifies optimal ranking functions.
Research on B Cell Algorithm for Learning to Rank Method Based on Parallel Strategy
Tian, Yuling; Zhang, Hongxian
2016-01-01
For the purposes of information retrieval, users must find highly relevant documents from within a system (and often a quite large one comprised of many individual documents) based on input query. Ranking the documents according to their relevance within the system to meet user needs is a challenging endeavor, and a hot research topic–there already exist several rank-learning methods based on machine learning techniques which can generate ranking functions automatically. This paper proposes a parallel B cell algorithm, RankBCA, for rank learning which utilizes a clonal selection mechanism based on biological immunity. The novel algorithm is compared with traditional rank-learning algorithms through experimentation and shown to outperform the others in respect to accuracy, learning time, and convergence rate; taken together, the experimental results show that the proposed algorithm indeed effectively and rapidly identifies optimal ranking functions. PMID:27487242
Historical data learning based dynamic LSP routing for overlay IP/MPLS over WDM networks
NASA Astrophysics Data System (ADS)
Yu, Xiaojun; Xiao, Gaoxi; Cheng, Tee Hiang
2013-08-01
Overlay IP/MPLS over WDM network is a promising network architecture starting to gain wide deployments recently. A desirable feature of such a network is to achieve efficient routing with limited information exchanges between the IP/MPLS and the WDM layers. This paper studies dynamic label switched path (LSP) routing in the overlay IP/MPLS over WDM networks. To enhance network performance while maintaining its simplicity, we propose to learn from the historical data of lightpath setup costs maintained by the IP-layer integrated service provider (ISP) when making routing decisions. Using a novel historical data learning scheme for logical link cost estimation, we develop a new dynamic LSP routing method named Existing Link First (ELF) algorithm. Simulation results show that the proposed algorithm significantly outperforms the existing ones under different traffic loads, with either limited or unlimited numbers of optical ports. Effects of the number of candidate routes, add/drop ratio and the amount of historical data are also evaluated.
Hemmelmayr, Vera C.; Cordeau, Jean-François; Crainic, Teodor Gabriel
2012-01-01
In this paper, we propose an adaptive large neighborhood search heuristic for the Two-Echelon Vehicle Routing Problem (2E-VRP) and the Location Routing Problem (LRP). The 2E-VRP arises in two-level transportation systems such as those encountered in the context of city logistics. In such systems, freight arrives at a major terminal and is shipped through intermediate satellite facilities to the final customers. The LRP can be seen as a special case of the 2E-VRP in which vehicle routing is performed only at the second level. We have developed new neighborhood search operators by exploiting the structure of the two problem classes considered and have also adapted existing operators from the literature. The operators are used in a hierarchical scheme reflecting the multi-level nature of the problem. Computational experiments conducted on several sets of instances from the literature show that our algorithm outperforms existing solution methods for the 2E-VRP and achieves excellent results on the LRP. PMID:23483764
Saliency Detection for Stereoscopic 3D Images in the Quaternion Frequency Domain
NASA Astrophysics Data System (ADS)
Cai, Xingyu; Zhou, Wujie; Cen, Gang; Qiu, Weiwei
2018-06-01
Recent studies have shown that a remarkable distinction exists between human binocular and monocular viewing behaviors. Compared with two-dimensional (2D) saliency detection models, stereoscopic three-dimensional (S3D) image saliency detection is a more challenging task. In this paper, we propose a saliency detection model for S3D images. The final saliency map of this model is constructed from the local quaternion Fourier transform (QFT) sparse feature and global QFT log-Gabor feature. More specifically, the local QFT feature measures the saliency map of an S3D image by analyzing the location of a similar patch. The similar patch is chosen using a sparse representation method. The global saliency map is generated by applying the wake edge-enhanced gradient QFT map through a band-pass filter. The results of experiments on two public datasets show that the proposed model outperforms existing computational saliency models for estimating S3D image saliency.
Hemmelmayr, Vera C; Cordeau, Jean-François; Crainic, Teodor Gabriel
2012-12-01
In this paper, we propose an adaptive large neighborhood search heuristic for the Two-Echelon Vehicle Routing Problem (2E-VRP) and the Location Routing Problem (LRP). The 2E-VRP arises in two-level transportation systems such as those encountered in the context of city logistics. In such systems, freight arrives at a major terminal and is shipped through intermediate satellite facilities to the final customers. The LRP can be seen as a special case of the 2E-VRP in which vehicle routing is performed only at the second level. We have developed new neighborhood search operators by exploiting the structure of the two problem classes considered and have also adapted existing operators from the literature. The operators are used in a hierarchical scheme reflecting the multi-level nature of the problem. Computational experiments conducted on several sets of instances from the literature show that our algorithm outperforms existing solution methods for the 2E-VRP and achieves excellent results on the LRP.
Revealing the Hidden Language of Complex Networks
Yaveroğlu, Ömer Nebil; Malod-Dognin, Noël; Davis, Darren; Levnajic, Zoran; Janjic, Vuk; Karapandza, Rasa; Stojmirovic, Aleksandar; Pržulj, Nataša
2014-01-01
Sophisticated methods for analysing complex networks promise to be of great benefit to almost all scientific disciplines, yet they elude us. In this work, we make fundamental methodological advances to rectify this. We discover that the interaction between a small number of roles, played by nodes in a network, can characterize a network's structure and also provide a clear real-world interpretation. Given this insight, we develop a framework for analysing and comparing networks, which outperforms all existing ones. We demonstrate its strength by uncovering novel relationships between seemingly unrelated networks, such as Facebook, metabolic, and protein structure networks. We also use it to track the dynamics of the world trade network, showing that a country's role of a broker between non-trading countries indicates economic prosperity, whereas peripheral roles are associated with poverty. This result, though intuitive, has escaped all existing frameworks. Finally, our approach translates network topology into everyday language, bringing network analysis closer to domain scientists. PMID:24686408
Fresh-slice multicolour X-ray free-electron lasers
Lutman, Alberto A.; Maxwell, Timothy J.; MacArthur, James P.; ...
2016-10-24
X-ray free-electron lasers (XFELs) provide femtosecond X-ray pulses with a narrow energy bandwidth and unprecedented brightness. Ultrafast physical and chemical dynamics, initiated with a site-specific X-ray pulse, can be explored using XFELs with a second ultrashort X-ray probe pulse. However, existing double-pulse schemes are complicated, difficult to customize or provide only low-intensity pulses. Here we present the novel fresh-slice technique for multicolour pulse production, wherein different temporal slices of an electron bunch lase to saturation in separate undulator sections. This method combines electron bunch tailoring from a passive wakefield device with trajectory control to provide multicolour pulses. The fresh-slice schememore » outperforms existing techniques at soft X-ray wavelengths. It produces femtosecond pulses with a power of tens of gigawatts and flexible colour separation. The pulse delay can be varied from temporal overlap to almost one picosecond. As a result, we also demonstrate the first three-colour XFEL and variably polarized two-colour pulses.« less
Jeong, Jeong-Won; Shin, Dae C; Do, Synho; Marmarelis, Vasilis Z
2006-08-01
This paper presents a novel segmentation methodology for automated classification and differentiation of soft tissues using multiband data obtained with the newly developed system of high-resolution ultrasonic transmission tomography (HUTT) for imaging biological organs. This methodology extends and combines two existing approaches: the L-level set active contour (AC) segmentation approach and the agglomerative hierarchical kappa-means approach for unsupervised clustering (UC). To prevent the trapping of the current iterative minimization AC algorithm in a local minimum, we introduce a multiresolution approach that applies the level set functions at successively increasing resolutions of the image data. The resulting AC clusters are subsequently rearranged by the UC algorithm that seeks the optimal set of clusters yielding the minimum within-cluster distances in the feature space. The presented results from Monte Carlo simulations and experimental animal-tissue data demonstrate that the proposed methodology outperforms other existing methods without depending on heuristic parameters and provides a reliable means for soft tissue differentiation in HUTT images.
Franklin, Brandon M.; Xiang, Lin; Collett, Jason A.; Rhoads, Megan K.
2015-01-01
Student populations are diverse such that different types of learners struggle with traditional didactic instruction. Problem-based learning has existed for several decades, but there is still controversy regarding the optimal mode of instruction to ensure success at all levels of students' past achievement. The present study addressed this problem by dividing students into the following three instructional groups for an upper-level course in animal physiology: traditional lecture-style instruction (LI), guided problem-based instruction (GPBI), and open problem-based instruction (OPBI). Student performance was measured by three summative assessments consisting of 50% multiple-choice questions and 50% short-answer questions as well as a final overall course assessment. The present study also examined how students of different academic achievement histories performed under each instructional method. When student achievement levels were not considered, the effects of instructional methods on student outcomes were modest; OPBI students performed moderately better on short-answer exam questions than both LI and GPBI groups. High-achieving students showed no difference in performance for any of the instructional methods on any metric examined. In students with low-achieving academic histories, OPBI students largely outperformed LI students on all metrics (short-answer exam: P < 0.05, d = 1.865; multiple-choice question exam: P < 0.05, d = 1.166; and final score: P < 0.05, d = 1.265). They also outperformed GPBI students on short-answer exam questions (P < 0.05, d = 1.109) but not multiple-choice exam questions (P = 0.071, d = 0.716) or final course outcome (P = 0.328, d = 0.513). These findings strongly suggest that typically low-achieving students perform at a higher level under OPBI as long as the proper support systems (formative assessment and scaffolding) are provided to encourage student success. PMID:26628656
Gene set analysis using variance component tests.
Huang, Yen-Tsung; Lin, Xihong
2013-06-28
Gene set analyses have become increasingly important in genomic research, as many complex diseases are contributed jointly by alterations of numerous genes. Genes often coordinate together as a functional repertoire, e.g., a biological pathway/network and are highly correlated. However, most of the existing gene set analysis methods do not fully account for the correlation among the genes. Here we propose to tackle this important feature of a gene set to improve statistical power in gene set analyses. We propose to model the effects of an independent variable, e.g., exposure/biological status (yes/no), on multiple gene expression values in a gene set using a multivariate linear regression model, where the correlation among the genes is explicitly modeled using a working covariance matrix. We develop TEGS (Test for the Effect of a Gene Set), a variance component test for the gene set effects by assuming a common distribution for regression coefficients in multivariate linear regression models, and calculate the p-values using permutation and a scaled chi-square approximation. We show using simulations that type I error is protected under different choices of working covariance matrices and power is improved as the working covariance approaches the true covariance. The global test is a special case of TEGS when correlation among genes in a gene set is ignored. Using both simulation data and a published diabetes dataset, we show that our test outperforms the commonly used approaches, the global test and gene set enrichment analysis (GSEA). We develop a gene set analyses method (TEGS) under the multivariate regression framework, which directly models the interdependence of the expression values in a gene set using a working covariance. TEGS outperforms two widely used methods, GSEA and global test in both simulation and a diabetes microarray data.
Inferring epidemiological parameters from phylogenies using regression-ABC: A comparative study
Gascuel, Olivier
2017-01-01
Inferring epidemiological parameters such as the R0 from time-scaled phylogenies is a timely challenge. Most current approaches rely on likelihood functions, which raise specific issues that range from computing these functions to finding their maxima numerically. Here, we present a new regression-based Approximate Bayesian Computation (ABC) approach, which we base on a large variety of summary statistics intended to capture the information contained in the phylogeny and its corresponding lineage-through-time plot. The regression step involves the Least Absolute Shrinkage and Selection Operator (LASSO) method, which is a robust machine learning technique. It allows us to readily deal with the large number of summary statistics, while avoiding resorting to Markov Chain Monte Carlo (MCMC) techniques. To compare our approach to existing ones, we simulated target trees under a variety of epidemiological models and settings, and inferred parameters of interest using the same priors. We found that, for large phylogenies, the accuracy of our regression-ABC is comparable to that of likelihood-based approaches involving birth-death processes implemented in BEAST2. Our approach even outperformed these when inferring the host population size with a Susceptible-Infected-Removed epidemiological model. It also clearly outperformed a recent kernel-ABC approach when assuming a Susceptible-Infected epidemiological model with two host types. Lastly, by re-analyzing data from the early stages of the recent Ebola epidemic in Sierra Leone, we showed that regression-ABC provides more realistic estimates for the duration parameters (latency and infectiousness) than the likelihood-based method. Overall, ABC based on a large variety of summary statistics and a regression method able to perform variable selection and avoid overfitting is a promising approach to analyze large phylogenies. PMID:28263987
A Graph-Embedding Approach to Hierarchical Visual Word Mergence.
Wang, Lei; Liu, Lingqiao; Zhou, Luping
2017-02-01
Appropriately merging visual words are an effective dimension reduction method for the bag-of-visual-words model in image classification. The approach of hierarchically merging visual words has been extensively employed, because it gives a fully determined merging hierarchy. Existing supervised hierarchical merging methods take different approaches and realize the merging process with various formulations. In this paper, we propose a unified hierarchical merging approach built upon the graph-embedding framework. Our approach is able to merge visual words for any scenario, where a preferred structure and an undesired structure are defined, and, therefore, can effectively attend to all kinds of requirements for the word-merging process. In terms of computational efficiency, we show that our algorithm can seamlessly integrate a fast search strategy developed in our previous work and, thus, well maintain the state-of-the-art merging speed. To the best of our survey, the proposed approach is the first one that addresses the hierarchical visual word mergence in such a flexible and unified manner. As demonstrated, it can maintain excellent image classification performance even after a significant dimension reduction, and outperform all the existing comparable visual word-merging methods. In a broad sense, our work provides an open platform for applying, evaluating, and developing new criteria for hierarchical word-merging tasks.
Devasenapathy, Deepa; Kannan, Kathiravan
2015-01-01
The traffic in the road network is progressively increasing at a greater extent. Good knowledge of network traffic can minimize congestions using information pertaining to road network obtained with the aid of communal callers, pavement detectors, and so on. Using these methods, low featured information is generated with respect to the user in the road network. Although the existing schemes obtain urban traffic information, they fail to calculate the energy drain rate of nodes and to locate equilibrium between the overhead and quality of the routing protocol that renders a great challenge. Thus, an energy-efficient cluster-based vehicle detection in road network using the intention numeration method (CVDRN-IN) is developed. Initially, sensor nodes that detect a vehicle are grouped into separate clusters. Further, we approximate the strength of the node drain rate for a cluster using polynomial regression function. In addition, the total node energy is estimated by taking the integral over the area. Finally, enhanced data aggregation is performed to reduce the amount of data transmission using digital signature tree. The experimental performance is evaluated with Dodgers loop sensor data set from UCI repository and the performance evaluation outperforms existing work on energy consumption, clustering efficiency, and node drain rate. PMID:25793221
Devasenapathy, Deepa; Kannan, Kathiravan
2015-01-01
The traffic in the road network is progressively increasing at a greater extent. Good knowledge of network traffic can minimize congestions using information pertaining to road network obtained with the aid of communal callers, pavement detectors, and so on. Using these methods, low featured information is generated with respect to the user in the road network. Although the existing schemes obtain urban traffic information, they fail to calculate the energy drain rate of nodes and to locate equilibrium between the overhead and quality of the routing protocol that renders a great challenge. Thus, an energy-efficient cluster-based vehicle detection in road network using the intention numeration method (CVDRN-IN) is developed. Initially, sensor nodes that detect a vehicle are grouped into separate clusters. Further, we approximate the strength of the node drain rate for a cluster using polynomial regression function. In addition, the total node energy is estimated by taking the integral over the area. Finally, enhanced data aggregation is performed to reduce the amount of data transmission using digital signature tree. The experimental performance is evaluated with Dodgers loop sensor data set from UCI repository and the performance evaluation outperforms existing work on energy consumption, clustering efficiency, and node drain rate.
Jian, Yulin; Huang, Daoyu; Yan, Jia; Lu, Kun; Huang, Ying; Wen, Tailai; Zeng, Tanyue; Zhong, Shijie; Xie, Qilong
2017-01-01
A novel classification model, named the quantum-behaved particle swarm optimization (QPSO)-based weighted multiple kernel extreme learning machine (QWMK-ELM), is proposed in this paper. Experimental validation is carried out with two different electronic nose (e-nose) datasets. Being different from the existing multiple kernel extreme learning machine (MK-ELM) algorithms, the combination coefficients of base kernels are regarded as external parameters of single-hidden layer feedforward neural networks (SLFNs). The combination coefficients of base kernels, the model parameters of each base kernel, and the regularization parameter are optimized by QPSO simultaneously before implementing the kernel extreme learning machine (KELM) with the composite kernel function. Four types of common single kernel functions (Gaussian kernel, polynomial kernel, sigmoid kernel, and wavelet kernel) are utilized to constitute different composite kernel functions. Moreover, the method is also compared with other existing classification methods: extreme learning machine (ELM), kernel extreme learning machine (KELM), k-nearest neighbors (KNN), support vector machine (SVM), multi-layer perceptron (MLP), radical basis function neural network (RBFNN), and probabilistic neural network (PNN). The results have demonstrated that the proposed QWMK-ELM outperforms the aforementioned methods, not only in precision, but also in efficiency for gas classification. PMID:28629202
Application of Wavelet Transform for PDZ Domain Classification
Daqrouq, Khaled; Alhmouz, Rami; Balamesh, Ahmed; Memic, Adnan
2015-01-01
PDZ domains have been identified as part of an array of signaling proteins that are often unrelated, except for the well-conserved structural PDZ domain they contain. These domains have been linked to many disease processes including common Avian influenza, as well as very rare conditions such as Fraser and Usher syndromes. Historically, based on the interactions and the nature of bonds they form, PDZ domains have most often been classified into one of three classes (class I, class II and others - class III), that is directly dependent on their binding partner. In this study, we report on three unique feature extraction approaches based on the bigram and trigram occurrence and existence rearrangements within the domain's primary amino acid sequences in assisting PDZ domain classification. Wavelet packet transform (WPT) and Shannon entropy denoted by wavelet entropy (WE) feature extraction methods were proposed. Using 115 unique human and mouse PDZ domains, the existence rearrangement approach yielded a high recognition rate (78.34%), which outperformed our occurrence rearrangements based method. The recognition rate was (81.41%) with validation technique. The method reported for PDZ domain classification from primary sequences proved to be an encouraging approach for obtaining consistent classification results. We anticipate that by increasing the database size, we can further improve feature extraction and correct classification. PMID:25860375
MAFsnp: A Multi-Sample Accurate and Flexible SNP Caller Using Next-Generation Sequencing Data
Hu, Jiyuan; Li, Tengfei; Xiu, Zidi; Zhang, Hong
2015-01-01
Most existing statistical methods developed for calling single nucleotide polymorphisms (SNPs) using next-generation sequencing (NGS) data are based on Bayesian frameworks, and there does not exist any SNP caller that produces p-values for calling SNPs in a frequentist framework. To fill in this gap, we develop a new method MAFsnp, a Multiple-sample based Accurate and Flexible algorithm for calling SNPs with NGS data. MAFsnp is based on an estimated likelihood ratio test (eLRT) statistic. In practical situation, the involved parameter is very close to the boundary of the parametric space, so the standard large sample property is not suitable to evaluate the finite-sample distribution of the eLRT statistic. Observing that the distribution of the test statistic is a mixture of zero and a continuous part, we propose to model the test statistic with a novel two-parameter mixture distribution. Once the parameters in the mixture distribution are estimated, p-values can be easily calculated for detecting SNPs, and the multiple-testing corrected p-values can be used to control false discovery rate (FDR) at any pre-specified level. With simulated data, MAFsnp is shown to have much better control of FDR than the existing SNP callers. Through the application to two real datasets, MAFsnp is also shown to outperform the existing SNP callers in terms of calling accuracy. An R package “MAFsnp” implementing the new SNP caller is freely available at http://homepage.fudan.edu.cn/zhangh/softwares/. PMID:26309201
Spatio-temporal colour correction of strongly degraded movies
NASA Astrophysics Data System (ADS)
Islam, A. B. M. Tariqul; Farup, Ivar
2011-01-01
The archives of motion pictures represent an important part of precious cultural heritage. Unfortunately, these cinematography collections are vulnerable to different distortions such as colour fading which is beyond the capability of photochemical restoration process. Spatial colour algorithms-Retinex and ACE provide helpful tool in restoring strongly degraded colour films but, there are some challenges associated with these algorithms. We present an automatic colour correction technique for digital colour restoration of strongly degraded movie material. The method is based upon the existing STRESS algorithm. In order to cope with the problem of highly correlated colour channels, we implemented a preprocessing step in which saturation enhancement is performed in a PCA space. Spatial colour algorithms tend to emphasize all details in the images, including dust and scratches. Surprisingly, we found that the presence of these defects does not affect the behaviour of the colour correction algorithm. Although the STRESS algorithm is already in itself more efficient than traditional spatial colour algorithms, it is still computationally expensive. To speed it up further, we went beyond the spatial domain of the frames and extended the algorithm to the temporal domain. This way, we were able to achieve an 80 percent reduction of the computational time compared to processing every single frame individually. We performed two user experiments and found that the visual quality of the resulting frames was significantly better than with existing methods. Thus, our method outperforms the existing ones in terms of both visual quality and computational efficiency.
MISTICA: Minimum Spanning Tree-based Coarse Image Alignment for Microscopy Image Sequences
Ray, Nilanjan; McArdle, Sara; Ley, Klaus; Acton, Scott T.
2016-01-01
Registration of an in vivo microscopy image sequence is necessary in many significant studies, including studies of atherosclerosis in large arteries and the heart. Significant cardiac and respiratory motion of the living subject, occasional spells of focal plane changes, drift in the field of view, and long image sequences are the principal roadblocks. The first step in such a registration process is the removal of translational and rotational motion. Next, a deformable registration can be performed. The focus of our study here is to remove the translation and/or rigid body motion that we refer to here as coarse alignment. The existing techniques for coarse alignment are unable to accommodate long sequences often consisting of periods of poor quality images (as quantified by a suitable perceptual measure). Many existing methods require the user to select an anchor image to which other images are registered. We propose a novel method for coarse image sequence alignment based on minimum weighted spanning trees (MISTICA) that overcomes these difficulties. The principal idea behind MISTICA is to re-order the images in shorter sequences, to demote nonconforming or poor quality images in the registration process, and to mitigate the error propagation. The anchor image is selected automatically making MISTICA completely automated. MISTICA is computationally efficient. It has a single tuning parameter that determines graph width, which can also be eliminated by way of additional computation. MISTICA outperforms existing alignment methods when applied to microscopy image sequences of mouse arteries. PMID:26415193
MISTICA: Minimum Spanning Tree-Based Coarse Image Alignment for Microscopy Image Sequences.
Ray, Nilanjan; McArdle, Sara; Ley, Klaus; Acton, Scott T
2016-11-01
Registration of an in vivo microscopy image sequence is necessary in many significant studies, including studies of atherosclerosis in large arteries and the heart. Significant cardiac and respiratory motion of the living subject, occasional spells of focal plane changes, drift in the field of view, and long image sequences are the principal roadblocks. The first step in such a registration process is the removal of translational and rotational motion. Next, a deformable registration can be performed. The focus of our study here is to remove the translation and/or rigid body motion that we refer to here as coarse alignment. The existing techniques for coarse alignment are unable to accommodate long sequences often consisting of periods of poor quality images (as quantified by a suitable perceptual measure). Many existing methods require the user to select an anchor image to which other images are registered. We propose a novel method for coarse image sequence alignment based on minimum weighted spanning trees (MISTICA) that overcomes these difficulties. The principal idea behind MISTICA is to reorder the images in shorter sequences, to demote nonconforming or poor quality images in the registration process, and to mitigate the error propagation. The anchor image is selected automatically making MISTICA completely automated. MISTICA is computationally efficient. It has a single tuning parameter that determines graph width, which can also be eliminated by the way of additional computation. MISTICA outperforms existing alignment methods when applied to microscopy image sequences of mouse arteries.
Blind technique using blocking artifacts and entropy of histograms for image tampering detection
NASA Astrophysics Data System (ADS)
Manu, V. T.; Mehtre, B. M.
2017-06-01
The tremendous technological advancements in recent times has enabled people to create, edit and circulate images easily than ever before. As a result of this, ensuring the integrity and authenticity of the images has become challenging. Malicious editing of images to deceive the viewer is referred to as image tampering. A widely used image tampering technique is image splicing or compositing, in which regions from different images are copied and pasted. In this paper, we propose a tamper detection method utilizing the blocking and blur artifacts which are the footprints of splicing. The classification of images as tampered or not, is done based on the standard deviations of the entropy histograms and block discrete cosine transformations. We can detect the exact boundaries of the tampered area in the image, if the image is classified as tampered. Experimental results on publicly available image tampering datasets show that the proposed method outperforms the existing methods in terms of accuracy.
Spatial Copula Model for Imputing Traffic Flow Data from Remote Microwave Sensors.
Ma, Xiaolei; Luan, Sen; Du, Bowen; Yu, Bin
2017-09-21
Issues of missing data have become increasingly serious with the rapid increase in usage of traffic sensors. Analyses of the Beijing ring expressway have showed that up to 50% of microwave sensors pose missing values. The imputation of missing traffic data must be urgently solved although a precise solution that cannot be easily achieved due to the significant number of missing portions. In this study, copula-based models are proposed for the spatial interpolation of traffic flow from remote traffic microwave sensors. Most existing interpolation methods only rely on covariance functions to depict spatial correlation and are unsuitable for coping with anomalies due to Gaussian consumption. Copula theory overcomes this issue and provides a connection between the correlation function and the marginal distribution function of traffic flow. To validate copula-based models, a comparison with three kriging methods is conducted. Results indicate that copula-based models outperform kriging methods, especially on roads with irregular traffic patterns. Copula-based models demonstrate significant potential to impute missing data in large-scale transportation networks.
Effective Multifocus Image Fusion Based on HVS and BP Neural Network
Yang, Yong
2014-01-01
The aim of multifocus image fusion is to fuse the images taken from the same scene with different focuses to obtain a resultant image with all objects in focus. In this paper, a novel multifocus image fusion method based on human visual system (HVS) and back propagation (BP) neural network is presented. Three features which reflect the clarity of a pixel are firstly extracted and used to train a BP neural network to determine which pixel is clearer. The clearer pixels are then used to construct the initial fused image. Thirdly, the focused regions are detected by measuring the similarity between the source images and the initial fused image followed by morphological opening and closing operations. Finally, the final fused image is obtained by a fusion rule for those focused regions. Experimental results show that the proposed method can provide better performance and outperform several existing popular fusion methods in terms of both objective and subjective evaluations. PMID:24683327
Cell dynamic morphology classification using deep convolutional neural networks.
Li, Heng; Pang, Fengqian; Shi, Yonggang; Liu, Zhiwen
2018-05-15
Cell morphology is often used as a proxy measurement of cell status to understand cell physiology. Hence, interpretation of cell dynamic morphology is a meaningful task in biomedical research. Inspired by the recent success of deep learning, we here explore the application of convolutional neural networks (CNNs) to cell dynamic morphology classification. An innovative strategy for the implementation of CNNs is introduced in this study. Mouse lymphocytes were collected to observe the dynamic morphology, and two datasets were thus set up to investigate the performances of CNNs. Considering the installation of deep learning, the classification problem was simplified from video data to image data, and was then solved by CNNs in a self-taught manner with the generated image data. CNNs were separately performed in three installation scenarios and compared with existing methods. Experimental results demonstrated the potential of CNNs in cell dynamic morphology classification, and validated the effectiveness of the proposed strategy. CNNs were successfully applied to the classification problem, and outperformed the existing methods in the classification accuracy. For the installation of CNNs, transfer learning was proved to be a promising scheme. © 2018 International Society for Advancement of Cytometry. © 2018 International Society for Advancement of Cytometry.
Link-Based Similarity Measures Using Reachability Vectors
Yoon, Seok-Ho; Kim, Ji-Soo; Ryu, Minsoo; Choi, Ho-Jin
2014-01-01
We present a novel approach for computing link-based similarities among objects accurately by utilizing the link information pertaining to the objects involved. We discuss the problems with previous link-based similarity measures and propose a novel approach for computing link based similarities that does not suffer from these problems. In the proposed approach each target object is represented by a vector. Each element of the vector corresponds to all the objects in the given data, and the value of each element denotes the weight for the corresponding object. As for this weight value, we propose to utilize the probability of reaching from the target object to the specific object, computed using the “Random Walk with Restart” strategy. Then, we define the similarity between two objects as the cosine similarity of the two vectors. In this paper, we provide examples to show that our approach does not suffer from the aforementioned problems. We also evaluate the performance of the proposed methods in comparison with existing link-based measures, qualitatively and quantitatively, with respect to two kinds of data sets, scientific papers and Web documents. Our experimental results indicate that the proposed methods significantly outperform the existing measures. PMID:24701188
Towards Open-World Person Re-Identification by One-Shot Group-Based Verification.
Zheng, Wei-Shi; Gong, Shaogang; Xiang, Tao
2016-03-01
Solving the problem of matching people across non-overlapping multi-camera views, known as person re-identification (re-id), has received increasing interests in computer vision. In a real-world application scenario, a watch-list (gallery set) of a handful of known target people are provided with very few (in many cases only a single) image(s) (shots) per target. Existing re-id methods are largely unsuitable to address this open-world re-id challenge because they are designed for (1) a closed-world scenario where the gallery and probe sets are assumed to contain exactly the same people, (2) person-wise identification whereby the model attempts to verify exhaustively against each individual in the gallery set, and (3) learning a matching model using multi-shots. In this paper, a novel transfer local relative distance comparison (t-LRDC) model is formulated to address the open-world person re-identification problem by one-shot group-based verification. The model is designed to mine and transfer useful information from a labelled open-world non-target dataset. Extensive experiments demonstrate that the proposed approach outperforms both non-transfer learning and existing transfer learning based re-id methods.
SCOUT: simultaneous time segmentation and community detection in dynamic networks
Hulovatyy, Yuriy; Milenković, Tijana
2016-01-01
Many evolving complex real-world systems can be modeled via dynamic networks. An important problem in dynamic network research is community detection, which finds groups of topologically related nodes. Typically, this problem is approached by assuming either that each time point has a distinct community organization or that all time points share a single community organization. The reality likely lies between these two extremes. To find the compromise, we consider community detection in the context of the problem of segment detection, which identifies contiguous time periods with consistent network structure. Consequently, we formulate a combined problem of segment community detection (SCD), which simultaneously partitions the network into contiguous time segments with consistent community organization and finds this community organization for each segment. To solve SCD, we introduce SCOUT, an optimization framework that explicitly considers both segmentation quality and partition quality. SCOUT addresses limitations of existing methods that can be adapted to solve SCD, which consider only one of segmentation quality or partition quality. In a thorough evaluation, SCOUT outperforms the existing methods in terms of both accuracy and computational complexity. We apply SCOUT to biological network data to study human aging. PMID:27881879
Link Prediction in Evolving Networks Based on Popularity of Nodes.
Wang, Tong; He, Xing-Sheng; Zhou, Ming-Yang; Fu, Zhong-Qian
2017-08-02
Link prediction aims to uncover the underlying relationship behind networks, which could be utilized to predict missing edges or identify the spurious edges. The key issue of link prediction is to estimate the likelihood of potential links in networks. Most classical static-structure based methods ignore the temporal aspects of networks, limited by the time-varying features, such approaches perform poorly in evolving networks. In this paper, we propose a hypothesis that the ability of each node to attract links depends not only on its structural importance, but also on its current popularity (activeness), since active nodes have much more probability to attract future links. Then a novel approach named popularity based structural perturbation method (PBSPM) and its fast algorithm are proposed to characterize the likelihood of an edge from both existing connectivity structure and current popularity of its two endpoints. Experiments on six evolving networks show that the proposed methods outperform state-of-the-art methods in accuracy and robustness. Besides, visual results and statistical analysis reveal that the proposed methods are inclined to predict future edges between active nodes, rather than edges between inactive nodes.
PET-CT image fusion using random forest and à-trous wavelet transform.
Seal, Ayan; Bhattacharjee, Debotosh; Nasipuri, Mita; Rodríguez-Esparragón, Dionisio; Menasalvas, Ernestina; Gonzalo-Martin, Consuelo
2018-03-01
New image fusion rules for multimodal medical images are proposed in this work. Image fusion rules are defined by random forest learning algorithm and a translation-invariant à-trous wavelet transform (AWT). The proposed method is threefold. First, source images are decomposed into approximation and detail coefficients using AWT. Second, random forest is used to choose pixels from the approximation and detail coefficients for forming the approximation and detail coefficients of the fused image. Lastly, inverse AWT is applied to reconstruct fused image. All experiments have been performed on 198 slices of both computed tomography and positron emission tomography images of a patient. A traditional fusion method based on Mallat wavelet transform has also been implemented on these slices. A new image fusion performance measure along with 4 existing measures has been presented, which helps to compare the performance of 2 pixel level fusion methods. The experimental results clearly indicate that the proposed method outperforms the traditional method in terms of visual and quantitative qualities and the new measure is meaningful. Copyright © 2017 John Wiley & Sons, Ltd.
Cohen, Michael X; Gulbinaite, Rasa
2017-02-15
Steady-state evoked potentials (SSEPs) are rhythmic brain responses to rhythmic sensory stimulation, and are often used to study perceptual and attentional processes. We present a data analysis method for maximizing the signal-to-noise ratio of the narrow-band steady-state response in the frequency and time-frequency domains. The method, termed rhythmic entrainment source separation (RESS), is based on denoising source separation approaches that take advantage of the simultaneous but differential projection of neural activity to multiple electrodes or sensors. Our approach is a combination and extension of existing multivariate source separation methods. We demonstrate that RESS performs well on both simulated and empirical data, and outperforms conventional SSEP analysis methods based on selecting electrodes with the strongest SSEP response, as well as several other linear spatial filters. We also discuss the potential confound of overfitting, whereby the filter captures noise in absence of a signal. Matlab scripts are available to replicate and extend our simulations and methods. We conclude with some practical advice for optimizing SSEP data analyses and interpreting the results. Copyright © 2016 Elsevier Inc. All rights reserved.
Spreading to localized targets in complex networks
NASA Astrophysics Data System (ADS)
Sun, Ye; Ma, Long; Zeng, An; Wang, Wen-Xu
2016-12-01
As an important type of dynamics on complex networks, spreading is widely used to model many real processes such as the epidemic contagion and information propagation. One of the most significant research questions in spreading is to rank the spreading ability of nodes in the network. To this end, substantial effort has been made and a variety of effective methods have been proposed. These methods usually define the spreading ability of a node as the number of finally infected nodes given that the spreading is initialized from the node. However, in many real cases such as advertising and news propagation, the spreading only aims to cover a specific group of nodes. Therefore, it is necessary to study the spreading ability of nodes towards localized targets in complex networks. In this paper, we propose a reversed local path algorithm for this problem. Simulation results show that our method outperforms the existing methods in identifying the influential nodes with respect to these localized targets. Moreover, the influential spreaders identified by our method can effectively avoid infecting the non-target nodes in the spreading process.
Ensemble-based prediction of RNA secondary structures.
Aghaeepour, Nima; Hoos, Holger H
2013-04-24
Accurate structure prediction methods play an important role for the understanding of RNA function. Energy-based, pseudoknot-free secondary structure prediction is one of the most widely used and versatile approaches, and improved methods for this task have received much attention over the past five years. Despite the impressive progress that as been achieved in this area, existing evaluations of the prediction accuracy achieved by various algorithms do not provide a comprehensive, statistically sound assessment. Furthermore, while there is increasing evidence that no prediction algorithm consistently outperforms all others, no work has been done to exploit the complementary strengths of multiple approaches. In this work, we present two contributions to the area of RNA secondary structure prediction. Firstly, we use state-of-the-art, resampling-based statistical methods together with a previously published and increasingly widely used dataset of high-quality RNA structures to conduct a comprehensive evaluation of existing RNA secondary structure prediction procedures. The results from this evaluation clarify the performance relationship between ten well-known existing energy-based pseudoknot-free RNA secondary structure prediction methods and clearly demonstrate the progress that has been achieved in recent years. Secondly, we introduce AveRNA, a generic and powerful method for combining a set of existing secondary structure prediction procedures into an ensemble-based method that achieves significantly higher prediction accuracies than obtained from any of its component procedures. Our new, ensemble-based method, AveRNA, improves the state of the art for energy-based, pseudoknot-free RNA secondary structure prediction by exploiting the complementary strengths of multiple existing prediction procedures, as demonstrated using a state-of-the-art statistical resampling approach. In addition, AveRNA allows an intuitive and effective control of the trade-off between false negative and false positive base pair predictions. Finally, AveRNA can make use of arbitrary sets of secondary structure prediction procedures and can therefore be used to leverage improvements in prediction accuracy offered by algorithms and energy models developed in the future. Our data, MATLAB software and a web-based version of AveRNA are publicly available at http://www.cs.ubc.ca/labs/beta/Software/AveRNA.
A Robust Gradient Based Method for Building Extraction from LiDAR and Photogrammetric Imagery.
Siddiqui, Fasahat Ullah; Teng, Shyh Wei; Awrangjeb, Mohammad; Lu, Guojun
2016-07-19
Existing automatic building extraction methods are not effective in extracting buildings which are small in size and have transparent roofs. The application of large area threshold prohibits detection of small buildings and the use of ground points in generating the building mask prevents detection of transparent buildings. In addition, the existing methods use numerous parameters to extract buildings in complex environments, e.g., hilly area and high vegetation. However, the empirical tuning of large number of parameters reduces the robustness of building extraction methods. This paper proposes a novel Gradient-based Building Extraction (GBE) method to address these limitations. The proposed method transforms the Light Detection And Ranging (LiDAR) height information into intensity image without interpolation of point heights and then analyses the gradient information in the image. Generally, building roof planes have a constant height change along the slope of a roof plane whereas trees have a random height change. With such an analysis, buildings of a greater range of sizes with a transparent or opaque roof can be extracted. In addition, a local colour matching approach is introduced as a post-processing stage to eliminate trees. This stage of our proposed method does not require any manual setting and all parameters are set automatically from the data. The other post processing stages including variance, point density and shadow elimination are also applied to verify the extracted buildings, where comparatively fewer empirically set parameters are used. The performance of the proposed GBE method is evaluated on two benchmark data sets by using the object and pixel based metrics (completeness, correctness and quality). Our experimental results show the effectiveness of the proposed method in eliminating trees, extracting buildings of all sizes, and extracting buildings with and without transparent roof. When compared with current state-of-the-art building extraction methods, the proposed method outperforms the existing methods in various evaluation metrics.
A Robust Gradient Based Method for Building Extraction from LiDAR and Photogrammetric Imagery
Siddiqui, Fasahat Ullah; Teng, Shyh Wei; Awrangjeb, Mohammad; Lu, Guojun
2016-01-01
Existing automatic building extraction methods are not effective in extracting buildings which are small in size and have transparent roofs. The application of large area threshold prohibits detection of small buildings and the use of ground points in generating the building mask prevents detection of transparent buildings. In addition, the existing methods use numerous parameters to extract buildings in complex environments, e.g., hilly area and high vegetation. However, the empirical tuning of large number of parameters reduces the robustness of building extraction methods. This paper proposes a novel Gradient-based Building Extraction (GBE) method to address these limitations. The proposed method transforms the Light Detection And Ranging (LiDAR) height information into intensity image without interpolation of point heights and then analyses the gradient information in the image. Generally, building roof planes have a constant height change along the slope of a roof plane whereas trees have a random height change. With such an analysis, buildings of a greater range of sizes with a transparent or opaque roof can be extracted. In addition, a local colour matching approach is introduced as a post-processing stage to eliminate trees. This stage of our proposed method does not require any manual setting and all parameters are set automatically from the data. The other post processing stages including variance, point density and shadow elimination are also applied to verify the extracted buildings, where comparatively fewer empirically set parameters are used. The performance of the proposed GBE method is evaluated on two benchmark data sets by using the object and pixel based metrics (completeness, correctness and quality). Our experimental results show the effectiveness of the proposed method in eliminating trees, extracting buildings of all sizes, and extracting buildings with and without transparent roof. When compared with current state-of-the-art building extraction methods, the proposed method outperforms the existing methods in various evaluation metrics. PMID:27447631
Learning Rotation-Invariant Local Binary Descriptor.
Duan, Yueqi; Lu, Jiwen; Feng, Jianjiang; Zhou, Jie
2017-08-01
In this paper, we propose a rotation-invariant local binary descriptor (RI-LBD) learning method for visual recognition. Compared with hand-crafted local binary descriptors, such as local binary pattern and its variants, which require strong prior knowledge, local binary feature learning methods are more efficient and data-adaptive. Unlike existing learning-based local binary descriptors, such as compact binary face descriptor and simultaneous local binary feature learning and encoding, which are susceptible to rotations, our RI-LBD first categorizes each local patch into a rotational binary pattern (RBP), and then jointly learns the orientation for each pattern and the projection matrix to obtain RI-LBDs. As all the rotation variants of a patch belong to the same RBP, they are rotated into the same orientation and projected into the same binary descriptor. Then, we construct a codebook by a clustering method on the learned binary codes, and obtain a histogram feature for each image as the final representation. In order to exploit higher order statistical information, we extend our RI-LBD to the triple rotation-invariant co-occurrence local binary descriptor (TRICo-LBD) learning method, which learns a triple co-occurrence binary code for each local patch. Extensive experimental results on four different visual recognition tasks, including image patch matching, texture classification, face recognition, and scene classification, show that our RI-LBD and TRICo-LBD outperform most existing local descriptors.
Staley, Dennis M.; Negri, Jacquelyn; Kean, Jason W.; Laber, Jayme L.; Tillery, Anne C.; Youberg, Ann M.
2017-01-01
Early warning of post-fire debris-flow occurrence during intense rainfall has traditionally relied upon a library of regionally specific empirical rainfall intensity–duration thresholds. Development of this library and the calculation of rainfall intensity-duration thresholds often require several years of monitoring local rainfall and hydrologic response to rainstorms, a time-consuming approach where results are often only applicable to the specific region where data were collected. Here, we present a new, fully predictive approach that utilizes rainfall, hydrologic response, and readily available geospatial data to predict rainfall intensity–duration thresholds for debris-flow generation in recently burned locations in the western United States. Unlike the traditional approach to defining regional thresholds from historical data, the proposed methodology permits the direct calculation of rainfall intensity–duration thresholds for areas where no such data exist. The thresholds calculated by this method are demonstrated to provide predictions that are of similar accuracy, and in some cases outperform, previously published regional intensity–duration thresholds. The method also provides improved predictions of debris-flow likelihood, which can be incorporated into existing approaches for post-fire debris-flow hazard assessment. Our results also provide guidance for the operational expansion of post-fire debris-flow early warning systems in areas where empirically defined regional rainfall intensity–duration thresholds do not currently exist.
NASA Astrophysics Data System (ADS)
Staley, Dennis M.; Negri, Jacquelyn A.; Kean, Jason W.; Laber, Jayme L.; Tillery, Anne C.; Youberg, Ann M.
2017-02-01
Early warning of post-fire debris-flow occurrence during intense rainfall has traditionally relied upon a library of regionally specific empirical rainfall intensity-duration thresholds. Development of this library and the calculation of rainfall intensity-duration thresholds often require several years of monitoring local rainfall and hydrologic response to rainstorms, a time-consuming approach where results are often only applicable to the specific region where data were collected. Here, we present a new, fully predictive approach that utilizes rainfall, hydrologic response, and readily available geospatial data to predict rainfall intensity-duration thresholds for debris-flow generation in recently burned locations in the western United States. Unlike the traditional approach to defining regional thresholds from historical data, the proposed methodology permits the direct calculation of rainfall intensity-duration thresholds for areas where no such data exist. The thresholds calculated by this method are demonstrated to provide predictions that are of similar accuracy, and in some cases outperform, previously published regional intensity-duration thresholds. The method also provides improved predictions of debris-flow likelihood, which can be incorporated into existing approaches for post-fire debris-flow hazard assessment. Our results also provide guidance for the operational expansion of post-fire debris-flow early warning systems in areas where empirically defined regional rainfall intensity-duration thresholds do not currently exist.
Data-driven confounder selection via Markov and Bayesian networks.
Häggström, Jenny
2018-06-01
To unbiasedly estimate a causal effect on an outcome unconfoundedness is often assumed. If there is sufficient knowledge on the underlying causal structure then existing confounder selection criteria can be used to select subsets of the observed pretreatment covariates, X, sufficient for unconfoundedness, if such subsets exist. Here, estimation of these target subsets is considered when the underlying causal structure is unknown. The proposed method is to model the causal structure by a probabilistic graphical model, for example, a Markov or Bayesian network, estimate this graph from observed data and select the target subsets given the estimated graph. The approach is evaluated by simulation both in a high-dimensional setting where unconfoundedness holds given X and in a setting where unconfoundedness only holds given subsets of X. Several common target subsets are investigated and the selected subsets are compared with respect to accuracy in estimating the average causal effect. The proposed method is implemented with existing software that can easily handle high-dimensional data, in terms of large samples and large number of covariates. The results from the simulation study show that, if unconfoundedness holds given X, this approach is very successful in selecting the target subsets, outperforming alternative approaches based on random forests and LASSO, and that the subset estimating the target subset containing all causes of outcome yields smallest MSE in the average causal effect estimation. © 2017, The International Biometric Society.
Gender Gaps and Gendered Action in a First-Year Physics Laboratory
ERIC Educational Resources Information Center
Day, James; Stang, Jared B.; Holmes, N. G.; Kumar, Dhaneesh; Bonn, D. A.
2016-01-01
It is established that male students outperform female students on almost all commonly used physics concept inventories. However, there is significant variation in the factors that contribute to the gap, as well as the direction in which they influence it. It is presently unknown if such a gender gap exists on the relatively new Concise Data…
Searching Information Sources in Networks
2017-06-14
SECURITY CLASSIFICATION OF: During the course of this project, we made significant progresses in multiple directions of the information detection...result on information source detection on non-tree networks; (2) The development of information source localization algorithms to detect multiple... information sources. The algorithms have provable performance guarantees and outperform existing algorithms in 1. REPORT DATE (DD-MM-YYYY) 4. TITLE AND
Towards a SIM-Less Existence: The Evolution of Smart Learning Networks
ERIC Educational Resources Information Center
Al-Khouri, Ali M.
2015-01-01
This article proposes that the widespread availability of wireless networks creates a case in which there is no real need for SIM cards. Recent technological developments offer the capability to outperform SIM cards and provide more innovative dimensions to current systems of mobility. In this context of changing realities in the domain of…
Adaptive correction of ensemble forecasts
NASA Astrophysics Data System (ADS)
Pelosi, Anna; Battista Chirico, Giovanni; Van den Bergh, Joris; Vannitsem, Stephane
2017-04-01
Forecasts from numerical weather prediction (NWP) models often suffer from both systematic and non-systematic errors. These are present in both deterministic and ensemble forecasts, and originate from various sources such as model error and subgrid variability. Statistical post-processing techniques can partly remove such errors, which is particularly important when NWP outputs concerning surface weather variables are employed for site specific applications. Many different post-processing techniques have been developed. For deterministic forecasts, adaptive methods such as the Kalman filter are often used, which sequentially post-process the forecasts by continuously updating the correction parameters as new ground observations become available. These methods are especially valuable when long training data sets do not exist. For ensemble forecasts, well-known techniques are ensemble model output statistics (EMOS), and so-called "member-by-member" approaches (MBM). Here, we introduce a new adaptive post-processing technique for ensemble predictions. The proposed method is a sequential Kalman filtering technique that fully exploits the information content of the ensemble. One correction equation is retrieved and applied to all members, however the parameters of the regression equations are retrieved by exploiting the second order statistics of the forecast ensemble. We compare our new method with two other techniques: a simple method that makes use of a running bias correction of the ensemble mean, and an MBM post-processing approach that rescales the ensemble mean and spread, based on minimization of the Continuous Ranked Probability Score (CRPS). We perform a verification study for the region of Campania in southern Italy. We use two years (2014-2015) of daily meteorological observations of 2-meter temperature and 10-meter wind speed from 18 ground-based automatic weather stations distributed across the region, comparing them with the corresponding COSMO-LEPS ensemble forecasts. Deterministic verification scores (e.g., mean absolute error, bias) and probabilistic scores (e.g., CRPS) are used to evaluate the post-processing techniques. We conclude that the new adaptive method outperforms the simpler running bias-correction. The proposed adaptive method often outperforms the MBM method in removing bias. The MBM method has the advantage of correcting the ensemble spread, although it needs more training data.
Metal artifact reduction for CT-based luggage screening.
Karimi, Seemeen; Martz, Harry; Cosman, Pamela
2015-01-01
In aviation security, checked luggage is screened by computed tomography scanning. Metal objects in the bags create artifacts that degrade image quality. Though there exist metal artifact reduction (MAR) methods mainly in medical imaging literature, they require knowledge of the materials in the scan, or are outlier rejection methods. To improve and evaluate a MAR method we previously introduced, that does not require knowledge of the materials in the scan, and gives good results on data with large quantities and different kinds of metal. We describe in detail an optimization which de-emphasizes metal projections and has a constraint for beam hardening and scatter. This method isolates and reduces artifacts in an intermediate image, which is then fed to a previously published sinogram replacement method. We evaluate the algorithm for luggage data containing multiple and large metal objects. We define measures of artifact reduction, and compare this method against others in MAR literature. Metal artifacts were reduced in our test images, even for multiple and large metal objects, without much loss of structure or resolution. Our MAR method outperforms the methods with which we compared it. Our approach does not make assumptions about image content, nor does it discard metal projections.
Prediction of fatigue-related driver performance from EEG data by deep Riemannian model.
Hajinoroozi, Mehdi; Jianqiu Zhang; Yufei Huang
2017-07-01
Prediction of the drivers' drowsy and alert states is important for safety purposes. The prediction of drivers' drowsy and alert states from electroencephalography (EEG) using shallow and deep Riemannian methods is presented. For shallow Riemannian methods, the minimum distance to Riemannian mean (mdm) and Log-Euclidian metric are investigated, where it is shown that Log-Euclidian metric outperforms the mdm algorithm. In addition the SPDNet, a deep Riemannian model, that takes the EEG covariance matrix as the input is investigated. It is shown that SPDNet outperforms all tested shallow and deep classification methods. Performance of SPDNet is 6.02% and 2.86% higher than the best performance by the conventional Euclidian classifiers and shallow Riemannian models, respectively.
On accuracy, privacy, and complexity in the identification problem
NASA Astrophysics Data System (ADS)
Beekhof, F.; Voloshynovskiy, S.; Koval, O.; Holotyak, T.
2010-02-01
This paper presents recent advances in the identification problem taking into account the accuracy, complexity and privacy leak of different decoding algorithms. Using a model of different actors from literature, we show that it is possible to use more accurate decoding algorithms using reliability information without increasing the privacy leak relative to algorithms that only use binary information. Existing algorithms from literature have been modified to take advantage of reliability information, and we show that a proposed branch-and-bound algorithm can outperform existing work, including the enhanced variants.
Fuzzy attitude control of solar sail via linear matrix inequalities
NASA Astrophysics Data System (ADS)
Baculi, Joshua; Ayoubi, Mohammad A.
2017-09-01
This study presents a fuzzy tracking controller based on the Takagi-Sugeno (T-S) fuzzy model of the solar sail. First, the T-S fuzzy model is constructed by linearizing the existing nonlinear equations of motion of the solar sail. Then, the T-S fuzzy model is used to derive the state feedback controller gains for the Twin Parallel Distributed Compensation (TPDC) technique. The TPDC tracks and stabilizes the attitude of the solar sail to any desired state in the presence of parameter uncertainties and external disturbances while satisfying actuator constraints. The performance of the TPDC is compared to a PID controller that is tuned using the Ziegler-Nichols method. Numerical simulation shows the TPDC outperforms the PID controller when stabilizing the solar sail to a desired state.
Time-Efficient High-Rate Data Flooding in One-Dimensional Acoustic Underwater Sensor Networks
Kwon, Jae Kyun; Seo, Bo-Min; Yun, Kyungsu; Cho, Ho-Shin
2015-01-01
Because underwater communication environments have poor characteristics, such as severe attenuation, large propagation delays and narrow bandwidths, data is normally transmitted at low rates through acoustic waves. On the other hand, as high traffic has recently been required in diverse areas, high rate transmission has become necessary. In this paper, transmission/reception timing schemes that maximize the time axis use efficiency to improve the resource efficiency for high rate transmission are proposed. The excellence of the proposed scheme is identified by examining the power distributions by node, rate bounds, power levels depending on the rates and number of nodes, and network split gains through mathematical analysis and numerical results. In addition, the simulation results show that the proposed scheme outperforms the existing packet train method. PMID:26528983
Qu, Jianfeng; Ouyang, Dantong; Hua, Wen; Ye, Yuxin; Li, Ximing
2018-04-01
Distant supervision for neural relation extraction is an efficient approach to extracting massive relations with reference to plain texts. However, the existing neural methods fail to capture the critical words in sentence encoding and meanwhile lack useful sentence information for some positive training instances. To address the above issues, we propose a novel neural relation extraction model. First, we develop a word-level attention mechanism to distinguish the importance of each individual word in a sentence, increasing the attention weights for those critical words. Second, we investigate the semantic information from word embeddings of target entities, which can be developed as a supplementary feature for the extractor. Experimental results show that our model outperforms previous state-of-the-art baselines. Copyright © 2018 Elsevier Ltd. All rights reserved.
Photon-efficient super-resolution laser radar
NASA Astrophysics Data System (ADS)
Shin, Dongeek; Shapiro, Jeffrey H.; Goyal, Vivek K.
2017-08-01
The resolution achieved in photon-efficient active optical range imaging systems can be low due to non-idealities such as propagation through a diffuse scattering medium. We propose a constrained optimization-based frame- work to address extremes in scarcity of photons and blurring by a forward imaging kernel. We provide two algorithms for the resulting inverse problem: a greedy algorithm, inspired by sparse pursuit algorithms; and a convex optimization heuristic that incorporates image total variation regularization. We demonstrate that our framework outperforms existing deconvolution imaging techniques in terms of peak signal-to-noise ratio. Since our proposed method is able to super-resolve depth features using small numbers of photon counts, it can be useful for observing fine-scale phenomena in remote sensing through a scattering medium and through-the-skin biomedical imaging applications.
Performance of device-independent quantum key distribution
NASA Astrophysics Data System (ADS)
Cao, Zhu; Zhao, Qi; Ma, Xiongfeng
2016-07-01
Quantum key distribution provides information-theoretically-secure communication. In practice, device imperfections may jeopardise the system security. Device-independent quantum key distribution solves this problem by providing secure keys even when the quantum devices are untrusted and uncharacterized. Following a recent security proof of the device-independent quantum key distribution, we improve the key rate by tightening the parameter choice in the security proof. In practice where the system is lossy, we further improve the key rate by taking into account the loss position information. From our numerical simulation, our method can outperform existing results. Meanwhile, we outline clear experimental requirements for implementing device-independent quantum key distribution. The maximal tolerable error rate is 1.6%, the minimal required transmittance is 97.3%, and the minimal required visibility is 96.8 % .
FBP and BPF reconstruction methods for circular X-ray tomography with off-center detector.
Schäfer, Dirk; Grass, Michael; van de Haar, Peter
2011-07-01
Circular scanning with an off-center planar detector is an acquisition scheme that allows to save detector area while keeping a large field of view (FOV). Several filtered back-projection (FBP) algorithms have been proposed earlier. The purpose of this work is to present two newly developed back-projection filtration (BPF) variants and evaluate the image quality of these methods compared to the existing state-of-the-art FBP methods. The first new BPF algorithm applies redundancy weighting of overlapping opposite projections before differentiation in a single projection. The second one uses the Katsevich-type differentiation involving two neighboring projections followed by redundancy weighting and back-projection. An averaging scheme is presented to mitigate streak artifacts inherent to circular BPF algorithms along the Hilbert filter lines in the off-center transaxial slices of the reconstructions. The image quality is assessed visually on reconstructed slices of simulated and clinical data. Quantitative evaluation studies are performed with the Forbild head phantom by calculating root-mean-squared-deviations (RMSDs) to the voxelized phantom for different detector overlap settings and by investigating the noise resolution trade-off with a wire phantom in the full detector and off-center scenario. The noise-resolution behavior of all off-center reconstruction methods corresponds to their full detector performance with the best resolution for the FDK based methods with the given imaging geometry. With respect to RMSD and visual inspection, the proposed BPF with Katsevich-type differentiation outperforms all other methods for the smallest chosen detector overlap of about 15 mm. The best FBP method is the algorithm that is also based on the Katsevich-type differentiation and subsequent redundancy weighting. For wider overlap of about 40-50 mm, these two algorithms produce similar results outperforming the other three methods. The clinical case with a detector overlap of about 17 mm confirms these results. The BPF-type reconstructions with Katsevich differentiation are widely independent of the size of the detector overlap and give the best results with respect to RMSD and visual inspection for minimal detector overlap. The increased homogeneity will improve correct assessment of lesions in the entire field of view.
Liu, Zhenqiu; Hsiao, William; Cantarel, Brandi L; Drábek, Elliott Franco; Fraser-Liggett, Claire
2011-12-01
Direct sequencing of microbes in human ecosystems (the human microbiome) has complemented single genome cultivation and sequencing to understand and explore the impact of commensal microbes on human health. As sequencing technologies improve and costs decline, the sophistication of data has outgrown available computational methods. While several existing machine learning methods have been adapted for analyzing microbiome data recently, there is not yet an efficient and dedicated algorithm available for multiclass classification of human microbiota. By combining instance-based and model-based learning, we propose a novel sparse distance-based learning method for simultaneous class prediction and feature (variable or taxa, which is used interchangeably) selection from multiple treatment populations on the basis of 16S rRNA sequence count data. Our proposed method simultaneously minimizes the intraclass distance and maximizes the interclass distance with many fewer estimated parameters than other methods. It is very efficient for problems with small sample sizes and unbalanced classes, which are common in metagenomic studies. We implemented this method in a MATLAB toolbox called MetaDistance. We also propose several approaches for data normalization and variance stabilization transformation in MetaDistance. We validate this method on several real and simulated 16S rRNA datasets to show that it outperforms existing methods for classifying metagenomic data. This article is the first to address simultaneous multifeature selection and class prediction with metagenomic count data. The MATLAB toolbox is freely available online at http://metadistance.igs.umaryland.edu/. zliu@umm.edu Supplementary data are available at Bioinformatics online.
Efficient biprediction decision scheme for fast high efficiency video coding encoding
NASA Astrophysics Data System (ADS)
Park, Sang-hyo; Lee, Seung-ho; Jang, Euee S.; Jun, Dongsan; Kang, Jung-Won
2016-11-01
An efficient biprediction decision scheme of high efficiency video coding (HEVC) is proposed for fast-encoding applications. For low-delay video applications, bidirectional prediction can be used to increase compression performance efficiently with previous reference frames. However, at the same time, the computational complexity of the HEVC encoder is significantly increased due to the additional biprediction search. Although a some research has attempted to reduce this complexity, whether the prediction is strongly related to both motion complexity and prediction modes in a coding unit has not yet been investigated. A method that avoids most compression-inefficient search points is proposed so that the computational complexity of the motion estimation process can be dramatically decreased. To determine if biprediction is critical, the proposed method exploits the stochastic correlation of the context of prediction units (PUs): the direction of a PU and the accuracy of a motion vector. Through experimental results, the proposed method showed that the time complexity of biprediction can be reduced to 30% on average, outperforming existing methods in view of encoding time, number of function calls, and memory access.
Efficient Iris Recognition Based on Optimal Subfeature Selection and Weighted Subregion Fusion
Deng, Ning
2014-01-01
In this paper, we propose three discriminative feature selection strategies and weighted subregion matching method to improve the performance of iris recognition system. Firstly, we introduce the process of feature extraction and representation based on scale invariant feature transformation (SIFT) in detail. Secondly, three strategies are described, which are orientation probability distribution function (OPDF) based strategy to delete some redundant feature keypoints, magnitude probability distribution function (MPDF) based strategy to reduce dimensionality of feature element, and compounded strategy combined OPDF and MPDF to further select optimal subfeature. Thirdly, to make matching more effective, this paper proposes a novel matching method based on weighted sub-region matching fusion. Particle swarm optimization is utilized to accelerate achieve different sub-region's weights and then weighted different subregions' matching scores to generate the final decision. The experimental results, on three public and renowned iris databases (CASIA-V3 Interval, Lamp, andMMU-V1), demonstrate that our proposed methods outperform some of the existing methods in terms of correct recognition rate, equal error rate, and computation complexity. PMID:24683317
ITALICS: an algorithm for normalization and DNA copy number calling for Affymetrix SNP arrays.
Rigaill, Guillem; Hupé, Philippe; Almeida, Anna; La Rosa, Philippe; Meyniel, Jean-Philippe; Decraene, Charles; Barillot, Emmanuel
2008-03-15
Affymetrix SNP arrays can be used to determine the DNA copy number measurement of 11 000-500 000 SNPs along the genome. Their high density facilitates the precise localization of genomic alterations and makes them a powerful tool for studies of cancers and copy number polymorphism. Like other microarray technologies it is influenced by non-relevant sources of variation, requiring correction. Moreover, the amplitude of variation induced by non-relevant effects is similar or greater than the biologically relevant effect (i.e. true copy number), making it difficult to estimate non-relevant effects accurately without including the biologically relevant effect. We addressed this problem by developing ITALICS, a normalization method that estimates both biological and non-relevant effects in an alternate, iterative manner, accurately eliminating irrelevant effects. We compared our normalization method with other existing and available methods, and found that ITALICS outperformed these methods for several in-house datasets and one public dataset. These results were validated biologically by quantitative PCR. The R package ITALICS (ITerative and Alternative normaLIzation and Copy number calling for affymetrix Snp arrays) has been submitted to Bioconductor.
Xiao, Zhu; Havyarimana, Vincent; Li, Tong; Wang, Dong
2016-05-13
In this paper, a novel nonlinear framework of smoothing method, non-Gaussian delayed particle smoother (nGDPS), is proposed, which enables vehicle state estimation (VSE) with high accuracy taking into account the non-Gaussianity of the measurement and process noises. Within the proposed method, the multivariate Student's t-distribution is adopted in order to compute the probability distribution function (PDF) related to the process and measurement noises, which are assumed to be non-Gaussian distributed. A computation approach based on Ensemble Kalman Filter (EnKF) is designed to cope with the mean and the covariance matrix of the proposal non-Gaussian distribution. A delayed Gibbs sampling algorithm, which incorporates smoothing of the sampled trajectories over a fixed-delay, is proposed to deal with the sample degeneracy of particles. The performance is investigated based on the real-world data, which is collected by low-cost on-board vehicle sensors. The comparison study based on the real-world experiments and the statistical analysis demonstrates that the proposed nGDPS has significant improvement on the vehicle state accuracy and outperforms the existing filtering and smoothing methods.
Efficient iris recognition based on optimal subfeature selection and weighted subregion fusion.
Chen, Ying; Liu, Yuanning; Zhu, Xiaodong; He, Fei; Wang, Hongye; Deng, Ning
2014-01-01
In this paper, we propose three discriminative feature selection strategies and weighted subregion matching method to improve the performance of iris recognition system. Firstly, we introduce the process of feature extraction and representation based on scale invariant feature transformation (SIFT) in detail. Secondly, three strategies are described, which are orientation probability distribution function (OPDF) based strategy to delete some redundant feature keypoints, magnitude probability distribution function (MPDF) based strategy to reduce dimensionality of feature element, and compounded strategy combined OPDF and MPDF to further select optimal subfeature. Thirdly, to make matching more effective, this paper proposes a novel matching method based on weighted sub-region matching fusion. Particle swarm optimization is utilized to accelerate achieve different sub-region's weights and then weighted different subregions' matching scores to generate the final decision. The experimental results, on three public and renowned iris databases (CASIA-V3 Interval, Lamp, and MMU-V1), demonstrate that our proposed methods outperform some of the existing methods in terms of correct recognition rate, equal error rate, and computation complexity.
Lin, Hongli; Yang, Xuedong; Wang, Weisheng
2014-08-01
Devising a method that can select cases based on the performance levels of trainees and the characteristics of cases is essential for developing a personalized training program in radiology education. In this paper, we propose a novel hybrid prediction algorithm called content-boosted collaborative filtering (CBCF) to predict the difficulty level of each case for each trainee. The CBCF utilizes a content-based filtering (CBF) method to enhance existing trainee-case ratings data and then provides final predictions through a collaborative filtering (CF) algorithm. The CBCF algorithm incorporates the advantages of both CBF and CF, while not inheriting the disadvantages of either. The CBCF method is compared with the pure CBF and pure CF approaches using three datasets. The experimental data are then evaluated in terms of the MAE metric. Our experimental results show that the CBCF outperforms the pure CBF and CF methods by 13.33 and 12.17 %, respectively, in terms of prediction precision. This also suggests that the CBCF can be used in the development of personalized training systems in radiology education.
González, Juan R; Carrasco, Josep L; Armengol, Lluís; Villatoro, Sergi; Jover, Lluís; Yasui, Yutaka; Estivill, Xavier
2008-01-01
Background MLPA method is a potentially useful semi-quantitative method to detect copy number alterations in targeted regions. In this paper, we propose a method for the normalization procedure based on a non-linear mixed-model, as well as a new approach for determining the statistical significance of altered probes based on linear mixed-model. This method establishes a threshold by using different tolerance intervals that accommodates the specific random error variability observed in each test sample. Results Through simulation studies we have shown that our proposed method outperforms two existing methods that are based on simple threshold rules or iterative regression. We have illustrated the method using a controlled MLPA assay in which targeted regions are variable in copy number in individuals suffering from different disorders such as Prader-Willi, DiGeorge or Autism showing the best performace. Conclusion Using the proposed mixed-model, we are able to determine thresholds to decide whether a region is altered. These threholds are specific for each individual, incorporating experimental variability, resulting in improved sensitivity and specificity as the examples with real data have revealed. PMID:18522760
Estimating uncertainty in respondent-driven sampling using a tree bootstrap method.
Baraff, Aaron J; McCormick, Tyler H; Raftery, Adrian E
2016-12-20
Respondent-driven sampling (RDS) is a network-based form of chain-referral sampling used to estimate attributes of populations that are difficult to access using standard survey tools. Although it has grown quickly in popularity since its introduction, the statistical properties of RDS estimates remain elusive. In particular, the sampling variability of these estimates has been shown to be much higher than previously acknowledged, and even methods designed to account for RDS result in misleadingly narrow confidence intervals. In this paper, we introduce a tree bootstrap method for estimating uncertainty in RDS estimates based on resampling recruitment trees. We use simulations from known social networks to show that the tree bootstrap method not only outperforms existing methods but also captures the high variability of RDS, even in extreme cases with high design effects. We also apply the method to data from injecting drug users in Ukraine. Unlike other methods, the tree bootstrap depends only on the structure of the sampled recruitment trees, not on the attributes being measured on the respondents, so correlations between attributes can be estimated as well as variability. Our results suggest that it is possible to accurately assess the high level of uncertainty inherent in RDS.
A fast algorithm to compute precise type-2 centroids for real-time control applications.
Chakraborty, Sumantra; Konar, Amit; Ralescu, Anca; Pal, Nikhil R
2015-02-01
An interval type-2 fuzzy set (IT2 FS) is characterized by its upper and lower membership functions containing all possible embedded fuzzy sets, which together is referred to as the footprint of uncertainty (FOU). The FOU results in a span of uncertainty measured in the defuzzified space and is determined by the positional difference of the centroids of all the embedded fuzzy sets taken together. This paper provides a closed-form formula to evaluate the span of uncertainty of an IT2 FS. The closed-form formula offers a precise measurement of the degree of uncertainty in an IT2 FS with a runtime complexity less than that of the classical iterative Karnik-Mendel algorithm and other formulations employing the iterative Newton-Raphson algorithm. This paper also demonstrates a real-time control application using the proposed closed-form formula of centroids with reduced root mean square error and computational overhead than those of the existing methods. Computer simulations for this real-time control application indicate that parallel realization of the IT2 defuzzification outperforms its competitors with respect to maximum overshoot even at high sampling rates. Furthermore, in the presence of measurement noise in system (plant) states, the proposed IT2 FS based scheme outperforms its type-1 counterpart with respect to peak overshoot and root mean square error in plant response.
First in the Class? Age and the Education Production Function. NBER Working Paper No. 13663
ERIC Educational Resources Information Center
Cascio, Elizabeth; Schanzenbach, Diane Whitmore
2007-01-01
Older children outperform younger children in a school-entry cohort well into their school careers. The existing literature has provided little insight into the causes of this phenomenon, leaving open the possibility that school-entry age is zero-sum game, where relatively young students lose what relatively old students gain. In this paper, we…
The Impact of Seating Location and Seating Type on Student Performance
ERIC Educational Resources Information Center
Meeks, Michael D.; Knotts, Tami L.; James, Karen D.; Williams, Felice; Vassar, John A.; Wren, Amy Oakes
2013-01-01
While an extensive body of research exists regarding the delivery of course knowledge and material, much less attention has been paid to the performance effect of seating position within a classroom. Research findings are mixed as to whether students in the front row of a classroom outperform students in the back row. Another issue that has not…
Wijetunge, Chalini D; Saeed, Isaam; Boughton, Berin A; Roessner, Ute; Halgamuge, Saman K
2015-01-01
Mass Spectrometry (MS) is a ubiquitous analytical tool in biological research and is used to measure the mass-to-charge ratio of bio-molecules. Peak detection is the essential first step in MS data analysis. Precise estimation of peak parameters such as peak summit location and peak area are critical to identify underlying bio-molecules and to estimate their abundances accurately. We propose a new method to detect and quantify peaks in mass spectra. It uses dual-tree complex wavelet transformation along with Stein's unbiased risk estimator for spectra smoothing. Then, a new method, based on the modified Asymmetric Pseudo-Voigt (mAPV) model and hierarchical particle swarm optimization, is used for peak parameter estimation. Using simulated data, we demonstrated the benefit of using the mAPV model over Gaussian, Lorentz and Bi-Gaussian functions for MS peak modelling. The proposed mAPV model achieved the best fitting accuracy for asymmetric peaks, with lower percentage errors in peak summit location estimation, which were 0.17% to 4.46% less than that of the other models. It also outperformed the other models in peak area estimation, delivering lower percentage errors, which were about 0.7% less than its closest competitor - the Bi-Gaussian model. In addition, using data generated from a MALDI-TOF computer model, we showed that the proposed overall algorithm outperformed the existing methods mainly in terms of sensitivity. It achieved a sensitivity of 85%, compared to 77% and 71% of the two benchmark algorithms, continuous wavelet transformation based method and Cromwell respectively. The proposed algorithm is particularly useful for peak detection and parameter estimation in MS data with overlapping peak distributions and asymmetric peaks. The algorithm is implemented using MATLAB and the source code is freely available at http://mapv.sourceforge.net.
2015-01-01
Background Mass Spectrometry (MS) is a ubiquitous analytical tool in biological research and is used to measure the mass-to-charge ratio of bio-molecules. Peak detection is the essential first step in MS data analysis. Precise estimation of peak parameters such as peak summit location and peak area are critical to identify underlying bio-molecules and to estimate their abundances accurately. We propose a new method to detect and quantify peaks in mass spectra. It uses dual-tree complex wavelet transformation along with Stein's unbiased risk estimator for spectra smoothing. Then, a new method, based on the modified Asymmetric Pseudo-Voigt (mAPV) model and hierarchical particle swarm optimization, is used for peak parameter estimation. Results Using simulated data, we demonstrated the benefit of using the mAPV model over Gaussian, Lorentz and Bi-Gaussian functions for MS peak modelling. The proposed mAPV model achieved the best fitting accuracy for asymmetric peaks, with lower percentage errors in peak summit location estimation, which were 0.17% to 4.46% less than that of the other models. It also outperformed the other models in peak area estimation, delivering lower percentage errors, which were about 0.7% less than its closest competitor - the Bi-Gaussian model. In addition, using data generated from a MALDI-TOF computer model, we showed that the proposed overall algorithm outperformed the existing methods mainly in terms of sensitivity. It achieved a sensitivity of 85%, compared to 77% and 71% of the two benchmark algorithms, continuous wavelet transformation based method and Cromwell respectively. Conclusions The proposed algorithm is particularly useful for peak detection and parameter estimation in MS data with overlapping peak distributions and asymmetric peaks. The algorithm is implemented using MATLAB and the source code is freely available at http://mapv.sourceforge.net. PMID:26680279
A Review of Depth and Normal Fusion Algorithms
Štolc, Svorad; Pock, Thomas
2018-01-01
Geometric surface information such as depth maps and surface normals can be acquired by various methods such as stereo light fields, shape from shading and photometric stereo techniques. We compare several algorithms which deal with the combination of depth with surface normal information in order to reconstruct a refined depth map. The reasons for performance differences are examined from the perspective of alternative formulations of surface normals for depth reconstruction. We review and analyze methods in a systematic way. Based on our findings, we introduce a new generalized fusion method, which is formulated as a least squares problem and outperforms previous methods in the depth error domain by introducing a novel normal weighting that performs closer to the geodesic distance measure. Furthermore, a novel method is introduced based on Total Generalized Variation (TGV) which further outperforms previous approaches in terms of the geodesic normal distance error and maintains comparable quality in the depth error domain. PMID:29389903
Automatic allograft bone selection through band registration and its application to distal femur.
Zhang, Yu; Qiu, Lei; Li, Fengzan; Zhang, Qing; Zhang, Li; Niu, Xiaohui
2017-09-01
Clinical reports suggest that large bone defects could be effectively restored by allograft bone transplantation, where allograft bone selection acts an important role. Besides, there is a huge demand for developing the automatic allograft bone selection methods, as the automatic methods could greatly improve the management efficiency of the large bone banks. Although several automatic methods have been presented to select the most suitable allograft bone from the massive allograft bone bank, these methods still suffer from inaccuracy. In this paper, we propose an effective allograft bone selection method without using the contralateral bones. Firstly, the allograft bone is globally aligned to the recipient bone by surface registration. Then, the global alignment is further refined through band registration. The band, defined as the recipient points within the lifted and lowered cutting planes, could involve more local structure of the defected segment. Therefore, our method could achieve robust alignment and high registration accuracy of the allograft and recipient. Moreover, the existing contour method and surface method could be unified into one framework under our method by adjusting the lift and lower distances of the cutting planes. Finally, our method has been validated on the database of distal femurs. The experimental results indicate that our method outperforms the surface method and contour method.
Efficiently computing exact geodesic loops within finite steps.
Xin, Shi-Qing; He, Ying; Fu, Chi-Wing
2012-06-01
Closed geodesics, or geodesic loops, are crucial to the study of differential topology and differential geometry. Although the existence and properties of closed geodesics on smooth surfaces have been widely studied in mathematics community, relatively little progress has been made on how to compute them on polygonal surfaces. Most existing algorithms simply consider the mesh as a graph and so the resultant loops are restricted only on mesh edges, which are far from the actual geodesics. This paper is the first to prove the existence and uniqueness of geodesic loop restricted on a closed face sequence; it contributes also with an efficient algorithm to iteratively evolve an initial closed path on a given mesh into an exact geodesic loop within finite steps. Our proposed algorithm takes only an O(k) space complexity and an O(mk) time complexity (experimentally), where m is the number of vertices in the region bounded by the initial loop and the resultant geodesic loop, and k is the average number of edges in the edge sequences that the evolving loop passes through. In contrast to the existing geodesic curvature flow methods which compute an approximate geodesic loop within a predefined threshold, our method is exact and can apply directly to triangular meshes without needing to solve any differential equation with a numerical solver; it can run at interactive speed, e.g., in the order of milliseconds, for a mesh with around 50K vertices, and hence, significantly outperforms existing algorithms. Actually, our algorithm could run at interactive speed even for larger meshes. Besides the complexity of the input mesh, the geometric shape could also affect the number of evolving steps, i.e., the performance. We motivate our algorithm with an interactive shape segmentation example shown later in the paper.
Reference point detection for camera-based fingerprint image based on wavelet transformation.
Khalil, Mohammed S
2015-04-30
Fingerprint recognition systems essentially require core-point detection prior to fingerprint matching. The core-point is used as a reference point to align the fingerprint with a template database. When processing a larger fingerprint database, it is necessary to consider the core-point during feature extraction. Numerous core-point detection methods are available and have been reported in the literature. However, these methods are generally applied to scanner-based images. Hence, this paper attempts to explore the feasibility of applying a core-point detection method to a fingerprint image obtained using a camera phone. The proposed method utilizes a discrete wavelet transform to extract the ridge information from a color image. The performance of proposed method is evaluated in terms of accuracy and consistency. These two indicators are calculated automatically by comparing the method's output with the defined core points. The proposed method is tested on two data sets, controlled and uncontrolled environment, collected from 13 different subjects. In the controlled environment, the proposed method achieved a detection rate 82.98%. In uncontrolled environment, the proposed method yield a detection rate of 78.21%. The proposed method yields promising results in a collected-image database. Moreover, the proposed method outperformed compare to existing method.
Functional Parallel Factor Analysis for Functions of One- and Two-dimensional Arguments.
Choi, Ji Yeh; Hwang, Heungsun; Timmerman, Marieke E
2018-03-01
Parallel factor analysis (PARAFAC) is a useful multivariate method for decomposing three-way data that consist of three different types of entities simultaneously. This method estimates trilinear components, each of which is a low-dimensional representation of a set of entities, often called a mode, to explain the maximum variance of the data. Functional PARAFAC permits the entities in different modes to be smooth functions or curves, varying over a continuum, rather than a collection of unconnected responses. The existing functional PARAFAC methods handle functions of a one-dimensional argument (e.g., time) only. In this paper, we propose a new extension of functional PARAFAC for handling three-way data whose responses are sequenced along both a two-dimensional domain (e.g., a plane with x- and y-axis coordinates) and a one-dimensional argument. Technically, the proposed method combines PARAFAC with basis function expansion approximations, using a set of piecewise quadratic finite element basis functions for estimating two-dimensional smooth functions and a set of one-dimensional basis functions for estimating one-dimensional smooth functions. In a simulation study, the proposed method appeared to outperform the conventional PARAFAC. We apply the method to EEG data to demonstrate its empirical usefulness.
Poisson-Gaussian Noise Analysis and Estimation for Low-Dose X-ray Images in the NSCT Domain.
Lee, Sangyoon; Lee, Min Seok; Kang, Moon Gi
2018-03-29
The noise distribution of images obtained by X-ray sensors in low-dosage situations can be analyzed using the Poisson and Gaussian mixture model. Multiscale conversion is one of the most popular noise reduction methods used in recent years. Estimation of the noise distribution of each subband in the multiscale domain is the most important factor in performing noise reduction, with non-subsampled contourlet transform (NSCT) representing an effective method for scale and direction decomposition. In this study, we use artificially generated noise to analyze and estimate the Poisson-Gaussian noise of low-dose X-ray images in the NSCT domain. The noise distribution of the subband coefficients is analyzed using the noiseless low-band coefficients and the variance of the noisy subband coefficients. The noise-after-transform also follows a Poisson-Gaussian distribution, and the relationship between the noise parameters of the subband and the full-band image is identified. We then analyze noise of actual images to validate the theoretical analysis. Comparison of the proposed noise estimation method with an existing noise reduction method confirms that the proposed method outperforms traditional methods.
Poisson–Gaussian Noise Analysis and Estimation for Low-Dose X-ray Images in the NSCT Domain
Lee, Sangyoon; Lee, Min Seok; Kang, Moon Gi
2018-01-01
The noise distribution of images obtained by X-ray sensors in low-dosage situations can be analyzed using the Poisson and Gaussian mixture model. Multiscale conversion is one of the most popular noise reduction methods used in recent years. Estimation of the noise distribution of each subband in the multiscale domain is the most important factor in performing noise reduction, with non-subsampled contourlet transform (NSCT) representing an effective method for scale and direction decomposition. In this study, we use artificially generated noise to analyze and estimate the Poisson–Gaussian noise of low-dose X-ray images in the NSCT domain. The noise distribution of the subband coefficients is analyzed using the noiseless low-band coefficients and the variance of the noisy subband coefficients. The noise-after-transform also follows a Poisson–Gaussian distribution, and the relationship between the noise parameters of the subband and the full-band image is identified. We then analyze noise of actual images to validate the theoretical analysis. Comparison of the proposed noise estimation method with an existing noise reduction method confirms that the proposed method outperforms traditional methods. PMID:29596335
Jonnagaddala, Jitendra; Jue, Toni Rose; Chang, Nai-Wen; Dai, Hong-Jie
2016-01-01
The rapidly increasing biomedical literature calls for the need of an automatic approach in the recognition and normalization of disease mentions in order to increase the precision and effectivity of disease based information retrieval. A variety of methods have been proposed to deal with the problem of disease named entity recognition and normalization. Among all the proposed methods, conditional random fields (CRFs) and dictionary lookup method are widely used for named entity recognition and normalization respectively. We herein developed a CRF-based model to allow automated recognition of disease mentions, and studied the effect of various techniques in improving the normalization results based on the dictionary lookup approach. The dataset from the BioCreative V CDR track was used to report the performance of the developed normalization methods and compare with other existing dictionary lookup based normalization methods. The best configuration achieved an F-measure of 0.77 for the disease normalization, which outperformed the best dictionary lookup based baseline method studied in this work by an F-measure of 0.13.Database URL: https://github.com/TCRNBioinformatics/DiseaseExtract. © The Author(s) 2016. Published by Oxford University Press.
New Method of Calculating a Multiplication by using the Generalized Bernstein-Vazirani Algorithm
NASA Astrophysics Data System (ADS)
Nagata, Koji; Nakamura, Tadao; Geurdes, Han; Batle, Josep; Abdalla, Soliman; Farouk, Ahmed
2018-06-01
We present a new method of more speedily calculating a multiplication by using the generalized Bernstein-Vazirani algorithm and many parallel quantum systems. Given the set of real values a1,a2,a3,\\ldots ,aN and a function g:bf {R}→ {0,1}, we shall determine the following values g(a1),g(a2),g(a3),\\ldots , g(aN) simultaneously. The speed of determining the values is shown to outperform the classical case by a factor of N. Next, we consider it as a number in binary representation; M 1 = ( g( a 1), g( a 2), g( a 3),…, g( a N )). By using M parallel quantum systems, we have M numbers in binary representation, simultaneously. The speed of obtaining the M numbers is shown to outperform the classical case by a factor of M. Finally, we calculate the product; M1× M2× \\cdots × MM. The speed of obtaining the product is shown to outperform the classical case by a factor of N × M.
Gehrmann, Sebastian; Dernoncourt, Franck; Li, Yeran; Carlson, Eric T; Wu, Joy T; Welt, Jonathan; Foote, John; Moseley, Edward T; Grant, David W; Tyler, Patrick D; Celi, Leo A
2018-01-01
In secondary analysis of electronic health records, a crucial task consists in correctly identifying the patient cohort under investigation. In many cases, the most valuable and relevant information for an accurate classification of medical conditions exist only in clinical narratives. Therefore, it is necessary to use natural language processing (NLP) techniques to extract and evaluate these narratives. The most commonly used approach to this problem relies on extracting a number of clinician-defined medical concepts from text and using machine learning techniques to identify whether a particular patient has a certain condition. However, recent advances in deep learning and NLP enable models to learn a rich representation of (medical) language. Convolutional neural networks (CNN) for text classification can augment the existing techniques by leveraging the representation of language to learn which phrases in a text are relevant for a given medical condition. In this work, we compare concept extraction based methods with CNNs and other commonly used models in NLP in ten phenotyping tasks using 1,610 discharge summaries from the MIMIC-III database. We show that CNNs outperform concept extraction based methods in almost all of the tasks, with an improvement in F1-score of up to 26 and up to 7 percentage points in area under the ROC curve (AUC). We additionally assess the interpretability of both approaches by presenting and evaluating methods that calculate and extract the most salient phrases for a prediction. The results indicate that CNNs are a valid alternative to existing approaches in patient phenotyping and cohort identification, and should be further investigated. Moreover, the deep learning approach presented in this paper can be used to assist clinicians during chart review or support the extraction of billing codes from text by identifying and highlighting relevant phrases for various medical conditions.
Kuang, Li; Yu, Long; Huang, Lan; Wang, Yin; Ma, Pengju; Li, Chuanbin; Zhu, Yujia
2018-05-14
With the rapid development of cyber-physical systems (CPS), building cyber-physical systems with high quality of service (QoS) has become an urgent requirement in both academia and industry. During the procedure of building Cyber-physical systems, it has been found that a large number of functionally equivalent services exist, so it becomes an urgent task to recommend suitable services from the large number of services available in CPS. However, since it is time-consuming, and even impractical, for a single user to invoke all of the services in CPS to experience their QoS, a robust QoS prediction method is needed to predict unknown QoS values. A commonly used method in QoS prediction is collaborative filtering, however, it is hard to deal with the data sparsity and cold start problem, and meanwhile most of the existing methods ignore the data credibility issue. Thence, in order to solve both of these challenging problems, in this paper, we design a framework of QoS prediction for CPS services, and propose a personalized QoS prediction approach based on reputation and location-aware collaborative filtering. Our approach first calculates the reputation of users by using the Dirichlet probability distribution, so as to identify untrusted users and process their unreliable data, and then it digs out the geographic neighborhood in three levels to improve the similarity calculation of users and services. Finally, the data from geographical neighbors of users and services are fused to predict the unknown QoS values. The experiments using real datasets show that our proposed approach outperforms other existing methods in terms of accuracy, efficiency, and robustness.
Huang, Lan; Wang, Yin; Ma, Pengju; Li, Chuanbin; Zhu, Yujia
2018-01-01
With the rapid development of cyber-physical systems (CPS), building cyber-physical systems with high quality of service (QoS) has become an urgent requirement in both academia and industry. During the procedure of building Cyber-physical systems, it has been found that a large number of functionally equivalent services exist, so it becomes an urgent task to recommend suitable services from the large number of services available in CPS. However, since it is time-consuming, and even impractical, for a single user to invoke all of the services in CPS to experience their QoS, a robust QoS prediction method is needed to predict unknown QoS values. A commonly used method in QoS prediction is collaborative filtering, however, it is hard to deal with the data sparsity and cold start problem, and meanwhile most of the existing methods ignore the data credibility issue. Thence, in order to solve both of these challenging problems, in this paper, we design a framework of QoS prediction for CPS services, and propose a personalized QoS prediction approach based on reputation and location-aware collaborative filtering. Our approach first calculates the reputation of users by using the Dirichlet probability distribution, so as to identify untrusted users and process their unreliable data, and then it digs out the geographic neighborhood in three levels to improve the similarity calculation of users and services. Finally, the data from geographical neighbors of users and services are fused to predict the unknown QoS values. The experiments using real datasets show that our proposed approach outperforms other existing methods in terms of accuracy, efficiency, and robustness. PMID:29757995
Bayesian CP Factorization of Incomplete Tensors with Automatic Rank Determination.
Zhao, Qibin; Zhang, Liqing; Cichocki, Andrzej
2015-09-01
CANDECOMP/PARAFAC (CP) tensor factorization of incomplete data is a powerful technique for tensor completion through explicitly capturing the multilinear latent factors. The existing CP algorithms require the tensor rank to be manually specified, however, the determination of tensor rank remains a challenging problem especially for CP rank . In addition, existing approaches do not take into account uncertainty information of latent factors, as well as missing entries. To address these issues, we formulate CP factorization using a hierarchical probabilistic model and employ a fully Bayesian treatment by incorporating a sparsity-inducing prior over multiple latent factors and the appropriate hyperpriors over all hyperparameters, resulting in automatic rank determination. To learn the model, we develop an efficient deterministic Bayesian inference algorithm, which scales linearly with data size. Our method is characterized as a tuning parameter-free approach, which can effectively infer underlying multilinear factors with a low-rank constraint, while also providing predictive distributions over missing entries. Extensive simulations on synthetic data illustrate the intrinsic capability of our method to recover the ground-truth of CP rank and prevent the overfitting problem, even when a large amount of entries are missing. Moreover, the results from real-world applications, including image inpainting and facial image synthesis, demonstrate that our method outperforms state-of-the-art approaches for both tensor factorization and tensor completion in terms of predictive performance.
YamiPred: A Novel Evolutionary Method for Predicting Pre-miRNAs and Selecting Relevant Features.
Kleftogiannis, Dimitrios; Theofilatos, Konstantinos; Likothanassis, Spiros; Mavroudi, Seferina
2015-01-01
MicroRNAs (miRNAs) are small non-coding RNAs, which play a significant role in gene regulation. Predicting miRNA genes is a challenging bioinformatics problem and existing experimental and computational methods fail to deal with it effectively. We developed YamiPred, an embedded classification method that combines the efficiency and robustness of support vector machines (SVM) with genetic algorithms (GA) for feature selection and parameters optimization. YamiPred was tested in a new and realistic human dataset and was compared with state-of-the-art computational intelligence approaches and the prevalent SVM-based tools for miRNA prediction. Experimental results indicate that YamiPred outperforms existing approaches in terms of accuracy and of geometric mean of sensitivity and specificity. The embedded feature selection component selects a compact feature subset that contributes to the performance optimization. Further experimentation with this minimal feature subset has achieved very high classification performance and revealed the minimum number of samples required for developing a robust predictor. YamiPred also confirmed the important role of commonly used features such as entropy and enthalpy, and uncovered the significance of newly introduced features, such as %A-U aggregate nucleotide frequency and positional entropy. The best model trained on human data has successfully predicted pre-miRNAs to other organisms including the category of viruses.
An effective and efficient compression algorithm for ECG signals with irregular periods.
Chou, Hsiao-Hsuan; Chen, Ying-Jui; Shiau, Yu-Chien; Kuo, Te-Son
2006-06-01
This paper presents an effective and efficient preprocessing algorithm for two-dimensional (2-D) electrocardiogram (ECG) compression to better compress irregular ECG signals by exploiting their inter- and intra-beat correlations. To better reveal the correlation structure, we first convert the ECG signal into a proper 2-D representation, or image. This involves a few steps including QRS detection and alignment, period sorting, and length equalization. The resulting 2-D ECG representation is then ready to be compressed by an appropriate image compression algorithm. We choose the state-of-the-art JPEG2000 for its high efficiency and flexibility. In this way, the proposed algorithm is shown to outperform some existing arts in the literature by simultaneously achieving high compression ratio (CR), low percent root mean squared difference (PRD), low maximum error (MaxErr), and low standard derivation of errors (StdErr). In particular, because the proposed period sorting method rearranges the detected heartbeats into a smoother image that is easier to compress, this algorithm is insensitive to irregular ECG periods. Thus either the irregular ECG signals or the QRS false-detection cases can be better compressed. This is a significant improvement over existing 2-D ECG compression methods. Moreover, this algorithm is not tied exclusively to JPEG2000. It can also be combined with other 2-D preprocessing methods or appropriate codecs to enhance the compression performance in irregular ECG cases.
Risk Classification with an Adaptive Naive Bayes Kernel Machine Model.
Minnier, Jessica; Yuan, Ming; Liu, Jun S; Cai, Tianxi
2015-04-22
Genetic studies of complex traits have uncovered only a small number of risk markers explaining a small fraction of heritability and adding little improvement to disease risk prediction. Standard single marker methods may lack power in selecting informative markers or estimating effects. Most existing methods also typically do not account for non-linearity. Identifying markers with weak signals and estimating their joint effects among many non-informative markers remains challenging. One potential approach is to group markers based on biological knowledge such as gene structure. If markers in a group tend to have similar effects, proper usage of the group structure could improve power and efficiency in estimation. We propose a two-stage method relating markers to disease risk by taking advantage of known gene-set structures. Imposing a naive bayes kernel machine (KM) model, we estimate gene-set specific risk models that relate each gene-set to the outcome in stage I. The KM framework efficiently models potentially non-linear effects of predictors without requiring explicit specification of functional forms. In stage II, we aggregate information across gene-sets via a regularization procedure. Estimation and computational efficiency is further improved with kernel principle component analysis. Asymptotic results for model estimation and gene set selection are derived and numerical studies suggest that the proposed procedure could outperform existing procedures for constructing genetic risk models.
Searching Remote Homology with Spectral Clustering with Symmetry in Neighborhood Cluster Kernels
Maulik, Ujjwal; Sarkar, Anasua
2013-01-01
Remote homology detection among proteins utilizing only the unlabelled sequences is a central problem in comparative genomics. The existing cluster kernel methods based on neighborhoods and profiles and the Markov clustering algorithms are currently the most popular methods for protein family recognition. The deviation from random walks with inflation or dependency on hard threshold in similarity measure in those methods requires an enhancement for homology detection among multi-domain proteins. We propose to combine spectral clustering with neighborhood kernels in Markov similarity for enhancing sensitivity in detecting homology independent of “recent” paralogs. The spectral clustering approach with new combined local alignment kernels more effectively exploits the unsupervised protein sequences globally reducing inter-cluster walks. When combined with the corrections based on modified symmetry based proximity norm deemphasizing outliers, the technique proposed in this article outperforms other state-of-the-art cluster kernels among all twelve implemented kernels. The comparison with the state-of-the-art string and mismatch kernels also show the superior performance scores provided by the proposed kernels. Similar performance improvement also is found over an existing large dataset. Therefore the proposed spectral clustering framework over combined local alignment kernels with modified symmetry based correction achieves superior performance for unsupervised remote homolog detection even in multi-domain and promiscuous domain proteins from Genolevures database families with better biological relevance. Source code available upon request. Contact: sarkar@labri.fr. PMID:23457439
Searching remote homology with spectral clustering with symmetry in neighborhood cluster kernels.
Maulik, Ujjwal; Sarkar, Anasua
2013-01-01
Remote homology detection among proteins utilizing only the unlabelled sequences is a central problem in comparative genomics. The existing cluster kernel methods based on neighborhoods and profiles and the Markov clustering algorithms are currently the most popular methods for protein family recognition. The deviation from random walks with inflation or dependency on hard threshold in similarity measure in those methods requires an enhancement for homology detection among multi-domain proteins. We propose to combine spectral clustering with neighborhood kernels in Markov similarity for enhancing sensitivity in detecting homology independent of "recent" paralogs. The spectral clustering approach with new combined local alignment kernels more effectively exploits the unsupervised protein sequences globally reducing inter-cluster walks. When combined with the corrections based on modified symmetry based proximity norm deemphasizing outliers, the technique proposed in this article outperforms other state-of-the-art cluster kernels among all twelve implemented kernels. The comparison with the state-of-the-art string and mismatch kernels also show the superior performance scores provided by the proposed kernels. Similar performance improvement also is found over an existing large dataset. Therefore the proposed spectral clustering framework over combined local alignment kernels with modified symmetry based correction achieves superior performance for unsupervised remote homolog detection even in multi-domain and promiscuous domain proteins from Genolevures database families with better biological relevance. Source code available upon request. sarkar@labri.fr.
Correlated Topic Vector for Scene Classification.
Wei, Pengxu; Qin, Fei; Wan, Fang; Zhu, Yi; Jiao, Jianbin; Ye, Qixiang
2017-07-01
Scene images usually involve semantic correlations, particularly when considering large-scale image data sets. This paper proposes a novel generative image representation, correlated topic vector, to model such semantic correlations. Oriented from the correlated topic model, correlated topic vector intends to naturally utilize the correlations among topics, which are seldom considered in the conventional feature encoding, e.g., Fisher vector, but do exist in scene images. It is expected that the involvement of correlations can increase the discriminative capability of the learned generative model and consequently improve the recognition accuracy. Incorporated with the Fisher kernel method, correlated topic vector inherits the advantages of Fisher vector. The contributions to the topics of visual words have been further employed by incorporating the Fisher kernel framework to indicate the differences among scenes. Combined with the deep convolutional neural network (CNN) features and Gibbs sampling solution, correlated topic vector shows great potential when processing large-scale and complex scene image data sets. Experiments on two scene image data sets demonstrate that correlated topic vector improves significantly the deep CNN features, and outperforms existing Fisher kernel-based features.
VanderKraats, Nathan D.; Hiken, Jeffrey F.; Decker, Keith F.; Edwards, John R.
2013-01-01
Methylation of the CpG-rich region (CpG island) overlapping a gene’s promoter is a generally accepted mechanism for silencing expression. While recent technological advances have enabled measurement of DNA methylation and expression changes genome-wide, only modest correlations between differential methylation at gene promoters and expression have been found. We hypothesize that stronger associations are not observed because existing analysis methods oversimplify their representation of the data and do not capture the diversity of existing methylation patterns. Recently, other patterns such as CpG island shore methylation and long partially hypomethylated domains have also been linked with gene silencing. Here, we detail a new approach for discovering differential methylation patterns associated with expression change using genome-wide high-resolution methylation data: we represent differential methylation as an interpolated curve, or signature, and then identify groups of genes with similarly shaped signatures and corresponding expression changes. Our technique uncovers a diverse set of patterns that are conserved across embryonic stem cell and cancer data sets. Overall, we find strong associations between these methylation patterns and expression. We further show that an extension of our method also outperforms other approaches by generating a longer list of genes with higher quality associations between differential methylation and expression. PMID:23748561
Deconvolution of mixing time series on a graph
Blocker, Alexander W.; Airoldi, Edoardo M.
2013-01-01
In many applications we are interested in making inference on latent time series from indirect measurements, which are often low-dimensional projections resulting from mixing or aggregation. Positron emission tomography, super-resolution, and network traffic monitoring are some examples. Inference in such settings requires solving a sequence of ill-posed inverse problems, yt = Axt, where the projection mechanism provides information on A. We consider problems in which A specifies mixing on a graph of times series that are bursty and sparse. We develop a multilevel state-space model for mixing times series and an efficient approach to inference. A simple model is used to calibrate regularization parameters that lead to efficient inference in the multilevel state-space model. We apply this method to the problem of estimating point-to-point traffic flows on a network from aggregate measurements. Our solution outperforms existing methods for this problem, and our two-stage approach suggests an efficient inference strategy for multilevel models of multivariate time series. PMID:25309135
Binary Interval Search: a scalable algorithm for counting interval intersections.
Layer, Ryan M; Skadron, Kevin; Robins, Gabriel; Hall, Ira M; Quinlan, Aaron R
2013-01-01
The comparison of diverse genomic datasets is fundamental to understand genome biology. Researchers must explore many large datasets of genome intervals (e.g. genes, sequence alignments) to place their experimental results in a broader context and to make new discoveries. Relationships between genomic datasets are typically measured by identifying intervals that intersect, that is, they overlap and thus share a common genome interval. Given the continued advances in DNA sequencing technologies, efficient methods for measuring statistically significant relationships between many sets of genomic features are crucial for future discovery. We introduce the Binary Interval Search (BITS) algorithm, a novel and scalable approach to interval set intersection. We demonstrate that BITS outperforms existing methods at counting interval intersections. Moreover, we show that BITS is intrinsically suited to parallel computing architectures, such as graphics processing units by illustrating its utility for efficient Monte Carlo simulations measuring the significance of relationships between sets of genomic intervals. https://github.com/arq5x/bits.
Tenzer, S; Peters, B; Bulik, S; Schoor, O; Lemmel, C; Schatz, M M; Kloetzel, P-M; Rammensee, H-G; Schild, H; Holzhütter, H-G
2005-05-01
Epitopes presented by major histocompatibility complex (MHC) class I molecules are selected by a multi-step process. Here we present the first computational prediction of this process based on in vitro experiments characterizing proteasomal cleavage, transport by the transporter associated with antigen processing (TAP) and MHC class I binding. Our novel prediction method for proteasomal cleavages outperforms existing methods when tested on in vitro cleavage data. The analysis of our predictions for a new dataset consisting of 390 endogenously processed MHC class I ligands from cells with known proteasome composition shows that the immunological advantage of switching from constitutive to immunoproteasomes is mainly to suppress the creation of peptides in the cytosol that TAP cannot transport. Furthermore, we show that proteasomes are unlikely to generate MHC class I ligands with a C-terminal lysine residue, suggesting processing of these ligands by a different protease that may be tripeptidyl-peptidase II (TPPII).
Boyen, Peter; Van Dyck, Dries; Neven, Frank; van Ham, Roeland C H J; van Dijk, Aalt D J
2011-01-01
Correlated motif mining (cmm) is the problem of finding overrepresented pairs of patterns, called motifs, in sequences of interacting proteins. Algorithmic solutions for cmm thereby provide a computational method for predicting binding sites for protein interaction. In this paper, we adopt a motif-driven approach where the support of candidate motif pairs is evaluated in the network. We experimentally establish the superiority of the Chi-square-based support measure over other support measures. Furthermore, we obtain that cmm is an np-hard problem for a large class of support measures (including Chi-square) and reformulate the search for correlated motifs as a combinatorial optimization problem. We then present the generic metaheuristic slider which uses steepest ascent with a neighborhood function based on sliding motifs and employs the Chi-square-based support measure. We show that slider outperforms existing motif-driven cmm methods and scales to large protein-protein interaction networks. The slider-implementation and the data used in the experiments are available on http://bioinformatics.uhasselt.be.
Surface Estimation, Variable Selection, and the Nonparametric Oracle Property.
Storlie, Curtis B; Bondell, Howard D; Reich, Brian J; Zhang, Hao Helen
2011-04-01
Variable selection for multivariate nonparametric regression is an important, yet challenging, problem due, in part, to the infinite dimensionality of the function space. An ideal selection procedure should be automatic, stable, easy to use, and have desirable asymptotic properties. In particular, we define a selection procedure to be nonparametric oracle (np-oracle) if it consistently selects the correct subset of predictors and at the same time estimates the smooth surface at the optimal nonparametric rate, as the sample size goes to infinity. In this paper, we propose a model selection procedure for nonparametric models, and explore the conditions under which the new method enjoys the aforementioned properties. Developed in the framework of smoothing spline ANOVA, our estimator is obtained via solving a regularization problem with a novel adaptive penalty on the sum of functional component norms. Theoretical properties of the new estimator are established. Additionally, numerous simulated and real examples further demonstrate that the new approach substantially outperforms other existing methods in the finite sample setting.
Surface Estimation, Variable Selection, and the Nonparametric Oracle Property
Storlie, Curtis B.; Bondell, Howard D.; Reich, Brian J.; Zhang, Hao Helen
2010-01-01
Variable selection for multivariate nonparametric regression is an important, yet challenging, problem due, in part, to the infinite dimensionality of the function space. An ideal selection procedure should be automatic, stable, easy to use, and have desirable asymptotic properties. In particular, we define a selection procedure to be nonparametric oracle (np-oracle) if it consistently selects the correct subset of predictors and at the same time estimates the smooth surface at the optimal nonparametric rate, as the sample size goes to infinity. In this paper, we propose a model selection procedure for nonparametric models, and explore the conditions under which the new method enjoys the aforementioned properties. Developed in the framework of smoothing spline ANOVA, our estimator is obtained via solving a regularization problem with a novel adaptive penalty on the sum of functional component norms. Theoretical properties of the new estimator are established. Additionally, numerous simulated and real examples further demonstrate that the new approach substantially outperforms other existing methods in the finite sample setting. PMID:21603586
Link prediction based on local weighted paths for complex networks
NASA Astrophysics Data System (ADS)
Yao, Yabing; Zhang, Ruisheng; Yang, Fan; Yuan, Yongna; Hu, Rongjing; Zhao, Zhili
As a significant problem in complex networks, link prediction aims to find the missing and future links between two unconnected nodes by estimating the existence likelihood of potential links. It plays an important role in understanding the evolution mechanism of networks and has broad applications in practice. In order to improve prediction performance, a variety of structural similarity-based methods that rely on different topological features have been put forward. As one topological feature, the path information between node pairs is utilized to calculate the node similarity. However, many path-dependent methods neglect the different contributions of paths for a pair of nodes. In this paper, a local weighted path (LWP) index is proposed to differentiate the contributions between paths. The LWP index considers the effect of the link degrees of intermediate links and the connectivity influence of intermediate nodes on paths to quantify the path weight in the prediction procedure. The experimental results on 12 real-world networks show that the LWP index outperforms other seven prediction baselines.
NASA Astrophysics Data System (ADS)
Wan, Minjie; Gu, Guohua; Qian, Weixian; Ren, Kan; Chen, Qian; Maldague, Xavier
2018-06-01
Infrared image enhancement plays a significant role in intelligent urban surveillance systems for smart city applications. Unlike existing methods only exaggerating the global contrast, we propose a particle swam optimization-based local entropy weighted histogram equalization which involves the enhancement of both local details and fore-and background contrast. First of all, a novel local entropy weighted histogram depicting the distribution of detail information is calculated based on a modified hyperbolic tangent function. Then, the histogram is divided into two parts via a threshold maximizing the inter-class variance in order to improve the contrasts of foreground and background, respectively. To avoid over-enhancement and noise amplification, double plateau thresholds of the presented histogram are formulated by means of particle swarm optimization algorithm. Lastly, each sub-image is equalized independently according to the constrained sub-local entropy weighted histogram. Comparative experiments implemented on real infrared images prove that our algorithm outperforms other state-of-the-art methods in terms of both visual and quantized evaluations.
Edge-directed inference for microaneurysms detection in digital fundus images
NASA Astrophysics Data System (ADS)
Huang, Ke; Yan, Michelle; Aviyente, Selin
2007-03-01
Microaneurysms (MAs) detection is a critical step in diabetic retinopathy screening, since MAs are the earliest visible warning of potential future problems. A variety of algorithms have been proposed for MAs detection in mass screening. Different methods have been proposed for MAs detection. The core technology for most of existing methods is based on a directional mathematical morphological operation called "Top-Hat" filter that requires multiple filtering operations at each pixel. Background structure, uneven illumination and noise often cause confusion between MAs and some non-MA structures and limits the applicability of the filter. In this paper, a novel detection framework based on edge directed inference is proposed for MAs detection. The candidate MA regions are first delineated from the edge map of a fundus image. Features measuring shape, brightness and contrast are extracted for each candidate MA region to better exclude false detection from true MAs. Algorithmic analysis and empirical evaluation reveal that the proposed edge directed inference outperforms the "Top-Hat" based algorithm in both detection accuracy and computational speed.
A Hybrid alldifferent-Tabu Search Algorithm for Solving Sudoku Puzzles
Crawford, Broderick; Paredes, Fernando; Norero, Enrique
2015-01-01
The Sudoku problem is a well-known logic-based puzzle of combinatorial number-placement. It consists in filling a n 2 × n 2 grid, composed of n columns, n rows, and n subgrids, each one containing distinct integers from 1 to n 2. Such a puzzle belongs to the NP-complete collection of problems, to which there exist diverse exact and approximate methods able to solve it. In this paper, we propose a new hybrid algorithm that smartly combines a classic tabu search procedure with the alldifferent global constraint from the constraint programming world. The alldifferent constraint is known to be efficient for domain filtering in the presence of constraints that must be pairwise different, which are exactly the kind of constraints that Sudokus own. This ability clearly alleviates the work of the tabu search, resulting in a faster and more robust approach for solving Sudokus. We illustrate interesting experimental results where our proposed algorithm outperforms the best results previously reported by hybrids and approximate methods. PMID:26078751
Refining Automatically Extracted Knowledge Bases Using Crowdsourcing
Xian, Xuefeng; Cui, Zhiming
2017-01-01
Machine-constructed knowledge bases often contain noisy and inaccurate facts. There exists significant work in developing automated algorithms for knowledge base refinement. Automated approaches improve the quality of knowledge bases but are far from perfect. In this paper, we leverage crowdsourcing to improve the quality of automatically extracted knowledge bases. As human labelling is costly, an important research challenge is how we can use limited human resources to maximize the quality improvement for a knowledge base. To address this problem, we first introduce a concept of semantic constraints that can be used to detect potential errors and do inference among candidate facts. Then, based on semantic constraints, we propose rank-based and graph-based algorithms for crowdsourced knowledge refining, which judiciously select the most beneficial candidate facts to conduct crowdsourcing and prune unnecessary questions. Our experiments show that our method improves the quality of knowledge bases significantly and outperforms state-of-the-art automatic methods under a reasonable crowdsourcing cost. PMID:28588611
NASA Astrophysics Data System (ADS)
Biazzo, Indaco; Braunstein, Alfredo; Zecchina, Riccardo
2012-08-01
We study the behavior of an algorithm derived from the cavity method for the prize-collecting steiner tree (PCST) problem on graphs. The algorithm is based on the zero temperature limit of the cavity equations and as such is formally simple (a fixed point equation resolved by iteration) and distributed (parallelizable). We provide a detailed comparison with state-of-the-art algorithms on a wide range of existing benchmarks, networks, and random graphs. Specifically, we consider an enhanced derivative of the Goemans-Williamson heuristics and the dhea solver, a branch and cut integer linear programming based approach. The comparison shows that the cavity algorithm outperforms the two algorithms in most large instances both in running time and quality of the solution. Finally we prove a few optimality properties of the solutions provided by our algorithm, including optimality under the two postprocessing procedures defined in the Goemans-Williamson derivative and global optimality in some limit cases.
A Study on the Secure User Profiling Structure and Procedure for Home Healthcare Systems.
Ko, Hoon; Song, MoonBae
2016-01-01
Despite of various benefits such as a convenience and efficiency, home healthcare systems have some inherent security risks that may cause a serious leak on personal health information. This work presents a Secure User Profiling Structure which has the patient information including their health information. A patient and a hospital keep it at that same time, they share the updated data. While they share the data and communicate, the data can be leaked. To solve the security problems, a secure communication channel with a hash function and an One-Time Password between a client and a hospital should be established and to generate an input value to an OTP, it uses a dual hash-function. This work presents a dual hash function-based approach to generate the One-Time Password ensuring a secure communication channel with the secured key. In result, attackers are unable to decrypt the leaked information because of the secured key; in addition, the proposed method outperforms the existing methods in terms of computation cost.
Image-guided filtering for improving photoacoustic tomographic image reconstruction.
Awasthi, Navchetan; Kalva, Sandeep Kumar; Pramanik, Manojit; Yalavarthy, Phaneendra K
2018-06-01
Several algorithms exist to solve the photoacoustic image reconstruction problem depending on the expected reconstructed image features. These reconstruction algorithms promote typically one feature, such as being smooth or sharp, in the output image. Combining these features using a guided filtering approach was attempted in this work, which requires an input and guiding image. This approach act as a postprocessing step to improve commonly used Tikhonov or total variational regularization method. The result obtained from linear backprojection was used as a guiding image to improve these results. Using both numerical and experimental phantom cases, it was shown that the proposed guided filtering approach was able to improve (as high as 11.23 dB) the signal-to-noise ratio of the reconstructed images with the added advantage being computationally efficient. This approach was compared with state-of-the-art basis pursuit deconvolution as well as standard denoising methods and shown to outperform them. (2018) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE).
A Hybrid alldifferent-Tabu Search Algorithm for Solving Sudoku Puzzles.
Soto, Ricardo; Crawford, Broderick; Galleguillos, Cristian; Paredes, Fernando; Norero, Enrique
2015-01-01
The Sudoku problem is a well-known logic-based puzzle of combinatorial number-placement. It consists in filling a n(2) × n(2) grid, composed of n columns, n rows, and n subgrids, each one containing distinct integers from 1 to n(2). Such a puzzle belongs to the NP-complete collection of problems, to which there exist diverse exact and approximate methods able to solve it. In this paper, we propose a new hybrid algorithm that smartly combines a classic tabu search procedure with the alldifferent global constraint from the constraint programming world. The alldifferent constraint is known to be efficient for domain filtering in the presence of constraints that must be pairwise different, which are exactly the kind of constraints that Sudokus own. This ability clearly alleviates the work of the tabu search, resulting in a faster and more robust approach for solving Sudokus. We illustrate interesting experimental results where our proposed algorithm outperforms the best results previously reported by hybrids and approximate methods.
Convolutional Neural Network-Based Shadow Detection in Images Using Visible Light Camera Sensor.
Kim, Dong Seop; Arsalan, Muhammad; Park, Kang Ryoung
2018-03-23
Recent developments in intelligence surveillance camera systems have enabled more research on the detection, tracking, and recognition of humans. Such systems typically use visible light cameras and images, in which shadows make it difficult to detect and recognize the exact human area. Near-infrared (NIR) light cameras and thermal cameras are used to mitigate this problem. However, such instruments require a separate NIR illuminator, or are prohibitively expensive. Existing research on shadow detection in images captured by visible light cameras have utilized object and shadow color features for detection. Unfortunately, various environmental factors such as illumination change and brightness of background cause detection to be a difficult task. To overcome this problem, we propose a convolutional neural network-based shadow detection method. Experimental results with a database built from various outdoor surveillance camera environments, and from the context-aware vision using image-based active recognition (CAVIAR) open database, show that our method outperforms previous works.
Convolutional Neural Network-Based Shadow Detection in Images Using Visible Light Camera Sensor
Kim, Dong Seop; Arsalan, Muhammad; Park, Kang Ryoung
2018-01-01
Recent developments in intelligence surveillance camera systems have enabled more research on the detection, tracking, and recognition of humans. Such systems typically use visible light cameras and images, in which shadows make it difficult to detect and recognize the exact human area. Near-infrared (NIR) light cameras and thermal cameras are used to mitigate this problem. However, such instruments require a separate NIR illuminator, or are prohibitively expensive. Existing research on shadow detection in images captured by visible light cameras have utilized object and shadow color features for detection. Unfortunately, various environmental factors such as illumination change and brightness of background cause detection to be a difficult task. To overcome this problem, we propose a convolutional neural network-based shadow detection method. Experimental results with a database built from various outdoor surveillance camera environments, and from the context-aware vision using image-based active recognition (CAVIAR) open database, show that our method outperforms previous works. PMID:29570690
A Robust Statistics Approach to Minimum Variance Portfolio Optimization
NASA Astrophysics Data System (ADS)
Yang, Liusha; Couillet, Romain; McKay, Matthew R.
2015-12-01
We study the design of portfolios under a minimum risk criterion. The performance of the optimized portfolio relies on the accuracy of the estimated covariance matrix of the portfolio asset returns. For large portfolios, the number of available market returns is often of similar order to the number of assets, so that the sample covariance matrix performs poorly as a covariance estimator. Additionally, financial market data often contain outliers which, if not correctly handled, may further corrupt the covariance estimation. We address these shortcomings by studying the performance of a hybrid covariance matrix estimator based on Tyler's robust M-estimator and on Ledoit-Wolf's shrinkage estimator while assuming samples with heavy-tailed distribution. Employing recent results from random matrix theory, we develop a consistent estimator of (a scaled version of) the realized portfolio risk, which is minimized by optimizing online the shrinkage intensity. Our portfolio optimization method is shown via simulations to outperform existing methods both for synthetic and real market data.
A new method to improve network topological similarity search: applied to fold recognition
Lhota, John; Hauptman, Ruth; Hart, Thomas; Ng, Clara; Xie, Lei
2015-01-01
Motivation: Similarity search is the foundation of bioinformatics. It plays a key role in establishing structural, functional and evolutionary relationships between biological sequences. Although the power of the similarity search has increased steadily in recent years, a high percentage of sequences remain uncharacterized in the protein universe. Thus, new similarity search strategies are needed to efficiently and reliably infer the structure and function of new sequences. The existing paradigm for studying protein sequence, structure, function and evolution has been established based on the assumption that the protein universe is discrete and hierarchical. Cumulative evidence suggests that the protein universe is continuous. As a result, conventional sequence homology search methods may be not able to detect novel structural, functional and evolutionary relationships between proteins from weak and noisy sequence signals. To overcome the limitations in existing similarity search methods, we propose a new algorithmic framework—Enrichment of Network Topological Similarity (ENTS)—to improve the performance of large scale similarity searches in bioinformatics. Results: We apply ENTS to a challenging unsolved problem: protein fold recognition. Our rigorous benchmark studies demonstrate that ENTS considerably outperforms state-of-the-art methods. As the concept of ENTS can be applied to any similarity metric, it may provide a general framework for similarity search on any set of biological entities, given their representation as a network. Availability and implementation: Source code freely available upon request Contact: lxie@iscb.org PMID:25717198
Li, Longhai; Feng, Cindy X; Qiu, Shi
2017-06-30
An important statistical task in disease mapping problems is to identify divergent regions with unusually high or low risk of disease. Leave-one-out cross-validatory (LOOCV) model assessment is the gold standard for estimating predictive p-values that can flag such divergent regions. However, actual LOOCV is time-consuming because one needs to rerun a Markov chain Monte Carlo analysis for each posterior distribution in which an observation is held out as a test case. This paper introduces a new method, called integrated importance sampling (iIS), for estimating LOOCV predictive p-values with only Markov chain samples drawn from the posterior based on a full data set. The key step in iIS is that we integrate away the latent variables associated the test observation with respect to their conditional distribution without reference to the actual observation. By following the general theory for importance sampling, the formula used by iIS can be proved to be equivalent to the LOOCV predictive p-value. We compare iIS and other three existing methods in the literature with two disease mapping datasets. Our empirical results show that the predictive p-values estimated with iIS are almost identical to the predictive p-values estimated with actual LOOCV and outperform those given by the existing three methods, namely, the posterior predictive checking, the ordinary importance sampling, and the ghosting method by Marshall and Spiegelhalter (2003). Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
A comparison of regional flood frequency analysis approaches in a simulation framework
NASA Astrophysics Data System (ADS)
Ganora, D.; Laio, F.
2016-07-01
Regional frequency analysis (RFA) is a well-established methodology to provide an estimate of the flood frequency curve at ungauged (or scarcely gauged) sites. Different RFA approaches exist, depending on the way the information is transferred to the site of interest, but it is not clear in the literature if a specific method systematically outperforms the others. The aim of this study is to provide a framework wherein carrying out the intercomparison by building up a virtual environment based on synthetically generated data. The considered regional approaches include: (i) a unique regional curve for the whole region; (ii) a multiple-region model where homogeneous subregions are determined through cluster analysis; (iii) a Region-of-Influence model which defines a homogeneous subregion for each site; (iv) a spatially smooth estimation procedure where the parameters of the regional model vary continuously along the space. Virtual environments are generated considering different patterns of heterogeneity, including step change and smooth variations. If the region is heterogeneous, with the parent distribution changing continuously within the region, the spatially smooth regional approach outperforms the others, with overall errors 10-50% lower than the other methods. In the case of a step-change, the spatially smooth and clustering procedures perform similarly if the heterogeneity is moderate, while clustering procedures work better when the step-change is severe. To extend our findings, an extensive sensitivity analysis has been performed to investigate the effect of sample length, number of virtual stations, return period of the predicted quantile, variability of the scale parameter of the parent distribution, number of predictor variables and different parent distribution. Overall, the spatially smooth approach appears as the most robust approach as its performances are more stable across different patterns of heterogeneity, especially when short records are considered.
ERIC Educational Resources Information Center
Brandstatter, Eduard; Gigerenzer, Gerd; Hertwig, Ralph
2008-01-01
E. Brandstatter, G. Gigerenzer, and R. Hertwig (2006) showed that the priority heuristic matches or outperforms modifications of expected utility theory in predicting choice in 4 diverse problem sets. M. H. Birnbaum (2008) argued that sets exist in which the opposite is true. The authors agree--but stress that all choice strategies have regions of…
Jaccard distance based weighted sparse representation for coarse-to-fine plant species recognition.
Zhang, Shanwen; Wu, Xiaowei; You, Zhuhong
2017-01-01
Leaf based plant species recognition plays an important role in ecological protection, however its application to large and modern leaf databases has been a long-standing obstacle due to the computational cost and feasibility. Recognizing such limitations, we propose a Jaccard distance based sparse representation (JDSR) method which adopts a two-stage, coarse to fine strategy for plant species recognition. In the first stage, we use the Jaccard distance between the test sample and each training sample to coarsely determine the candidate classes of the test sample. The second stage includes a Jaccard distance based weighted sparse representation based classification(WSRC), which aims to approximately represent the test sample in the training space, and classify it by the approximation residuals. Since the training model of our JDSR method involves much fewer but more informative representatives, this method is expected to overcome the limitation of high computational and memory costs in traditional sparse representation based classification. Comparative experimental results on a public leaf image database demonstrate that the proposed method outperforms other existing feature extraction and SRC based plant recognition methods in terms of both accuracy and computational speed.
Methods of Reverberation Mapping. I. Time-lag Determination by Measures of Randomness
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chelouche, Doron; Pozo-Nuñez, Francisco; Zucker, Shay, E-mail: doron@sci.haifa.ac.il, E-mail: francisco.pozon@gmail.com, E-mail: shayz@post.tau.ac.il
A class of methods for measuring time delays between astronomical time series is introduced in the context of quasar reverberation mapping, which is based on measures of randomness or complexity of the data. Several distinct statistical estimators are considered that do not rely on polynomial interpolations of the light curves nor on their stochastic modeling, and do not require binning in correlation space. Methods based on von Neumann’s mean-square successive-difference estimator are found to be superior to those using other estimators. An optimized von Neumann scheme is formulated, which better handles sparsely sampled data and outperforms current implementations of discretemore » correlation function methods. This scheme is applied to existing reverberation data of varying quality, and consistency with previously reported time delays is found. In particular, the size–luminosity relation of the broad-line region in quasars is recovered with a scatter comparable to that obtained by other works, yet with fewer assumptions made concerning the process underlying the variability. The proposed method for time-lag determination is particularly relevant for irregularly sampled time series, and in cases where the process underlying the variability cannot be adequately modeled.« less
An Effective Palmprint Recognition Approach for Visible and Multispectral Sensor Images.
Gumaei, Abdu; Sammouda, Rachid; Al-Salman, Abdul Malik; Alsanad, Ahmed
2018-05-15
Among several palmprint feature extraction methods the HOG-based method is attractive and performs well against changes in illumination and shadowing of palmprint images. However, it still lacks the robustness to extract the palmprint features at different rotation angles. To solve this problem, this paper presents a hybrid feature extraction method, named HOG-SGF that combines the histogram of oriented gradients (HOG) with a steerable Gaussian filter (SGF) to develop an effective palmprint recognition approach. The approach starts by processing all palmprint images by David Zhang's method to segment only the region of interests. Next, we extracted palmprint features based on the hybrid HOG-SGF feature extraction method. Then, an optimized auto-encoder (AE) was utilized to reduce the dimensionality of the extracted features. Finally, a fast and robust regularized extreme learning machine (RELM) was applied for the classification task. In the evaluation phase of the proposed approach, a number of experiments were conducted on three publicly available palmprint databases, namely MS-PolyU of multispectral palmprint images and CASIA and Tongji of contactless palmprint images. Experimentally, the results reveal that the proposed approach outperforms the existing state-of-the-art approaches even when a small number of training samples are used.
Solving multi-objective optimization problems in conservation with the reference point method
Dujardin, Yann; Chadès, Iadine
2018-01-01
Managing the biodiversity extinction crisis requires wise decision-making processes able to account for the limited resources available. In most decision problems in conservation biology, several conflicting objectives have to be taken into account. Most methods used in conservation either provide suboptimal solutions or use strong assumptions about the decision-maker’s preferences. Our paper reviews some of the existing approaches to solve multi-objective decision problems and presents new multi-objective linear programming formulations of two multi-objective optimization problems in conservation, allowing the use of a reference point approach. Reference point approaches solve multi-objective optimization problems by interactively representing the preferences of the decision-maker with a point in the criteria (objectives) space, called the reference point. We modelled and solved the following two problems in conservation: a dynamic multi-species management problem under uncertainty and a spatial allocation resource management problem. Results show that the reference point method outperforms classic methods while illustrating the use of an interactive methodology for solving combinatorial problems with multiple objectives. The method is general and can be adapted to a wide range of ecological combinatorial problems. PMID:29293650
Wang, Yubo; Veluvolu, Kalyana C
2017-06-14
It is often difficult to analyze biological signals because of their nonlinear and non-stationary characteristics. This necessitates the usage of time-frequency decomposition methods for analyzing the subtle changes in these signals that are often connected to an underlying phenomena. This paper presents a new approach to analyze the time-varying characteristics of such signals by employing a simple truncated Fourier series model, namely the band-limited multiple Fourier linear combiner (BMFLC). In contrast to the earlier designs, we first identified the sparsity imposed on the signal model in order to reformulate the model to a sparse linear regression model. The coefficients of the proposed model are then estimated by a convex optimization algorithm. The performance of the proposed method was analyzed with benchmark test signals. An energy ratio metric is employed to quantify the spectral performance and results show that the proposed method Sparse-BMFLC has high mean energy (0.9976) ratio and outperforms existing methods such as short-time Fourier transfrom (STFT), continuous Wavelet transform (CWT) and BMFLC Kalman Smoother. Furthermore, the proposed method provides an overall 6.22% in reconstruction error.
Ruan, Peiying; Hayashida, Morihiro; Maruyama, Osamu; Akutsu, Tatsuya
2013-01-01
Since many proteins express their functional activity by interacting with other proteins and forming protein complexes, it is very useful to identify sets of proteins that form complexes. For that purpose, many prediction methods for protein complexes from protein-protein interactions have been developed such as MCL, MCODE, RNSC, PCP, RRW, and NWE. These methods have dealt with only complexes with size of more than three because the methods often are based on some density of subgraphs. However, heterodimeric protein complexes that consist of two distinct proteins occupy a large part according to several comprehensive databases of known complexes. In this paper, we propose several feature space mappings from protein-protein interaction data, in which each interaction is weighted based on reliability. Furthermore, we make use of prior knowledge on protein domains to develop feature space mappings, domain composition kernel and its combination kernel with our proposed features. We perform ten-fold cross-validation computational experiments. These results suggest that our proposed kernel considerably outperforms the naive Bayes-based method, which is the best existing method for predicting heterodimeric protein complexes. PMID:23776458
Emerging From Water: Underwater Image Color Correction Based on Weakly Supervised Color Transfer
NASA Astrophysics Data System (ADS)
Li, Chongyi; Guo, Jichang; Guo, Chunle
2018-03-01
Underwater vision suffers from severe effects due to selective attenuation and scattering when light propagates through water. Such degradation not only affects the quality of underwater images but limits the ability of vision tasks. Different from existing methods which either ignore the wavelength dependency of the attenuation or assume a specific spectral profile, we tackle color distortion problem of underwater image from a new view. In this letter, we propose a weakly supervised color transfer method to correct color distortion, which relaxes the need of paired underwater images for training and allows for the underwater images unknown where were taken. Inspired by Cycle-Consistent Adversarial Networks, we design a multi-term loss function including adversarial loss, cycle consistency loss, and SSIM (Structural Similarity Index Measure) loss, which allows the content and structure of the corrected result the same as the input, but the color as if the image was taken without the water. Experiments on underwater images captured under diverse scenes show that our method produces visually pleasing results, even outperforms the art-of-the-state methods. Besides, our method can improve the performance of vision tasks.
Adaptive Modeling Procedure Selection by Data Perturbation.
Zhang, Yongli; Shen, Xiaotong
2015-10-01
Many procedures have been developed to deal with the high-dimensional problem that is emerging in various business and economics areas. To evaluate and compare these procedures, modeling uncertainty caused by model selection and parameter estimation has to be assessed and integrated into a modeling process. To do this, a data perturbation method estimates the modeling uncertainty inherited in a selection process by perturbing the data. Critical to data perturbation is the size of perturbation, as the perturbed data should resemble the original dataset. To account for the modeling uncertainty, we derive the optimal size of perturbation, which adapts to the data, the model space, and other relevant factors in the context of linear regression. On this basis, we develop an adaptive data-perturbation method that, unlike its nonadaptive counterpart, performs well in different situations. This leads to a data-adaptive model selection method. Both theoretical and numerical analysis suggest that the data-adaptive model selection method adapts to distinct situations in that it yields consistent model selection and optimal prediction, without knowing which situation exists a priori. The proposed method is applied to real data from the commodity market and outperforms its competitors in terms of price forecasting accuracy.
Female Chess Players Outperform Expectations When Playing Men.
Stafford, Tom
2018-03-01
Stereotype threat has been offered as a potential explanation of differential performance between men and women in some cognitive domains. Questions remain about the reliability and generality of the phenomenon. Previous studies have found that stereotype threat is activated in female chess players when they are matched against male players. I used data from over 5.5 million games of international tournament chess and found no evidence of a stereotype-threat effect. In fact, female players outperform expectations when playing men. Further analysis showed no influence of degree of challenge, player age, nor prevalence of female role models in national chess leagues on differences in performance when women play men versus when they play women. Though this analysis contradicts one specific mechanism of influence of gender stereotypes, the persistent differences between male and female players suggest that systematic factors do exist and remain to be uncovered.
An effective convolutional neural network model for Chinese sentiment analysis
NASA Astrophysics Data System (ADS)
Zhang, Yu; Chen, Mengdong; Liu, Lianzhong; Wang, Yadong
2017-06-01
Nowadays microblog is getting more and more popular. People are increasingly accustomed to expressing their opinions on Twitter, Facebook and Sina Weibo. Sentiment analysis of microblog has received significant attention, both in academia and in industry. So far, Chinese microblog exploration still needs lots of further work. In recent years CNN has also been used to deal with NLP tasks, and already achieved good results. However, these methods ignore the effective use of a large number of existing sentimental resources. For this purpose, we propose a Lexicon-based Sentiment Convolutional Neural Networks (LSCNN) model focus on Weibo's sentiment analysis, which combines two CNNs, trained individually base on sentiment features and word embedding, at the fully connected hidden layer. The experimental results show that our model outperforms the CNN model only with word embedding features on microblog sentiment analysis task.
Adaptive filtering with the self-organizing map: a performance comparison.
Barreto, Guilherme A; Souza, Luís Gustavo M
2006-01-01
In this paper we provide an in-depth evaluation of the SOM as a feasible tool for nonlinear adaptive filtering. A comprehensive survey of existing SOM-based and related architectures for learning input-output mappings is carried out and the application of these architectures to nonlinear adaptive filtering is formulated. Then, we introduce two simple procedures for building RBF-based nonlinear filters using the Vector-Quantized Temporal Associative Memory (VQTAM), a recently proposed method for learning dynamical input-output mappings using the SOM. The aforementioned SOM-based adaptive filters are compared with standard FIR/LMS and FIR/LMS-Newton linear transversal filters, as well as with powerful MLP-based filters in nonlinear channel equalization and inverse modeling tasks. The obtained results in both tasks indicate that SOM-based filters can consistently outperform powerful MLP-based ones.
Fast Object Motion Estimation Based on Dynamic Stixels.
Morales, Néstor; Morell, Antonio; Toledo, Jonay; Acosta, Leopoldo
2016-07-28
The stixel world is a simplification of the world in which obstacles are represented as vertical instances, called stixels, standing on a surface assumed to be planar. In this paper, previous approaches for stixel tracking are extended using a two-level scheme. In the first level, stixels are tracked by matching them between frames using a bipartite graph in which edges represent a matching cost function. Then, stixels are clustered into sets representing objects in the environment. These objects are matched based on the number of stixels paired inside them. Furthermore, a faster, but less accurate approach is proposed in which only the second level is used. Several configurations of our method are compared to an existing state-of-the-art approach to show how our methodology outperforms it in several areas, including an improvement in the quality of the depth reconstruction.
Roth, Philip L; Le, Huy; Oh, In-Sue; Van Iddekinge, Chad H; Bobko, Philip
2018-06-01
Meta-analysis has become a well-accepted method for synthesizing empirical research about a given phenomenon. Many meta-analyses focus on synthesizing correlations across primary studies, but some primary studies do not report correlations. Peterson and Brown (2005) suggested that researchers could use standardized regression weights (i.e., beta coefficients) to impute missing correlations. Indeed, their beta estimation procedures (BEPs) have been used in meta-analyses in a wide variety of fields. In this study, the authors evaluated the accuracy of BEPs in meta-analysis. We first examined how use of BEPs might affect results from a published meta-analysis. We then developed a series of Monte Carlo simulations that systematically compared the use of existing correlations (that were not missing) to data sets that incorporated BEPs (that impute missing correlations from corresponding beta coefficients). These simulations estimated ρ̄ (mean population correlation) and SDρ (true standard deviation) across a variety of meta-analytic conditions. Results from both the existing meta-analysis and the Monte Carlo simulations revealed that BEPs were associated with potentially large biases when estimating ρ̄ and even larger biases when estimating SDρ. Using only existing correlations often substantially outperformed use of BEPs and virtually never performed worse than BEPs. Overall, the authors urge a return to the standard practice of using only existing correlations in meta-analysis. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
NASA Astrophysics Data System (ADS)
Zhang, Tianzhen; Wang, Xiumei; Gao, Xinbo
2018-04-01
Nowadays, several datasets are demonstrated by multi-view, which usually include shared and complementary information. Multi-view clustering methods integrate the information of multi-view to obtain better clustering results. Nonnegative matrix factorization has become an essential and popular tool in clustering methods because of its interpretation. However, existing nonnegative matrix factorization based multi-view clustering algorithms do not consider the disagreement between views and neglects the fact that different views will have different contributions to the data distribution. In this paper, we propose a new multi-view clustering method, named adaptive multi-view clustering based on nonnegative matrix factorization and pairwise co-regularization. The proposed algorithm can obtain the parts-based representation of multi-view data by nonnegative matrix factorization. Then, pairwise co-regularization is used to measure the disagreement between views. There is only one parameter to auto learning the weight values according to the contribution of each view to data distribution. Experimental results show that the proposed algorithm outperforms several state-of-the-arts algorithms for multi-view clustering.
Geographical topic learning for social images with a deep neural network
NASA Astrophysics Data System (ADS)
Feng, Jiangfan; Xu, Xin
2017-03-01
The use of geographical tagging in social-media images is becoming a part of image metadata and a great interest for geographical information science. It is well recognized that geographical topic learning is crucial for geographical annotation. Existing methods usually exploit geographical characteristics using image preprocessing, pixel-based classification, and feature recognition. How to effectively exploit the high-level semantic feature and underlying correlation among different types of contents is a crucial task for geographical topic learning. Deep learning (DL) has recently demonstrated robust capabilities for image tagging and has been introduced into geoscience. It extracts high-level features computed from a whole image component, where the cluttered background may dominate spatial features in the deep representation. Therefore, a method of spatial-attentional DL for geographical topic learning is provided and we can regard it as a special case of DL combined with various deep networks and tuning tricks. Results demonstrated that the method is discriminative for different types of geographical topic learning. In addition, it outperforms other sequential processing models in a tagging task for a geographical image dataset.
Sufficient Dimension Reduction for Longitudinally Measured Predictors
Pfeiffer, Ruth M.; Forzani, Liliana; Bura, Efstathia
2013-01-01
We propose a method to combine several predictors (markers) that are measured repeatedly over time into a composite marker score without assuming a model and only requiring a mild condition on the predictor distribution. Assuming that the first and second moments of the predictors can be decomposed into a time and a marker component via a Kronecker product structure, that accommodates the longitudinal nature of the predictors, we develop first moment sufficient dimension reduction techniques to replace the original markers with linear transformations that contain sufficient information for the regression of the predictors on the outcome. These linear combinations can then be combined into a score that has better predictive performance than the score built under a general model that ignores the longitudinal structure of the data. Our methods can be applied to either continuous or categorical outcome measures. In simulations we focus on binary outcomes and show that our method outperforms existing alternatives using the AUC, the area under the receiver-operator characteristics (ROC) curve, as a summary measure of the discriminatory ability of a single continuous diagnostic marker for binary disease outcomes. PMID:22161635
Compressive sensing of high betweenness centrality nodes in networks
NASA Astrophysics Data System (ADS)
Mahyar, Hamidreza; Hasheminezhad, Rouzbeh; Ghalebi K., Elahe; Nazemian, Ali; Grosu, Radu; Movaghar, Ali; Rabiee, Hamid R.
2018-05-01
Betweenness centrality is a prominent centrality measure expressing importance of a node within a network, in terms of the fraction of shortest paths passing through that node. Nodes with high betweenness centrality have significant impacts on the spread of influence and idea in social networks, the user activity in mobile phone networks, the contagion process in biological networks, and the bottlenecks in communication networks. Thus, identifying k-highest betweenness centrality nodes in networks will be of great interest in many applications. In this paper, we introduce CS-HiBet, a new method to efficiently detect top- k betweenness centrality nodes in networks, using compressive sensing. CS-HiBet can perform as a distributed algorithm by using only the local information at each node. Hence, it is applicable to large real-world and unknown networks in which the global approaches are usually unrealizable. The performance of the proposed method is evaluated by extensive simulations on several synthetic and real-world networks. The experimental results demonstrate that CS-HiBet outperforms the best existing methods with notable improvements.
Wu, Zhenyu; Guo, Yang; Lin, Wenfang; Yu, Shuyang; Ji, Yang
2018-04-05
Predictive maintenance plays an important role in modern Cyber-Physical Systems (CPSs) and data-driven methods have been a worthwhile direction for Prognostics Health Management (PHM). However, two main challenges have significant influences on the traditional fault diagnostic models: one is that extracting hand-crafted features from multi-dimensional sensors with internal dependencies depends too much on expertise knowledge; the other is that imbalance pervasively exists among faulty and normal samples. As deep learning models have proved to be good methods for automatic feature extraction, the objective of this paper is to study an optimized deep learning model for imbalanced fault diagnosis for CPSs. Thus, this paper proposes a weighted Long Recurrent Convolutional LSTM model with sampling policy (wLRCL-D) to deal with these challenges. The model consists of 2-layer CNNs, 2-layer inner LSTMs and 2-Layer outer LSTMs, with under-sampling policy and weighted cost-sensitive loss function. Experiments are conducted on PHM 2015 challenge datasets, and the results show that wLRCL-D outperforms other baseline methods.
An Improved BLE Indoor Localization with Kalman-Based Fusion: An Experimental Study
Röbesaat, Jenny; Zhang, Peilin; Abdelaal, Mohamed; Theel, Oliver
2017-01-01
Indoor positioning has grasped great attention in recent years. A number of efforts have been exerted to achieve high positioning accuracy. However, there exists no technology that proves its efficacy in various situations. In this paper, we propose a novel positioning method based on fusing trilateration and dead reckoning. We employ Kalman filtering as a position fusion algorithm. Moreover, we adopt an Android device with Bluetooth Low Energy modules as the communication platform to avoid excessive energy consumption and to improve the stability of the received signal strength. To further improve the positioning accuracy, we take the environmental context information into account while generating the position fixes. Extensive experiments in a testbed are conducted to examine the performance of three approaches: trilateration, dead reckoning and the fusion method. Additionally, the influence of the knowledge of the environmental context is also examined. Finally, our proposed fusion method outperforms both trilateration and dead reckoning in terms of accuracy: experimental results show that the Kalman-based fusion, for our settings, achieves a positioning accuracy of less than one meter. PMID:28445421
Village Building Identification Based on Ensemble Convolutional Neural Networks
Guo, Zhiling; Chen, Qi; Xu, Yongwei; Shibasaki, Ryosuke; Shao, Xiaowei
2017-01-01
In this study, we present the Ensemble Convolutional Neural Network (ECNN), an elaborate CNN frame formulated based on ensembling state-of-the-art CNN models, to identify village buildings from open high-resolution remote sensing (HRRS) images. First, to optimize and mine the capability of CNN for village mapping and to ensure compatibility with our classification targets, a few state-of-the-art models were carefully optimized and enhanced based on a series of rigorous analyses and evaluations. Second, rather than directly implementing building identification by using these models, we exploited most of their advantages by ensembling their feature extractor parts into a stronger model called ECNN based on the multiscale feature learning method. Finally, the generated ECNN was applied to a pixel-level classification frame to implement object identification. The proposed method can serve as a viable tool for village building identification with high accuracy and efficiency. The experimental results obtained from the test area in Savannakhet province, Laos, prove that the proposed ECNN model significantly outperforms existing methods, improving overall accuracy from 96.64% to 99.26%, and kappa from 0.57 to 0.86. PMID:29084154
Spatial Copula Model for Imputing Traffic Flow Data from Remote Microwave Sensors
Ma, Xiaolei; Du, Bowen; Yu, Bin
2017-01-01
Issues of missing data have become increasingly serious with the rapid increase in usage of traffic sensors. Analyses of the Beijing ring expressway have showed that up to 50% of microwave sensors pose missing values. The imputation of missing traffic data must be urgently solved although a precise solution that cannot be easily achieved due to the significant number of missing portions. In this study, copula-based models are proposed for the spatial interpolation of traffic flow from remote traffic microwave sensors. Most existing interpolation methods only rely on covariance functions to depict spatial correlation and are unsuitable for coping with anomalies due to Gaussian consumption. Copula theory overcomes this issue and provides a connection between the correlation function and the marginal distribution function of traffic flow. To validate copula-based models, a comparison with three kriging methods is conducted. Results indicate that copula-based models outperform kriging methods, especially on roads with irregular traffic patterns. Copula-based models demonstrate significant potential to impute missing data in large-scale transportation networks. PMID:28934164
Guo, Yang; Lin, Wenfang; Yu, Shuyang; Ji, Yang
2018-01-01
Predictive maintenance plays an important role in modern Cyber-Physical Systems (CPSs) and data-driven methods have been a worthwhile direction for Prognostics Health Management (PHM). However, two main challenges have significant influences on the traditional fault diagnostic models: one is that extracting hand-crafted features from multi-dimensional sensors with internal dependencies depends too much on expertise knowledge; the other is that imbalance pervasively exists among faulty and normal samples. As deep learning models have proved to be good methods for automatic feature extraction, the objective of this paper is to study an optimized deep learning model for imbalanced fault diagnosis for CPSs. Thus, this paper proposes a weighted Long Recurrent Convolutional LSTM model with sampling policy (wLRCL-D) to deal with these challenges. The model consists of 2-layer CNNs, 2-layer inner LSTMs and 2-Layer outer LSTMs, with under-sampling policy and weighted cost-sensitive loss function. Experiments are conducted on PHM 2015 challenge datasets, and the results show that wLRCL-D outperforms other baseline methods. PMID:29621131
Deformable registration of CT and cone-beam CT with local intensity matching.
Park, Seyoun; Plishker, William; Quon, Harry; Wong, John; Shekhar, Raj; Lee, Junghoon
2017-02-07
Cone-beam CT (CBCT) is a widely used intra-operative imaging modality in image-guided radiotherapy and surgery. A short scan followed by a filtered-backprojection is typically used for CBCT reconstruction. While data on the mid-plane (plane of source-detector rotation) is complete, off-mid-planes undergo different information deficiency and the computed reconstructions are approximate. This causes different reconstruction artifacts at off-mid-planes depending on slice locations, and therefore impedes accurate registration between CT and CBCT. In this paper, we propose a method to accurately register CT and CBCT by iteratively matching local CT and CBCT intensities. We correct CBCT intensities by matching local intensity histograms slice by slice in conjunction with intensity-based deformable registration. The correction-registration steps are repeated in an alternating way until the result image converges. We integrate the intensity matching into three different deformable registration methods, B-spline, demons, and optical flow that are widely used for CT-CBCT registration. All three registration methods were implemented on a graphics processing unit for efficient parallel computation. We tested the proposed methods on twenty five head and neck cancer cases and compared the performance with state-of-the-art registration methods. Normalized cross correlation (NCC), structural similarity index (SSIM), and target registration error (TRE) were computed to evaluate the registration performance. Our method produced overall NCC of 0.96, SSIM of 0.94, and TRE of 2.26 → 2.27 mm, outperforming existing methods by 9%, 12%, and 27%, respectively. Experimental results also show that our method performs consistently and is more accurate than existing algorithms, and also computationally efficient.
Deformable registration of CT and cone-beam CT with local intensity matching
NASA Astrophysics Data System (ADS)
Park, Seyoun; Plishker, William; Quon, Harry; Wong, John; Shekhar, Raj; Lee, Junghoon
2017-02-01
Cone-beam CT (CBCT) is a widely used intra-operative imaging modality in image-guided radiotherapy and surgery. A short scan followed by a filtered-backprojection is typically used for CBCT reconstruction. While data on the mid-plane (plane of source-detector rotation) is complete, off-mid-planes undergo different information deficiency and the computed reconstructions are approximate. This causes different reconstruction artifacts at off-mid-planes depending on slice locations, and therefore impedes accurate registration between CT and CBCT. In this paper, we propose a method to accurately register CT and CBCT by iteratively matching local CT and CBCT intensities. We correct CBCT intensities by matching local intensity histograms slice by slice in conjunction with intensity-based deformable registration. The correction-registration steps are repeated in an alternating way until the result image converges. We integrate the intensity matching into three different deformable registration methods, B-spline, demons, and optical flow that are widely used for CT-CBCT registration. All three registration methods were implemented on a graphics processing unit for efficient parallel computation. We tested the proposed methods on twenty five head and neck cancer cases and compared the performance with state-of-the-art registration methods. Normalized cross correlation (NCC), structural similarity index (SSIM), and target registration error (TRE) were computed to evaluate the registration performance. Our method produced overall NCC of 0.96, SSIM of 0.94, and TRE of 2.26 → 2.27 mm, outperforming existing methods by 9%, 12%, and 27%, respectively. Experimental results also show that our method performs consistently and is more accurate than existing algorithms, and also computationally efficient.
Bias correction for selecting the minimal-error classifier from many machine learning models.
Ding, Ying; Tang, Shaowu; Liao, Serena G; Jia, Jia; Oesterreich, Steffi; Lin, Yan; Tseng, George C
2014-11-15
Supervised machine learning is commonly applied in genomic research to construct a classifier from the training data that is generalizable to predict independent testing data. When test datasets are not available, cross-validation is commonly used to estimate the error rate. Many machine learning methods are available, and it is well known that no universally best method exists in general. It has been a common practice to apply many machine learning methods and report the method that produces the smallest cross-validation error rate. Theoretically, such a procedure produces a selection bias. Consequently, many clinical studies with moderate sample sizes (e.g. n = 30-60) risk reporting a falsely small cross-validation error rate that could not be validated later in independent cohorts. In this article, we illustrated the probabilistic framework of the problem and explored the statistical and asymptotic properties. We proposed a new bias correction method based on learning curve fitting by inverse power law (IPL) and compared it with three existing methods: nested cross-validation, weighted mean correction and Tibshirani-Tibshirani procedure. All methods were compared in simulation datasets, five moderate size real datasets and two large breast cancer datasets. The result showed that IPL outperforms the other methods in bias correction with smaller variance, and it has an additional advantage to extrapolate error estimates for larger sample sizes, a practical feature to recommend whether more samples should be recruited to improve the classifier and accuracy. An R package 'MLbias' and all source files are publicly available. tsenglab.biostat.pitt.edu/software.htm. ctseng@pitt.edu Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Gu, Jinghua; Xuan, Jianhua; Riggins, Rebecca B; Chen, Li; Wang, Yue; Clarke, Robert
2012-08-01
Identification of transcriptional regulatory networks (TRNs) is of significant importance in computational biology for cancer research, providing a critical building block to unravel disease pathways. However, existing methods for TRN identification suffer from the inclusion of excessive 'noise' in microarray data and false-positives in binding data, especially when applied to human tumor-derived cell line studies. More robust methods that can counteract the imperfection of data sources are therefore needed for reliable identification of TRNs in this context. In this article, we propose to establish a link between the quality of one target gene to represent its regulator and the uncertainty of its expression to represent other target genes. Specifically, an outlier sum statistic was used to measure the aggregated evidence for regulation events between target genes and their corresponding transcription factors. A Gibbs sampling method was then developed to estimate the marginal distribution of the outlier sum statistic, hence, to uncover underlying regulatory relationships. To evaluate the effectiveness of our proposed method, we compared its performance with that of an existing sampling-based method using both simulation data and yeast cell cycle data. The experimental results show that our method consistently outperforms the competing method in different settings of signal-to-noise ratio and network topology, indicating its robustness for biological applications. Finally, we applied our method to breast cancer cell line data and demonstrated its ability to extract biologically meaningful regulatory modules related to estrogen signaling and action in breast cancer. The Gibbs sampler MATLAB package is freely available at http://www.cbil.ece.vt.edu/software.htm. xuan@vt.edu Supplementary data are available at Bioinformatics online.
Gu, Jinghua; Xuan, Jianhua; Riggins, Rebecca B.; Chen, Li; Wang, Yue; Clarke, Robert
2012-01-01
Motivation: Identification of transcriptional regulatory networks (TRNs) is of significant importance in computational biology for cancer research, providing a critical building block to unravel disease pathways. However, existing methods for TRN identification suffer from the inclusion of excessive ‘noise’ in microarray data and false-positives in binding data, especially when applied to human tumor-derived cell line studies. More robust methods that can counteract the imperfection of data sources are therefore needed for reliable identification of TRNs in this context. Results: In this article, we propose to establish a link between the quality of one target gene to represent its regulator and the uncertainty of its expression to represent other target genes. Specifically, an outlier sum statistic was used to measure the aggregated evidence for regulation events between target genes and their corresponding transcription factors. A Gibbs sampling method was then developed to estimate the marginal distribution of the outlier sum statistic, hence, to uncover underlying regulatory relationships. To evaluate the effectiveness of our proposed method, we compared its performance with that of an existing sampling-based method using both simulation data and yeast cell cycle data. The experimental results show that our method consistently outperforms the competing method in different settings of signal-to-noise ratio and network topology, indicating its robustness for biological applications. Finally, we applied our method to breast cancer cell line data and demonstrated its ability to extract biologically meaningful regulatory modules related to estrogen signaling and action in breast cancer. Availability and implementation: The Gibbs sampler MATLAB package is freely available at http://www.cbil.ece.vt.edu/software.htm. Contact: xuan@vt.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22595208
Good match exploration for infrared face recognition
NASA Astrophysics Data System (ADS)
Yang, Changcai; Zhou, Huabing; Sun, Sheng; Liu, Renfeng; Zhao, Ji; Ma, Jiayi
2014-11-01
Establishing good feature correspondence is a critical prerequisite and a challenging task for infrared (IR) face recognition. Recent studies revealed that the scale invariant feature transform (SIFT) descriptor outperforms other local descriptors for feature matching. However, it only uses local appearance information for matching, and hence inevitably leads to a number of false matches. To address this issue, this paper explores global structure information (GSI) among SIFT correspondences, and proposes a new method SIFT-GSI for good match exploration. This is achieved by fitting a smooth mapping function for the underlying correct matches, which involves softassign and deterministic annealing. Quantitative comparisons with state-of-the-art methods on a publicly available IR human face database demonstrate that SIFT-GSI significantly outperforms other methods for feature matching, and hence it is able to improve the reliability of IR face recognition systems.
Li, Guang-Qing; Liu, Zi; Shen, Hong-Bin; Yu, Dong-Jun
2016-10-01
As one of the most ubiquitous post-transcriptional modifications of RNA, N 6 -methyladenosine ( [Formula: see text]) plays an essential role in many vital biological processes. The identification of [Formula: see text] sites in RNAs is significantly important for both basic biomedical research and practical drug development. In this study, we designed a computational-based method, called TargetM6A, to rapidly and accurately target [Formula: see text] sites solely from the primary RNA sequences. Two new features, i.e., position-specific nucleotide/dinucleotide propensities (PSNP/PSDP), are introduced and combined with the traditional nucleotide composition (NC) feature to formulate RNA sequences. The extracted features are further optimized to obtain a much more compact and discriminative feature subset by applying an incremental feature selection (IFS) procedure. Based on the optimized feature subset, we trained TargetM6A on the training dataset with a support vector machine (SVM) as the prediction engine. We compared the proposed TargetM6A method with existing methods for predicting [Formula: see text] sites by performing stringent jackknife tests and independent validation tests on benchmark datasets. The experimental results show that the proposed TargetM6A method outperformed the existing methods for predicting [Formula: see text] sites and remarkably improved the prediction performances, with MCC = 0.526 and AUC = 0.818. We also provided a user-friendly web server for TargetM6A, which is publicly accessible for academic use at http://csbio.njust.edu.cn/bioinf/TargetM6A.
The MIMIC Method with Scale Purification for Detecting Differential Item Functioning
ERIC Educational Resources Information Center
Wang, Wen-Chung; Shih, Ching-Lin; Yang, Chih-Chien
2009-01-01
This study implements a scale purification procedure onto the standard MIMIC method for differential item functioning (DIF) detection and assesses its performance through a series of simulations. It is found that the MIMIC method with scale purification (denoted as M-SP) outperforms the standard MIMIC method (denoted as M-ST) in controlling…
Wang, Yong-Cui; Wang, Yong; Yang, Zhi-Xia; Deng, Nai-Yang
2011-06-20
Enzymes are known as the largest class of proteins and their functions are usually annotated by the Enzyme Commission (EC), which uses a hierarchy structure, i.e., four numbers separated by periods, to classify the function of enzymes. Automatically categorizing enzyme into the EC hierarchy is crucial to understand its specific molecular mechanism. In this paper, we introduce two key improvements in predicting enzyme function within the machine learning framework. One is to introduce the efficient sequence encoding methods for representing given proteins. The second one is to develop a structure-based prediction method with low computational complexity. In particular, we propose to use the conjoint triad feature (CTF) to represent the given protein sequences by considering not only the composition of amino acids but also the neighbor relationships in the sequence. Then we develop a support vector machine (SVM)-based method, named as SVMHL (SVM for hierarchy labels), to output enzyme function by fully considering the hierarchical structure of EC. The experimental results show that our SVMHL with the CTF outperforms SVMHL with the amino acid composition (AAC) feature both in predictive accuracy and Matthew's correlation coefficient (MCC). In addition, SVMHL with the CTF obtains the accuracy and MCC ranging from 81% to 98% and 0.82 to 0.98 when predicting the first three EC digits on a low-homologous enzyme dataset. We further demonstrate that our method outperforms the methods which do not take account of hierarchical relationship among enzyme categories and alternative methods which incorporate prior knowledge about inter-class relationships. Our structure-based prediction model, SVMHL with the CTF, reduces the computational complexity and outperforms the alternative approaches in enzyme function prediction. Therefore our new method will be a useful tool for enzyme function prediction community.
MINE: Module Identification in Networks
2011-01-01
Background Graphical models of network associations are useful for both visualizing and integrating multiple types of association data. Identifying modules, or groups of functionally related gene products, is an important challenge in analyzing biological networks. However, existing tools to identify modules are insufficient when applied to dense networks of experimentally derived interaction data. To address this problem, we have developed an agglomerative clustering method that is able to identify highly modular sets of gene products within highly interconnected molecular interaction networks. Results MINE outperforms MCODE, CFinder, NEMO, SPICi, and MCL in identifying non-exclusive, high modularity clusters when applied to the C. elegans protein-protein interaction network. The algorithm generally achieves superior geometric accuracy and modularity for annotated functional categories. In comparison with the most closely related algorithm, MCODE, the top clusters identified by MINE are consistently of higher density and MINE is less likely to designate overlapping modules as a single unit. MINE offers a high level of granularity with a small number of adjustable parameters, enabling users to fine-tune cluster results for input networks with differing topological properties. Conclusions MINE was created in response to the challenge of discovering high quality modules of gene products within highly interconnected biological networks. The algorithm allows a high degree of flexibility and user-customisation of results with few adjustable parameters. MINE outperforms several popular clustering algorithms in identifying modules with high modularity and obtains good overall recall and precision of functional annotations in protein-protein interaction networks from both S. cerevisiae and C. elegans. PMID:21605434
FBP and BPF reconstruction methods for circular X-ray tomography with off-center detector
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schaefer, Dirk; Grass, Michael; Haar, Peter van de
2011-05-15
Purpose: Circular scanning with an off-center planar detector is an acquisition scheme that allows to save detector area while keeping a large field of view (FOV). Several filtered back-projection (FBP) algorithms have been proposed earlier. The purpose of this work is to present two newly developed back-projection filtration (BPF) variants and evaluate the image quality of these methods compared to the existing state-of-the-art FBP methods. Methods: The first new BPF algorithm applies redundancy weighting of overlapping opposite projections before differentiation in a single projection. The second one uses the Katsevich-type differentiation involving two neighboring projections followed by redundancy weighting andmore » back-projection. An averaging scheme is presented to mitigate streak artifacts inherent to circular BPF algorithms along the Hilbert filter lines in the off-center transaxial slices of the reconstructions. The image quality is assessed visually on reconstructed slices of simulated and clinical data. Quantitative evaluation studies are performed with the Forbild head phantom by calculating root-mean-squared-deviations (RMSDs) to the voxelized phantom for different detector overlap settings and by investigating the noise resolution trade-off with a wire phantom in the full detector and off-center scenario. Results: The noise-resolution behavior of all off-center reconstruction methods corresponds to their full detector performance with the best resolution for the FDK based methods with the given imaging geometry. With respect to RMSD and visual inspection, the proposed BPF with Katsevich-type differentiation outperforms all other methods for the smallest chosen detector overlap of about 15 mm. The best FBP method is the algorithm that is also based on the Katsevich-type differentiation and subsequent redundancy weighting. For wider overlap of about 40-50 mm, these two algorithms produce similar results outperforming the other three methods. The clinical case with a detector overlap of about 17 mm confirms these results. Conclusions: The BPF-type reconstructions with Katsevich differentiation are widely independent of the size of the detector overlap and give the best results with respect to RMSD and visual inspection for minimal detector overlap. The increased homogeneity will improve correct assessment of lesions in the entire field of view.« less
Mo, Yun; Zhang, Zhongzhao; Meng, Weixiao; Ma, Lin; Wang, Yao
2014-01-01
Indoor positioning systems based on the fingerprint method are widely used due to the large number of existing devices with a wide range of coverage. However, extensive positioning regions with a massive fingerprint database may cause high computational complexity and error margins, therefore clustering methods are widely applied as a solution. However, traditional clustering methods in positioning systems can only measure the similarity of the Received Signal Strength without being concerned with the continuity of physical coordinates. Besides, outage of access points could result in asymmetric matching problems which severely affect the fine positioning procedure. To solve these issues, in this paper we propose a positioning system based on the Spatial Division Clustering (SDC) method for clustering the fingerprint dataset subject to physical distance constraints. With the Genetic Algorithm and Support Vector Machine techniques, SDC can achieve higher coarse positioning accuracy than traditional clustering algorithms. In terms of fine localization, based on the Kernel Principal Component Analysis method, the proposed positioning system outperforms its counterparts based on other feature extraction methods in low dimensionality. Apart from balancing online matching computational burden, the new positioning system exhibits advantageous performance on radio map clustering, and also shows better robustness and adaptability in the asymmetric matching problem aspect. PMID:24451470
Learning to rank atlases for multiple-atlas segmentation.
Sanroma, Gerard; Wu, Guorong; Gao, Yaozong; Shen, Dinggang
2014-10-01
Recently, multiple-atlas segmentation (MAS) has achieved a great success in the medical imaging area. The key assumption is that multiple atlases have greater chances of correctly labeling a target image than a single atlas. However, the problem of atlas selection still remains unexplored. Traditionally, image similarity is used to select a set of atlases. Unfortunately, this heuristic criterion is not necessarily related to the final segmentation performance. To solve this seemingly simple but critical problem, we propose a learning-based atlas selection method to pick up the best atlases that would lead to a more accurate segmentation. Our main idea is to learn the relationship between the pairwise appearance of observed instances (i.e., a pair of atlas and target images) and their final labeling performance (e.g., using the Dice ratio). In this way, we select the best atlases based on their expected labeling accuracy. Our atlas selection method is general enough to be integrated with any existing MAS method. We show the advantages of our atlas selection method in an extensive experimental evaluation in the ADNI, SATA, IXI, and LONI LPBA40 datasets. As shown in the experiments, our method can boost the performance of three widely used MAS methods, outperforming other learning-based and image-similarity-based atlas selection methods.
Paroxysmal atrial fibrillation prediction method with shorter HRV sequences.
Boon, K H; Khalil-Hani, M; Malarvili, M B; Sia, C W
2016-10-01
This paper proposes a method that predicts the onset of paroxysmal atrial fibrillation (PAF), using heart rate variability (HRV) segments that are shorter than those applied in existing methods, while maintaining good prediction accuracy. PAF is a common cardiac arrhythmia that increases the health risk of a patient, and the development of an accurate predictor of the onset of PAF is clinical important because it increases the possibility to stabilize (electrically) and prevent the onset of atrial arrhythmias with different pacing techniques. We investigate the effect of HRV features extracted from different lengths of HRV segments prior to PAF onset with the proposed PAF prediction method. The pre-processing stage of the predictor includes QRS detection, HRV quantification and ectopic beat correction. Time-domain, frequency-domain, non-linear and bispectrum features are then extracted from the quantified HRV. In the feature selection, the HRV feature set and classifier parameters are optimized simultaneously using an optimization procedure based on genetic algorithm (GA). Both full feature set and statistically significant feature subset are optimized by GA respectively. For the statistically significant feature subset, Mann-Whitney U test is used to filter non-statistical significance features that cannot pass the statistical test at 20% significant level. The final stage of our predictor is the classifier that is based on support vector machine (SVM). A 10-fold cross-validation is applied in performance evaluation, and the proposed method achieves 79.3% prediction accuracy using 15-minutes HRV segment. This accuracy is comparable to that achieved by existing methods that use 30-minutes HRV segments, most of which achieves accuracy of around 80%. More importantly, our method significantly outperforms those that applied segments shorter than 30 minutes. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Miwa, Makoto; Ohta, Tomoko; Rak, Rafal; Rowley, Andrew; Kell, Douglas B.; Pyysalo, Sampo; Ananiadou, Sophia
2013-01-01
Motivation: To create, verify and maintain pathway models, curators must discover and assess knowledge distributed over the vast body of biological literature. Methods supporting these tasks must understand both the pathway model representations and the natural language in the literature. These methods should identify and order documents by relevance to any given pathway reaction. No existing system has addressed all aspects of this challenge. Method: We present novel methods for associating pathway model reactions with relevant publications. Our approach extracts the reactions directly from the models and then turns them into queries for three text mining-based MEDLINE literature search systems. These queries are executed, and the resulting documents are combined and ranked according to their relevance to the reactions of interest. We manually annotate document-reaction pairs with the relevance of the document to the reaction and use this annotation to study several ranking methods, using various heuristic and machine-learning approaches. Results: Our evaluation shows that the annotated document-reaction pairs can be used to create a rule-based document ranking system, and that machine learning can be used to rank documents by their relevance to pathway reactions. We find that a Support Vector Machine-based system outperforms several baselines and matches the performance of the rule-based system. The success of the query extraction and ranking methods are used to update our existing pathway search system, PathText. Availability: An online demonstration of PathText 2 and the annotated corpus are available for research purposes at http://www.nactem.ac.uk/pathtext2/. Contact: makoto.miwa@manchester.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23813008
Assessing Mediational Models: Testing and Interval Estimation for Indirect Effects.
Biesanz, Jeremy C; Falk, Carl F; Savalei, Victoria
2010-08-06
Theoretical models specifying indirect or mediated effects are common in the social sciences. An indirect effect exists when an independent variable's influence on the dependent variable is mediated through an intervening variable. Classic approaches to assessing such mediational hypotheses ( Baron & Kenny, 1986 ; Sobel, 1982 ) have in recent years been supplemented by computationally intensive methods such as bootstrapping, the distribution of the product methods, and hierarchical Bayesian Markov chain Monte Carlo (MCMC) methods. These different approaches for assessing mediation are illustrated using data from Dunn, Biesanz, Human, and Finn (2007). However, little is known about how these methods perform relative to each other, particularly in more challenging situations, such as with data that are incomplete and/or nonnormal. This article presents an extensive Monte Carlo simulation evaluating a host of approaches for assessing mediation. We examine Type I error rates, power, and coverage. We study normal and nonnormal data as well as complete and incomplete data. In addition, we adapt a method, recently proposed in statistical literature, that does not rely on confidence intervals (CIs) to test the null hypothesis of no indirect effect. The results suggest that the new inferential method-the partial posterior p value-slightly outperforms existing ones in terms of maintaining Type I error rates while maximizing power, especially with incomplete data. Among confidence interval approaches, the bias-corrected accelerated (BC a ) bootstrapping approach often has inflated Type I error rates and inconsistent coverage and is not recommended; In contrast, the bootstrapped percentile confidence interval and the hierarchical Bayesian MCMC method perform best overall, maintaining Type I error rates, exhibiting reasonable power, and producing stable and accurate coverage rates.
Hard exudates segmentation based on learned initial seeds and iterative graph cut.
Kusakunniran, Worapan; Wu, Qiang; Ritthipravat, Panrasee; Zhang, Jian
2018-05-01
(Background and Objective): The occurrence of hard exudates is one of the early signs of diabetic retinopathy which is one of the leading causes of the blindness. Many patients with diabetic retinopathy lose their vision because of the late detection of the disease. Thus, this paper is to propose a novel method of hard exudates segmentation in retinal images in an automatic way. (Methods): The existing methods are based on either supervised or unsupervised learning techniques. In addition, the learned segmentation models may often cause miss-detection and/or fault-detection of hard exudates, due to the lack of rich characteristics, the intra-variations, and the similarity with other components in the retinal image. Thus, in this paper, the supervised learning based on the multilayer perceptron (MLP) is only used to identify initial seeds with high confidences to be hard exudates. Then, the segmentation is finalized by unsupervised learning based on the iterative graph cut (GC) using clusters of initial seeds. Also, in order to reduce color intra-variations of hard exudates in different retinal images, the color transfer (CT) is applied to normalize their color information, in the pre-processing step. (Results): The experiments and comparisons with the other existing methods are based on the two well-known datasets, e_ophtha EX and DIARETDB1. It can be seen that the proposed method outperforms the other existing methods in the literature, with the sensitivity in the pixel-level of 0.891 for the DIARETDB1 dataset and 0.564 for the e_ophtha EX dataset. The cross datasets validation where the training process is performed on one dataset and the testing process is performed on another dataset is also evaluated in this paper, in order to illustrate the robustness of the proposed method. (Conclusions): This newly proposed method integrates the supervised learning and unsupervised learning based techniques. It achieves the improved performance, when compared with the existing methods in the literature. The robustness of the proposed method for the scenario of cross datasets could enhance its practical usage. That is, the trained model could be more practical for unseen data in the real-world situation, especially when the capturing environments of training and testing images are not the same. Copyright © 2018 Elsevier B.V. All rights reserved.
A threshold-based fixed predictor for JPEG-LS image compression
NASA Astrophysics Data System (ADS)
Deng, Lihua; Huang, Zhenghua; Yao, Shoukui
2018-03-01
In JPEG-LS, fixed predictor based on median edge detector (MED) only detect horizontal and vertical edges, and thus produces large prediction errors in the locality of diagonal edges. In this paper, we propose a threshold-based edge detection scheme for the fixed predictor. The proposed scheme can detect not only the horizontal and vertical edges, but also diagonal edges. For some certain thresholds, the proposed scheme can be simplified to other existing schemes. So, it can also be regarded as the integration of these existing schemes. For a suitable threshold, the accuracy of horizontal and vertical edges detection is higher than the existing median edge detection in JPEG-LS. Thus, the proposed fixed predictor outperforms the existing JPEG-LS predictors for all images tested, while the complexity of the overall algorithm is maintained at a similar level.
Schall, Marina; Martiny, Sarah E; Goetz, Thomas; Hall, Nathan C
2016-05-01
Although expressing positive emotions is typically socially rewarded, in the present work, we predicted that people suppress positive emotions and thereby experience social benefits when outperformed others are present. We tested our predictions in three experimental studies with high school students. In Studies 1 and 2, we manipulated the type of social situation (outperformance vs. non-outperformance) and assessed suppression of positive emotions. In both studies, individuals reported suppressing positive emotions more in outperformance situations than in non-outperformance situations. In Study 3, we manipulated the social situation (outperformance vs. non-outperformance) as well as the videotaped person's expression of positive emotions (suppression vs. expression). The findings showed that when outperforming others, individuals were indeed evaluated more positively when they suppressed rather than expressed their positive emotions, and demonstrate the importance of the specific social situation with respect to the effects of suppression. © 2016 by the Society for Personality and Social Psychology, Inc.
Wireless sensor networks for heritage object deformation detection and tracking algorithm.
Xie, Zhijun; Huang, Guangyan; Zarei, Roozbeh; He, Jing; Zhang, Yanchun; Ye, Hongwu
2014-10-31
Deformation is the direct cause of heritage object collapse. It is significant to monitor and signal the early warnings of the deformation of heritage objects. However, traditional heritage object monitoring methods only roughly monitor a simple-shaped heritage object as a whole, but cannot monitor complicated heritage objects, which may have a large number of surfaces inside and outside. Wireless sensor networks, comprising many small-sized, low-cost, low-power intelligent sensor nodes, are more useful to detect the deformation of every small part of the heritage objects. Wireless sensor networks need an effective mechanism to reduce both the communication costs and energy consumption in order to monitor the heritage objects in real time. In this paper, we provide an effective heritage object deformation detection and tracking method using wireless sensor networks (EffeHDDT). In EffeHDDT, we discover a connected core set of sensor nodes to reduce the communication cost for transmitting and collecting the data of the sensor networks. Particularly, we propose a heritage object boundary detecting and tracking mechanism. Both theoretical analysis and experimental results demonstrate that our EffeHDDT method outperforms the existing methods in terms of network traffic and the precision of the deformation detection.
Empirical Likelihood-Based Estimation of the Treatment Effect in a Pretest-Posttest Study.
Huang, Chiung-Yu; Qin, Jing; Follmann, Dean A
2008-09-01
The pretest-posttest study design is commonly used in medical and social science research to assess the effect of a treatment or an intervention. Recently, interest has been rising in developing inference procedures that improve efficiency while relaxing assumptions used in the pretest-posttest data analysis, especially when the posttest measurement might be missing. In this article we propose a semiparametric estimation procedure based on empirical likelihood (EL) that incorporates the common baseline covariate information to improve efficiency. The proposed method also yields an asymptotically unbiased estimate of the response distribution. Thus functions of the response distribution, such as the median, can be estimated straightforwardly, and the EL method can provide a more appealing estimate of the treatment effect for skewed data. We show that, compared with existing methods, the proposed EL estimator has appealing theoretical properties, especially when the working model for the underlying relationship between the pretest and posttest measurements is misspecified. A series of simulation studies demonstrates that the EL-based estimator outperforms its competitors when the working model is misspecified and the data are missing at random. We illustrate the methods by analyzing data from an AIDS clinical trial (ACTG 175).
Empirical Likelihood-Based Estimation of the Treatment Effect in a Pretest–Posttest Study
Huang, Chiung-Yu; Qin, Jing; Follmann, Dean A.
2013-01-01
The pretest–posttest study design is commonly used in medical and social science research to assess the effect of a treatment or an intervention. Recently, interest has been rising in developing inference procedures that improve efficiency while relaxing assumptions used in the pretest–posttest data analysis, especially when the posttest measurement might be missing. In this article we propose a semiparametric estimation procedure based on empirical likelihood (EL) that incorporates the common baseline covariate information to improve efficiency. The proposed method also yields an asymptotically unbiased estimate of the response distribution. Thus functions of the response distribution, such as the median, can be estimated straightforwardly, and the EL method can provide a more appealing estimate of the treatment effect for skewed data. We show that, compared with existing methods, the proposed EL estimator has appealing theoretical properties, especially when the working model for the underlying relationship between the pretest and posttest measurements is misspecified. A series of simulation studies demonstrates that the EL-based estimator outperforms its competitors when the working model is misspecified and the data are missing at random. We illustrate the methods by analyzing data from an AIDS clinical trial (ACTG 175). PMID:23729942
Extraction of skin lesions from non-dermoscopic images for surgical excision of melanoma.
Jafari, M Hossein; Nasr-Esfahani, Ebrahim; Karimi, Nader; Soroushmehr, S M Reza; Samavi, Shadrokh; Najarian, Kayvan
2017-06-01
Computerized prescreening of suspicious moles and lesions for malignancy is of great importance for assessing the need and the priority of the removal surgery. Detection can be done by images captured by standard cameras, which are more preferable due to low cost and availability. One important step in computerized evaluation is accurate detection of lesion's region, i.e., segmentation of an image into two regions as lesion and normal skin. In this paper, a new method based on deep neural networks is proposed for accurate extraction of a lesion region. The input image is preprocessed, and then, its patches are fed to a convolutional neural network. Local texture and global structure of the patches are processed in order to assign pixels to lesion or normal classes. A method for effective selection of training patches is proposed for more accurate detection of a lesion's border. Our results indicate that the proposed method could reach the accuracy of 98.7% and the sensitivity of 95.2% in segmentation of lesion regions over the dataset of clinical images. The experimental results of qualitative and quantitative evaluations demonstrate that our method can outperform other state-of-the-art algorithms exist in the literature.
A local immunization strategy for networks with overlapping community structure
NASA Astrophysics Data System (ADS)
Taghavian, Fatemeh; Salehi, Mostafa; Teimouri, Mehdi
2017-02-01
Since full coverage treatment is not feasible due to limited resources, we need to utilize an immunization strategy to effectively distribute the available vaccines. On the other hand, the structure of contact network among people has a significant impact on epidemics of infectious diseases (such as SARS and influenza) in a population. Therefore, network-based immunization strategies aim to reduce the spreading rate by removing the vaccinated nodes from contact network. Such strategies try to identify more important nodes in epidemics spreading over a network. In this paper, we address the effect of overlapping nodes among communities on epidemics spreading. The proposed strategy is an optimized random-walk based selection of these nodes. The whole process is local, i.e. it requires contact network information in the level of nodes. Thus, it is applicable to large-scale and unknown networks in which the global methods usually are unrealizable. Our simulation results on different synthetic and real networks show that the proposed method outperforms the existing local methods in most cases. In particular, for networks with strong community structures, high overlapping membership of nodes or small size communities, the proposed method shows better performance.
Collaborative Filtering Recommendation on Users' Interest Sequences.
Cheng, Weijie; Yin, Guisheng; Dong, Yuxin; Dong, Hongbin; Zhang, Wansong
2016-01-01
As an important factor for improving recommendations, time information has been introduced to model users' dynamic preferences in many papers. However, the sequence of users' behaviour is rarely studied in recommender systems. Due to the users' unique behavior evolution patterns and personalized interest transitions among items, users' similarity in sequential dimension should be introduced to further distinguish users' preferences and interests. In this paper, we propose a new collaborative filtering recommendation method based on users' interest sequences (IS) that rank users' ratings or other online behaviors according to the timestamps when they occurred. This method extracts the semantics hidden in the interest sequences by the length of users' longest common sub-IS (LCSIS) and the count of users' total common sub-IS (ACSIS). Then, these semantics are utilized to obtain users' IS-based similarities and, further, to refine the similarities acquired from traditional collaborative filtering approaches. With these updated similarities, transition characteristics and dynamic evolution patterns of users' preferences are considered. Our new proposed method was compared with state-of-the-art time-aware collaborative filtering algorithms on datasets MovieLens, Flixster and Ciao. The experimental results validate that the proposed recommendation method is effective and outperforms several existing algorithms in the accuracy of rating prediction.
Collaborative Filtering Recommendation on Users’ Interest Sequences
Cheng, Weijie; Yin, Guisheng; Dong, Yuxin; Dong, Hongbin; Zhang, Wansong
2016-01-01
As an important factor for improving recommendations, time information has been introduced to model users’ dynamic preferences in many papers. However, the sequence of users’ behaviour is rarely studied in recommender systems. Due to the users’ unique behavior evolution patterns and personalized interest transitions among items, users’ similarity in sequential dimension should be introduced to further distinguish users’ preferences and interests. In this paper, we propose a new collaborative filtering recommendation method based on users’ interest sequences (IS) that rank users’ ratings or other online behaviors according to the timestamps when they occurred. This method extracts the semantics hidden in the interest sequences by the length of users’ longest common sub-IS (LCSIS) and the count of users’ total common sub-IS (ACSIS). Then, these semantics are utilized to obtain users’ IS-based similarities and, further, to refine the similarities acquired from traditional collaborative filtering approaches. With these updated similarities, transition characteristics and dynamic evolution patterns of users’ preferences are considered. Our new proposed method was compared with state-of-the-art time-aware collaborative filtering algorithms on datasets MovieLens, Flixster and Ciao. The experimental results validate that the proposed recommendation method is effective and outperforms several existing algorithms in the accuracy of rating prediction. PMID:27195787
Shi, Xingjie; Zhao, Qing; Huang, Jian; Xie, Yang; Ma, Shuangge
2015-01-01
Motivation: Both gene expression levels (GEs) and copy number alterations (CNAs) have important biological implications. GEs are partly regulated by CNAs, and much effort has been devoted to understanding their relations. The regulation analysis is challenging with one gene expression possibly regulated by multiple CNAs and one CNA potentially regulating the expressions of multiple genes. The correlations among GEs and among CNAs make the analysis even more complicated. The existing methods have limitations and cannot comprehensively describe the regulation. Results: A sparse double Laplacian shrinkage method is developed. It jointly models the effects of multiple CNAs on multiple GEs. Penalization is adopted to achieve sparsity and identify the regulation relationships. Network adjacency is computed to describe the interconnections among GEs and among CNAs. Two Laplacian shrinkage penalties are imposed to accommodate the network adjacency measures. Simulation shows that the proposed method outperforms the competing alternatives with more accurate marker identification. The Cancer Genome Atlas data are analysed to further demonstrate advantages of the proposed method. Availability and implementation: R code is available at http://works.bepress.com/shuangge/49/ Contact: shuangge.ma@yale.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26342102
Xiao, Zhu; Havyarimana, Vincent; Li, Tong; Wang, Dong
2016-01-01
In this paper, a novel nonlinear framework of smoothing method, non-Gaussian delayed particle smoother (nGDPS), is proposed, which enables vehicle state estimation (VSE) with high accuracy taking into account the non-Gaussianity of the measurement and process noises. Within the proposed method, the multivariate Student’s t-distribution is adopted in order to compute the probability distribution function (PDF) related to the process and measurement noises, which are assumed to be non-Gaussian distributed. A computation approach based on Ensemble Kalman Filter (EnKF) is designed to cope with the mean and the covariance matrix of the proposal non-Gaussian distribution. A delayed Gibbs sampling algorithm, which incorporates smoothing of the sampled trajectories over a fixed-delay, is proposed to deal with the sample degeneracy of particles. The performance is investigated based on the real-world data, which is collected by low-cost on-board vehicle sensors. The comparison study based on the real-world experiments and the statistical analysis demonstrates that the proposed nGDPS has significant improvement on the vehicle state accuracy and outperforms the existing filtering and smoothing methods. PMID:27187405
Still-to-video face recognition in unconstrained environments
NASA Astrophysics Data System (ADS)
Wang, Haoyu; Liu, Changsong; Ding, Xiaoqing
2015-02-01
Face images from video sequences captured in unconstrained environments usually contain several kinds of variations, e.g. pose, facial expression, illumination, image resolution and occlusion. Motion blur and compression artifacts also deteriorate recognition performance. Besides, in various practical systems such as law enforcement, video surveillance and e-passport identification, only a single still image per person is enrolled as the gallery set. Many existing methods may fail to work due to variations in face appearances and the limit of available gallery samples. In this paper, we propose a novel approach for still-to-video face recognition in unconstrained environments. By assuming that faces from still images and video frames share the same identity space, a regularized least squares regression method is utilized to tackle the multi-modality problem. Regularization terms based on heuristic assumptions are enrolled to avoid overfitting. In order to deal with the single image per person problem, we exploit face variations learned from training sets to synthesize virtual samples for gallery samples. We adopt a learning algorithm combining both affine/convex hull-based approach and regularizations to match image sets. Experimental results on a real-world dataset consisting of unconstrained video sequences demonstrate that our method outperforms the state-of-the-art methods impressively.
NASA Astrophysics Data System (ADS)
Mercier, Sylvain; Gratton, Serge; Tardieu, Nicolas; Vasseur, Xavier
2017-12-01
Many applications in structural mechanics require the numerical solution of sequences of linear systems typically issued from a finite element discretization of the governing equations on fine meshes. The method of Lagrange multipliers is often used to take into account mechanical constraints. The resulting matrices then exhibit a saddle point structure and the iterative solution of such preconditioned linear systems is considered as challenging. A popular strategy is then to combine preconditioning and deflation to yield an efficient method. We propose an alternative that is applicable to the general case and not only to matrices with a saddle point structure. In this approach, we consider to update an existing algebraic or application-based preconditioner, using specific available information exploiting the knowledge of an approximate invariant subspace or of matrix-vector products. The resulting preconditioner has the form of a limited memory quasi-Newton matrix and requires a small number of linearly independent vectors. Numerical experiments performed on three large-scale applications in elasticity highlight the relevance of the new approach. We show that the proposed method outperforms the deflation method when considering sequences of linear systems with varying matrices.
Effective Clipart Image Vectorization through Direct Optimization of Bezigons.
Yang, Ming; Chao, Hongyang; Zhang, Chi; Guo, Jun; Yuan, Lu; Sun, Jian
2016-02-01
Bezigons, i.e., closed paths composed of Bézier curves, have been widely employed to describe shapes in image vectorization results. However, most existing vectorization techniques infer the bezigons by simply approximating an intermediate vector representation (such as polygons). Consequently, the resultant bezigons are sometimes imperfect due to accumulated errors, fitting ambiguities, and a lack of curve priors, especially for low-resolution images. In this paper, we describe a novel method for vectorizing clipart images. In contrast to previous methods, we directly optimize the bezigons rather than using other intermediate representations; therefore, the resultant bezigons are not only of higher fidelity compared with the original raster image but also more reasonable because they were traced by a proficient expert. To enable such optimization, we have overcome several challenges and have devised a differentiable data energy as well as several curve-based prior terms. To improve the efficiency of the optimization, we also take advantage of the local control property of bezigons and adopt an overlapped piecewise optimization strategy. The experimental results show that our method outperforms both the current state-of-the-art method and commonly used commercial software in terms of bezigon quality.
Image Search Reranking With Hierarchical Topic Awareness.
Tian, Xinmei; Yang, Linjun; Lu, Yijuan; Tian, Qi; Tao, Dacheng
2015-10-01
With much attention from both academia and industrial communities, visual search reranking has recently been proposed to refine image search results obtained from text-based image search engines. Most of the traditional reranking methods cannot capture both relevance and diversity of the search results at the same time. Or they ignore the hierarchical topic structure of search result. Each topic is treated equally and independently. However, in real applications, images returned for certain queries are naturally in hierarchical organization, rather than simple parallel relation. In this paper, a new reranking method "topic-aware reranking (TARerank)" is proposed. TARerank describes the hierarchical topic structure of search results in one model, and seamlessly captures both relevance and diversity of the image search results simultaneously. Through a structured learning framework, relevance and diversity are modeled in TARerank by a set of carefully designed features, and then the model is learned from human-labeled training samples. The learned model is expected to predict reranking results with high relevance and diversity for testing queries. To verify the effectiveness of the proposed method, we collect an image search dataset and conduct comparison experiments on it. The experimental results demonstrate that the proposed TARerank outperforms the existing relevance-based and diversified reranking methods.
Chen, Lidong; Basu, Anup; Zhang, Maojun; Wang, Wei; Liu, Yu
2014-03-20
A complementary catadioptric imaging technique was proposed to solve the problem of low and nonuniform resolution in omnidirectional imaging. To enhance this research, our paper focuses on how to generate a high-resolution panoramic image from the captured omnidirectional image. To avoid the interference between the inner and outer images while fusing the two complementary views, a cross-selection kernel regression method is proposed. First, in view of the complementarity of sampling resolution in the tangential and radial directions between the inner and the outer images, respectively, the horizontal gradients in the expected panoramic image are estimated based on the scattered neighboring pixels mapped from the outer, while the vertical gradients are estimated using the inner image. Then, the size and shape of the regression kernel are adaptively steered based on the local gradients. Furthermore, the neighboring pixels in the next interpolation step of kernel regression are also selected based on the comparison between the horizontal and vertical gradients. In simulation and real-image experiments, the proposed method outperforms existing kernel regression methods and our previous wavelet-based fusion method in terms of both visual quality and objective evaluation.
Sparse kernel methods for high-dimensional survival data.
Evers, Ludger; Messow, Claudia-Martina
2008-07-15
Sparse kernel methods like support vector machines (SVM) have been applied with great success to classification and (standard) regression settings. Existing support vector classification and regression techniques however are not suitable for partly censored survival data, which are typically analysed using Cox's proportional hazards model. As the partial likelihood of the proportional hazards model only depends on the covariates through inner products, it can be 'kernelized'. The kernelized proportional hazards model however yields a solution that is dense, i.e. the solution depends on all observations. One of the key features of an SVM is that it yields a sparse solution, depending only on a small fraction of the training data. We propose two methods. One is based on a geometric idea, where-akin to support vector classification-the margin between the failed observation and the observations currently at risk is maximised. The other approach is based on obtaining a sparse model by adding observations one after another akin to the Import Vector Machine (IVM). Data examples studied suggest that both methods can outperform competing approaches. Software is available under the GNU Public License as an R package and can be obtained from the first author's website http://www.maths.bris.ac.uk/~maxle/software.html.
Deviation-based spam-filtering method via stochastic approach
NASA Astrophysics Data System (ADS)
Lee, Daekyung; Lee, Mi Jin; Kim, Beom Jun
2018-03-01
In the presence of a huge number of possible purchase choices, ranks or ratings of items by others often play very important roles for a buyer to make a final purchase decision. Perfectly objective rating is an impossible task to achieve, and we often use an average rating built on how previous buyers estimated the quality of the product. The problem of using a simple average rating is that it can easily be polluted by careless users whose evaluation of products cannot be trusted, and by malicious spammers who try to bias the rating result on purpose. In this letter we suggest how trustworthiness of individual users can be systematically and quantitatively reflected to build a more reliable rating system. We compute the suitably defined reliability of each user based on the user's rating pattern for all products she evaluated. We call our proposed method as the deviation-based ranking, since the statistical significance of each user's rating pattern with respect to the average rating pattern is the key ingredient. We find that our deviation-based ranking method outperforms existing methods in filtering out careless random evaluators as well as malicious spammers.
Wireless Sensor Networks for Heritage Object Deformation Detection and Tracking Algorithm
Xie, Zhijun; Huang, Guangyan; Zarei, Roozbeh; He, Jing; Zhang, Yanchun; Ye, Hongwu
2014-01-01
Deformation is the direct cause of heritage object collapse. It is significant to monitor and signal the early warnings of the deformation of heritage objects. However, traditional heritage object monitoring methods only roughly monitor a simple-shaped heritage object as a whole, but cannot monitor complicated heritage objects, which may have a large number of surfaces inside and outside. Wireless sensor networks, comprising many small-sized, low-cost, low-power intelligent sensor nodes, are more useful to detect the deformation of every small part of the heritage objects. Wireless sensor networks need an effective mechanism to reduce both the communication costs and energy consumption in order to monitor the heritage objects in real time. In this paper, we provide an effective heritage object deformation detection and tracking method using wireless sensor networks (EffeHDDT). In EffeHDDT, we discover a connected core set of sensor nodes to reduce the communication cost for transmitting and collecting the data of the sensor networks. Particularly, we propose a heritage object boundary detecting and tracking mechanism. Both theoretical analysis and experimental results demonstrate that our EffeHDDT method outperforms the existing methods in terms of network traffic and the precision of the deformation detection. PMID:25365458
Sun, Jimeng; Hu, Jianying; Luo, Dijun; Markatou, Marianthi; Wang, Fei; Edabollahi, Shahram; Steinhubl, Steven E.; Daar, Zahra; Stewart, Walter F.
2012-01-01
Background: The ability to identify the risk factors related to an adverse condition, e.g., heart failures (HF) diagnosis, is very important for improving care quality and reducing cost. Existing approaches for risk factor identification are either knowledge driven (from guidelines or literatures) or data driven (from observational data). No existing method provides a model to effectively combine expert knowledge with data driven insight for risk factor identification. Methods: We present a systematic approach to enhance known knowledge-based risk factors with additional potential risk factors derived from data. The core of our approach is a sparse regression model with regularization terms that correspond to both knowledge and data driven risk factors. Results: The approach is validated using a large dataset containing 4,644 heart failure cases and 45,981 controls. The outpatient electronic health records (EHRs) for these patients include diagnosis, medication, lab results from 2003–2010. We demonstrate that the proposed method can identify complementary risk factors that are not in the existing known factors and can better predict the onset of HF. We quantitatively compare different sets of risk factors in the context of predicting onset of HF using the performance metric, the Area Under the ROC Curve (AUC). The combined risk factors between knowledge and data significantly outperform knowledge-based risk factors alone. Furthermore, those additional risk factors are confirmed to be clinically meaningful by a cardiologist. Conclusion: We present a systematic framework for combining knowledge and data driven insights for risk factor identification. We demonstrate the power of this framework in the context of predicting onset of HF, where our approach can successfully identify intuitive and predictive risk factors beyond a set of known HF risk factors. PMID:23304365
G3//BMK and Its Application to Calculation of Bond Dissociation Enthalpies.
Zheng, Wen-Rui; Fu, Yao; Guo, Qing-Xiang
2008-08-01
On the basis of systematic examinations it was found that the BMK functional significantly outperformed the other popular density functional theory methods including B3LYP, B3P86, KMLYP, MPW1P86, O3LYP, and X3LYP for the calculation of bond dissociation enthalpies (BDEs). However, it was also found that even the BMK functional might dramatically fail in predicting the BDEs of some chemical bonds. To solve this problem, a new composite ab initio method named G3//BMK was developed by combining the strengths of both the G3 theory and BMK. G3//BMK was found to outperform the G3 and G3//B3LYP methods. It could accurately predict the BDEs of diverse types of chemical bonds in various organic molecules within a precision of ca. 1.2 kcal/mol.
Binding SNOMED CT terms to archetype elements. Establishing a baseline of results.
Berges, I; Bermudez, J; Illarramendi, A
2015-01-01
This article is part of the Focus Theme of METHODS of Information in Medicine on "Managing Interoperability and Complexity in Health Systems". The proliferation of archetypes as a means to represent information of Electronic Health Records has raised the need of binding terminological codes - such as SNOMED CT codes - to their elements, in order to identify them univocally. However, the large size of the terminologies makes it difficult to perform this task manually. To establish a baseline of results for the aforementioned problem by using off-the-shelf string comparison-based techniques against which results from more complex techniques could be evaluated. Nine Typed Comparison METHODS were evaluated for binding using a set of 487 archetype elements. Their recall was calculated and Friedman and Nemenyi tests were applied in order to assess whether any of the methods outperformed the others. Using the qGrams method along with the 'Text' information piece of archetype elements outperforms the other methods if a level of confidence of 90% is considered. A recall of 25.26% is obtained if just one SNOMED CT term is retrieved for each archetype element. This recall rises to 50.51% and 75.56% if 10 and 100 elements are retrieved respectively, that being a reduction of more than 99.99% on the SNOMED CT code set. The baseline has been established following the above-mentioned results. Moreover, it has been observed that although string comparison-based methods do not outperform more sophisticated techniques, they still can be an alternative for providing a reduced set of candidate terms for each archetype element from which the ultimate term can be chosen later in the more-than-likely manual supervision task.
Distributed Adaptive Binary Quantization for Fast Nearest Neighbor Search.
Xianglong Liu; Zhujin Li; Cheng Deng; Dacheng Tao
2017-11-01
Hashing has been proved an attractive technique for fast nearest neighbor search over big data. Compared with the projection based hashing methods, prototype-based ones own stronger power to generate discriminative binary codes for the data with complex intrinsic structure. However, existing prototype-based methods, such as spherical hashing and K-means hashing, still suffer from the ineffective coding that utilizes the complete binary codes in a hypercube. To address this problem, we propose an adaptive binary quantization (ABQ) method that learns a discriminative hash function with prototypes associated with small unique binary codes. Our alternating optimization adaptively discovers the prototype set and the code set of a varying size in an efficient way, which together robustly approximate the data relations. Our method can be naturally generalized to the product space for long hash codes, and enjoys the fast training linear to the number of the training data. We further devise a distributed framework for the large-scale learning, which can significantly speed up the training of ABQ in the distributed environment that has been widely deployed in many areas nowadays. The extensive experiments on four large-scale (up to 80 million) data sets demonstrate that our method significantly outperforms state-of-the-art hashing methods, with up to 58.84% performance gains relatively.
Advanced Steel Microstructural Classification by Deep Learning Methods.
Azimi, Seyed Majid; Britz, Dominik; Engstler, Michael; Fritz, Mario; Mücklich, Frank
2018-02-01
The inner structure of a material is called microstructure. It stores the genesis of a material and determines all its physical and chemical properties. While microstructural characterization is widely spread and well known, the microstructural classification is mostly done manually by human experts, which gives rise to uncertainties due to subjectivity. Since the microstructure could be a combination of different phases or constituents with complex substructures its automatic classification is very challenging and only a few prior studies exist. Prior works focused on designed and engineered features by experts and classified microstructures separately from the feature extraction step. Recently, Deep Learning methods have shown strong performance in vision applications by learning the features from data together with the classification step. In this work, we propose a Deep Learning method for microstructural classification in the examples of certain microstructural constituents of low carbon steel. This novel method employs pixel-wise segmentation via Fully Convolutional Neural Network (FCNN) accompanied by a max-voting scheme. Our system achieves 93.94% classification accuracy, drastically outperforming the state-of-the-art method of 48.89% accuracy. Beyond the strong performance of our method, this line of research offers a more robust and first of all objective way for the difficult task of steel quality appreciation.
Cohen, Trevor; Schvaneveldt, Roger; Widdows, Dominic
2010-04-01
The discovery of implicit connections between terms that do not occur together in any scientific document underlies the model of literature-based knowledge discovery first proposed by Swanson. Corpus-derived statistical models of semantic distance such as Latent Semantic Analysis (LSA) have been evaluated previously as methods for the discovery of such implicit connections. However, LSA in particular is dependent on a computationally demanding method of dimension reduction as a means to obtain meaningful indirect inference, limiting its ability to scale to large text corpora. In this paper, we evaluate the ability of Random Indexing (RI), a scalable distributional model of word associations, to draw meaningful implicit relationships between terms in general and biomedical language. Proponents of this method have achieved comparable performance to LSA on several cognitive tasks while using a simpler and less computationally demanding method of dimension reduction than LSA employs. In this paper, we demonstrate that the original implementation of RI is ineffective at inferring meaningful indirect connections, and evaluate Reflective Random Indexing (RRI), an iterative variant of the method that is better able to perform indirect inference. RRI is shown to lead to more clearly related indirect connections and to outperform existing RI implementations in the prediction of future direct co-occurrence in the MEDLINE corpus. 2009 Elsevier Inc. All rights reserved.
Predicting protein contact map using evolutionary and physical constraints by integer programming.
Wang, Zhiyong; Xu, Jinbo
2013-07-01
Protein contact map describes the pairwise spatial and functional relationship of residues in a protein and contains key information for protein 3D structure prediction. Although studied extensively, it remains challenging to predict contact map using only sequence information. Most existing methods predict the contact map matrix element-by-element, ignoring correlation among contacts and physical feasibility of the whole-contact map. A couple of recent methods predict contact map by using mutual information, taking into consideration contact correlation and enforcing a sparsity restraint, but these methods demand for a very large number of sequence homologs for the protein under consideration and the resultant contact map may be still physically infeasible. This article presents a novel method PhyCMAP for contact map prediction, integrating both evolutionary and physical restraints by machine learning and integer linear programming. The evolutionary restraints are much more informative than mutual information, and the physical restraints specify more concrete relationship among contacts than the sparsity restraint. As such, our method greatly reduces the solution space of the contact map matrix and, thus, significantly improves prediction accuracy. Experimental results confirm that PhyCMAP outperforms currently popular methods no matter how many sequence homologs are available for the protein under consideration. http://raptorx.uchicago.edu.
Salient region detection by fusing bottom-up and top-down features extracted from a single image.
Tian, Huawei; Fang, Yuming; Zhao, Yao; Lin, Weisi; Ni, Rongrong; Zhu, Zhenfeng
2014-10-01
Recently, some global contrast-based salient region detection models have been proposed based on only the low-level feature of color. It is necessary to consider both color and orientation features to overcome their limitations, and thus improve the performance of salient region detection for images with low-contrast in color and high-contrast in orientation. In addition, the existing fusion methods for different feature maps, like the simple averaging method and the selective method, are not effective sufficiently. To overcome these limitations of existing salient region detection models, we propose a novel salient region model based on the bottom-up and top-down mechanisms: the color contrast and orientation contrast are adopted to calculate the bottom-up feature maps, while the top-down cue of depth-from-focus from the same single image is used to guide the generation of final salient regions, since depth-from-focus reflects the photographer's preference and knowledge of the task. A more general and effective fusion method is designed to combine the bottom-up feature maps. According to the degree-of-scattering and eccentricities of feature maps, the proposed fusion method can assign adaptive weights to different feature maps to reflect the confidence level of each feature map. The depth-from-focus of the image as a significant top-down feature for visual attention in the image is used to guide the salient regions during the fusion process; with its aid, the proposed fusion method can filter out the background and highlight salient regions for the image. Experimental results show that the proposed model outperforms the state-of-the-art models on three public available data sets.
Respiratory Artefact Removal in Forced Oscillation Measurements: A Machine Learning Approach.
Pham, Thuy T; Thamrin, Cindy; Robinson, Paul D; McEwan, Alistair L; Leong, Philip H W
2017-08-01
Respiratory artefact removal for the forced oscillation technique can be treated as an anomaly detection problem. Manual removal is currently considered the gold standard, but this approach is laborious and subjective. Most existing automated techniques used simple statistics and/or rejected anomalous data points. Unfortunately, simple statistics are insensitive to numerous artefacts, leading to low reproducibility of results. Furthermore, rejecting anomalous data points causes an imbalance between the inspiratory and expiratory contributions. From a machine learning perspective, such methods are unsupervised and can be considered simple feature extraction. We hypothesize that supervised techniques can be used to find improved features that are more discriminative and more highly correlated with the desired output. Features thus found are then used for anomaly detection by applying quartile thresholding, which rejects complete breaths if one of its features is out of range. The thresholds are determined by both saliency and performance metrics rather than qualitative assumptions as in previous works. Feature ranking indicates that our new landmark features are among the highest scoring candidates regardless of age across saliency criteria. F1-scores, receiver operating characteristic, and variability of the mean resistance metrics show that the proposed scheme outperforms previous simple feature extraction approaches. Our subject-independent detector, 1IQR-SU, demonstrated approval rates of 80.6% for adults and 98% for children, higher than existing methods. Our new features are more relevant. Our removal is objective and comparable to the manual method. This is a critical work to automate forced oscillation technique quality control.
RENT+: an improved method for inferring local genealogical trees from haplotypes with recombination
Mirzaei, Sajad; Wu, Yufeng
2017-01-01
Abstract Motivation: Haplotypes from one or multiple related populations share a common genealogical history. If this shared genealogy can be inferred from haplotypes, it can be very useful for many population genetics problems. However, with the presence of recombination, the genealogical history of haplotypes is complex and cannot be represented by a single genealogical tree. Therefore, inference of genealogical history with recombination is much more challenging than the case of no recombination. Results: In this paper, we present a new approach called RENT+ for the inference of local genealogical trees from haplotypes with the presence of recombination. RENT+ builds on a previous genealogy inference approach called RENT, which infers a set of related genealogical trees at different genomic positions. RENT+ represents a significant improvement over RENT in the sense that it is more effective in extracting information contained in the haplotype data about the underlying genealogy than RENT. The key components of RENT+ are several greatly enhanced genealogy inference rules. Through simulation, we show that RENT+ is more efficient and accurate than several existing genealogy inference methods. As an application, we apply RENT+ in the inference of population demographic history from haplotypes, which outperforms several existing methods. Availability and Implementation: RENT+ is implemented in Java, and is freely available for download from: https://github.com/SajadMirzaei/RentPlus. Contacts: sajad@engr.uconn.edu or ywu@engr.uconn.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28065901
NASA Astrophysics Data System (ADS)
Krestyannikov, E.; Tohka, J.; Ruotsalainen, U.
2008-06-01
This paper presents a novel statistical approach for joint estimation of regions-of-interest (ROIs) and the corresponding time-activity curves (TACs) from dynamic positron emission tomography (PET) brain projection data. It is based on optimizing the joint objective function that consists of a data log-likelihood term and two penalty terms reflecting the available a priori information about the human brain anatomy. The developed local optimization strategy iteratively updates both the ROI and TAC parameters and is guaranteed to monotonically increase the objective function. The quantitative evaluation of the algorithm is performed with numerically and Monte Carlo-simulated dynamic PET brain data of the 11C-Raclopride and 18F-FDG tracers. The results demonstrate that the method outperforms the existing sequential ROI quantification approaches in terms of accuracy, and can noticeably reduce the errors in TACs arising due to the finite spatial resolution and ROI delineation.
NASA Astrophysics Data System (ADS)
Liu, Hao; Li, Kangda; Wang, Bing; Tang, Hainie; Gong, Xiaohui
2017-01-01
A quantized block compressive sensing (QBCS) framework, which incorporates the universal measurement, quantization/inverse quantization, entropy coder/decoder, and iterative projected Landweber reconstruction, is summarized. Under the QBCS framework, this paper presents an improved reconstruction algorithm for aerial imagery, QBCS, with entropy-aware projected Landweber (QBCS-EPL), which leverages the full-image sparse transform without Wiener filter and an entropy-aware thresholding model for wavelet-domain image denoising. Through analyzing the functional relation between the soft-thresholding factors and entropy-based bitrates for different quantization methods, the proposed model can effectively remove wavelet-domain noise of bivariate shrinkage and achieve better image reconstruction quality. For the overall performance of QBCS reconstruction, experimental results demonstrate that the proposed QBCS-EPL algorithm significantly outperforms several existing algorithms. With the experiment-driven methodology, the QBCS-EPL algorithm can obtain better reconstruction quality at a relatively moderate computational cost, which makes it more desirable for aerial imagery applications.
Dong, Yitong; Qiao, Tian; Kim, Doyun; Parobek, David; Rossi, Daniel; Son, Dong Hee
2018-05-09
Cesium lead halide (CsPbX 3 ) nanocrystals have emerged as a new family of materials that can outperform the existing semiconductor nanocrystals due to their superb optical and charge-transport properties. However, the lack of a robust method for producing quantum dots with controlled size and high ensemble uniformity has been one of the major obstacles in exploring the useful properties of excitons in zero-dimensional nanostructures of CsPbX 3 . Here, we report a new synthesis approach that enables the precise control of the size based on the equilibrium rather than kinetics, producing CsPbX 3 quantum dots nearly free of heterogeneous broadening in their exciton luminescence. The high level of size control and ensemble uniformity achieved here will open the door to harnessing the benefits of excitons in CsPbX 3 quantum dots for photonic and energy-harvesting applications.
Genome-wide assessment of differential translations with ribosome profiling data.
Xiao, Zhengtao; Zou, Qin; Liu, Yu; Yang, Xuerui
2016-04-04
The closely regulated process of mRNA translation is crucial for precise control of protein abundance and quality. Ribosome profiling, a combination of ribosome foot-printing and RNA deep sequencing, has been used in a large variety of studies to quantify genome-wide mRNA translation. Here, we developed Xtail, an analysis pipeline tailored for ribosome profiling data that comprehensively and accurately identifies differentially translated genes in pairwise comparisons. Applied on simulated and real datasets, Xtail exhibits high sensitivity with minimal false-positive rates, outperforming existing methods in the accuracy of quantifying differential translations. With published ribosome profiling datasets, Xtail does not only reveal differentially translated genes that make biological sense, but also uncovers new events of differential translation in human cancer cells on mTOR signalling perturbation and in human primary macrophages on interferon gamma (IFN-γ) treatment. This demonstrates the value of Xtail in providing novel insights into the molecular mechanisms that involve translational dysregulations.
Automatic inference of indexing rules for MEDLINE
Névéol, Aurélie; Shooshan, Sonya E; Claveau, Vincent
2008-01-01
Background: Indexing is a crucial step in any information retrieval system. In MEDLINE, a widely used database of the biomedical literature, the indexing process involves the selection of Medical Subject Headings in order to describe the subject matter of articles. The need for automatic tools to assist MEDLINE indexers in this task is growing with the increasing number of publications being added to MEDLINE. Methods: In this paper, we describe the use and the customization of Inductive Logic Programming (ILP) to infer indexing rules that may be used to produce automatic indexing recommendations for MEDLINE indexers. Results: Our results show that this original ILP-based approach outperforms manual rules when they exist. In addition, the use of ILP rules also improves the overall performance of the Medical Text Indexer (MTI), a system producing automatic indexing recommendations for MEDLINE. Conclusion: We expect the sets of ILP rules obtained in this experiment to be integrated into MTI. PMID:19025687
Active impulsive noise control using maximum correntropy with adaptive kernel size
NASA Astrophysics Data System (ADS)
Lu, Lu; Zhao, Haiquan
2017-03-01
The active noise control (ANC) based on the principle of superposition is an attractive method to attenuate the noise signals. However, the impulsive noise in the ANC systems will degrade the performance of the controller. In this paper, a filtered-x recursive maximum correntropy (FxRMC) algorithm is proposed based on the maximum correntropy criterion (MCC) to reduce the effect of outliers. The proposed FxRMC algorithm does not requires any priori information of the noise characteristics and outperforms the filtered-x least mean square (FxLMS) algorithm for impulsive noise. Meanwhile, in order to adjust the kernel size of FxRMC algorithm online, a recursive approach is proposed through taking into account the past estimates of error signals over a sliding window. Simulation and experimental results in the context of active impulsive noise control demonstrate that the proposed algorithms achieve much better performance than the existing algorithms in various noise environments.
Cheung, Y M; Leung, W M; Xu, L
1997-01-01
We propose a prediction model called Rival Penalized Competitive Learning (RPCL) and Combined Linear Predictor method (CLP), which involves a set of local linear predictors such that a prediction is made by the combination of some activated predictors through a gating network (Xu et al., 1994). Furthermore, we present its improved variant named Adaptive RPCL-CLP that includes an adaptive learning mechanism as well as a data pre-and-post processing scheme. We compare them with some existing models by demonstrating their performance on two real-world financial time series--a China stock price and an exchange-rate series of US Dollar (USD) versus Deutschmark (DEM). Experiments have shown that Adaptive RPCL-CLP not only outperforms the other approaches with the smallest prediction error and training costs, but also brings in considerable high profits in the trading simulation of foreign exchange market.
Translation of Genotype to Phenotype by a Hierarchy of Cell Subsystems.
Yu, Michael Ku; Kramer, Michael; Dutkowski, Janusz; Srivas, Rohith; Licon, Katherine; Kreisberg, Jason; Ng, Cherie T; Krogan, Nevan; Sharan, Roded; Ideker, Trey
2016-02-24
Accurately translating genotype to phenotype requires accounting for the functional impact of genetic variation at many biological scales. Here we present a strategy for genotype-phenotype reasoning based on existing knowledge of cellular subsystems. These subsystems and their hierarchical organization are defined by the Gene Ontology or a complementary ontology inferred directly from previously published datasets. Guided by the ontology's hierarchical structure, we organize genotype data into an "ontotype," that is, a hierarchy of perturbations representing the effects of genetic variation at multiple cellular scales. The ontotype is then interpreted using logical rules generated by machine learning to predict phenotype. This approach substantially outperforms previous, non-hierarchical methods for translating yeast genotype to cell growth phenotype, and it accurately predicts the growth outcomes of two new screens of 2,503 double gene knockouts impacting DNA repair or nuclear lumen. Ontotypes also generalize to larger knockout combinations, setting the stage for interpreting the complex genetics of disease.
A Generalized Mixture Framework for Multi-label Classification
Hong, Charmgil; Batal, Iyad; Hauskrecht, Milos
2015-01-01
We develop a novel probabilistic ensemble framework for multi-label classification that is based on the mixtures-of-experts architecture. In this framework, we combine multi-label classification models in the classifier chains family that decompose the class posterior distribution P(Y1, …, Yd|X) using a product of posterior distributions over components of the output space. Our approach captures different input–output and output–output relations that tend to change across data. As a result, we can recover a rich set of dependency relations among inputs and outputs that a single multi-label classification model cannot capture due to its modeling simplifications. We develop and present algorithms for learning the mixtures-of-experts models from data and for performing multi-label predictions on unseen data instances. Experiments on multiple benchmark datasets demonstrate that our approach achieves highly competitive results and outperforms the existing state-of-the-art multi-label classification methods. PMID:26613069
A biologically inspired immunization strategy for network epidemiology.
Liu, Yang; Deng, Yong; Jusup, Marko; Wang, Zhen
2016-07-07
Well-known immunization strategies, based on degree centrality, betweenness centrality, or closeness centrality, either neglect the structural significance of a node or require global information about the network. We propose a biologically inspired immunization strategy that circumvents both of these problems by considering the number of links of a focal node and the way the neighbors are connected among themselves. The strategy thus measures the dependence of the neighbors on the focal node, identifying the ability of this node to spread the disease. Nodes with the highest ability in the network are the first to be immunized. To test the performance of our method, we conduct numerical simulations on several computer-generated and empirical networks, using the susceptible-infected-recovered (SIR) model. The results show that the proposed strategy largely outperforms the existing well-known strategies. Copyright © 2016 Elsevier Ltd. All rights reserved.
Identification of hybrid node and link communities in complex networks
He, Dongxiao; Jin, Di; Chen, Zheng; Zhang, Weixiong
2015-01-01
Identifying communities in complex networks is an effective means for analyzing complex systems, with applications in diverse areas such as social science, engineering, biology and medicine. Finding communities of nodes and finding communities of links are two popular schemes for network analysis. These schemes, however, have inherent drawbacks and are inadequate to capture complex organizational structures in real networks. We introduce a new scheme and an effective approach for identifying complex mixture structures of node and link communities, called hybrid node-link communities. A central piece of our approach is a probabilistic model that accommodates node, link and hybrid node-link communities. Our extensive experiments on various real-world networks, including a large protein-protein interaction network and a large network of semantically associated words, illustrated that the scheme for hybrid communities is superior in revealing network characteristics. Moreover, the new approach outperformed the existing methods for finding node or link communities separately. PMID:25728010
Identification of hybrid node and link communities in complex networks.
He, Dongxiao; Jin, Di; Chen, Zheng; Zhang, Weixiong
2015-03-02
Identifying communities in complex networks is an effective means for analyzing complex systems, with applications in diverse areas such as social science, engineering, biology and medicine. Finding communities of nodes and finding communities of links are two popular schemes for network analysis. These schemes, however, have inherent drawbacks and are inadequate to capture complex organizational structures in real networks. We introduce a new scheme and an effective approach for identifying complex mixture structures of node and link communities, called hybrid node-link communities. A central piece of our approach is a probabilistic model that accommodates node, link and hybrid node-link communities. Our extensive experiments on various real-world networks, including a large protein-protein interaction network and a large network of semantically associated words, illustrated that the scheme for hybrid communities is superior in revealing network characteristics. Moreover, the new approach outperformed the existing methods for finding node or link communities separately.
Identification of hybrid node and link communities in complex networks
NASA Astrophysics Data System (ADS)
He, Dongxiao; Jin, Di; Chen, Zheng; Zhang, Weixiong
2015-03-01
Identifying communities in complex networks is an effective means for analyzing complex systems, with applications in diverse areas such as social science, engineering, biology and medicine. Finding communities of nodes and finding communities of links are two popular schemes for network analysis. These schemes, however, have inherent drawbacks and are inadequate to capture complex organizational structures in real networks. We introduce a new scheme and an effective approach for identifying complex mixture structures of node and link communities, called hybrid node-link communities. A central piece of our approach is a probabilistic model that accommodates node, link and hybrid node-link communities. Our extensive experiments on various real-world networks, including a large protein-protein interaction network and a large network of semantically associated words, illustrated that the scheme for hybrid communities is superior in revealing network characteristics. Moreover, the new approach outperformed the existing methods for finding node or link communities separately.
Infrared traffic image enhancement algorithm based on dark channel prior and gamma correction
NASA Astrophysics Data System (ADS)
Zheng, Lintao; Shi, Hengliang; Gu, Ming
2017-07-01
The infrared traffic image acquired by the intelligent traffic surveillance equipment has low contrast, little hierarchical differences in perceptions of image and the blurred vision effect. Therefore, infrared traffic image enhancement, being an indispensable key step, is applied to nearly all infrared imaging based traffic engineering applications. In this paper, we propose an infrared traffic image enhancement algorithm that is based on dark channel prior and gamma correction. In existing research dark channel prior, known as a famous image dehazing method, here is used to do infrared image enhancement for the first time. Initially, in the proposed algorithm, the original degraded infrared traffic image is transformed with dark channel prior as the initial enhanced result. A further adjustment based on the gamma curve is needed because initial enhanced result has lower brightness. Comprehensive validation experiments reveal that the proposed algorithm outperforms the current state-of-the-art algorithms.
A Hybrid Genetic Programming Algorithm for Automated Design of Dispatching Rules.
Nguyen, Su; Mei, Yi; Xue, Bing; Zhang, Mengjie
2018-06-04
Designing effective dispatching rules for production systems is a difficult and timeconsuming task if it is done manually. In the last decade, the growth of computing power, advanced machine learning, and optimisation techniques has made the automated design of dispatching rules possible and automatically discovered rules are competitive or outperform existing rules developed by researchers. Genetic programming is one of the most popular approaches to discovering dispatching rules in the literature, especially for complex production systems. However, the large heuristic search space may restrict genetic programming from finding near optimal dispatching rules. This paper develops a new hybrid genetic programming algorithm for dynamic job shop scheduling based on a new representation, a new local search heuristic, and efficient fitness evaluators. Experiments show that the new method is effective regarding the quality of evolved rules. Moreover, evolved rules are also significantly smaller and contain more relevant attributes.
Prediction of Patient-Controlled Analgesic Consumption: A Multimodel Regression Tree Approach.
Hu, Yuh-Jyh; Ku, Tien-Hsiung; Yang, Yu-Hung; Shen, Jia-Ying
2018-01-01
Several factors contribute to individual variability in postoperative pain, therefore, individuals consume postoperative analgesics at different rates. Although many statistical studies have analyzed postoperative pain and analgesic consumption, most have identified only the correlation and have not subjected the statistical model to further tests in order to evaluate its predictive accuracy. In this study involving 3052 patients, a multistrategy computational approach was developed for analgesic consumption prediction. This approach uses data on patient-controlled analgesia demand behavior over time and combines clustering, classification, and regression to mitigate the limitations of current statistical models. Cross-validation results indicated that the proposed approach significantly outperforms various existing regression methods. Moreover, a comparison between the predictions by anesthesiologists and medical specialists and those of the computational approach for an independent test data set of 60 patients further evidenced the superiority of the computational approach in predicting analgesic consumption because it produced markedly lower root mean squared errors.
Consistently Sampled Correlation Filters with Space Anisotropic Regularization for Visual Tracking
Shi, Guokai; Xu, Tingfa; Luo, Jiqiang; Li, Yuankun
2017-01-01
Most existing correlation filter-based tracking algorithms, which use fixed patches and cyclic shifts as training and detection measures, assume that the training samples are reliable and ignore the inconsistencies between training samples and detection samples. We propose to construct and study a consistently sampled correlation filter with space anisotropic regularization (CSSAR) to solve these two problems simultaneously. Our approach constructs a spatiotemporally consistent sample strategy to alleviate the redundancies in training samples caused by the cyclical shifts, eliminate the inconsistencies between training samples and detection samples, and introduce space anisotropic regularization to constrain the correlation filter for alleviating drift caused by occlusion. Moreover, an optimization strategy based on the Gauss-Seidel method was developed for obtaining robust and efficient online learning. Both qualitative and quantitative evaluations demonstrate that our tracker outperforms state-of-the-art trackers in object tracking benchmarks (OTBs). PMID:29231876
Wan, Shibiao; Mak, Man-Wai; Kung, Sun-Yuan
2016-12-02
In the postgenomic era, the number of unreviewed protein sequences is remarkably larger and grows tremendously faster than that of reviewed ones. However, existing methods for protein subchloroplast localization often ignore the information from these unlabeled proteins. This paper proposes a multi-label predictor based on ensemble linear neighborhood propagation (LNP), namely, LNP-Chlo, which leverages hybrid sequence-based feature information from both labeled and unlabeled proteins for predicting localization of both single- and multi-label chloroplast proteins. Experimental results on a stringent benchmark dataset and a novel independent dataset suggest that LNP-Chlo performs at least 6% (absolute) better than state-of-the-art predictors. This paper also demonstrates that ensemble LNP significantly outperforms LNP based on individual features. For readers' convenience, the online Web server LNP-Chlo is freely available at http://bioinfo.eie.polyu.edu.hk/LNPChloServer/ .
Jelínek, Jan; Škoda, Petr; Hoksza, David
2017-12-06
Protein-protein interactions (PPI) play a key role in an investigation of various biochemical processes, and their identification is thus of great importance. Although computational prediction of which amino acids take part in a PPI has been an active field of research for some time, the quality of in-silico methods is still far from perfect. We have developed a novel prediction method called INSPiRE which benefits from a knowledge base built from data available in Protein Data Bank. All proteins involved in PPIs were converted into labeled graphs with nodes corresponding to amino acids and edges to pairs of neighboring amino acids. A structural neighborhood of each node was then encoded into a bit string and stored in the knowledge base. When predicting PPIs, INSPiRE labels amino acids of unknown proteins as interface or non-interface based on how often their structural neighborhood appears as interface or non-interface in the knowledge base. We evaluated INSPiRE's behavior with respect to different types and sizes of the structural neighborhood. Furthermore, we examined the suitability of several different features for labeling the nodes. Our evaluations showed that INSPiRE clearly outperforms existing methods with respect to Matthews correlation coefficient. In this paper we introduce a new knowledge-based method for identification of protein-protein interaction sites called INSPiRE. Its knowledge base utilizes structural patterns of known interaction sites in the Protein Data Bank which are then used for PPI prediction. Extensive experiments on several well-established datasets show that INSPiRE significantly surpasses existing PPI approaches.
Deep learning with domain adaptation for accelerated projection-reconstruction MR.
Han, Yoseob; Yoo, Jaejun; Kim, Hak Hee; Shin, Hee Jung; Sung, Kyunghyun; Ye, Jong Chul
2018-09-01
The radial k-space trajectory is a well-established sampling trajectory used in conjunction with magnetic resonance imaging. However, the radial k-space trajectory requires a large number of radial lines for high-resolution reconstruction. Increasing the number of radial lines causes longer acquisition time, making it more difficult for routine clinical use. On the other hand, if we reduce the number of radial lines, streaking artifact patterns are unavoidable. To solve this problem, we propose a novel deep learning approach with domain adaptation to restore high-resolution MR images from under-sampled k-space data. The proposed deep network removes the streaking artifacts from the artifact corrupted images. To address the situation given the limited available data, we propose a domain adaptation scheme that employs a pre-trained network using a large number of X-ray computed tomography (CT) or synthesized radial MR datasets, which is then fine-tuned with only a few radial MR datasets. The proposed method outperforms existing compressed sensing algorithms, such as the total variation and PR-FOCUSS methods. In addition, the calculation time is several orders of magnitude faster than the total variation and PR-FOCUSS methods. Moreover, we found that pre-training using CT or MR data from similar organ data is more important than pre-training using data from the same modality for different organ. We demonstrate the possibility of a domain-adaptation when only a limited amount of MR data is available. The proposed method surpasses the existing compressed sensing algorithms in terms of the image quality and computation time. © 2018 International Society for Magnetic Resonance in Medicine.
Li, Ben; Sun, Zhaonan; He, Qing; Zhu, Yu; Qin, Zhaohui S.
2016-01-01
Motivation: Modern high-throughput biotechnologies such as microarray are capable of producing a massive amount of information for each sample. However, in a typical high-throughput experiment, only limited number of samples were assayed, thus the classical ‘large p, small n’ problem. On the other hand, rapid propagation of these high-throughput technologies has resulted in a substantial collection of data, often carried out on the same platform and using the same protocol. It is highly desirable to utilize the existing data when performing analysis and inference on a new dataset. Results: Utilizing existing data can be carried out in a straightforward fashion under the Bayesian framework in which the repository of historical data can be exploited to build informative priors and used in new data analysis. In this work, using microarray data, we investigate the feasibility and effectiveness of deriving informative priors from historical data and using them in the problem of detecting differentially expressed genes. Through simulation and real data analysis, we show that the proposed strategy significantly outperforms existing methods including the popular and state-of-the-art Bayesian hierarchical model-based approaches. Our work illustrates the feasibility and benefits of exploiting the increasingly available genomics big data in statistical inference and presents a promising practical strategy for dealing with the ‘large p, small n’ problem. Availability and implementation: Our method is implemented in R package IPBT, which is freely available from https://github.com/benliemory/IPBT. Contact: yuzhu@purdue.edu; zhaohui.qin@emory.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26519502
Jung, Ji-Young; Seo, Dong-Yoon; Lee, Jung-Ryun
2018-01-04
A wireless sensor network (WSN) is emerging as an innovative method for gathering information that will significantly improve the reliability and efficiency of infrastructure systems. Broadcast is a common method to disseminate information in WSNs. A variety of counter-based broadcast schemes have been proposed to mitigate the broadcast-storm problems, using the count threshold value and a random access delay. However, because of the limited propagation of the broadcast-message, there exists a trade-off in a sense that redundant retransmissions of the broadcast-message become low and energy efficiency of a node is enhanced, but reachability become low. Therefore, it is necessary to study an efficient counter-based broadcast scheme that can dynamically adjust the random access delay and count threshold value to ensure high reachability, low redundant of broadcast-messages, and low energy consumption of nodes. Thus, in this paper, we first measure the additional coverage provided by a node that receives the same broadcast-message from two neighbor nodes, in order to achieve high reachability with low redundant retransmissions of broadcast-messages. Second, we propose a new counter-based broadcast scheme considering the size of the additional coverage area, distance between the node and the broadcasting node, remaining battery of the node, and variations of the node density. Finally, we evaluate performance of the proposed scheme compared with the existing counter-based broadcast schemes. Simulation results show that the proposed scheme outperforms the existing schemes in terms of saved rebroadcasts, reachability, and total energy consumption.
BigFoot: Bayesian alignment and phylogenetic footprinting with MCMC
Satija, Rahul; Novák, Ádám; Miklós, István; Lyngsø, Rune; Hein, Jotun
2009-01-01
Background We have previously combined statistical alignment and phylogenetic footprinting to detect conserved functional elements without assuming a fixed alignment. Considering a probability-weighted distribution of alignments removes sensitivity to alignment errors, properly accommodates regions of alignment uncertainty, and increases the accuracy of functional element prediction. Our method utilized standard dynamic programming hidden markov model algorithms to analyze up to four sequences. Results We present a novel approach, implemented in the software package BigFoot, for performing phylogenetic footprinting on greater numbers of sequences. We have developed a Markov chain Monte Carlo (MCMC) approach which samples both sequence alignments and locations of slowly evolving regions. We implement our method as an extension of the existing StatAlign software package and test it on well-annotated regions controlling the expression of the even-skipped gene in Drosophila and the α-globin gene in vertebrates. The results exhibit how adding additional sequences to the analysis has the potential to improve the accuracy of functional predictions, and demonstrate how BigFoot outperforms existing alignment-based phylogenetic footprinting techniques. Conclusion BigFoot extends a combined alignment and phylogenetic footprinting approach to analyze larger amounts of sequence data using MCMC. Our approach is robust to alignment error and uncertainty and can be applied to a variety of biological datasets. The source code and documentation are publicly available for download from PMID:19715598
BigFoot: Bayesian alignment and phylogenetic footprinting with MCMC.
Satija, Rahul; Novák, Adám; Miklós, István; Lyngsø, Rune; Hein, Jotun
2009-08-28
We have previously combined statistical alignment and phylogenetic footprinting to detect conserved functional elements without assuming a fixed alignment. Considering a probability-weighted distribution of alignments removes sensitivity to alignment errors, properly accommodates regions of alignment uncertainty, and increases the accuracy of functional element prediction. Our method utilized standard dynamic programming hidden markov model algorithms to analyze up to four sequences. We present a novel approach, implemented in the software package BigFoot, for performing phylogenetic footprinting on greater numbers of sequences. We have developed a Markov chain Monte Carlo (MCMC) approach which samples both sequence alignments and locations of slowly evolving regions. We implement our method as an extension of the existing StatAlign software package and test it on well-annotated regions controlling the expression of the even-skipped gene in Drosophila and the alpha-globin gene in vertebrates. The results exhibit how adding additional sequences to the analysis has the potential to improve the accuracy of functional predictions, and demonstrate how BigFoot outperforms existing alignment-based phylogenetic footprinting techniques. BigFoot extends a combined alignment and phylogenetic footprinting approach to analyze larger amounts of sequence data using MCMC. Our approach is robust to alignment error and uncertainty and can be applied to a variety of biological datasets. The source code and documentation are publicly available for download from http://www.stats.ox.ac.uk/~satija/BigFoot/
Cang, Zixuan; Wei, Guo-Wei
2018-02-01
Protein-ligand binding is a fundamental biological process that is paramount to many other biological processes, such as signal transduction, metabolic pathways, enzyme construction, cell secretion, and gene expression. Accurate prediction of protein-ligand binding affinities is vital to rational drug design and the understanding of protein-ligand binding and binding induced function. Existing binding affinity prediction methods are inundated with geometric detail and involve excessively high dimensions, which undermines their predictive power for massive binding data. Topology provides the ultimate level of abstraction and thus incurs too much reduction in geometric information. Persistent homology embeds geometric information into topological invariants and bridges the gap between complex geometry and abstract topology. However, it oversimplifies biological information. This work introduces element specific persistent homology (ESPH) or multicomponent persistent homology to retain crucial biological information during topological simplification. The combination of ESPH and machine learning gives rise to a powerful paradigm for macromolecular analysis. Tests on 2 large data sets indicate that the proposed topology-based machine-learning paradigm outperforms other existing methods in protein-ligand binding affinity predictions. ESPH reveals protein-ligand binding mechanism that can not be attained from other conventional techniques. The present approach reveals that protein-ligand hydrophobic interactions are extended to 40Å away from the binding site, which has a significant ramification to drug and protein design. Copyright © 2017 John Wiley & Sons, Ltd.
A novel missense-mutation-related feature extraction scheme for 'driver' mutation identification.
Tan, Hua; Bao, Jiguang; Zhou, Xiaobo
2012-11-15
It becomes widely accepted that human cancer is a disease involving dynamic changes in the genome and that the missense mutations constitute the bulk of human genetic variations. A multitude of computational algorithms, especially the machine learning-based ones, has consequently been proposed to distinguish missense changes that contribute to the cancer progression ('driver' mutation) from those that do not ('passenger' mutation). However, the existing methods have multifaceted shortcomings, in the sense that they either adopt incomplete feature space or depend on protein structural databases which are usually far from integrated. In this article, we investigated multiple aspects of a missense mutation and identified a novel feature space that well distinguishes cancer-associated driver mutations from passenger ones. An index (DX score) was proposed to evaluate the discriminating capability of each feature, and a subset of these features which ranks top was selected to build the SVM classifier. Cross-validation showed that the classifier trained on our selected features significantly outperforms the existing ones both in precision and robustness. We applied our method to several datasets of missense mutations culled from published database and literature and obtained more reasonable results than previous studies. The software is available online at http://www.methodisthealth.com/software and https://sites.google.com/site/drivermutationidentification/. xzhou@tmhs.org. Supplementary data are available at Bioinformatics online.
The Cape Town Clinical Decision Rule for Streptococcal Pharyngitis in Children
Engel, Mark Emmanuel; Cohen, Karen; Gounden, Ronald; Kengne, Andre P.; Barth, Dylan Dominic; Whitelaw, Andrew C; Francis, Veronica; Badri, Motasim; Stewart, Annemie; Dale, James B.; Mayosi, Bongani M.; Maartens, Gary
2016-01-01
Background Existing clinical decision rules (CDR) to diagnose group A streptococcal (GAS) pharyngitis have not been validated in sub-Saharan Africa. We developed a locally applicable CDR while evaluating existing CDRs for diagnosing GAS pharyngitis in South African children. Methods We conducted a prospective cohort study and enrolled 997 children aged 3-15 years presenting to primary care clinics with a complaint of sore throat, and whose parents provided consent. Main outcome measures were signs and symptoms of pharyngitis, and a positive GAS culture from a throat swab. Bivariate and multivariate analyses were used to develop the clinical decision rule. In addition, the diagnostic effectiveness of six existing rules for predicting a positive culture in our cohort was assessed. Results 206 of 982 children (21%) had a positive GAS culture. Tonsillar swelling, tonsillar exudates, tender or enlarged anterior cervical lymph nodes, absence of cough and absence of rhinorrhea were associated with positive cultures in bivariate and multivariate analyses. Four variables (tonsillar swelling and one of tonsillar exudate, no rhinorrhea, no cough), when used in a cumulative score, showed 83.7% sensitivity and 32.2% specificity for GAS pharyngitis. Of existing rules tested, the McIsaac rule had the highest positive predictive value (28%), but missed 49% of the culture-positive children who should have been treated. Conclusion The new four-variable clinical decision rule for GAS pharyngitis (i.e., tonsillar swelling and one of tonsillar exudate, no rhinorrhea, no cough) outperformed existing rules for GAS pharyngitis diagnosis in children with symptomatic sore throat in Cape Town. PMID:27870815
NASA Astrophysics Data System (ADS)
Gide, Milind S.; Karam, Lina J.
2016-08-01
With the increased focus on visual attention (VA) in the last decade, a large number of computational visual saliency methods have been developed over the past few years. These models are traditionally evaluated by using performance evaluation metrics that quantify the match between predicted saliency and fixation data obtained from eye-tracking experiments on human observers. Though a considerable number of such metrics have been proposed in the literature, there are notable problems in them. In this work, we discuss shortcomings in existing metrics through illustrative examples and propose a new metric that uses local weights based on fixation density which overcomes these flaws. To compare the performance of our proposed metric at assessing the quality of saliency prediction with other existing metrics, we construct a ground-truth subjective database in which saliency maps obtained from 17 different VA models are evaluated by 16 human observers on a 5-point categorical scale in terms of their visual resemblance with corresponding ground-truth fixation density maps obtained from eye-tracking data. The metrics are evaluated by correlating metric scores with the human subjective ratings. The correlation results show that the proposed evaluation metric outperforms all other popular existing metrics. Additionally, the constructed database and corresponding subjective ratings provide an insight into which of the existing metrics and future metrics are better at estimating the quality of saliency prediction and can be used as a benchmark.
Chen, Shyi-Ming; Chen, Shen-Wen
2015-03-01
In this paper, we present a new method for fuzzy forecasting based on two-factors second-order fuzzy-trend logical relationship groups and the probabilities of trends of fuzzy-trend logical relationships. Firstly, the proposed method fuzzifies the historical training data of the main factor and the secondary factor into fuzzy sets, respectively, to form two-factors second-order fuzzy logical relationships. Then, it groups the obtained two-factors second-order fuzzy logical relationships into two-factors second-order fuzzy-trend logical relationship groups. Then, it calculates the probability of the "down-trend," the probability of the "equal-trend" and the probability of the "up-trend" of the two-factors second-order fuzzy-trend logical relationships in each two-factors second-order fuzzy-trend logical relationship group, respectively. Finally, it performs the forecasting based on the probabilities of the down-trend, the equal-trend, and the up-trend of the two-factors second-order fuzzy-trend logical relationships in each two-factors second-order fuzzy-trend logical relationship group. We also apply the proposed method to forecast the Taiwan Stock Exchange Capitalization Weighted Stock Index (TAIEX) and the NTD/USD exchange rates. The experimental results show that the proposed method outperforms the existing methods.
Descriptive Statistics of the Genome: Phylogenetic Classification of Viruses.
Hernandez, Troy; Yang, Jie
2016-10-01
The typical process for classifying and submitting a newly sequenced virus to the NCBI database involves two steps. First, a BLAST search is performed to determine likely family candidates. That is followed by checking the candidate families with the pairwise sequence alignment tool for similar species. The submitter's judgment is then used to determine the most likely species classification. The aim of this article is to show that this process can be automated into a fast, accurate, one-step process using the proposed alignment-free method and properly implemented machine learning techniques. We present a new family of alignment-free vectorizations of the genome, the generalized vector, that maintains the speed of existing alignment-free methods while outperforming all available methods. This new alignment-free vectorization uses the frequency of genomic words (k-mers), as is done in the composition vector, and incorporates descriptive statistics of those k-mers' positional information, as inspired by the natural vector. We analyze five different characterizations of genome similarity using k-nearest neighbor classification and evaluate these on two collections of viruses totaling over 10,000 viruses. We show that our proposed method performs better than, or as well as, other methods at every level of the phylogenetic hierarchy. The data and R code is available upon request.
An Effective Palmprint Recognition Approach for Visible and Multispectral Sensor Images
Sammouda, Rachid; Al-Salman, Abdul Malik; Alsanad, Ahmed
2018-01-01
Among several palmprint feature extraction methods the HOG-based method is attractive and performs well against changes in illumination and shadowing of palmprint images. However, it still lacks the robustness to extract the palmprint features at different rotation angles. To solve this problem, this paper presents a hybrid feature extraction method, named HOG-SGF that combines the histogram of oriented gradients (HOG) with a steerable Gaussian filter (SGF) to develop an effective palmprint recognition approach. The approach starts by processing all palmprint images by David Zhang’s method to segment only the region of interests. Next, we extracted palmprint features based on the hybrid HOG-SGF feature extraction method. Then, an optimized auto-encoder (AE) was utilized to reduce the dimensionality of the extracted features. Finally, a fast and robust regularized extreme learning machine (RELM) was applied for the classification task. In the evaluation phase of the proposed approach, a number of experiments were conducted on three publicly available palmprint databases, namely MS-PolyU of multispectral palmprint images and CASIA and Tongji of contactless palmprint images. Experimentally, the results reveal that the proposed approach outperforms the existing state-of-the-art approaches even when a small number of training samples are used. PMID:29762519
RELIC: a novel dye-bias correction method for Illumina Methylation BeadChip.
Xu, Zongli; Langie, Sabine A S; De Boever, Patrick; Taylor, Jack A; Niu, Liang
2017-01-03
The Illumina Infinium HumanMethylation450 BeadChip and its successor, Infinium MethylationEPIC BeadChip, have been extensively utilized in epigenome-wide association studies. Both arrays use two fluorescent dyes (Cy3-green/Cy5-red) to measure methylation level at CpG sites. However, performance difference between dyes can result in biased estimates of methylation levels. Here we describe a novel method, called REgression on Logarithm of Internal Control probes (RELIC) to correct for dye bias on whole array by utilizing the intensity values of paired internal control probes that monitor the two color channels. We evaluate the method in several datasets against other widely used dye-bias correction methods. Results on data quality improvement showed that RELIC correction statistically significantly outperforms alternative dye-bias correction methods. We incorporated the method into the R package ENmix, which is freely available from the Bioconductor website ( https://www.bioconductor.org/packages/release/bioc/html/ENmix.html ). RELIC is an efficient and robust method to correct for dye-bias in Illumina Methylation BeadChip data. It outperforms other alternative methods and conveniently implemented in R package ENmix to facilitate DNA methylation studies.
Connected Component Model for Multi-Object Tracking.
He, Zhenyu; Li, Xin; You, Xinge; Tao, Dacheng; Tang, Yuan Yan
2016-08-01
In multi-object tracking, it is critical to explore the data associations by exploiting the temporal information from a sequence of frames rather than the information from the adjacent two frames. Since straightforwardly obtaining data associations from multi-frames is an NP-hard multi-dimensional assignment (MDA) problem, most existing methods solve this MDA problem by either developing complicated approximate algorithms, or simplifying MDA as a 2D assignment problem based upon the information extracted only from adjacent frames. In this paper, we show that the relation between associations of two observations is the equivalence relation in the data association problem, based on the spatial-temporal constraint that the trajectories of different objects must be disjoint. Therefore, the MDA problem can be equivalently divided into independent subproblems by equivalence partitioning. In contrast to existing works for solving the MDA problem, we develop a connected component model (CCM) by exploiting the constraints of the data association and the equivalence relation on the constraints. Based upon CCM, we can efficiently obtain the global solution of the MDA problem for multi-object tracking by optimizing a sequence of independent data association subproblems. Experiments on challenging public data sets demonstrate that our algorithm outperforms the state-of-the-art approaches.
A novel swarm intelligence algorithm for finding DNA motifs.
Lei, Chengwei; Ruan, Jianhua
2009-01-01
Discovering DNA motifs from co-expressed or co-regulated genes is an important step towards deciphering complex gene regulatory networks and understanding gene functions. Despite significant improvement in the last decade, it still remains one of the most challenging problems in computational molecular biology. In this work, we propose a novel motif finding algorithm that finds consensus patterns using a population-based stochastic optimisation technique called Particle Swarm Optimisation (PSO), which has been shown to be effective in optimising difficult multidimensional problems in continuous domains. We propose to use a word dissimilarity graph to remap the neighborhood structure of the solution space of DNA motifs, and propose a modification of the naive PSO algorithm to accommodate discrete variables. In order to improve efficiency, we also propose several strategies for escaping from local optima and for automatically determining the termination criteria. Experimental results on simulated challenge problems show that our method is both more efficient and more accurate than several existing algorithms. Applications to several sets of real promoter sequences also show that our approach is able to detect known transcription factor binding sites, and outperforms two of the most popular existing algorithms.
Stochastic gradient ascent outperforms gamers in the Quantum Moves game
NASA Astrophysics Data System (ADS)
Sels, Dries
2018-04-01
In a recent work on quantum state preparation, Sørensen and co-workers [Nature (London) 532, 210 (2016), 10.1038/nature17620] explore the possibility of using video games to help design quantum control protocols. The authors present a game called "Quantum Moves" (https://www.scienceathome.org/games/quantum-moves/) in which gamers have to move an atom from A to B by means of optical tweezers. They report that, "players succeed where purely numerical optimization fails." Moreover, by harnessing the player strategies, they can "outperform the most prominent established numerical methods." The aim of this Rapid Communication is to analyze the problem in detail and show that those claims are untenable. In fact, without any prior knowledge and starting from a random initial seed, a simple stochastic local optimization method finds near-optimal solutions which outperform all players. Counterdiabatic driving can even be used to generate protocols without resorting to numeric optimization. The analysis results in an accurate analytic estimate of the quantum speed limit which, apart from zero-point motion, is shown to be entirely classical in nature. The latter might explain why gamers are reasonably good at the game. A simple modification of the BringHomeWater challenge is proposed to test this hypothesis.
Accurate diagnosis of thyroid follicular lesions from nuclear morphology using supervised learning.
Ozolek, John A; Tosun, Akif Burak; Wang, Wei; Chen, Cheng; Kolouri, Soheil; Basu, Saurav; Huang, Hu; Rohde, Gustavo K
2014-07-01
Follicular lesions of the thyroid remain significant diagnostic challenges in surgical pathology and cytology. The diagnosis often requires considerable resources and ancillary tests including immunohistochemistry, molecular studies, and expert consultation. Visual analyses of nuclear morphological features, generally speaking, have not been helpful in distinguishing this group of lesions. Here we describe a method for distinguishing between follicular lesions of the thyroid based on nuclear morphology. The method utilizes an optimal transport-based linear embedding for segmented nuclei, together with an adaptation of existing classification methods. We show the method outputs assignments (classification results) which are near perfectly correlated with the clinical diagnosis of several lesion types' lesions utilizing a database of 94 patients in total. Experimental comparisons also show the new method can significantly outperform standard numerical feature-type methods in terms of agreement with the clinical diagnosis gold standard. In addition, the new method could potentially be used to derive insights into biologically meaningful nuclear morphology differences in these lesions. Our methods could be incorporated into a tool for pathologists to aid in distinguishing between follicular lesions of the thyroid. In addition, these results could potentially provide nuclear morphological correlates of biological behavior and reduce health care costs by decreasing histotechnician and pathologist time and obviating the need for ancillary testing. Copyright © 2014 Elsevier B.V. All rights reserved.
Geodesic denoising for optical coherence tomography images
NASA Astrophysics Data System (ADS)
Shahrian Varnousfaderani, Ehsan; Vogl, Wolf-Dieter; Wu, Jing; Gerendas, Bianca S.; Simader, Christian; Langs, Georg; Waldstein, Sebastian M.; Schmidt-Erfurth, Ursula
2016-03-01
Optical coherence tomography (OCT) is an optical signal acquisition method capturing micrometer resolution, cross-sectional three-dimensional images. OCT images are used widely in ophthalmology to diagnose and monitor retinal diseases such as age-related macular degeneration (AMD) and Glaucoma. While OCT allows the visualization of retinal structures such as vessels and retinal layers, image quality and contrast is reduced by speckle noise, obfuscating small, low intensity structures and structural boundaries. Existing denoising methods for OCT images may remove clinically significant image features such as texture and boundaries of anomalies. In this paper, we propose a novel patch based denoising method, Geodesic Denoising. The method reduces noise in OCT images while preserving clinically significant, although small, pathological structures, such as fluid-filled cysts in diseased retinas. Our method selects optimal image patch distribution representations based on geodesic patch similarity to noisy samples. Patch distributions are then randomly sampled to build a set of best matching candidates for every noisy sample, and the denoised value is computed based on a geodesic weighted average of the best candidate samples. Our method is evaluated qualitatively on real pathological OCT scans and quantitatively on a proposed set of ground truth, noise free synthetic OCT scans with artificially added noise and pathologies. Experimental results show that performance of our method is comparable with state of the art denoising methods while outperforming them in preserving the critical clinically relevant structures.
Spatial modelling of disease using data- and knowledge-driven approaches.
Stevens, Kim B; Pfeiffer, Dirk U
2011-09-01
The purpose of spatial modelling in animal and public health is three-fold: describing existing spatial patterns of risk, attempting to understand the biological mechanisms that lead to disease occurrence and predicting what will happen in the medium to long-term future (temporal prediction) or in different geographical areas (spatial prediction). Traditional methods for temporal and spatial predictions include general and generalized linear models (GLM), generalized additive models (GAM) and Bayesian estimation methods. However, such models require both disease presence and absence data which are not always easy to obtain. Novel spatial modelling methods such as maximum entropy (MAXENT) and the genetic algorithm for rule set production (GARP) require only disease presence data and have been used extensively in the fields of ecology and conservation, to model species distribution and habitat suitability. Other methods, such as multicriteria decision analysis (MCDA), use knowledge of the causal factors of disease occurrence to identify areas potentially suitable for disease. In addition to their less restrictive data requirements, some of these novel methods have been shown to outperform traditional statistical methods in predictive ability (Elith et al., 2006). This review paper provides details of some of these novel methods for mapping disease distribution, highlights their advantages and limitations, and identifies studies which have used the methods to model various aspects of disease distribution. Copyright © 2011. Published by Elsevier Ltd.
Counting motifs in dynamic networks.
Mukherjee, Kingshuk; Hasan, Md Mahmudul; Boucher, Christina; Kahveci, Tamer
2018-04-11
A network motif is a sub-network that occurs frequently in a given network. Detection of such motifs is important since they uncover functions and local properties of the given biological network. Finding motifs is however a computationally challenging task as it requires solving the costly subgraph isomorphism problem. Moreover, the topology of biological networks change over time. These changing networks are called dynamic biological networks. As the network evolves, frequency of each motif in the network also changes. Computing the frequency of a given motif from scratch in a dynamic network as the network topology evolves is infeasible, particularly for large and fast evolving networks. In this article, we design and develop a scalable method for counting the number of motifs in a dynamic biological network. Our method incrementally updates the frequency of each motif as the underlying network's topology evolves. Our experiments demonstrate that our method can update the frequency of each motif in orders of magnitude faster than counting the motif embeddings every time the network changes. If the network evolves more frequently, the margin with which our method outperforms the existing static methods, increases. We evaluated our method extensively using synthetic and real datasets, and show that our method is highly accurate(≥ 96%) and that it can be scaled to large dense networks. The results on real data demonstrate the utility of our method in revealing interesting insights on the evolution of biological processes.
Integrating linear optimization with structural modeling to increase HIV neutralization breadth.
Sevy, Alexander M; Panda, Swetasudha; Crowe, James E; Meiler, Jens; Vorobeychik, Yevgeniy
2018-02-01
Computational protein design has been successful in modeling fixed backbone proteins in a single conformation. However, when modeling large ensembles of flexible proteins, current methods in protein design have been insufficient. Large barriers in the energy landscape are difficult to traverse while redesigning a protein sequence, and as a result current design methods only sample a fraction of available sequence space. We propose a new computational approach that combines traditional structure-based modeling using the Rosetta software suite with machine learning and integer linear programming to overcome limitations in the Rosetta sampling methods. We demonstrate the effectiveness of this method, which we call BROAD, by benchmarking the performance on increasing predicted breadth of anti-HIV antibodies. We use this novel method to increase predicted breadth of naturally-occurring antibody VRC23 against a panel of 180 divergent HIV viral strains and achieve 100% predicted binding against the panel. In addition, we compare the performance of this method to state-of-the-art multistate design in Rosetta and show that we can outperform the existing method significantly. We further demonstrate that sequences recovered by this method recover known binding motifs of broadly neutralizing anti-HIV antibodies. Finally, our approach is general and can be extended easily to other protein systems. Although our modeled antibodies were not tested in vitro, we predict that these variants would have greatly increased breadth compared to the wild-type antibody.
Robust hashing with local models for approximate similarity search.
Song, Jingkuan; Yang, Yi; Li, Xuelong; Huang, Zi; Yang, Yang
2014-07-01
Similarity search plays an important role in many applications involving high-dimensional data. Due to the known dimensionality curse, the performance of most existing indexing structures degrades quickly as the feature dimensionality increases. Hashing methods, such as locality sensitive hashing (LSH) and its variants, have been widely used to achieve fast approximate similarity search by trading search quality for efficiency. However, most existing hashing methods make use of randomized algorithms to generate hash codes without considering the specific structural information in the data. In this paper, we propose a novel hashing method, namely, robust hashing with local models (RHLM), which learns a set of robust hash functions to map the high-dimensional data points into binary hash codes by effectively utilizing local structural information. In RHLM, for each individual data point in the training dataset, a local hashing model is learned and used to predict the hash codes of its neighboring data points. The local models from all the data points are globally aligned so that an optimal hash code can be assigned to each data point. After obtaining the hash codes of all the training data points, we design a robust method by employing l2,1 -norm minimization on the loss function to learn effective hash functions, which are then used to map each database point into its hash code. Given a query data point, the search process first maps it into the query hash code by the hash functions and then explores the buckets, which have similar hash codes to the query hash code. Extensive experimental results conducted on real-life datasets show that the proposed RHLM outperforms the state-of-the-art methods in terms of search quality and efficiency.
Zhang, Xiao-Fei; Ou-Yang, Le; Yan, Hong
2017-08-15
Understanding how gene regulatory networks change under different cellular states is important for revealing insights into network dynamics. Gaussian graphical models, which assume that the data follow a joint normal distribution, have been used recently to infer differential networks. However, the distributions of the omics data are non-normal in general. Furthermore, although much biological knowledge (or prior information) has been accumulated, most existing methods ignore the valuable prior information. Therefore, new statistical methods are needed to relax the normality assumption and make full use of prior information. We propose a new differential network analysis method to address the above challenges. Instead of using Gaussian graphical models, we employ a non-paranormal graphical model that can relax the normality assumption. We develop a principled model to take into account the following prior information: (i) a differential edge less likely exists between two genes that do not participate together in the same pathway; (ii) changes in the networks are driven by certain regulator genes that are perturbed across different cellular states and (iii) the differential networks estimated from multi-view gene expression data likely share common structures. Simulation studies demonstrate that our method outperforms other graphical model-based algorithms. We apply our method to identify the differential networks between platinum-sensitive and platinum-resistant ovarian tumors, and the differential networks between the proneural and mesenchymal subtypes of glioblastoma. Hub nodes in the estimated differential networks rediscover known cancer-related regulator genes and contain interesting predictions. The source code is at https://github.com/Zhangxf-ccnu/pDNA. szuouyl@gmail.com. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Constrained Active Learning for Anchor Link Prediction Across Multiple Heterogeneous Social Networks
Zhu, Junxing; Zhang, Jiawei; Wu, Quanyuan; Jia, Yan; Zhou, Bin; Wei, Xiaokai; Yu, Philip S.
2017-01-01
Nowadays, people are usually involved in multiple heterogeneous social networks simultaneously. Discovering the anchor links between the accounts owned by the same users across different social networks is crucial for many important inter-network applications, e.g., cross-network link transfer and cross-network recommendation. Many different supervised models have been proposed to predict anchor links so far, but they are effective only when the labeled anchor links are abundant. However, in real scenarios, such a requirement can hardly be met and most anchor links are unlabeled, since manually labeling the inter-network anchor links is quite costly and tedious. To overcome such a problem and utilize the numerous unlabeled anchor links in model building, in this paper, we introduce the active learning based anchor link prediction problem. Different from the traditional active learning problems, due to the one-to-one constraint on anchor links, if an unlabeled anchor link a=(u,v) is identified as positive (i.e., existing), all the other unlabeled anchor links incident to account u or account v will be negative (i.e., non-existing) automatically. Viewed in such a perspective, asking for the labels of potential positive anchor links in the unlabeled set will be rewarding in the active anchor link prediction problem. Various novel anchor link information gain measures are defined in this paper, based on which several constraint active anchor link prediction methods are introduced. Extensive experiments have been done on real-world social network datasets to compare the performance of these methods with state-of-art anchor link prediction methods. The experimental results show that the proposed Mean-entropy-based Constrained Active Learning (MC) method can outperform other methods with significant advantages. PMID:28771201
Zhu, Junxing; Zhang, Jiawei; Wu, Quanyuan; Jia, Yan; Zhou, Bin; Wei, Xiaokai; Yu, Philip S
2017-08-03
Nowadays, people are usually involved in multiple heterogeneous social networks simultaneously. Discovering the anchor links between the accounts owned by the same users across different social networks is crucial for many important inter-network applications, e.g., cross-network link transfer and cross-network recommendation. Many different supervised models have been proposed to predict anchor links so far, but they are effective only when the labeled anchor links are abundant. However, in real scenarios, such a requirement can hardly be met and most anchor links are unlabeled, since manually labeling the inter-network anchor links is quite costly and tedious. To overcome such a problem and utilize the numerous unlabeled anchor links in model building, in this paper, we introduce the active learning based anchor link prediction problem. Different from the traditional active learning problems, due to the one-to-one constraint on anchor links, if an unlabeled anchor link a = ( u , v ) is identified as positive (i.e., existing), all the other unlabeled anchor links incident to account u or account v will be negative (i.e., non-existing) automatically. Viewed in such a perspective, asking for the labels of potential positive anchor links in the unlabeled set will be rewarding in the active anchor link prediction problem. Various novel anchor link information gain measures are defined in this paper, based on which several constraint active anchor link prediction methods are introduced. Extensive experiments have been done on real-world social network datasets to compare the performance of these methods with state-of-art anchor link prediction methods. The experimental results show that the proposed Mean-entropy-based Constrained Active Learning (MC) method can outperform other methods with significant advantages.
Does Video-Autotutorial Instruction Improve College Student Achievement?
ERIC Educational Resources Information Center
Fisher, K. M.; And Others
1977-01-01
Compares student achievement in an upper-division college introductory course taught by the video-autotutorial method with that in two comparable courses taught by the lecture-discussion method. Pre-post tests of 623 students reveal that video-autotutorial students outperform lecture/discussion participants at all ability levels and that in…
Binary Interval Search: a scalable algorithm for counting interval intersections
Layer, Ryan M.; Skadron, Kevin; Robins, Gabriel; Hall, Ira M.; Quinlan, Aaron R.
2013-01-01
Motivation: The comparison of diverse genomic datasets is fundamental to understand genome biology. Researchers must explore many large datasets of genome intervals (e.g. genes, sequence alignments) to place their experimental results in a broader context and to make new discoveries. Relationships between genomic datasets are typically measured by identifying intervals that intersect, that is, they overlap and thus share a common genome interval. Given the continued advances in DNA sequencing technologies, efficient methods for measuring statistically significant relationships between many sets of genomic features are crucial for future discovery. Results: We introduce the Binary Interval Search (BITS) algorithm, a novel and scalable approach to interval set intersection. We demonstrate that BITS outperforms existing methods at counting interval intersections. Moreover, we show that BITS is intrinsically suited to parallel computing architectures, such as graphics processing units by illustrating its utility for efficient Monte Carlo simulations measuring the significance of relationships between sets of genomic intervals. Availability: https://github.com/arq5x/bits. Contact: arq5x@virginia.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23129298
Estimation of signal-dependent noise level function in transform domain via a sparse recovery model.
Yang, Jingyu; Gan, Ziqiao; Wu, Zhaoyang; Hou, Chunping
2015-05-01
This paper proposes a novel algorithm to estimate the noise level function (NLF) of signal-dependent noise (SDN) from a single image based on the sparse representation of NLFs. Noise level samples are estimated from the high-frequency discrete cosine transform (DCT) coefficients of nonlocal-grouped low-variation image patches. Then, an NLF recovery model based on the sparse representation of NLFs under a trained basis is constructed to recover NLF from the incomplete noise level samples. Confidence levels of the NLF samples are incorporated into the proposed model to promote reliable samples and weaken unreliable ones. We investigate the behavior of the estimation performance with respect to the block size, sampling rate, and confidence weighting. Simulation results on synthetic noisy images show that our method outperforms existing state-of-the-art schemes. The proposed method is evaluated on real noisy images captured by three types of commodity imaging devices, and shows consistently excellent SDN estimation performance. The estimated NLFs are incorporated into two well-known denoising schemes, nonlocal means and BM3D, and show significant improvements in denoising SDN-polluted images.
Period Estimation for Sparsely-sampled Quasi-periodic Light Curves Applied to Miras
NASA Astrophysics Data System (ADS)
He, Shiyuan; Yuan, Wenlong; Huang, Jianhua Z.; Long, James; Macri, Lucas M.
2016-12-01
We develop a nonlinear semi-parametric Gaussian process model to estimate periods of Miras with sparsely sampled light curves. The model uses a sinusoidal basis for the periodic variation and a Gaussian process for the stochastic changes. We use maximum likelihood to estimate the period and the parameters of the Gaussian process, while integrating out the effects of other nuisance parameters in the model with respect to a suitable prior distribution obtained from earlier studies. Since the likelihood is highly multimodal for period, we implement a hybrid method that applies the quasi-Newton algorithm for Gaussian process parameters and search the period/frequency parameter space over a dense grid. A large-scale, high-fidelity simulation is conducted to mimic the sampling quality of Mira light curves obtained by the M33 Synoptic Stellar Survey. The simulated data set is publicly available and can serve as a testbed for future evaluation of different period estimation methods. The semi-parametric model outperforms an existing algorithm on this simulated test data set as measured by period recovery rate and quality of the resulting period-luminosity relations.
Tang, Rongnian; Chen, Xupeng; Li, Chuang
2018-05-01
Near-infrared spectroscopy is an efficient, low-cost technology that has potential as an accurate method in detecting the nitrogen content of natural rubber leaves. Successive projections algorithm (SPA) is a widely used variable selection method for multivariate calibration, which uses projection operations to select a variable subset with minimum multi-collinearity. However, due to the fluctuation of correlation between variables, high collinearity may still exist in non-adjacent variables of subset obtained by basic SPA. Based on analysis to the correlation matrix of the spectra data, this paper proposed a correlation-based SPA (CB-SPA) to apply the successive projections algorithm in regions with consistent correlation. The result shows that CB-SPA can select variable subsets with more valuable variables and less multi-collinearity. Meanwhile, models established by the CB-SPA subset outperform basic SPA subsets in predicting nitrogen content in terms of both cross-validation and external prediction. Moreover, CB-SPA is assured to be more efficient, for the time cost in its selection procedure is one-twelfth that of the basic SPA.
Fractal analysis of bone structure with applications to osteoporosis and microgravity effects
NASA Astrophysics Data System (ADS)
Acharya, Raj S.; LeBlanc, Adrian; Shackelford, Linda; Swarnakar, Vivek; Krishnamurthy, Ram; Hausman, E.; Lin, Chin-Shoou
1995-05-01
We characterize the trabecular structure with the aid of fractal dimension. We use alternating sequential filters (ASF) to generate a nonlinear pyramid for fractal dimension computations. We do not make any assumptions of the statistical distributions of the underlying fractal bone structure. The only assumption of our scheme is the rudimentary definition of self-similarity. This allows us the freedom of not being constrained by statistical estimation schemes. With mathematical simulations, we have shown that the ASF methods outperform other existing methods for fractal dimension estimation. We have shown that the fractal dimension remains the same when computed with both the x-ray images and the MRI images of the patella. We have shown that the fractal dimension of osteoporotic subjects is lower than that of the normal subjects. In animal models, we have shown that the fractal dimension of osteoporotic rats was lower than that of the normal rats. In a 17 week bedrest study, we have shown that the subject's prebedrest fractal dimension is higher than that of the postbedrest fractal dimension.
Fractal analysis of bone structure with applications to osteoporosis and microgravity effects
DOE Office of Scientific and Technical Information (OSTI.GOV)
Acharya, R.S.; Swarnarkar, V.; Krishnamurthy, R.
1995-12-31
The authors characterize the trabecular structure with the aid of fractal dimension. The authors use Alternating Sequential filters to generate a nonlinear pyramid for fractal dimension computations. The authors do not make any assumptions of the statistical distributions of the underlying fractal bone structure. The only assumption of the scheme is the rudimentary definition of self similarity. This allows them the freedom of not being constrained by statistical estimation schemes. With mathematical simulations, the authors have shown that the ASF methods outperform other existing methods for fractal dimension estimation. They have shown that the fractal dimension remains the same whenmore » computed with both the X-Ray images and the MRI images of the patella. They have shown that the fractal dimension of osteoporotic subjects is lower than that of the normal subjects. In animal models, the authors have shown that the fractal dimension of osteoporotic rats was lower than that of the normal rats. In a 17 week bedrest study, they have shown that the subject`s prebedrest fractal dimension is higher than that of the postbedrest fractal dimension.« less
Unsupervised Anomaly Detection Based on Clustering and Multiple One-Class SVM
NASA Astrophysics Data System (ADS)
Song, Jungsuk; Takakura, Hiroki; Okabe, Yasuo; Kwon, Yongjin
Intrusion detection system (IDS) has played an important role as a device to defend our networks from cyber attacks. However, since it is unable to detect unknown attacks, i.e., 0-day attacks, the ultimate challenge in intrusion detection field is how we can exactly identify such an attack by an automated manner. Over the past few years, several studies on solving these problems have been made on anomaly detection using unsupervised learning techniques such as clustering, one-class support vector machine (SVM), etc. Although they enable one to construct intrusion detection models at low cost and effort, and have capability to detect unforeseen attacks, they still have mainly two problems in intrusion detection: a low detection rate and a high false positive rate. In this paper, we propose a new anomaly detection method based on clustering and multiple one-class SVM in order to improve the detection rate while maintaining a low false positive rate. We evaluated our method using KDD Cup 1999 data set. Evaluation results show that our approach outperforms the existing algorithms reported in the literature; especially in detection of unknown attacks.
TRANSPOSABLE REGULARIZED COVARIANCE MODELS WITH AN APPLICATION TO MISSING DATA IMPUTATION
Allen, Genevera I.; Tibshirani, Robert
2015-01-01
Missing data estimation is an important challenge with high-dimensional data arranged in the form of a matrix. Typically this data matrix is transposable, meaning that either the rows, columns or both can be treated as features. To model transposable data, we present a modification of the matrix-variate normal, the mean-restricted matrix-variate normal, in which the rows and columns each have a separate mean vector and covariance matrix. By placing additive penalties on the inverse covariance matrices of the rows and columns, these so called transposable regularized covariance models allow for maximum likelihood estimation of the mean and non-singular covariance matrices. Using these models, we formulate EM-type algorithms for missing data imputation in both the multivariate and transposable frameworks. We present theoretical results exploiting the structure of our transposable models that allow these models and imputation methods to be applied to high-dimensional data. Simulations and results on microarray data and the Netflix data show that these imputation techniques often outperform existing methods and offer a greater degree of flexibility. PMID:26877823
TRANSPOSABLE REGULARIZED COVARIANCE MODELS WITH AN APPLICATION TO MISSING DATA IMPUTATION.
Allen, Genevera I; Tibshirani, Robert
2010-06-01
Missing data estimation is an important challenge with high-dimensional data arranged in the form of a matrix. Typically this data matrix is transposable , meaning that either the rows, columns or both can be treated as features. To model transposable data, we present a modification of the matrix-variate normal, the mean-restricted matrix-variate normal , in which the rows and columns each have a separate mean vector and covariance matrix. By placing additive penalties on the inverse covariance matrices of the rows and columns, these so called transposable regularized covariance models allow for maximum likelihood estimation of the mean and non-singular covariance matrices. Using these models, we formulate EM-type algorithms for missing data imputation in both the multivariate and transposable frameworks. We present theoretical results exploiting the structure of our transposable models that allow these models and imputation methods to be applied to high-dimensional data. Simulations and results on microarray data and the Netflix data show that these imputation techniques often outperform existing methods and offer a greater degree of flexibility.
Nadalin, Francesca; Carbone, Alessandra
2018-02-01
Large-scale computational docking will be increasingly used in future years to discriminate protein-protein interactions at the residue resolution. Complete cross-docking experiments make in silico reconstruction of protein-protein interaction networks a feasible goal. They ask for efficient and accurate screening of the millions structural conformations issued by the calculations. We propose CIPS (Combined Interface Propensity for decoy Scoring), a new pair potential combining interface composition with residue-residue contact preference. CIPS outperforms several other methods on screening docking solutions obtained either with all-atom or with coarse-grain rigid docking. Further testing on 28 CAPRI targets corroborates CIPS predictive power over existing methods. By combining CIPS with atomic potentials, discrimination of correct conformations in all-atom structures reaches optimal accuracy. The drastic reduction of candidate solutions produced by thousands of proteins docked against each other makes large-scale docking accessible to analysis. CIPS source code is freely available at http://www.lcqb.upmc.fr/CIPS. alessandra.carbone@lip6.fr. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
Accurately estimating PSF with straight lines detected by Hough transform
NASA Astrophysics Data System (ADS)
Wang, Ruichen; Xu, Liangpeng; Fan, Chunxiao; Li, Yong
2018-04-01
This paper presents an approach to estimating point spread function (PSF) from low resolution (LR) images. Existing techniques usually rely on accurate detection of ending points of the profile normal to edges. In practice however, it is often a great challenge to accurately localize profiles of edges from a LR image, which hence leads to a poor PSF estimation of the lens taking the LR image. For precisely estimating the PSF, this paper proposes firstly estimating a 1-D PSF kernel with straight lines, and then robustly obtaining the 2-D PSF from the 1-D kernel by least squares techniques and random sample consensus. Canny operator is applied to the LR image for obtaining edges and then Hough transform is utilized to extract straight lines of all orientations. Estimating 1-D PSF kernel with straight lines effectively alleviates the influence of the inaccurate edge detection on PSF estimation. The proposed method is investigated on both natural and synthetic images for estimating PSF. Experimental results show that the proposed method outperforms the state-ofthe- art and does not rely on accurate edge detection.
Akita, Yasuyuki; Baldasano, Jose M; Beelen, Rob; Cirach, Marta; de Hoogh, Kees; Hoek, Gerard; Nieuwenhuijsen, Mark; Serre, Marc L; de Nazelle, Audrey
2014-04-15
In recognition that intraurban exposure gradients may be as large as between-city variations, recent air pollution epidemiologic studies have become increasingly interested in capturing within-city exposure gradients. In addition, because of the rapidly accumulating health data, recent studies also need to handle large study populations distributed over large geographic domains. Even though several modeling approaches have been introduced, a consistent modeling framework capturing within-city exposure variability and applicable to large geographic domains is still missing. To address these needs, we proposed a modeling framework based on the Bayesian Maximum Entropy method that integrates monitoring data and outputs from existing air quality models based on Land Use Regression (LUR) and Chemical Transport Models (CTM). The framework was applied to estimate the yearly average NO2 concentrations over the region of Catalunya in Spain. By jointly accounting for the global scale variability in the concentration from the output of CTM and the intraurban scale variability through LUR model output, the proposed framework outperformed more conventional approaches.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kou, Qiang; Wu, Si; Tolić, Nikola
Motivation: Although proteomics has rapidly developed in the past decade, researchers are still in the early stage of exploring the world of complex proteoforms, which are protein products with various primary structure alterations resulting from gene mutations, alternative splicing, post-translational modifications, and other biological processes. Proteoform identification is essential to mapping proteoforms to their biological functions as well as discovering novel proteoforms and new protein functions. Top-down mass spectrometry is the method of choice for identifying complex proteoforms because it provides a “bird’s eye view” of intact proteoforms. The combinatorial explosion of various alterations on a protein may result inmore » billions of possible proteoforms, making proteoform identification a challenging computational problem. Results: We propose a new data structure, called the mass graph, for efficient representation of proteoforms and design mass graph alignment algorithms. We developed TopMG, a mass graph-based software tool for proteoform identification by top-down mass spectrometry. Experiments on top-down mass spectrometry data sets showed that TopMG outperformed existing methods in identifying complex proteoforms.« less
Li, Tao; Hua, Zhendong; Meng, Xin; Liu, Cuimei
2018-03-01
Methamphetamine (MA) tablet production confers chemical and physical properties. This study developed a simple and effective physical characteristic profiling method for MA tablets with capital letter "WY" logos, which realized the discrimination between linked and unlinked seizures. Seventeen signature distances extracted from the "WY" logo were explored as factors for multivariate analysis and demonstrated to be effective to represent the features of tablets in the drug intelligence perspective. Receiver operating characteristic (ROC) curve was used to evaluate efficiency of different pretreatments and distance/correlation metrics, while "Standardization + Euclidean" and "Logarithm + Euclidean" algorithms outperformed the rest. Finally, hierarchical cluster analysis (HCA) was applied to the data set of 200 MA tablet seizures randomly selected from cases all around China in 2015, and 76% of them were classified into a group named after "WY-001." Moreover, the "WY-001" tablets occupied 51-80% tablet seizures from 2011 to 2015 in China, indicating the existence of a huge clandestine factory incessantly manufacturing MA tablets. © 2017 American Academy of Forensic Sciences.
XQ-NLM: Denoising Diffusion MRI Data via x-q Space Non-Local Patch Matching.
Chen, Geng; Wu, Yafeng; Shen, Dinggang; Yap, Pew-Thian
2016-10-01
Noise is a major issue influencing quantitative analysis in diffusion MRI. The effects of noise can be reduced by repeated acquisitions, but this leads to long acquisition times that can be unrealistic in clinical settings. For this reason, post-acquisition denoising methods have been widely used to improve SNR. Among existing methods, non-local means (NLM) has been shown to produce good image quality with edge preservation. However, currently the application of NLM to diffusion MRI has been mostly focused on the spatial space (i.e., the x -space), despite the fact that diffusion data live in a combined space consisting of the x -space and the q -space (i.e., the space of wavevectors). In this paper, we propose to extend NLM to both x -space and q -space. We show how patch-matching, as required in NLM, can be performed concurrently in x-q space with the help of azimuthal equidistant projection and rotation invariant features. Extensive experiments on both synthetic and real data confirm that the proposed x-q space NLM (XQ-NLM) outperforms the classic NLM.
Discrete False-Discovery Rate Improves Identification of Differentially Abundant Microbes.
Jiang, Lingjing; Amir, Amnon; Morton, James T; Heller, Ruth; Arias-Castro, Ery; Knight, Rob
2017-01-01
Differential abundance testing is a critical task in microbiome studies that is complicated by the sparsity of data matrices. Here we adapt for microbiome studies a solution from the field of gene expression analysis to produce a new method, discrete false-discovery rate (DS-FDR), that greatly improves the power to detect differential taxa by exploiting the discreteness of the data. Additionally, DS-FDR is relatively robust to the number of noninformative features, and thus removes the problem of filtering taxonomy tables by an arbitrary abundance threshold. We show by using a combination of simulations and reanalysis of nine real-world microbiome data sets that this new method outperforms existing methods at the differential abundance testing task, producing a false-discovery rate that is up to threefold more accurate, and halves the number of samples required to find a given difference (thus increasing the efficiency of microbiome experiments considerably). We therefore expect DS-FDR to be widely applied in microbiome studies. IMPORTANCE DS-FDR can achieve higher statistical power to detect significant findings in sparse and noisy microbiome data compared to the commonly used Benjamini-Hochberg procedure and other FDR-controlling procedures.
Zhao, Guangjun; Wang, Xuchu; Niu, Yanmin; Tan, Liwen; Zhang, Shao-Xiang
2016-01-01
Cryosection brain images in Chinese Visible Human (CVH) dataset contain rich anatomical structure information of tissues because of its high resolution (e.g., 0.167 mm per pixel). Fast and accurate segmentation of these images into white matter, gray matter, and cerebrospinal fluid plays a critical role in analyzing and measuring the anatomical structures of human brain. However, most existing automated segmentation methods are designed for computed tomography or magnetic resonance imaging data, and they may not be applicable for cryosection images due to the imaging difference. In this paper, we propose a supervised learning-based CVH brain tissues segmentation method that uses stacked autoencoder (SAE) to automatically learn the deep feature representations. Specifically, our model includes two successive parts where two three-layer SAEs take image patches as input to learn the complex anatomical feature representation, and then these features are sent to Softmax classifier for inferring the labels. Experimental results validated the effectiveness of our method and showed that it outperformed four other classical brain tissue detection strategies. Furthermore, we reconstructed three-dimensional surfaces of these tissues, which show their potential in exploring the high-resolution anatomical structures of human brain. PMID:27057543
Inference from Samples of DNA Sequences Using a Two-Locus Model
Griffiths, Robert C.
2011-01-01
Abstract Performing inference on contemporary samples of DNA sequence data is an important and challenging task. Computationally intensive methods such as importance sampling (IS) are attractive because they make full use of the available data, but in the presence of recombination the large state space of genealogies can be prohibitive. In this article, we make progress by developing an efficient IS proposal distribution for a two-locus model of sequence data. We show that the proposal developed here leads to much greater efficiency, outperforming existing IS methods that could be adapted to this model. Among several possible applications, the algorithm can be used to find maximum likelihood estimates for mutation and crossover rates, and to perform ancestral inference. We illustrate the method on previously reported sequence data covering two loci either side of the well-studied TAP2 recombination hotspot. The two loci are themselves largely non-recombining, so we obtain a gene tree at each locus and are able to infer in detail the effect of the hotspot on their joint ancestry. We summarize this joint ancestry by introducing the gene graph, a summary of the well-known ancestral recombination graph. PMID:21210733
Zhao, Guangjun; Wang, Xuchu; Niu, Yanmin; Tan, Liwen; Zhang, Shao-Xiang
2016-01-01
Cryosection brain images in Chinese Visible Human (CVH) dataset contain rich anatomical structure information of tissues because of its high resolution (e.g., 0.167 mm per pixel). Fast and accurate segmentation of these images into white matter, gray matter, and cerebrospinal fluid plays a critical role in analyzing and measuring the anatomical structures of human brain. However, most existing automated segmentation methods are designed for computed tomography or magnetic resonance imaging data, and they may not be applicable for cryosection images due to the imaging difference. In this paper, we propose a supervised learning-based CVH brain tissues segmentation method that uses stacked autoencoder (SAE) to automatically learn the deep feature representations. Specifically, our model includes two successive parts where two three-layer SAEs take image patches as input to learn the complex anatomical feature representation, and then these features are sent to Softmax classifier for inferring the labels. Experimental results validated the effectiveness of our method and showed that it outperformed four other classical brain tissue detection strategies. Furthermore, we reconstructed three-dimensional surfaces of these tissues, which show their potential in exploring the high-resolution anatomical structures of human brain.
Probabilistic multi-person localisation and tracking in image sequences
NASA Astrophysics Data System (ADS)
Klinger, T.; Rottensteiner, F.; Heipke, C.
2017-05-01
The localisation and tracking of persons in image sequences in commonly guided by recursive filters. Especially in a multi-object tracking environment, where mutual occlusions are inherent, the predictive model is prone to drift away from the actual target position when not taking context into account. Further, if the image-based observations are imprecise, the trajectory is prone to be updated towards a wrong position. In this work we address both these problems by using a new predictive model on the basis of Gaussian Process Regression, and by using generic object detection, as well as instance-specific classification, for refined localisation. The predictive model takes into account the motion of every tracked pedestrian in the scene and the prediction is executed with respect to the velocities of neighbouring persons. In contrast to existing methods our approach uses a Dynamic Bayesian Network in which the state vector of a recursive Bayes filter, as well as the location of the tracked object in the image, are modelled as unknowns. This allows the detection to be corrected before it is incorporated into the recursive filter. Our method is evaluated on a publicly available benchmark dataset and outperforms related methods in terms of geometric precision and tracking accuracy.
Chen, Hongyu; Martin, Bronwen; Daimon, Caitlin M; Maudsley, Stuart
2013-01-01
Text mining is rapidly becoming an essential technique for the annotation and analysis of large biological data sets. Biomedical literature currently increases at a rate of several thousand papers per week, making automated information retrieval methods the only feasible method of managing this expanding corpus. With the increasing prevalence of open-access journals and constant growth of publicly-available repositories of biomedical literature, literature mining has become much more effective with respect to the extraction of biomedically-relevant data. In recent years, text mining of popular databases such as MEDLINE has evolved from basic term-searches to more sophisticated natural language processing techniques, indexing and retrieval methods, structural analysis and integration of literature with associated metadata. In this review, we will focus on Latent Semantic Indexing (LSI), a computational linguistics technique increasingly used for a variety of biological purposes. It is noted for its ability to consistently outperform benchmark Boolean text searches and co-occurrence models at information retrieval and its power to extract indirect relationships within a data set. LSI has been used successfully to formulate new hypotheses, generate novel connections from existing data, and validate empirical data.
SPONGY (SPam ONtoloGY): Email Classification Using Two-Level Dynamic Ontology
2014-01-01
Email is one of common communication methods between people on the Internet. However, the increase of email misuse/abuse has resulted in an increasing volume of spam emails over recent years. An experimental system has been designed and implemented with the hypothesis that this method would outperform existing techniques, and the experimental results showed that indeed the proposed ontology-based approach improves spam filtering accuracy significantly. In this paper, two levels of ontology spam filters were implemented: a first level global ontology filter and a second level user-customized ontology filter. The use of the global ontology filter showed about 91% of spam filtered, which is comparable with other methods. The user-customized ontology filter was created based on the specific user's background as well as the filtering mechanism used in the global ontology filter creation. The main contributions of the paper are (1) to introduce an ontology-based multilevel filtering technique that uses both a global ontology and an individual filter for each user to increase spam filtering accuracy and (2) to create a spam filter in the form of ontology, which is user-customized, scalable, and modularized, so that it can be embedded to many other systems for better performance. PMID:25254240
NASA Astrophysics Data System (ADS)
Lorenzi, Juan M.; Stecher, Thomas; Reuter, Karsten; Matera, Sebastian
2017-10-01
Many problems in computational materials science and chemistry require the evaluation of expensive functions with locally rapid changes, such as the turn-over frequency of first principles kinetic Monte Carlo models for heterogeneous catalysis. Because of the high computational cost, it is often desirable to replace the original with a surrogate model, e.g., for use in coupled multiscale simulations. The construction of surrogates becomes particularly challenging in high-dimensions. Here, we present a novel version of the modified Shepard interpolation method which can overcome the curse of dimensionality for such functions to give faithful reconstructions even from very modest numbers of function evaluations. The introduction of local metrics allows us to take advantage of the fact that, on a local scale, rapid variation often occurs only across a small number of directions. Furthermore, we use local error estimates to weigh different local approximations, which helps avoid artificial oscillations. Finally, we test our approach on a number of challenging analytic functions as well as a realistic kinetic Monte Carlo model. Our method not only outperforms existing isotropic metric Shepard methods but also state-of-the-art Gaussian process regression.
Lorenzi, Juan M; Stecher, Thomas; Reuter, Karsten; Matera, Sebastian
2017-10-28
Many problems in computational materials science and chemistry require the evaluation of expensive functions with locally rapid changes, such as the turn-over frequency of first principles kinetic Monte Carlo models for heterogeneous catalysis. Because of the high computational cost, it is often desirable to replace the original with a surrogate model, e.g., for use in coupled multiscale simulations. The construction of surrogates becomes particularly challenging in high-dimensions. Here, we present a novel version of the modified Shepard interpolation method which can overcome the curse of dimensionality for such functions to give faithful reconstructions even from very modest numbers of function evaluations. The introduction of local metrics allows us to take advantage of the fact that, on a local scale, rapid variation often occurs only across a small number of directions. Furthermore, we use local error estimates to weigh different local approximations, which helps avoid artificial oscillations. Finally, we test our approach on a number of challenging analytic functions as well as a realistic kinetic Monte Carlo model. Our method not only outperforms existing isotropic metric Shepard methods but also state-of-the-art Gaussian process regression.
Vokhidov, Husan; Hong, Hyung Gil; Kang, Jin Kyu; Hoang, Toan Minh; Park, Kang Ryoung
2016-12-16
Automobile driver information as displayed on marked road signs indicates the state of the road, traffic conditions, proximity to schools, etc. These signs are important to insure the safety of the driver and pedestrians. They are also important input to the automated advanced driver assistance system (ADAS), installed in many automobiles. Over time, the arrow-road markings may be eroded or otherwise damaged by automobile contact, making it difficult for the driver to correctly identify the marking. Failure to properly identify an arrow-road marker creates a dangerous situation that may result in traffic accidents or pedestrian injury. Very little research exists that studies the problem of automated identification of damaged arrow-road marking painted on the road. In this study, we propose a method that uses a convolutional neural network (CNN) to recognize six types of arrow-road markings, possibly damaged, by visible light camera sensor. Experimental results with six databases of Road marking dataset, KITTI dataset, Málaga dataset 2009, Málaga urban dataset, Naver street view dataset, and Road/Lane detection evaluation 2013 dataset, show that our method outperforms conventional methods.
Vokhidov, Husan; Hong, Hyung Gil; Kang, Jin Kyu; Hoang, Toan Minh; Park, Kang Ryoung
2016-01-01
Automobile driver information as displayed on marked road signs indicates the state of the road, traffic conditions, proximity to schools, etc. These signs are important to insure the safety of the driver and pedestrians. They are also important input to the automated advanced driver assistance system (ADAS), installed in many automobiles. Over time, the arrow-road markings may be eroded or otherwise damaged by automobile contact, making it difficult for the driver to correctly identify the marking. Failure to properly identify an arrow-road marker creates a dangerous situation that may result in traffic accidents or pedestrian injury. Very little research exists that studies the problem of automated identification of damaged arrow-road marking painted on the road. In this study, we propose a method that uses a convolutional neural network (CNN) to recognize six types of arrow-road markings, possibly damaged, by visible light camera sensor. Experimental results with six databases of Road marking dataset, KITTI dataset, Málaga dataset 2009, Málaga urban dataset, Naver street view dataset, and Road/Lane detection evaluation 2013 dataset, show that our method outperforms conventional methods. PMID:27999301
Su, Jin-He; Piao, Ying-Chao; Luo, Ze; Yan, Bao-Ping
2018-04-26
With the application of various data acquisition devices, a large number of animal movement data can be used to label presence data in remote sensing images and predict species distribution. In this paper, a two-stage classification approach for combining movement data and moderate-resolution remote sensing images was proposed. First, we introduced a new density-based clustering method to identify stopovers from migratory birds’ movement data and generated classification samples based on the clustering result. We split the remote sensing images into 16 × 16 patches and labeled them as positive samples if they have overlap with stopovers. Second, a multi-convolution neural network model is proposed for extracting the features from temperature data and remote sensing images, respectively. Then a Support Vector Machines (SVM) model was used to combine the features together and predict classification results eventually. The experimental analysis was carried out on public Landsat 5 TM images and a GPS dataset was collected on 29 birds over three years. The results indicated that our proposed method outperforms the existing baseline methods and was able to achieve good performance in habitat suitability prediction.
nala: text mining natural language mutation mentions
Cejuela, Juan Miguel; Bojchevski, Aleksandar; Uhlig, Carsten; Bekmukhametov, Rustem; Kumar Karn, Sanjeev; Mahmuti, Shpend; Baghudana, Ashish; Dubey, Ankit; Satagopam, Venkata P.; Rost, Burkhard
2017-01-01
Abstract Motivation: The extraction of sequence variants from the literature remains an important task. Existing methods primarily target standard (ST) mutation mentions (e.g. ‘E6V’), leaving relevant mentions natural language (NL) largely untapped (e.g. ‘glutamic acid was substituted by valine at residue 6’). Results: We introduced three new corpora suggesting named-entity recognition (NER) to be more challenging than anticipated: 28–77% of all articles contained mentions only available in NL. Our new method nala captured NL and ST by combining conditional random fields with word embedding features learned unsupervised from the entire PubMed. In our hands, nala substantially outperformed the state-of-the-art. For instance, we compared all unique mentions in new discoveries correctly detected by any of three methods (SETH, tmVar, or nala). Neither SETH nor tmVar discovered anything missed by nala, while nala uniquely tagged 33% mentions. For NL mentions the corresponding value shot up to 100% nala-only. Availability and Implementation: Source code, API and corpora freely available at: http://tagtog.net/-corpora/IDP4+. Contact: nala@rostlab.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28200120
SPONGY (SPam ONtoloGY): email classification using two-level dynamic ontology.
Youn, Seongwook
2014-01-01
Email is one of common communication methods between people on the Internet. However, the increase of email misuse/abuse has resulted in an increasing volume of spam emails over recent years. An experimental system has been designed and implemented with the hypothesis that this method would outperform existing techniques, and the experimental results showed that indeed the proposed ontology-based approach improves spam filtering accuracy significantly. In this paper, two levels of ontology spam filters were implemented: a first level global ontology filter and a second level user-customized ontology filter. The use of the global ontology filter showed about 91% of spam filtered, which is comparable with other methods. The user-customized ontology filter was created based on the specific user's background as well as the filtering mechanism used in the global ontology filter creation. The main contributions of the paper are (1) to introduce an ontology-based multilevel filtering technique that uses both a global ontology and an individual filter for each user to increase spam filtering accuracy and (2) to create a spam filter in the form of ontology, which is user-customized, scalable, and modularized, so that it can be embedded to many other systems for better performance.
NASA Astrophysics Data System (ADS)
Roberts, Brenden; Vidick, Thomas; Motrunich, Olexei I.
2017-12-01
The success of polynomial-time tensor network methods for computing ground states of certain quantum local Hamiltonians has recently been given a sound theoretical basis by Arad et al. [Math. Phys. 356, 65 (2017), 10.1007/s00220-017-2973-z]. The convergence proof, however, relies on "rigorous renormalization group" (RRG) techniques which differ fundamentally from existing algorithms. We introduce a practical adaptation of the RRG procedure which, while no longer theoretically guaranteed to converge, finds matrix product state ansatz approximations to the ground spaces and low-lying excited spectra of local Hamiltonians in realistic situations. In contrast to other schemes, RRG does not utilize variational methods on tensor networks. Rather, it operates on subsets of the system Hilbert space by constructing approximations to the global ground space in a treelike manner. We evaluate the algorithm numerically, finding similar performance to density matrix renormalization group (DMRG) in the case of a gapped nondegenerate Hamiltonian. Even in challenging situations of criticality, large ground-state degeneracy, or long-range entanglement, RRG remains able to identify candidate states having large overlap with ground and low-energy eigenstates, outperforming DMRG in some cases.
Multiratio fusion change detection with adaptive thresholding
NASA Astrophysics Data System (ADS)
Hytla, Patrick C.; Balster, Eric J.; Vasquez, Juan R.; Neuroth, Robert M.
2017-04-01
A ratio-based change detection method known as multiratio fusion (MRF) is proposed and tested. The MRF framework builds on other change detection components proposed in this work: dual ratio (DR) and multiratio (MR). The DR method involves two ratios coupled with adaptive thresholds to maximize detected changes and minimize false alarms. The use of two ratios is shown to outperform the single ratio case when the means of the image pairs are not equal. MR change detection builds on the DR method by including negative imagery to produce four total ratios with adaptive thresholds. Inclusion of negative imagery is shown to improve detection sensitivity and to boost detection performance in certain target and background cases. MRF further expands this concept by fusing together the ratio outputs using a routine in which detections must be verified by two or more ratios to be classified as a true changed pixel. The proposed method is tested with synthetically generated test imagery and real datasets with results compared to other methods found in the literature. DR is shown to significantly outperform the standard single ratio method. MRF produces excellent change detection results that exhibit up to a 22% performance improvement over other methods from the literature at low false-alarm rates.
Huet, Michaël; Jacobs, David M; Camachon, Cyril; Goulon, Cedric; Montagne, Gilles
2009-12-01
This study (a) compares the effectiveness of different types of feedback for novices who learn to land a virtual aircraft in a fixed-base flight simulator and (b) analyzes the informational variables that learners come to use after practice. An extensive body of research exists concerning the informational variables that allow successful landing. In contrast, few studies have examined how the attention of pilots can be directed toward these sources of information. In this study, 15 participants were asked to land a virtual Cessna 172 on 245 trials while trying to follow the glide-slope area as accurately as possible. Three groups of participants practiced under different feedback conditions: with self-controlled concurrent feedback (the self-controlled group), with imposed concurrent feedback (the yoked group), or without concurrent feedback (the control group). The self-controlled group outperformed the yoked group, which in turn outperformed the control group. Removing or manipulating specific sources of information during transfer tests had different effects for different individuals. However, removing the cockpit from the visual scene had a detrimental effect on the performance of the majority of the participants. Self-controlled concurrent feedback helps learners to more quickly attune to the informational variables that allow them to control the aircraft during the approach phase. Knowledge concerning feedback schedules can be used for the design of optimal practice methods for student pilots, and knowledge about the informational variables used by expert performers has implications for the design of cockpits and runways that facilitate the detection of these variables.
Efficient sequential and parallel algorithms for record linkage
Mamun, Abdullah-Al; Mi, Tian; Aseltine, Robert; Rajasekaran, Sanguthevar
2014-01-01
Background and objective Integrating data from multiple sources is a crucial and challenging problem. Even though there exist numerous algorithms for record linkage or deduplication, they suffer from either large time needs or restrictions on the number of datasets that they can integrate. In this paper we report efficient sequential and parallel algorithms for record linkage which handle any number of datasets and outperform previous algorithms. Methods Our algorithms employ hierarchical clustering algorithms as the basis. A key idea that we use is radix sorting on certain attributes to eliminate identical records before any further processing. Another novel idea is to form a graph that links similar records and find the connected components. Results Our sequential and parallel algorithms have been tested on a real dataset of 1 083 878 records and synthetic datasets ranging in size from 50 000 to 9 000 000 records. Our sequential algorithm runs at least two times faster, for any dataset, than the previous best-known algorithm, the two-phase algorithm using faster computation of the edit distance (TPA (FCED)). The speedups obtained by our parallel algorithm are almost linear. For example, we get a speedup of 7.5 with 8 cores (residing in a single node), 14.1 with 16 cores (residing in two nodes), and 26.4 with 32 cores (residing in four nodes). Conclusions We have compared the performance of our sequential algorithm with TPA (FCED) and found that our algorithm outperforms the previous one. The accuracy is the same as that of this previous best-known algorithm. PMID:24154837
Critchfield, Thomas S
2010-01-01
A popular-press self-help manual is reviewed with an eye toward two issues. First, the popularity of such books documents the existence of considerable demand for technologies that address the everyday problems (in the present case, troublesome conversations) of nondisordered individuals. Second, many ideas invoked in popular-press books may be interpretable within an analysis of verbal behavior, although much more than casual translation is required to develop technologies that outperform self-help manuals. I discuss several challenges relevant to research, theory refinement, technology development, and dissemination, and conclude that behavioral alternatives to existing popular-press resources may not emerge anytime soon. PMID:22477467
Dazard, Jean-Eudes; Rao, J. Sunil
2010-01-01
The search for structures in real datasets e.g. in the form of bumps, components, classes or clusters is important as these often reveal underlying phenomena leading to scientific discoveries. One of these tasks, known as bump hunting, is to locate domains of a multidimensional input space where the target function assumes local maxima without pre-specifying their total number. A number of related methods already exist, yet are challenged in the context of high dimensional data. We introduce a novel supervised and multivariate bump hunting strategy for exploring modes or classes of a target function of many continuous variables. This addresses the issues of correlation, interpretability, and high-dimensionality (p ≫ n case), while making minimal assumptions. The method is based upon a divide and conquer strategy, combining a tree-based method, a dimension reduction technique, and the Patient Rule Induction Method (PRIM). Important to this task, we show how to estimate the PRIM meta-parameters. Using accuracy evaluation procedures such as cross-validation and ROC analysis, we show empirically how the method outperforms a naive PRIM as well as competitive non-parametric supervised and unsupervised methods in the problem of class discovery. The method has practical application especially in the case of noisy high-throughput data. It is applied to a class discovery problem in a colon cancer micro-array dataset aimed at identifying tumor subtypes in the metastatic stage. Supplemental Materials are available online. PMID:22399839
Multigrid contact detection method
NASA Astrophysics Data System (ADS)
He, Kejing; Dong, Shoubin; Zhou, Zhaoyao
2007-03-01
Contact detection is a general problem of many physical simulations. This work presents a O(N) multigrid method for general contact detection problems (MGCD). The multigrid idea is integrated with contact detection problems. Both the time complexity and memory consumption of the MGCD are O(N) . Unlike other methods, whose efficiencies are influenced strongly by the object size distribution, the performance of MGCD is insensitive to the object size distribution. We compare the MGCD with the no binary search (NBS) method and the multilevel boxing method in three dimensions for both time complexity and memory consumption. For objects with similar size, the MGCD is as good as the NBS method, both of which outperform the multilevel boxing method regarding memory consumption. For objects with diverse size, the MGCD outperform both the NBS method and the multilevel boxing method. We use the MGCD to solve the contact detection problem for a granular simulation system based on the discrete element method. From this granular simulation, we get the density property of monosize packing and binary packing with size ratio equal to 10. The packing density for monosize particles is 0.636. For binary packing with size ratio equal to 10, when the number of small particles is 300 times as the number of big particles, the maximal packing density 0.824 is achieved.
Jackowski, Konrad; Krawczyk, Bartosz; Woźniak, Michał
2014-05-01
Currently, methods of combined classification are the focus of intense research. A properly designed group of combined classifiers exploiting knowledge gathered in a pool of elementary classifiers can successfully outperform a single classifier. There are two essential issues to consider when creating combined classifiers: how to establish the most comprehensive pool and how to design a fusion model that allows for taking full advantage of the collected knowledge. In this work, we address the issues and propose an AdaSS+, training algorithm dedicated for the compound classifier system that effectively exploits local specialization of the elementary classifiers. An effective training procedure consists of two phases. The first phase detects the classifier competencies and adjusts the respective fusion parameters. The second phase boosts classification accuracy by elevating the degree of local specialization. The quality of the proposed algorithms are evaluated on the basis of a wide range of computer experiments that show that AdaSS+ can outperform the original method and several reference classifiers.
An Exact Algorithm to Compute the Double-Cut-and-Join Distance for Genomes with Duplicate Genes.
Shao, Mingfu; Lin, Yu; Moret, Bernard M E
2015-05-01
Computing the edit distance between two genomes is a basic problem in the study of genome evolution. The double-cut-and-join (DCJ) model has formed the basis for most algorithmic research on rearrangements over the last few years. The edit distance under the DCJ model can be computed in linear time for genomes without duplicate genes, while the problem becomes NP-hard in the presence of duplicate genes. In this article, we propose an integer linear programming (ILP) formulation to compute the DCJ distance between two genomes with duplicate genes. We also provide an efficient preprocessing approach to simplify the ILP formulation while preserving optimality. Comparison on simulated genomes demonstrates that our method outperforms MSOAR in computing the edit distance, especially when the genomes contain long duplicated segments. We also apply our method to assign orthologous gene pairs among human, mouse, and rat genomes, where once again our method outperforms MSOAR.
Comparison of DNA preservation methods for environmental bacterial community samples
Gray, Michael A.; Pratte, Zoe A.; Kellogg, Christina A.
2013-01-01
Field collections of environmental samples, for example corals, for molecular microbial analyses present distinct challenges. The lack of laboratory facilities in remote locations is common, and preservation of microbial community DNA for later study is critical. A particular challenge is keeping samples frozen in transit. Five nucleic acid preservation methods that do not require cold storage were compared for effectiveness over time and ease of use. Mixed microbial communities of known composition were created and preserved by DNAgard™, RNAlater®, DMSO–EDTA–salt (DESS), FTA® cards, and FTA Elute® cards. Automated ribosomal intergenic spacer analysis and clone libraries were used to detect specific changes in the faux communities over weeks and months of storage. A previously known bias in FTA® cards that results in lower recovery of pure cultures of Gram-positive bacteria was also detected in mixed community samples. There appears to be a uniform bias across all five preservation methods against microorganisms with high G + C DNA. Overall, the liquid-based preservatives (DNAgard™, RNAlater®, and DESS) outperformed the card-based methods. No single liquid method clearly outperformed the others, leaving method choice to be based on experimental design, field facilities, shipping constraints, and allowable cost.
Bayesian source term estimation of atmospheric releases in urban areas using LES approach.
Xue, Fei; Kikumoto, Hideki; Li, Xiaofeng; Ooka, Ryozo
2018-05-05
The estimation of source information from limited measurements of a sensor network is a challenging inverse problem, which can be viewed as an assimilation process of the observed concentration data and the predicted concentration data. When dealing with releases in built-up areas, the predicted data are generally obtained by the Reynolds-averaged Navier-Stokes (RANS) equations, which yields building-resolving results; however, RANS-based models are outperformed by large-eddy simulation (LES) in the predictions of both airflow and dispersion. Therefore, it is important to explore the possibility of improving the estimation of the source parameters by using the LES approach. In this paper, a novel source term estimation method is proposed based on LES approach using Bayesian inference. The source-receptor relationship is obtained by solving the adjoint equations constructed using the time-averaged flow field simulated by the LES approach based on the gradient diffusion hypothesis. A wind tunnel experiment with a constant point source downwind of a single building model is used to evaluate the performance of the proposed method, which is compared with that of the existing method using a RANS model. The results show that the proposed method reduces the errors of source location and releasing strength by 77% and 28%, respectively. Copyright © 2018 Elsevier B.V. All rights reserved.
Simultaneous Tensor Decomposition and Completion Using Factor Priors.
Chen, Yi-Lei; Hsu, Chiou-Ting Candy; Liao, Hong-Yuan Mark
2013-08-27
Tensor completion, which is a high-order extension of matrix completion, has generated a great deal of research interest in recent years. Given a tensor with incomplete entries, existing methods use either factorization or completion schemes to recover the missing parts. However, as the number of missing entries increases, factorization schemes may overfit the model because of incorrectly predefined ranks, while completion schemes may fail to interpret the model factors. In this paper, we introduce a novel concept: complete the missing entries and simultaneously capture the underlying model structure. To this end, we propose a method called Simultaneous Tensor Decomposition and Completion (STDC) that combines a rank minimization technique with Tucker model decomposition. Moreover, as the model structure is implicitly included in the Tucker model, we use factor priors, which are usually known a priori in real-world tensor objects, to characterize the underlying joint-manifold drawn from the model factors. We conducted experiments to empirically verify the convergence of our algorithm on synthetic data, and evaluate its effectiveness on various kinds of real-world data. The results demonstrate the efficacy of the proposed method and its potential usage in tensor-based applications. It also outperforms state-of-the-art methods on multilinear model analysis and visual data completion tasks.
Image Quality Assessment Based on Local Linear Information and Distortion-Specific Compensation.
Wang, Hanli; Fu, Jie; Lin, Weisi; Hu, Sudeng; Kuo, C-C Jay; Zuo, Lingxuan
2016-12-14
Image Quality Assessment (IQA) is a fundamental yet constantly developing task for computer vision and image processing. Most IQA evaluation mechanisms are based on the pertinence of subjective and objective estimation. Each image distortion type has its own property correlated with human perception. However, this intrinsic property may not be fully exploited by existing IQA methods. In this paper, we make two main contributions to the IQA field. First, a novel IQA method is developed based on a local linear model that examines the distortion between the reference and the distorted images for better alignment with human visual experience. Second, a distortion-specific compensation strategy is proposed to offset the negative effect on IQA modeling caused by different image distortion types. These score offsets are learned from several known distortion types. Furthermore, for an image with an unknown distortion type, a Convolutional Neural Network (CNN) based method is proposed to compute the score offset automatically. Finally, an integrated IQA metric is proposed by combining the aforementioned two ideas. Extensive experiments are performed to verify the proposed IQA metric, which demonstrate that the local linear model is useful in human perception modeling, especially for individual image distortion, and the overall IQA method outperforms several state-of-the-art IQA approaches.
Accurate and reproducible functional maps in 127 human cell types via 2D genome segmentation
Hardison, Ross C.
2017-01-01
Abstract The Roadmap Epigenomics Consortium has published whole-genome functional annotation maps in 127 human cell types by integrating data from studies of multiple epigenetic marks. These maps have been widely used for studying gene regulation in cell type-specific contexts and predicting the functional impact of DNA mutations on disease. Here, we present a new map of functional elements produced by applying a method called IDEAS on the same data. The method has several unique advantages and outperforms existing methods, including that used by the Roadmap Epigenomics Consortium. Using five categories of independent experimental datasets, we compared the IDEAS and Roadmap Epigenomics maps. While the overall concordance between the two maps is high, the maps differ substantially in the prediction details and in their consistency of annotation of a given genomic position across cell types. The annotation from IDEAS is uniformly more accurate than the Roadmap Epigenomics annotation and the improvement is substantial based on several criteria. We further introduce a pipeline that improves the reproducibility of functional annotation maps. Thus, we provide a high-quality map of candidate functional regions across 127 human cell types and compare the quality of different annotation methods in order to facilitate biomedical research in epigenomics. PMID:28973456
External Prior Guided Internal Prior Learning for Real-World Noisy Image Denoising
NASA Astrophysics Data System (ADS)
Xu, Jun; Zhang, Lei; Zhang, David
2018-06-01
Most of existing image denoising methods learn image priors from either external data or the noisy image itself to remove noise. However, priors learned from external data may not be adaptive to the image to be denoised, while priors learned from the given noisy image may not be accurate due to the interference of corrupted noise. Meanwhile, the noise in real-world noisy images is very complex, which is hard to be described by simple distributions such as Gaussian distribution, making real noisy image denoising a very challenging problem. We propose to exploit the information in both external data and the given noisy image, and develop an external prior guided internal prior learning method for real noisy image denoising. We first learn external priors from an independent set of clean natural images. With the aid of learned external priors, we then learn internal priors from the given noisy image to refine the prior model. The external and internal priors are formulated as a set of orthogonal dictionaries to efficiently reconstruct the desired image. Extensive experiments are performed on several real noisy image datasets. The proposed method demonstrates highly competitive denoising performance, outperforming state-of-the-art denoising methods including those designed for real noisy images.
Vision-Aided RAIM: A New Method for GPS Integrity Monitoring in Approach and Landing Phase
Fu, Li; Zhang, Jun; Li, Rui; Cao, Xianbin; Wang, Jinling
2015-01-01
In the 1980s, Global Positioning System (GPS) receiver autonomous integrity monitoring (RAIM) was proposed to provide the integrity of a navigation system by checking the consistency of GPS measurements. However, during the approach and landing phase of a flight path, where there is often low GPS visibility conditions, the performance of the existing RAIM method may not meet the stringent aviation requirements for availability and integrity due to insufficient observations. To solve this problem, a new RAIM method, named vision-aided RAIM (VA-RAIM), is proposed for GPS integrity monitoring in the approach and landing phase. By introducing landmarks as pseudo-satellites, the VA-RAIM enriches the navigation observations to improve the performance of RAIM. In the method, a computer vision system photographs and matches these landmarks to obtain additional measurements for navigation. Nevertheless, the challenging issue is that such additional measurements may suffer from vision errors. To ensure the reliability of the vision measurements, a GPS-based calibration algorithm is presented to reduce the time-invariant part of the vision errors. Then, the calibrated vision measurements are integrated with the GPS observations for integrity monitoring. Simulation results show that the VA-RAIM outperforms the conventional RAIM with a higher level of availability and fault detection rate. PMID:26378533
Vision-Aided RAIM: A New Method for GPS Integrity Monitoring in Approach and Landing Phase.
Fu, Li; Zhang, Jun; Li, Rui; Cao, Xianbin; Wang, Jinling
2015-09-10
In the 1980s, Global Positioning System (GPS) receiver autonomous integrity monitoring (RAIM) was proposed to provide the integrity of a navigation system by checking the consistency of GPS measurements. However, during the approach and landing phase of a flight path, where there is often low GPS visibility conditions, the performance of the existing RAIM method may not meet the stringent aviation requirements for availability and integrity due to insufficient observations. To solve this problem, a new RAIM method, named vision-aided RAIM (VA-RAIM), is proposed for GPS integrity monitoring in the approach and landing phase. By introducing landmarks as pseudo-satellites, the VA-RAIM enriches the navigation observations to improve the performance of RAIM. In the method, a computer vision system photographs and matches these landmarks to obtain additional measurements for navigation. Nevertheless, the challenging issue is that such additional measurements may suffer from vision errors. To ensure the reliability of the vision measurements, a GPS-based calibration algorithm is presented to reduce the time-invariant part of the vision errors. Then, the calibrated vision measurements are integrated with the GPS observations for integrity monitoring. Simulation results show that the VA-RAIM outperforms the conventional RAIM with a higher level of availability and fault detection rate.
A Class of Prediction-Correction Methods for Time-Varying Convex Optimization
NASA Astrophysics Data System (ADS)
Simonetto, Andrea; Mokhtari, Aryan; Koppel, Alec; Leus, Geert; Ribeiro, Alejandro
2016-09-01
This paper considers unconstrained convex optimization problems with time-varying objective functions. We propose algorithms with a discrete time-sampling scheme to find and track the solution trajectory based on prediction and correction steps, while sampling the problem data at a constant rate of $1/h$, where $h$ is the length of the sampling interval. The prediction step is derived by analyzing the iso-residual dynamics of the optimality conditions. The correction step adjusts for the distance between the current prediction and the optimizer at each time step, and consists either of one or multiple gradient steps or Newton steps, which respectively correspond to the gradient trajectory tracking (GTT) or Newton trajectory tracking (NTT) algorithms. Under suitable conditions, we establish that the asymptotic error incurred by both proposed methods behaves as $O(h^2)$, and in some cases as $O(h^4)$, which outperforms the state-of-the-art error bound of $O(h)$ for correction-only methods in the gradient-correction step. Moreover, when the characteristics of the objective function variation are not available, we propose approximate gradient and Newton tracking algorithms (AGT and ANT, respectively) that still attain these asymptotical error bounds. Numerical simulations demonstrate the practical utility of the proposed methods and that they improve upon existing techniques by several orders of magnitude.
Predictive functional control for active queue management in congested TCP/IP networks.
Bigdeli, N; Haeri, M
2009-01-01
Predictive functional control (PFC) as a new active queue management (AQM) method in dynamic TCP networks supporting explicit congestion notification (ECN) is proposed. The ability of the controller in handling system delay along with its simplicity and low computational load makes PFC a privileged AQM method in the high speed networks. Besides, considering the disturbance term (which represents model/process mismatches, external disturbances, and existing noise) in the control formulation adds some level of robustness into the PFC-AQM controller. This is an important and desired property in the control of dynamically-varying computer networks. In this paper, the controller is designed based on a small signal linearized fluid-flow model of the TCP/AQM networks. Then, closed-loop transfer function representation of the system is derived to analyze the robustness with respect to the network and controller parameters. The analytical as well as the packet-level ns-2 simulation results show the out-performance of the developed controller for both queue regulation and resource utilization. Fast response, low queue fluctuations (and consequently low delay jitter), high link utilization, good disturbance rejection, scalability, and low packet marking probability are other features of the developed method with respect to other well-known AQM methods such as RED, PI, and REM which are also simulated for comparison.
Waytowich, Nicholas R.; Lawhern, Vernon J.; Bohannon, Addison W.; ...
2016-09-22
Recent advances in signal processing and machine learning techniques have enabled the application of Brain-Computer Interface (BCI) technologies to fields such as medicine, industry, and recreation; however, BCIs still suffer from the requirement of frequent calibration sessions due to the intra- and inter-individual variability of brain-signals, which makes calibration suppression through transfer learning an area of increasing interest for the development of practical BCI systems. In this paper, we present an unsupervised transfer method (spectral transfer using information geometry,STIG),which ranks and combines unlabeled predictions from an ensemble of information geometry classifiers built on data from individual training subjects. The STIGmore » method is validated in both off-line and real-time feedback analysis during a rapid serial visual presentation task (RSVP). For detection of single-trial, event-related potentials (ERPs), the proposed method can significantly outperform existing calibration-free techniques as well as out perform traditional within-subject calibration techniques when limited data is available. Here, this method demonstrates that unsupervised transfer learning for single-trial detection in ERP-based BCIs can be achieved without the requirement of costly training data, representing a step-forward in the overall goal of achieving a practical user-independent BCI system.« less
NASA Astrophysics Data System (ADS)
Yu, H.; Barriga, S.; Agurto, C.; Zamora, G.; Bauman, W.; Soliz, P.
2012-03-01
Retinal vasculature is one of the most important anatomical structures in digital retinal photographs. Accurate segmentation of retinal blood vessels is an essential task in automated analysis of retinopathy. This paper presents a new and effective vessel segmentation algorithm that features computational simplicity and fast implementation. This method uses morphological pre-processing to decrease the disturbance of bright structures and lesions before vessel extraction. Next, a vessel probability map is generated by computing the eigenvalues of the second derivatives of Gaussian filtered image at multiple scales. Then, the second order local entropy thresholding is applied to segment the vessel map. Lastly, a rule-based decision step, which measures the geometric shape difference between vessels and lesions is applied to reduce false positives. The algorithm is evaluated on the low-resolution DRIVE and STARE databases and the publicly available high-resolution image database from Friedrich-Alexander University Erlangen-Nuremberg, Germany). The proposed method achieved comparable performance to state of the art unsupervised vessel segmentation methods with a competitive faster speed on the DRIVE and STARE databases. For the high resolution fundus image database, the proposed algorithm outperforms an existing approach both on performance and speed. The efficiency and robustness make the blood vessel segmentation method described here suitable for broad application in automated analysis of retinal images.
A permutation-based non-parametric analysis of CRISPR screen data.
Jia, Gaoxiang; Wang, Xinlei; Xiao, Guanghua
2017-07-19
Clustered regularly-interspaced short palindromic repeats (CRISPR) screens are usually implemented in cultured cells to identify genes with critical functions. Although several methods have been developed or adapted to analyze CRISPR screening data, no single specific algorithm has gained popularity. Thus, rigorous procedures are needed to overcome the shortcomings of existing algorithms. We developed a Permutation-Based Non-Parametric Analysis (PBNPA) algorithm, which computes p-values at the gene level by permuting sgRNA labels, and thus it avoids restrictive distributional assumptions. Although PBNPA is designed to analyze CRISPR data, it can also be applied to analyze genetic screens implemented with siRNAs or shRNAs and drug screens. We compared the performance of PBNPA with competing methods on simulated data as well as on real data. PBNPA outperformed recent methods designed for CRISPR screen analysis, as well as methods used for analyzing other functional genomics screens, in terms of Receiver Operating Characteristics (ROC) curves and False Discovery Rate (FDR) control for simulated data under various settings. Remarkably, the PBNPA algorithm showed better consistency and FDR control on published real data as well. PBNPA yields more consistent and reliable results than its competitors, especially when the data quality is low. R package of PBNPA is available at: https://cran.r-project.org/web/packages/PBNPA/ .
Missing value imputation for gene expression data by tailored nearest neighbors.
Faisal, Shahla; Tutz, Gerhard
2017-04-25
High dimensional data like gene expression and RNA-sequences often contain missing values. The subsequent analysis and results based on these incomplete data can suffer strongly from the presence of these missing values. Several approaches to imputation of missing values in gene expression data have been developed but the task is difficult due to the high dimensionality (number of genes) of the data. Here an imputation procedure is proposed that uses weighted nearest neighbors. Instead of using nearest neighbors defined by a distance that includes all genes the distance is computed for genes that are apt to contribute to the accuracy of imputed values. The method aims at avoiding the curse of dimensionality, which typically occurs if local methods as nearest neighbors are applied in high dimensional settings. The proposed weighted nearest neighbors algorithm is compared to existing missing value imputation techniques like mean imputation, KNNimpute and the recently proposed imputation by random forests. We use RNA-sequence and microarray data from studies on human cancer to compare the performance of the methods. The results from simulations as well as real studies show that the weighted distance procedure can successfully handle missing values for high dimensional data structures where the number of predictors is larger than the number of samples. The method typically outperforms the considered competitors.
Ma, Jianzhu; Wang, Sheng
2015-01-01
Motivation. The solvent accessibility of protein residues is one of the driving forces of protein folding, while the contact number of protein residues limits the possibilities of protein conformations. The de novo prediction of these properties from protein sequence is important for the study of protein structure and function. Although these two properties are certainly related with each other, it is challenging to exploit this dependency for the prediction. Method. We present a method AcconPred for predicting solvent accessibility and contact number simultaneously, which is based on a shared weight multitask learning framework under the CNF (conditional neural fields) model. The multitask learning framework on a collection of related tasks provides more accurate prediction than the framework trained only on a single task. The CNF method not only models the complex relationship between the input features and the predicted labels, but also exploits the interdependency among adjacent labels. Results. Trained on 5729 monomeric soluble globular protein datasets, AcconPred could reach 0.68 three-state accuracy for solvent accessibility and 0.75 correlation for contact number. Tested on the 105 CASP11 domain datasets for solvent accessibility, AcconPred could reach 0.64 accuracy, which outperforms existing methods. PMID:26339631
Sphinx: merging knowledge-based and ab initio approaches to improve protein loop prediction
Marks, Claire; Nowak, Jaroslaw; Klostermann, Stefan; Georges, Guy; Dunbar, James; Shi, Jiye; Kelm, Sebastian
2017-01-01
Abstract Motivation: Loops are often vital for protein function, however, their irregular structures make them difficult to model accurately. Current loop modelling algorithms can mostly be divided into two categories: knowledge-based, where databases of fragments are searched to find suitable conformations and ab initio, where conformations are generated computationally. Existing knowledge-based methods only use fragments that are the same length as the target, even though loops of slightly different lengths may adopt similar conformations. Here, we present a novel method, Sphinx, which combines ab initio techniques with the potential extra structural information contained within loops of a different length to improve structure prediction. Results: We show that Sphinx is able to generate high-accuracy predictions and decoy sets enriched with near-native loop conformations, performing better than the ab initio algorithm on which it is based. In addition, it is able to provide predictions for every target, unlike some knowledge-based methods. Sphinx can be used successfully for the difficult problem of antibody H3 prediction, outperforming RosettaAntibody, one of the leading H3-specific ab initio methods, both in accuracy and speed. Availability and Implementation: Sphinx is available at http://opig.stats.ox.ac.uk/webapps/sphinx. Contact: deane@stats.ox.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28453681
Sphinx: merging knowledge-based and ab initio approaches to improve protein loop prediction.
Marks, Claire; Nowak, Jaroslaw; Klostermann, Stefan; Georges, Guy; Dunbar, James; Shi, Jiye; Kelm, Sebastian; Deane, Charlotte M
2017-05-01
Loops are often vital for protein function, however, their irregular structures make them difficult to model accurately. Current loop modelling algorithms can mostly be divided into two categories: knowledge-based, where databases of fragments are searched to find suitable conformations and ab initio, where conformations are generated computationally. Existing knowledge-based methods only use fragments that are the same length as the target, even though loops of slightly different lengths may adopt similar conformations. Here, we present a novel method, Sphinx, which combines ab initio techniques with the potential extra structural information contained within loops of a different length to improve structure prediction. We show that Sphinx is able to generate high-accuracy predictions and decoy sets enriched with near-native loop conformations, performing better than the ab initio algorithm on which it is based. In addition, it is able to provide predictions for every target, unlike some knowledge-based methods. Sphinx can be used successfully for the difficult problem of antibody H3 prediction, outperforming RosettaAntibody, one of the leading H3-specific ab initio methods, both in accuracy and speed. Sphinx is available at http://opig.stats.ox.ac.uk/webapps/sphinx. deane@stats.ox.ac.uk. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.
Safo, Sandra E; Li, Shuzhao; Long, Qi
2018-03-01
Integrative analysis of high dimensional omics data is becoming increasingly popular. At the same time, incorporating known functional relationships among variables in analysis of omics data has been shown to help elucidate underlying mechanisms for complex diseases. In this article, our goal is to assess association between transcriptomic and metabolomic data from a Predictive Health Institute (PHI) study that includes healthy adults at a high risk of developing cardiovascular diseases. Adopting a strategy that is both data-driven and knowledge-based, we develop statistical methods for sparse canonical correlation analysis (CCA) with incorporation of known biological information. Our proposed methods use prior network structural information among genes and among metabolites to guide selection of relevant genes and metabolites in sparse CCA, providing insight on the molecular underpinning of cardiovascular disease. Our simulations demonstrate that the structured sparse CCA methods outperform several existing sparse CCA methods in selecting relevant genes and metabolites when structural information is informative and are robust to mis-specified structural information. Our analysis of the PHI study reveals that a number of gene and metabolic pathways including some known to be associated with cardiovascular diseases are enriched in the set of genes and metabolites selected by our proposed approach. © 2017, The International Biometric Society.
Barmpoutis, Angelos
2010-01-01
Registration of Diffusion-Weighted MR Images (DW-MRI) can be achieved by registering the corresponding 2nd-order Diffusion Tensor Images (DTI). However, it has been shown that higher-order diffusion tensors (e.g. order-4) outperform the traditional DTI in approximating complex fiber structures such as fiber crossings. In this paper we present a novel method for unbiased group-wise non-rigid registration and atlas construction of 4th-order diffusion tensor fields. To the best of our knowledge there is no other existing method to achieve this task. First we define a metric on the space of positive-valued functions based on the Riemannian metric of real positive numbers (denoted by ℝ+). Then, we use this metric in a novel functional minimization method for non-rigid 4th-order tensor field registration. We define a cost function that accounts for the 4th-order tensor re-orientation during the registration process and has analytic derivatives with respect to the transformation parameters. Finally, the tensor field atlas is computed as the minimizer of the variance defined using the Riemannian metric. We quantitatively compare the proposed method with other techniques that register scalar-valued or diffusion tensor (rank-2) representations of the DWMRI. PMID:20436782
DOE Office of Scientific and Technical Information (OSTI.GOV)
Waytowich, Nicholas R.; Lawhern, Vernon J.; Bohannon, Addison W.
Recent advances in signal processing and machine learning techniques have enabled the application of Brain-Computer Interface (BCI) technologies to fields such as medicine, industry, and recreation; however, BCIs still suffer from the requirement of frequent calibration sessions due to the intra- and inter-individual variability of brain-signals, which makes calibration suppression through transfer learning an area of increasing interest for the development of practical BCI systems. In this paper, we present an unsupervised transfer method (spectral transfer using information geometry,STIG),which ranks and combines unlabeled predictions from an ensemble of information geometry classifiers built on data from individual training subjects. The STIGmore » method is validated in both off-line and real-time feedback analysis during a rapid serial visual presentation task (RSVP). For detection of single-trial, event-related potentials (ERPs), the proposed method can significantly outperform existing calibration-free techniques as well as out perform traditional within-subject calibration techniques when limited data is available. Here, this method demonstrates that unsupervised transfer learning for single-trial detection in ERP-based BCIs can be achieved without the requirement of costly training data, representing a step-forward in the overall goal of achieving a practical user-independent BCI system.« less
Orenstein, Yaron; Wang, Yuhao; Berger, Bonnie
2016-06-15
Protein-RNA interactions, which play vital roles in many processes, are mediated through both RNA sequence and structure. CLIP-based methods, which measure protein-RNA binding in vivo, suffer from experimental noise and systematic biases, whereas in vitro experiments capture a clearer signal of protein RNA-binding. Among them, RNAcompete provides binding affinities of a specific protein to more than 240 000 unstructured RNA probes in one experiment. The computational challenge is to infer RNA structure- and sequence-based binding models from these data. The state-of-the-art in sequence models, Deepbind, does not model structural preferences. RNAcontext models both sequence and structure preferences, but is outperformed by GraphProt. Unfortunately, GraphProt cannot detect structural preferences from RNAcompete data due to the unstructured nature of the data, as noted by its developers, nor can it be tractably run on the full RNACompete dataset. We develop RCK, an efficient, scalable algorithm that infers both sequence and structure preferences based on a new k-mer based model. Remarkably, even though RNAcompete data is designed to be unstructured, RCK can still learn structural preferences from it. RCK significantly outperforms both RNAcontext and Deepbind in in vitro binding prediction for 244 RNAcompete experiments. Moreover, RCK is also faster and uses less memory, which enables scalability. While currently on par with existing methods in in vivo binding prediction on a small scale test, we demonstrate that RCK will increasingly benefit from experimentally measured RNA structure profiles as compared to computationally predicted ones. By running RCK on the entire RNAcompete dataset, we generate and provide as a resource a set of protein-RNA structure-based models on an unprecedented scale. Software and models are freely available at http://rck.csail.mit.edu/ bab@mit.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Yanjuan; College of Material Science and Engineering, Key Laboratory of Automobile Materials of Ministry of Education, Jilin University, 2699 Qianjin Street, Changchun 130012; Li, Nan, E-mail: lin@jlu.edu.cn
2015-05-15
Highlights: • Highly crystalline RuS{sub 2} nanoparticles have been first synthesized by a “one-step” hydrothermal method. • The product presents a pure cubic phase of stoichiometric ratio RuS{sub 2} with average particle size of 14.8 nm. • RuS{sub 2} nanoparticles were used as cathodic catalysts in methanol fuel cell and hydrochloric acid electrolysis. • The catalyst outperforms commercial Pt/C in methanol tolerance and stability towards Cl{sup −}. - Abstract: Highly crystalline ruthenium sulfide (RuS{sub 2}) nanoparticles have been first synthesized by a “one-step” hydrothermal method at 400 °C, using ruthenium chloride and thiourea as reactants. The products were characterized bymore » powder X-ray diffraction (XRD), scanning electron microscopy/energy disperse spectroscopy (SEM/EDS), thermo gravimetric-differential thermal analyze (TG-DTA), transmission electron microscopy equipped with selected area electron diffraction (TEM/SAED). Fourier transform infrared spectra (IR), and X-ray photoelectron spectroscopy (XPS). XRD result illustrates that the highly crystalline product presents a pure cubic phase of stoichiometric ratio RuS{sub 2} and the average particle size is 14.8 nm. SEM and TEM images display the products have irregular shape of 6–25 nm. XPS analyst indicates that the sulfur exists in the form of S{sub 2}{sup 2−}. Cyclic voltammetry (CV), rotating disk electrode (RDE), chronoamperometry (CA) and electrochemical impedance spectroscopy (EIS) measurements are conducted to evaluate the electrocatalytic activity and stability of the highly crystalline RuS{sub 2} nanoparticles in oxygen reduction reaction (ORR) for methanol fuel cell and hydrochloric acid electrolysis. The results illustrate that RuS{sub 2} is active towards oxygen reduction reaction. Although the activity of RuS{sub 2} is lower than that of Pt/C, the RuS{sub 2} catalyst outperforms commercial Pt/C in methanol tolerance and stability towards Cl{sup −}.« less
Multiple-Beam Detection of Fast Transient Radio Sources
NASA Technical Reports Server (NTRS)
Thompson, David R.; Wagstaff, Kiri L.; Majid, Walid A.
2011-01-01
A method has been designed for using multiple independent stations to discriminate fast transient radio sources from local anomalies, such as antenna noise or radio frequency interference (RFI). This can improve the sensitivity of incoherent detection for geographically separated stations such as the very long baseline array (VLBA), the future square kilometer array (SKA), or any other coincident observations by multiple separated receivers. The transients are short, broadband pulses of radio energy, often just a few milliseconds long, emitted by a variety of exotic astronomical phenomena. They generally represent rare, high-energy events making them of great scientific value. For RFI-robust adaptive detection of transients, using multiple stations, a family of algorithms has been developed. The technique exploits the fact that the separated stations constitute statistically independent samples of the target. This can be used to adaptively ignore RFI events for superior sensitivity. If the antenna signals are independent and identically distributed (IID), then RFI events are simply outlier data points that can be removed through robust estimation such as a trimmed or Winsorized estimator. The alternative "trimmed" estimator is considered, which excises the strongest n signals from the list of short-beamed intensities. Because local RFI is independent at each antenna, this interference is unlikely to occur at many antennas on the same step. Trimming the strongest signals provides robustness to RFI that can theoretically outperform even the detection performance of the same number of antennas at a single site. This algorithm requires sorting the signals at each time step and dispersion measure, an operation that is computationally tractable for existing array sizes. An alternative uses the various stations to form an ensemble estimate of the conditional density function (CDF) evaluated at each time step. Both methods outperform standard detection strategies on a test sequence of VLBA data, and both are efficient enough for deployment in real-time, online transient detection applications.
RENT+: an improved method for inferring local genealogical trees from haplotypes with recombination.
Mirzaei, Sajad; Wu, Yufeng
2017-04-01
: Haplotypes from one or multiple related populations share a common genealogical history. If this shared genealogy can be inferred from haplotypes, it can be very useful for many population genetics problems. However, with the presence of recombination, the genealogical history of haplotypes is complex and cannot be represented by a single genealogical tree. Therefore, inference of genealogical history with recombination is much more challenging than the case of no recombination. : In this paper, we present a new approach called RENT+ for the inference of local genealogical trees from haplotypes with the presence of recombination. RENT+ builds on a previous genealogy inference approach called RENT , which infers a set of related genealogical trees at different genomic positions. RENT+ represents a significant improvement over RENT in the sense that it is more effective in extracting information contained in the haplotype data about the underlying genealogy than RENT . The key components of RENT+ are several greatly enhanced genealogy inference rules. Through simulation, we show that RENT+ is more efficient and accurate than several existing genealogy inference methods. As an application, we apply RENT+ in the inference of population demographic history from haplotypes, which outperforms several existing methods. : RENT+ is implemented in Java, and is freely available for download from: https://github.com/SajadMirzaei/RentPlus . : sajad@engr.uconn.edu or ywu@engr.uconn.edu. : Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Linden, Ariel
2017-08-01
When a randomized controlled trial is not feasible, health researchers typically use observational data and rely on statistical methods to adjust for confounding when estimating treatment effects. These methods generally fall into 3 categories: (1) estimators based on a model for the outcome using conventional regression adjustment; (2) weighted estimators based on the propensity score (ie, a model for the treatment assignment); and (3) "doubly robust" (DR) estimators that model both the outcome and propensity score within the same framework. In this paper, we introduce a new DR estimator that utilizes marginal mean weighting through stratification (MMWS) as the basis for weighted adjustment. This estimator may prove more accurate than treatment effect estimators because MMWS has been shown to be more accurate than other models when the propensity score is misspecified. We therefore compare the performance of this new estimator to other commonly used treatment effects estimators. Monte Carlo simulation is used to compare the DR-MMWS estimator to regression adjustment, 2 weighted estimators based on the propensity score and 2 other DR methods. To assess performance under varied conditions, we vary the level of misspecification of the propensity score model as well as misspecify the outcome model. Overall, DR estimators generally outperform methods that model one or the other components (eg, propensity score or outcome). The DR-MMWS estimator outperforms all other estimators when both the propensity score and outcome models are misspecified and performs equally as well as other DR estimators when only the propensity score is misspecified. Health researchers should consider using DR-MMWS as the principal evaluation strategy in observational studies, as this estimator appears to outperform other estimators in its class. © 2017 John Wiley & Sons, Ltd.
Bayesian model aggregation for ensemble-based estimates of protein pKa values
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gosink, Luke J.; Hogan, Emilie A.; Pulsipher, Trenton C.
2014-03-01
This paper investigates an ensemble-based technique called Bayesian Model Averaging (BMA) to improve the performance of protein amino acid pmore » $$K_a$$ predictions. Structure-based p$$K_a$$ calculations play an important role in the mechanistic interpretation of protein structure and are also used to determine a wide range of protein properties. A diverse set of methods currently exist for p$$K_a$$ prediction, ranging from empirical statistical models to {\\it ab initio} quantum mechanical approaches. However, each of these methods are based on a set of assumptions that have inherent bias and sensitivities that can effect a model's accuracy and generalizability for p$$K_a$$ prediction in complicated biomolecular systems. We use BMA to combine eleven diverse prediction methods that each estimate pKa values of amino acids in staphylococcal nuclease. These methods are based on work conducted for the pKa Cooperative and the pKa measurements are based on experimental work conducted by the Garc{\\'i}a-Moreno lab. Our study demonstrates that the aggregated estimate obtained from BMA outperforms all individual prediction methods in our cross-validation study with improvements from 40-70\\% over other method classes. This work illustrates a new possible mechanism for improving the accuracy of p$$K_a$$ prediction and lays the foundation for future work on aggregate models that balance computational cost with prediction accuracy.« less
Context-Aware Local Binary Feature Learning for Face Recognition.
Duan, Yueqi; Lu, Jiwen; Feng, Jianjiang; Zhou, Jie
2018-05-01
In this paper, we propose a context-aware local binary feature learning (CA-LBFL) method for face recognition. Unlike existing learning-based local face descriptors such as discriminant face descriptor (DFD) and compact binary face descriptor (CBFD) which learn each feature code individually, our CA-LBFL exploits the contextual information of adjacent bits by constraining the number of shifts from different binary bits, so that more robust information can be exploited for face representation. Given a face image, we first extract pixel difference vectors (PDV) in local patches, and learn a discriminative mapping in an unsupervised manner to project each pixel difference vector into a context-aware binary vector. Then, we perform clustering on the learned binary codes to construct a codebook, and extract a histogram feature for each face image with the learned codebook as the final representation. In order to exploit local information from different scales, we propose a context-aware local binary multi-scale feature learning (CA-LBMFL) method to jointly learn multiple projection matrices for face representation. To make the proposed methods applicable for heterogeneous face recognition, we present a coupled CA-LBFL (C-CA-LBFL) method and a coupled CA-LBMFL (C-CA-LBMFL) method to reduce the modality gap of corresponding heterogeneous faces in the feature level, respectively. Extensive experimental results on four widely used face datasets clearly show that our methods outperform most state-of-the-art face descriptors.
SimBA: simulation algorithm to fit extant-population distributions.
Parida, Laxmi; Haiminen, Niina
2015-03-14
Simulation of populations with specified characteristics such as allele frequencies, linkage disequilibrium etc., is an integral component of many studies, including in-silico breeding optimization. Since the accuracy and sensitivity of population simulation is critical to the quality of the output of the applications that use them, accurate algorithms are required to provide a strong foundation to the methods in these studies. In this paper we present SimBA (Simulation using Best-fit Algorithm) a non-generative approach, based on a combination of stochastic techniques and discrete methods. We optimize a hill climbing algorithm and extend the framework to include multiple subpopulation structures. Additionally, we show that SimBA is very sensitive to the input specifications, i.e., very similar but distinct input characteristics result in distinct outputs with high fidelity to the specified distributions. This property of the simulation is not explicitly modeled or studied by previous methods. We show that SimBA outperforms the existing population simulation methods, both in terms of accuracy as well as time-efficiency. Not only does it construct populations that meet the input specifications more stringently than other published methods, SimBA is also easy to use. It does not require explicit parameter adaptations or calibrations. Also, it can work with input specified as distributions, without an exemplar matrix or population as required by some methods. SimBA is available at http://researcher.ibm.com/project/5669 .
BioGraph: unsupervised biomedical knowledge discovery via automated hypothesis generation
2011-01-01
We present BioGraph, a data integration and data mining platform for the exploration and discovery of biomedical information. The platform offers prioritizations of putative disease genes, supported by functional hypotheses. We show that BioGraph can retrospectively confirm recently discovered disease genes and identify potential susceptibility genes, outperforming existing technologies, without requiring prior domain knowledge. Additionally, BioGraph allows for generic biomedical applications beyond gene discovery. BioGraph is accessible at http://www.biograph.be. PMID:21696594
deepNF: Deep network fusion for protein function prediction.
Gligorijevic, Vladimir; Barot, Meet; Bonneau, Richard
2018-06-01
The prevalence of high-throughput experimental methods has resulted in an abundance of large-scale molecular and functional interaction networks. The connectivity of these networks provides a rich source of information for inferring functional annotations for genes and proteins. An important challenge has been to develop methods for combining these heterogeneous networks to extract useful protein feature representations for function prediction. Most of the existing approaches for network integration use shallow models that encounter difficulty in capturing complex and highly-nonlinear network structures. Thus, we propose deepNF, a network fusion method based on Multimodal Deep Autoencoders to extract high-level features of proteins from multiple heterogeneous interaction networks. We apply this method to combine STRING networks to construct a common low-dimensional representation containing high-level protein features. We use separate layers for different network types in the early stages of the multimodal autoencoder, later connecting all the layers into a single bottleneck layer from which we extract features to predict protein function. We compare the cross-validation and temporal holdout predictive performance of our method with state-of-the-art methods, including the recently proposed method Mashup. Our results show that our method outperforms previous methods for both human and yeast STRING networks. We also show substantial improvement in the performance of our method in predicting GO terms of varying type and specificity. deepNF is freely available at: https://github.com/VGligorijevic/deepNF. vgligorijevic@flatironinstitute.org, rb133@nyu.edu. Supplementary data are available at Bioinformatics online.
Warmerdam, G; Vullings, R; Van Pul, C; Andriessen, P; Oei, S G; Wijn, P
2013-01-01
Non-invasive fetal electrocardiography (ECG) can be used for prolonged monitoring of the fetal heart rate (FHR). However, the signal-to-noise-ratio (SNR) of non-invasive ECG recordings is often insufficient for reliable detection of the FHR. To overcome this problem, source separation techniques can be used to enhance the fetal ECG. This study uses a physiology-based source separation (PBSS) technique that has already been demonstrated to outperform widely used blind source separation techniques. Despite the relatively good performance of PBSS in enhancing the fetal ECG, PBSS is still susceptible to artifacts. In this study an augmented PBSS technique is developed to reduce the influence of artifacts. The performance of the developed method is compared to PBSS on multi-channel non-invasive fetal ECG recordings. Based on this comparison, the developed method is shown to outperform PBSS for the enhancement of the fetal ECG.
Wide coverage biomedical event extraction using multiple partially overlapping corpora
2013-01-01
Background Biomedical events are key to understanding physiological processes and disease, and wide coverage extraction is required for comprehensive automatic analysis of statements describing biomedical systems in the literature. In turn, the training and evaluation of extraction methods requires manually annotated corpora. However, as manual annotation is time-consuming and expensive, any single event-annotated corpus can only cover a limited number of semantic types. Although combined use of several such corpora could potentially allow an extraction system to achieve broad semantic coverage, there has been little research into learning from multiple corpora with partially overlapping semantic annotation scopes. Results We propose a method for learning from multiple corpora with partial semantic annotation overlap, and implement this method to improve our existing event extraction system, EventMine. An evaluation using seven event annotated corpora, including 65 event types in total, shows that learning from overlapping corpora can produce a single, corpus-independent, wide coverage extraction system that outperforms systems trained on single corpora and exceeds previously reported results on two established event extraction tasks from the BioNLP Shared Task 2011. Conclusions The proposed method allows the training of a wide-coverage, state-of-the-art event extraction system from multiple corpora with partial semantic annotation overlap. The resulting single model makes broad-coverage extraction straightforward in practice by removing the need to either select a subset of compatible corpora or semantic types, or to merge results from several models trained on different individual corpora. Multi-corpus learning also allows annotation efforts to focus on covering additional semantic types, rather than aiming for exhaustive coverage in any single annotation effort, or extending the coverage of semantic types annotated in existing corpora. PMID:23731785
2013-01-01
Background Differential gene expression (DGE) analysis is commonly used to reveal the deregulated molecular mechanisms of complex diseases. However, traditional DGE analysis (e.g., the t test or the rank sum test) tests each gene independently without considering interactions between them. Top-ranked differentially regulated genes prioritized by the analysis may not directly relate to the coherent molecular changes underlying complex diseases. Joint analyses of co-expression and DGE have been applied to reveal the deregulated molecular modules underlying complex diseases. Most of these methods consist of separate steps: first to identify gene-gene relationships under the studied phenotype then to integrate them with gene expression changes for prioritizing signature genes, or vice versa. It is warrant a method that can simultaneously consider gene-gene co-expression strength and corresponding expression level changes so that both types of information can be leveraged optimally. Results In this paper, we develop a gene module based method for differential gene expression analysis, named network-based differential gene expression (nDGE) analysis, a one-step integrative process for prioritizing deregulated genes and grouping them into gene modules. We demonstrate that nDGE outperforms existing methods in prioritizing deregulated genes and discovering deregulated gene modules using simulated data sets. When tested on a series of smoker and non-smoker lung adenocarcinoma data sets, we show that top differentially regulated genes identified by the rank sum test in different sets are not consistent while top ranked genes defined by nDGE in different data sets significantly overlap. nDGE results suggest that a differentially regulated gene module, which is enriched for cell cycle related genes and E2F1 targeted genes, plays a role in the molecular differences between smoker and non-smoker lung adenocarcinoma. Conclusions In this paper, we develop nDGE to prioritize deregulated genes and group them into gene modules by simultaneously considering gene expression level changes and gene-gene co-regulations. When applied to both simulated and empirical data, nDGE outperforms the traditional DGE method. More specifically, when applied to smoker and non-smoker lung cancer sets, nDGE results illustrate the molecular differences between smoker and non-smoker lung cancer. PMID:24341432
Li, Ben; Sun, Zhaonan; He, Qing; Zhu, Yu; Qin, Zhaohui S
2016-03-01
Modern high-throughput biotechnologies such as microarray are capable of producing a massive amount of information for each sample. However, in a typical high-throughput experiment, only limited number of samples were assayed, thus the classical 'large p, small n' problem. On the other hand, rapid propagation of these high-throughput technologies has resulted in a substantial collection of data, often carried out on the same platform and using the same protocol. It is highly desirable to utilize the existing data when performing analysis and inference on a new dataset. Utilizing existing data can be carried out in a straightforward fashion under the Bayesian framework in which the repository of historical data can be exploited to build informative priors and used in new data analysis. In this work, using microarray data, we investigate the feasibility and effectiveness of deriving informative priors from historical data and using them in the problem of detecting differentially expressed genes. Through simulation and real data analysis, we show that the proposed strategy significantly outperforms existing methods including the popular and state-of-the-art Bayesian hierarchical model-based approaches. Our work illustrates the feasibility and benefits of exploiting the increasingly available genomics big data in statistical inference and presents a promising practical strategy for dealing with the 'large p, small n' problem. Our method is implemented in R package IPBT, which is freely available from https://github.com/benliemory/IPBT CONTACT: yuzhu@purdue.edu; zhaohui.qin@emory.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Improving cerebellar segmentation with statistical fusion
NASA Astrophysics Data System (ADS)
Plassard, Andrew J.; Yang, Zhen; Prince, Jerry L.; Claassen, Daniel O.; Landman, Bennett A.
2016-03-01
The cerebellum is a somatotopically organized central component of the central nervous system well known to be involved with motor coordination and increasingly recognized roles in cognition and planning. Recent work in multiatlas labeling has created methods that offer the potential for fully automated 3-D parcellation of the cerebellar lobules and vermis (which are organizationally equivalent to cortical gray matter areas). This work explores the trade offs of using different statistical fusion techniques and post hoc optimizations in two datasets with distinct imaging protocols. We offer a novel fusion technique by extending the ideas of the Selective and Iterative Method for Performance Level Estimation (SIMPLE) to a patch-based performance model. We demonstrate the effectiveness of our algorithm, Non- Local SIMPLE, for segmentation of a mixed population of healthy subjects and patients with severe cerebellar anatomy. Under the first imaging protocol, we show that Non-Local SIMPLE outperforms previous gold-standard segmentation techniques. In the second imaging protocol, we show that Non-Local SIMPLE outperforms previous gold standard techniques but is outperformed by a non-locally weighted vote with the deeper population of atlases available. This work advances the state of the art in open source cerebellar segmentation algorithms and offers the opportunity for routinely including cerebellar segmentation in magnetic resonance imaging studies that acquire whole brain T1-weighted volumes with approximately 1 mm isotropic resolution.
Optimal lattice-structured materials
Messner, Mark C.
2016-07-09
This paper describes a method for optimizing the mesostructure of lattice-structured materials. These materials are periodic arrays of slender members resembling efficient, lightweight macroscale structures like bridges and frame buildings. Current additive manufacturing technologies can assemble lattice structures with length scales ranging from nanometers to millimeters. Previous work demonstrates that lattice materials have excellent stiffness- and strength-to-weight scaling, outperforming natural materials. However, there are currently no methods for producing optimal mesostructures that consider the full space of possible 3D lattice topologies. The inverse homogenization approach for optimizing the periodic structure of lattice materials requires a parameterized, homogenized material model describingmore » the response of an arbitrary structure. This work develops such a model, starting with a method for describing the long-wavelength, macroscale deformation of an arbitrary lattice. The work combines the homogenized model with a parameterized description of the total design space to generate a parameterized model. Finally, the work describes an optimization method capable of producing optimal mesostructures. Several examples demonstrate the optimization method. One of these examples produces an elastically isotropic, maximally stiff structure, here called the isotruss, that arguably outperforms the anisotropic octet truss topology.« less
Koutsoukas, Alexios; Monaghan, Keith J; Li, Xiaoli; Huan, Jun
2017-06-28
In recent years, research in artificial neural networks has resurged, now under the deep-learning umbrella, and grown extremely popular. Recently reported success of DL techniques in crowd-sourced QSAR and predictive toxicology competitions has showcased these methods as powerful tools in drug-discovery and toxicology research. The aim of this work was dual, first large number of hyper-parameter configurations were explored to investigate how they affect the performance of DNNs and could act as starting points when tuning DNNs and second their performance was compared to popular methods widely employed in the field of cheminformatics namely Naïve Bayes, k-nearest neighbor, random forest and support vector machines. Moreover, robustness of machine learning methods to different levels of artificially introduced noise was assessed. The open-source Caffe deep-learning framework and modern NVidia GPU units were utilized to carry out this study, allowing large number of DNN configurations to be explored. We show that feed-forward deep neural networks are capable of achieving strong classification performance and outperform shallow methods across diverse activity classes when optimized. Hyper-parameters that were found to play critical role are the activation function, dropout regularization, number hidden layers and number of neurons. When compared to the rest methods, tuned DNNs were found to statistically outperform, with p value <0.01 based on Wilcoxon statistical test. DNN achieved on average MCC units of 0.149 higher than NB, 0.092 than kNN, 0.052 than SVM with linear kernel, 0.021 than RF and finally 0.009 higher than SVM with radial basis function kernel. When exploring robustness to noise, non-linear methods were found to perform well when dealing with low levels of noise, lower than or equal to 20%, however when dealing with higher levels of noise, higher than 30%, the Naïve Bayes method was found to perform well and even outperform at the highest level of noise 50% more sophisticated methods across several datasets.
Fringe pattern demodulation with a two-dimensional digital phase-locked loop algorithm.
Gdeisat, Munther A; Burton, David R; Lalor, Michael J
2002-09-10
A novel technique called a two-dimensional digital phase-locked loop (DPLL) for fringe pattern demodulation is presented. This algorithm is more suitable for demodulation of fringe patterns with varying phase in two directions than the existing DPLL techniques that assume that the phase of the fringe patterns varies only in one direction. The two-dimensional DPLL technique assumes that the phase of a fringe pattern is continuous in both directions and takes advantage of the phase continuity; consequently, the algorithm has better noise performance than the existing DPLL schemes. The two-dimensional DPLL algorithm is also suitable for demodulation of fringe patterns with low sampling rates, and it outperforms the Fourier fringe analysis technique in this aspect.
Transductive multi-view zero-shot learning.
Fu, Yanwei; Hospedales, Timothy M; Xiang, Tao; Gong, Shaogang
2015-11-01
Most existing zero-shot learning approaches exploit transfer learning via an intermediate semantic representation shared between an annotated auxiliary dataset and a target dataset with different classes and no annotation. A projection from a low-level feature space to the semantic representation space is learned from the auxiliary dataset and applied without adaptation to the target dataset. In this paper we identify two inherent limitations with these approaches. First, due to having disjoint and potentially unrelated classes, the projection functions learned from the auxiliary dataset/domain are biased when applied directly to the target dataset/domain. We call this problem the projection domain shift problem and propose a novel framework, transductive multi-view embedding, to solve it. The second limitation is the prototype sparsity problem which refers to the fact that for each target class, only a single prototype is available for zero-shot learning given a semantic representation. To overcome this problem, a novel heterogeneous multi-view hypergraph label propagation method is formulated for zero-shot learning in the transductive embedding space. It effectively exploits the complementary information offered by different semantic representations and takes advantage of the manifold structures of multiple representation spaces in a coherent manner. We demonstrate through extensive experiments that the proposed approach (1) rectifies the projection shift between the auxiliary and target domains, (2) exploits the complementarity of multiple semantic representations, (3) significantly outperforms existing methods for both zero-shot and N-shot recognition on three image and video benchmark datasets, and (4) enables novel cross-view annotation tasks.
Application of a New Resampling Method to SEM: A Comparison of S-SMART with the Bootstrap
ERIC Educational Resources Information Center
Bai, Haiyan; Sivo, Stephen A.; Pan, Wei; Fan, Xitao
2016-01-01
Among the commonly used resampling methods of dealing with small-sample problems, the bootstrap enjoys the widest applications because it often outperforms its counterparts. However, the bootstrap still has limitations when its operations are contemplated. Therefore, the purpose of this study is to examine an alternative, new resampling method…
Resampling and Distribution of the Product Methods for Testing Indirect Effects in Complex Models
ERIC Educational Resources Information Center
Williams, Jason; MacKinnon, David P.
2008-01-01
Recent advances in testing mediation have found that certain resampling methods and tests based on the mathematical distribution of 2 normal random variables substantially outperform the traditional "z" test. However, these studies have primarily focused only on models with a single mediator and 2 component paths. To address this limitation, a…
Abbas, Ahmed; Guo, Xianrong; Jing, Bing-Yi; Gao, Xin
2014-06-01
Despite significant advances in automated nuclear magnetic resonance-based protein structure determination, the high numbers of false positives and false negatives among the peaks selected by fully automated methods remain a problem. These false positives and negatives impair the performance of resonance assignment methods. One of the main reasons for this problem is that the computational research community often considers peak picking and resonance assignment to be two separate problems, whereas spectroscopists use expert knowledge to pick peaks and assign their resonances at the same time. We propose a novel framework that simultaneously conducts slice picking and spin system forming, an essential step in resonance assignment. Our framework then employs a genetic algorithm, directed by both connectivity information and amino acid typing information from the spin systems, to assign the spin systems to residues. The inputs to our framework can be as few as two commonly used spectra, i.e., CBCA(CO)NH and HNCACB. Different from the existing peak picking and resonance assignment methods that treat peaks as the units, our method is based on 'slices', which are one-dimensional vectors in three-dimensional spectra that correspond to certain ([Formula: see text]) values. Experimental results on both benchmark simulated data sets and four real protein data sets demonstrate that our method significantly outperforms the state-of-the-art methods while using a less number of spectra than those methods. Our method is freely available at http://sfb.kaust.edu.sa/Pages/Software.aspx.
Radar studies of the atmosphere using spatial and frequency diversity
NASA Astrophysics Data System (ADS)
Yu, Tian-You
This work provides results from a thorough investigation of atmospheric radar imaging including theory, numerical simulations, observational verification, and applications. The theory is generalized to include the existing imaging techniques of coherent radar imaging (CRI) and range imaging (RIM), which are shown to be special cases of three-dimensional imaging (3D Imaging). Mathematically, the problem of atmospheric radar imaging is posed as an inverse problem. In this study, the Fourier, Capon, and maximum entropy (MaxEnt) methods are proposed to solve the inverse problem. After the introduction of the theory, numerical simulations are used to test, validate, and exercise these techniques. Statistical comparisons of the three methods of atmospheric radar imaging are presented for various signal-to-noise ratio (SNR), receiver configuration, and frequency sampling. The MaxEnt method is shown to generally possess the best performance for low SNR. The performance of the Capon method approaches the performance of the MaxEnt method for high SNR. In limited cases, the Capon method actually outperforms the MaxEnt method. The Fourier method generally tends to distort the model structure due to its limited resolution. Experimental justification of CRI and RIM is accomplished using the Middle and Upper (MU) Atmosphere Radar in Japan and the SOUnding SYstem (SOUSY) in Germany, respectively. A special application of CRI to the observation of polar mesosphere summer echoes (PMSE) is used to show direct evidence of wave steepening and possibly explain gravity wave variations associated with PMSE.
Peng, Wei; Wang, Jianxin; Cheng, Yingjiao; Lu, Yu; Wu, Fangxiang; Pan, Yi
2015-01-01
Prediction of essential proteins which are crucial to an organism's survival is important for disease analysis and drug design, as well as the understanding of cellular life. The majority of prediction methods infer the possibility of proteins to be essential by using the network topology. However, these methods are limited to the completeness of available protein-protein interaction (PPI) data and depend on the network accuracy. To overcome these limitations, some computational methods have been proposed. However, seldom of them solve this problem by taking consideration of protein domains. In this work, we first analyze the correlation between the essentiality of proteins and their domain features based on data of 13 species. We find that the proteins containing more protein domain types which rarely occur in other proteins tend to be essential. Accordingly, we propose a new prediction method, named UDoNC, by combining the domain features of proteins with their topological properties in PPI network. In UDoNC, the essentiality of proteins is decided by the number and the frequency of their protein domain types, as well as the essentiality of their adjacent edges measured by edge clustering coefficient. The experimental results on S. cerevisiae data show that UDoNC outperforms other existing methods in terms of area under the curve (AUC). Additionally, UDoNC can also perform well in predicting essential proteins on data of E. coli.
NDRC: A Disease-Causing Genes Prioritized Method Based on Network Diffusion and Rank Concordance.
Fang, Minghong; Hu, Xiaohua; Wang, Yan; Zhao, Junmin; Shen, Xianjun; He, Tingting
2015-07-01
Disease-causing genes prioritization is very important to understand disease mechanisms and biomedical applications, such as design of drugs. Previous studies have shown that promising candidate genes are mostly ranked according to their relatedness to known disease genes or closely related disease genes. Therefore, a dangling gene (isolated gene) with no edges in the network can not be effectively prioritized. These approaches tend to prioritize those genes that are highly connected in the PPI network while perform poorly when they are applied to loosely connected disease genes. To address these problems, we propose a new disease-causing genes prioritization method that based on network diffusion and rank concordance (NDRC). The method is evaluated by leave-one-out cross validation on 1931 diseases in which at least one gene is known to be involved, and it is able to rank the true causal gene first in 849 of all 2542 cases. The experimental results suggest that NDRC significantly outperforms other existing methods such as RWR, VAVIEN, DADA and PRINCE on identifying loosely connected disease genes and successfully put dangling genes as potential candidate disease genes. Furthermore, we apply NDRC method to study three representative diseases, Meckel syndrome 1, Protein C deficiency and Peroxisome biogenesis disorder 1A (Zellweger). Our study has also found that certain complex disease-causing genes can be divided into several modules that are closely associated with different disease phenotype.
Local Higher-Order Graph Clustering
Yin, Hao; Benson, Austin R.; Leskovec, Jure; Gleich, David F.
2018-01-01
Local graph clustering methods aim to find a cluster of nodes by exploring a small region of the graph. These methods are attractive because they enable targeted clustering around a given seed node and are faster than traditional global graph clustering methods because their runtime does not depend on the size of the input graph. However, current local graph partitioning methods are not designed to account for the higher-order structures crucial to the network, nor can they effectively handle directed networks. Here we introduce a new class of local graph clustering methods that address these issues by incorporating higher-order network information captured by small subgraphs, also called network motifs. We develop the Motif-based Approximate Personalized PageRank (MAPPR) algorithm that finds clusters containing a seed node with minimal motif conductance, a generalization of the conductance metric for network motifs. We generalize existing theory to prove the fast running time (independent of the size of the graph) and obtain theoretical guarantees on the cluster quality (in terms of motif conductance). We also develop a theory of node neighborhoods for finding sets that have small motif conductance, and apply these results to the case of finding good seed nodes to use as input to the MAPPR algorithm. Experimental validation on community detection tasks in both synthetic and real-world networks, shows that our new framework MAPPR outperforms the current edge-based personalized PageRank methodology. PMID:29770258
Rectification of curved document images based on single view three-dimensional reconstruction.
Kang, Lai; Wei, Yingmei; Jiang, Jie; Bai, Liang; Lao, Songyang
2016-10-01
Since distortions in camera-captured document images significantly affect the accuracy of optical character recognition (OCR), distortion removal plays a critical role for document digitalization systems using a camera for image capturing. This paper proposes a novel framework that performs three-dimensional (3D) reconstruction and rectification of camera-captured document images. While most existing methods rely on additional calibrated hardware or multiple images to recover the 3D shape of a document page, or make a simple but not always valid assumption on the corresponding 3D shape, our framework is more flexible and practical since it only requires a single input image and is able to handle a general locally smooth document surface. The main contributions of this paper include a new iterative refinement scheme for baseline fitting from connected components of text line, an efficient discrete vertical text direction estimation algorithm based on convex hull projection profile analysis, and a 2D distortion grid construction method based on text direction function estimation using 3D regularization. In order to examine the performance of our proposed method, both qualitative and quantitative evaluation and comparison with several recent methods are conducted in our experiments. The experimental results demonstrate that the proposed method outperforms relevant approaches for camera-captured document image rectification, in terms of improvements on both visual distortion removal and OCR accuracy.
Li, Xinpeng; Li, Hong; Liu, Yun; Xiong, Wei; Fang, Sheng
2018-03-05
The release rate of atmospheric radionuclide emissions is a critical factor in the emergency response to nuclear accidents. However, there are unavoidable biases in radionuclide transport models, leading to inaccurate estimates. In this study, a method that simultaneously corrects these biases and estimates the release rate is developed. Our approach provides a more complete measurement-by-measurement correction of the biases with a coefficient matrix that considers both deterministic and stochastic deviations. This matrix and the release rate are jointly solved by the alternating minimization algorithm. The proposed method is generic because it does not rely on specific features of transport models or scenarios. It is validated against wind tunnel experiments that simulate accidental releases in a heterogonous and densely built nuclear power plant site. The sensitivities to the position, number, and quality of measurements and extendibility of the method are also investigated. The results demonstrate that this method effectively corrects the model biases, and therefore outperforms Tikhonov's method in both release rate estimation and model prediction. The proposed approach is robust to uncertainties and extendible with various center estimators, thus providing a flexible framework for robust source inversion in real accidents, even if large uncertainties exist in multiple factors. Copyright © 2017 Elsevier B.V. All rights reserved.
A special purpose knowledge-based face localization method
NASA Astrophysics Data System (ADS)
Hassanat, Ahmad; Jassim, Sabah
2008-04-01
This paper is concerned with face localization for visual speech recognition (VSR) system. Face detection and localization have got a great deal of attention in the last few years, because it is an essential pre-processing step in many techniques that handle or deal with faces, (e.g. age, face, gender, race and visual speech recognition). We shall present an efficient method for localization human's faces in video images captured on mobile constrained devices, under a wide variation in lighting conditions. We use a multiphase method that may include all or some of the following steps starting with image pre-processing, followed by a special purpose edge detection, then an image refinement step. The output image will be passed through a discrete wavelet decomposition procedure, and the computed LL sub-band at a certain level will be transformed into a binary image that will be scanned by using a special template to select a number of possible candidate locations. Finally, we fuse the scores from the wavelet step with scores determined by color information for the candidate location and employ a form of fuzzy logic to distinguish face from non-face locations. We shall present results of large number of experiments to demonstrate that the proposed face localization method is efficient and achieve high level of accuracy that outperforms existing general-purpose face detection methods.
MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering
Kim, Eun-Youn; Kim, Seon-Young; Ashlock, Daniel; Nam, Dougu
2009-01-01
Background Uncovering subtypes of disease from microarray samples has important clinical implications such as survival time and sensitivity of individual patients to specific therapies. Unsupervised clustering methods have been used to classify this type of data. However, most existing methods focus on clusters with compact shapes and do not reflect the geometric complexity of the high dimensional microarray clusters, which limits their performance. Results We present a cluster-number-based ensemble clustering algorithm, called MULTI-K, for microarray sample classification, which demonstrates remarkable accuracy. The method amalgamates multiple k-means runs by varying the number of clusters and identifies clusters that manifest the most robust co-memberships of elements. In addition to the original algorithm, we newly devised the entropy-plot to control the separation of singletons or small clusters. MULTI-K, unlike the simple k-means or other widely used methods, was able to capture clusters with complex and high-dimensional structures accurately. MULTI-K outperformed other methods including a recently developed ensemble clustering algorithm in tests with five simulated and eight real gene-expression data sets. Conclusion The geometric complexity of clusters should be taken into account for accurate classification of microarray data, and ensemble clustering applied to the number of clusters tackles the problem very well. The C++ code and the data sets tested are available from the authors. PMID:19698124
Finding Dantzig Selectors with a Proximity Operator based Fixed-point Algorithm
2014-11-01
experiments showed that this method usually outperforms the method in [2] in terms of CPU time while producing solutions of comparable quality. The... method proposed in [19]. To alleviate the difficulty caused by the subprob- lem without a closed form solution , a linearized ADM was proposed for the...a closed form solution , but the β-related subproblem does not and is solved approximately by using the nonmonotone gradient method in [18]. The
Positive-unlabeled learning for disease gene identification
Yang, Peng; Li, Xiao-Li; Mei, Jian-Ping; Kwoh, Chee-Keong; Ng, See-Kiong
2012-01-01
Background: Identifying disease genes from human genome is an important but challenging task in biomedical research. Machine learning methods can be applied to discover new disease genes based on the known ones. Existing machine learning methods typically use the known disease genes as the positive training set P and the unknown genes as the negative training set N (non-disease gene set does not exist) to build classifiers to identify new disease genes from the unknown genes. However, such kind of classifiers is actually built from a noisy negative set N as there can be unknown disease genes in N itself. As a result, the classifiers do not perform as well as they could be. Result: Instead of treating the unknown genes as negative examples in N, we treat them as an unlabeled set U. We design a novel positive-unlabeled (PU) learning algorithm PUDI (PU learning for disease gene identification) to build a classifier using P and U. We first partition U into four sets, namely, reliable negative set RN, likely positive set LP, likely negative set LN and weak negative set WN. The weighted support vector machines are then used to build a multi-level classifier based on the four training sets and positive training set P to identify disease genes. Our experimental results demonstrate that our proposed PUDI algorithm outperformed the existing methods significantly. Conclusion: The proposed PUDI algorithm is able to identify disease genes more accurately by treating the unknown data more appropriately as unlabeled set U instead of negative set N. Given that many machine learning problems in biomedical research do involve positive and unlabeled data instead of negative data, it is possible that the machine learning methods for these problems can be further improved by adopting PU learning methods, as we have done here for disease gene identification. Availability and implementation: The executable program and data are available at http://www1.i2r.a-star.edu.sg/∼xlli/PUDI/PUDI.html. Contact: xlli@i2r.a-star.edu.sg or yang0293@e.ntu.edu.sg Supplementary information: Supplementary Data are available at Bioinformatics online. PMID:22923290
Comparison promotes learning and transfer of relational categories.
Kurtz, Kenneth J; Boukrina, Olga; Gentner, Dedre
2013-07-01
We investigated the effect of co-presenting training items during supervised classification learning of novel relational categories. Strong evidence exists that comparison induces a structural alignment process that renders common relational structure more salient. We hypothesized that comparisons between exemplars would facilitate learning and transfer of categories that cohere around a common relational property. The effect of comparison was investigated using learning trials that elicited a separate classification response for each item in presentation pairs that could be drawn from the same or different categories. This methodology ensures consideration of both items and invites comparison through an implicit same-different judgment inherent in making the two responses. In a test phase measuring learning and transfer, the comparison group significantly outperformed a control group receiving an equivalent training session of single-item classification learning. Comparison-based learners also outperformed the control group on a test of far transfer, that is, the ability to accurately classify items from a novel domain that was relationally alike, but surface-dissimilar, to the training materials. Theoretical and applied implications of this comparison advantage are discussed. PsycINFO Database Record (c) 2013 APA, all rights reserved.
An Identity-Based Anti-Quantum Privacy-Preserving Blind Authentication in Wireless Sensor Networks.
Zhu, Hongfei; Tan, Yu-An; Zhu, Liehuang; Wang, Xianmin; Zhang, Quanxin; Li, Yuanzhang
2018-05-22
With the development of wireless sensor networks, IoT devices are crucial for the Smart City; these devices change people's lives such as e-payment and e-voting systems. However, in these two systems, the state-of-art authentication protocols based on traditional number theory cannot defeat a quantum computer attack. In order to protect user privacy and guarantee trustworthy of big data, we propose a new identity-based blind signature scheme based on number theorem research unit lattice, this scheme mainly uses a rejection sampling theorem instead of constructing a trapdoor. Meanwhile, this scheme does not depend on complex public key infrastructure and can resist quantum computer attack. Then we design an e-payment protocol using the proposed scheme. Furthermore, we prove our scheme is secure in the random oracle, and satisfies confidentiality, integrity, and non-repudiation. Finally, we demonstrate that the proposed scheme outperforms the other traditional existing identity-based blind signature schemes in signing speed and verification speed, outperforms the other lattice-based blind signature in signing speed, verification speed, and signing secret key size.
Wang, Shunfang; Liu, Shuhui
2015-12-19
An effective representation of a protein sequence plays a crucial role in protein sub-nuclear localization. The existing representations, such as dipeptide composition (DipC), pseudo-amino acid composition (PseAAC) and position specific scoring matrix (PSSM), are insufficient to represent protein sequence due to their single perspectives. Thus, this paper proposes two fusion feature representations of DipPSSM and PseAAPSSM to integrate PSSM with DipC and PseAAC, respectively. When constructing each fusion representation, we introduce the balance factors to value the importance of its components. The optimal values of the balance factors are sought by genetic algorithm. Due to the high dimensionality of the proposed representations, linear discriminant analysis (LDA) is used to find its important low dimensional structure, which is essential for classification and location prediction. The numerical experiments on two public datasets with KNN classifier and cross-validation tests showed that in terms of the common indexes of sensitivity, specificity, accuracy and MCC, the proposed fusing representations outperform the traditional representations in protein sub-nuclear localization, and the representation treated by LDA outperforms the untreated one.
An Identity-Based Anti-Quantum Privacy-Preserving Blind Authentication in Wireless Sensor Networks
Zhu, Hongfei; Tan, Yu-an; Zhu, Liehuang; Wang, Xianmin; Zhang, Quanxin; Li, Yuanzhang
2018-01-01
With the development of wireless sensor networks, IoT devices are crucial for the Smart City; these devices change people’s lives such as e-payment and e-voting systems. However, in these two systems, the state-of-art authentication protocols based on traditional number theory cannot defeat a quantum computer attack. In order to protect user privacy and guarantee trustworthy of big data, we propose a new identity-based blind signature scheme based on number theorem research unit lattice, this scheme mainly uses a rejection sampling theorem instead of constructing a trapdoor. Meanwhile, this scheme does not depend on complex public key infrastructure and can resist quantum computer attack. Then we design an e-payment protocol using the proposed scheme. Furthermore, we prove our scheme is secure in the random oracle, and satisfies confidentiality, integrity, and non-repudiation. Finally, we demonstrate that the proposed scheme outperforms the other traditional existing identity-based blind signature schemes in signing speed and verification speed, outperforms the other lattice-based blind signature in signing speed, verification speed, and signing secret key size. PMID:29789475
The Effect of Prior Knowledge and Gender on Physics Achievement
NASA Astrophysics Data System (ADS)
Stewart, John; Henderson, Rachel
2017-01-01
Gender differences on the Conceptual Survey in Electricity and Magnetism (CSEM) have been extensively studied. Ten semesters (N=1621) of CSEM data is presented showing male students outperform female students on the CSEM posttest by 5 % (p < . 001). Male students also outperform female students on qualitative in-semester test questions by 3 % (p = . 004), but no significant difference between male and female students was found on quantitative test questions. Male students enter the class with superior prior preparation in the subject and score 4 % higher on the CSEM pretest (p < . 001). If the sample is restricted to students with little prior knowledge who answer no more than 8 of the 32 questions correctly (N=822), male and female differences on the CSEM and qualitative test questions cease to be significant. This suggests no intrinsic gender bias exists in the CSEM itself and that gender differences are the result of prior preparation measured by CSEM pretest score. Gender differences between male and female students increase with pretest score. Regression analyses are presented to further explore interactions between preparation, gender, and achievement.
Wang, Shunfang; Liu, Shuhui
2015-01-01
An effective representation of a protein sequence plays a crucial role in protein sub-nuclear localization. The existing representations, such as dipeptide composition (DipC), pseudo-amino acid composition (PseAAC) and position specific scoring matrix (PSSM), are insufficient to represent protein sequence due to their single perspectives. Thus, this paper proposes two fusion feature representations of DipPSSM and PseAAPSSM to integrate PSSM with DipC and PseAAC, respectively. When constructing each fusion representation, we introduce the balance factors to value the importance of its components. The optimal values of the balance factors are sought by genetic algorithm. Due to the high dimensionality of the proposed representations, linear discriminant analysis (LDA) is used to find its important low dimensional structure, which is essential for classification and location prediction. The numerical experiments on two public datasets with KNN classifier and cross-validation tests showed that in terms of the common indexes of sensitivity, specificity, accuracy and MCC, the proposed fusing representations outperform the traditional representations in protein sub-nuclear localization, and the representation treated by LDA outperforms the untreated one. PMID:26703574