ERIC Educational Resources Information Center
Kember, David
2016-01-01
One of the major current issues in education is the question of why Chinese and East Asian students are outperforming those from Western countries. Research into the approaches to learning of Chinese students revealed the existence of intermediate approaches, combining memorising and understanding, which were distinct from rote learning. At the…
Closing the Education Gap: A Mayo Clinic Approach to Academic Achievement.
ERIC Educational Resources Information Center
Sang, Herb A.
Despite recent efforts to provide equal education, agreement exists that blacks, females, and disadvantaged students as a group are outperformed in mathematics and science by white middle-class students. To help disadvantaged students, the Duval County Public Schools (Jacksonville, Florida) have developed a "Mayo Clinic" approach to…
Regoui, Chaouki; Durand, Guillaume; Belliveau, Luc; Léger, Serge
2013-01-01
This paper presents a novel hybrid DNA encryption (HyDEn) approach that uses randomized assignments of unique error-correcting DNA Hamming code words for single characters in the extended ASCII set. HyDEn relies on custom-built quaternary codes and a private key used in the randomized assignment of code words and the cyclic permutations applied on the encoded message. Along with its ability to detect and correct errors, HyDEn equals or outperforms existing cryptographic methods and represents a promising in silico DNA steganographic approach. PMID:23984392
Towards Accurate Node-Based Detection of P2P Botnets
2014-01-01
Botnets are a serious security threat to the current Internet infrastructure. In this paper, we propose a novel direction for P2P botnet detection called node-based detection. This approach focuses on the network characteristics of individual nodes. Based on our model, we examine node's flows and extract the useful features over a given time period. We have tested our approach on real-life data sets and achieved detection rates of 99-100% and low false positives rates of 0–2%. Comparison with other similar approaches on the same data sets shows that our approach outperforms the existing approaches. PMID:25089287
Rational-operator-based depth-from-defocus approach to scene reconstruction.
Li, Ang; Staunton, Richard; Tjahjadi, Tardi
2013-09-01
This paper presents a rational-operator-based approach to depth from defocus (DfD) for the reconstruction of three-dimensional scenes from two-dimensional images, which enables fast DfD computation that is independent of scene textures. Two variants of the approach, one using the Gaussian rational operators (ROs) that are based on the Gaussian point spread function (PSF) and the second based on the generalized Gaussian PSF, are considered. A novel DfD correction method is also presented to further improve the performance of the approach. Experimental results are considered for real scenes and show that both approaches outperform existing RO-based methods.
Location Estimation of Urban Images Based on Geographical Neighborhoods
NASA Astrophysics Data System (ADS)
Huang, Jie; Lo, Sio-Long
2018-04-01
Estimating the location of an image is a challenging computer vision problem, and the recent decade has witnessed increasing research efforts towards the solution of this problem. In this paper, we propose a new approach to the location estimation of images taken in urban environments. Experiments are conducted to quantitatively compare the estimation accuracy of our approach, against three representative approaches in the existing literature, using a recently published dataset of over 150 thousand Google Street View images and 259 user uploaded images as queries. According to the experimental results, our approach outperforms three baseline approaches and shows its robustness across different distance thresholds.
Classification of hyperspectral imagery with neural networks: comparison to conventional tools
NASA Astrophysics Data System (ADS)
Merényi, Erzsébet; Farrand, William H.; Taranik, James V.; Minor, Timothy B.
2014-12-01
Efficient exploitation of hyperspectral imagery is of great importance in remote sensing. Artificial intelligence approaches have been receiving favorable reviews for classification of hyperspectral data because the complexity of such data challenges the limitations of many conventional methods. Artificial neural networks (ANNs) were shown to outperform traditional classifiers in many situations. However, studies that use the full spectral dimensionality of hyperspectral images to classify a large number of surface covers are scarce if non-existent. We advocate the need for methods that can handle the full dimensionality and a large number of classes to retain the discovery potential and the ability to discriminate classes with subtle spectral differences. We demonstrate that such a method exists in the family of ANNs. We compare the maximum likelihood, Mahalonobis distance, minimum distance, spectral angle mapper, and a hybrid ANN classifier for real hyperspectral AVIRIS data, using the full spectral resolution to map 23 cover types and using a small training set. Rigorous evaluation of the classification accuracies shows that the ANN outperforms the other methods and achieves ≈90% accuracy on test data.
Clone tag detection in distributed RFID systems.
Kamaludin, Hazalila; Mahdin, Hairulnizam; Abawajy, Jemal H
2018-01-01
Although Radio Frequency Identification (RFID) is poised to displace barcodes, security vulnerabilities pose serious challenges for global adoption of the RFID technology. Specifically, RFID tags are prone to basic cloning and counterfeiting security attacks. A successful cloning of the RFID tags in many commercial applications can lead to many serious problems such as financial losses, brand damage, safety and health of the public. With many industries such as pharmaceutical and businesses deploying RFID technology with a variety of products, it is important to tackle RFID tag cloning problem and improve the resistance of the RFID systems. To this end, we propose an approach for detecting cloned RFID tags in RFID systems with high detection accuracy and minimal overhead thus overcoming practical challenges in existing approaches. The proposed approach is based on consistency of dual hash collisions and modified count-min sketch vector. We evaluated the proposed approach through extensive experiments and compared it with existing baseline approaches in terms of execution time and detection accuracy under varying RFID tag cloning ratio. The results of the experiments show that the proposed approach outperforms the baseline approaches in cloned RFID tag detection accuracy.
View-Invariant Gait Recognition Through Genetic Template Segmentation
NASA Astrophysics Data System (ADS)
Isaac, Ebenezer R. H. P.; Elias, Susan; Rajagopalan, Srinivasan; Easwarakumar, K. S.
2017-08-01
Template-based model-free approach provides by far the most successful solution to the gait recognition problem in literature. Recent work discusses how isolating the head and leg portion of the template increase the performance of a gait recognition system making it robust against covariates like clothing and carrying conditions. However, most involve a manual definition of the boundaries. The method we propose, the genetic template segmentation (GTS), employs the genetic algorithm to automate the boundary selection process. This method was tested on the GEI, GEnI and AEI templates. GEI seems to exhibit the best result when segmented with our approach. Experimental results depict that our approach significantly outperforms the existing implementations of view-invariant gait recognition.
Jeong, Jeong-Won; Shin, Dae C; Do, Synho; Marmarelis, Vasilis Z
2006-08-01
This paper presents a novel segmentation methodology for automated classification and differentiation of soft tissues using multiband data obtained with the newly developed system of high-resolution ultrasonic transmission tomography (HUTT) for imaging biological organs. This methodology extends and combines two existing approaches: the L-level set active contour (AC) segmentation approach and the agglomerative hierarchical kappa-means approach for unsupervised clustering (UC). To prevent the trapping of the current iterative minimization AC algorithm in a local minimum, we introduce a multiresolution approach that applies the level set functions at successively increasing resolutions of the image data. The resulting AC clusters are subsequently rearranged by the UC algorithm that seeks the optimal set of clusters yielding the minimum within-cluster distances in the feature space. The presented results from Monte Carlo simulations and experimental animal-tissue data demonstrate that the proposed methodology outperforms other existing methods without depending on heuristic parameters and provides a reliable means for soft tissue differentiation in HUTT images.
EMUDRA: Ensemble of Multiple Drug Repositioning Approaches to Improve Prediction Accuracy.
Zhou, Xianxiao; Wang, Minghui; Katsyv, Igor; Irie, Hanna; Zhang, Bin
2018-04-24
Availability of large-scale genomic, epigenetic and proteomic data in complex diseases makes it possible to objectively and comprehensively identify therapeutic targets that can lead to new therapies. The Connectivity Map has been widely used to explore novel indications of existing drugs. However, the prediction accuracy of the existing methods, such as Kolmogorov-Smirnov statistic remains low. Here we present a novel high-performance drug repositioning approach that improves over the state-of-the-art methods. We first designed an expression weighted cosine method (EWCos) to minimize the influence of the uninformative expression changes and then developed an ensemble approach termed EMUDRA (Ensemble of Multiple Drug Repositioning Approaches) to integrate EWCos and three existing state-of-the-art methods. EMUDRA significantly outperformed individual drug repositioning methods when applied to simulated and independent evaluation datasets. We predicted using EMUDRA and experimentally validated an antibiotic rifabutin as an inhibitor of cell growth in triple negative breast cancer. EMUDRA can identify drugs that more effectively target disease gene signatures and will thus be a useful tool for identifying novel therapies for complex diseases and predicting new indications for existing drugs. The EMUDRA R package is available at doi:10.7303/syn11510888. bin.zhang@mssm.edu or zhangb@hotmail.com. Supplementary data are available at Bioinformatics online.
Tweaked residual convolutional network for face alignment
NASA Astrophysics Data System (ADS)
Du, Wenchao; Li, Ke; Zhao, Qijun; Zhang, Yi; Chen, Hu
2017-08-01
We propose a novel Tweaked Residual Convolutional Network approach for face alignment with two-level convolutional networks architecture. Specifically, the first-level Tweaked Convolutional Network (TCN) module predicts the landmark quickly but accurately enough as a preliminary, by taking low-resolution version of the detected face holistically as the input. The following Residual Convolutional Networks (RCN) module progressively refines the landmark by taking as input the local patch extracted around the predicted landmark, particularly, which allows the Convolutional Neural Network (CNN) to extract local shape-indexed features to fine tune landmark position. Extensive evaluations show that the proposed Tweaked Residual Convolutional Network approach outperforms existing methods.
Pooling across cells to normalize single-cell RNA sequencing data with many zero counts.
Lun, Aaron T L; Bach, Karsten; Marioni, John C
2016-04-27
Normalization of single-cell RNA sequencing data is necessary to eliminate cell-specific biases prior to downstream analyses. However, this is not straightforward for noisy single-cell data where many counts are zero. We present a novel approach where expression values are summed across pools of cells, and the summed values are used for normalization. Pool-based size factors are then deconvolved to yield cell-based factors. Our deconvolution approach outperforms existing methods for accurate normalization of cell-specific biases in simulated data. Similar behavior is observed in real data, where deconvolution improves the relevance of results of downstream analyses.
NASA Astrophysics Data System (ADS)
Mansourian, Leila; Taufik Abdullah, Muhamad; Nurliyana Abdullah, Lili; Azman, Azreen; Mustaffa, Mas Rina
2017-02-01
Pyramid Histogram of Words (PHOW), combined Bag of Visual Words (BoVW) with the spatial pyramid matching (SPM) in order to add location information to extracted features. However, different PHOW extracted from various color spaces, and they did not extract color information individually, that means they discard color information, which is an important characteristic of any image that is motivated by human vision. This article, concatenated PHOW Multi-Scale Dense Scale Invariant Feature Transform (MSDSIFT) histogram and a proposed Color histogram to improve the performance of existing image classification algorithms. Performance evaluation on several datasets proves that the new approach outperforms other existing, state-of-the-art methods.
Gene regulatory network identification from the yeast cell cycle based on a neuro-fuzzy system.
Wang, B H; Lim, J W; Lim, J S
2016-08-30
Many studies exist for reconstructing gene regulatory networks (GRNs). In this paper, we propose a method based on an advanced neuro-fuzzy system, for gene regulatory network reconstruction from microarray time-series data. This approach uses a neural network with a weighted fuzzy function to model the relationships between genes. Fuzzy rules, which determine the regulators of genes, are very simplified through this method. Additionally, a regulator selection procedure is proposed, which extracts the exact dynamic relationship between genes, using the information obtained from the weighted fuzzy function. Time-series related features are extracted from the original data to employ the characteristics of temporal data that are useful for accurate GRN reconstruction. The microarray dataset of the yeast cell cycle was used for our study. We measured the mean squared prediction error for the efficiency of the proposed approach and evaluated the accuracy in terms of precision, sensitivity, and F-score. The proposed method outperformed the other existing approaches.
Clone tag detection in distributed RFID systems
Kamaludin, Hazalila; Mahdin, Hairulnizam
2018-01-01
Although Radio Frequency Identification (RFID) is poised to displace barcodes, security vulnerabilities pose serious challenges for global adoption of the RFID technology. Specifically, RFID tags are prone to basic cloning and counterfeiting security attacks. A successful cloning of the RFID tags in many commercial applications can lead to many serious problems such as financial losses, brand damage, safety and health of the public. With many industries such as pharmaceutical and businesses deploying RFID technology with a variety of products, it is important to tackle RFID tag cloning problem and improve the resistance of the RFID systems. To this end, we propose an approach for detecting cloned RFID tags in RFID systems with high detection accuracy and minimal overhead thus overcoming practical challenges in existing approaches. The proposed approach is based on consistency of dual hash collisions and modified count-min sketch vector. We evaluated the proposed approach through extensive experiments and compared it with existing baseline approaches in terms of execution time and detection accuracy under varying RFID tag cloning ratio. The results of the experiments show that the proposed approach outperforms the baseline approaches in cloned RFID tag detection accuracy. PMID:29565982
Prediction of Patient-Controlled Analgesic Consumption: A Multimodel Regression Tree Approach.
Hu, Yuh-Jyh; Ku, Tien-Hsiung; Yang, Yu-Hung; Shen, Jia-Ying
2018-01-01
Several factors contribute to individual variability in postoperative pain, therefore, individuals consume postoperative analgesics at different rates. Although many statistical studies have analyzed postoperative pain and analgesic consumption, most have identified only the correlation and have not subjected the statistical model to further tests in order to evaluate its predictive accuracy. In this study involving 3052 patients, a multistrategy computational approach was developed for analgesic consumption prediction. This approach uses data on patient-controlled analgesia demand behavior over time and combines clustering, classification, and regression to mitigate the limitations of current statistical models. Cross-validation results indicated that the proposed approach significantly outperforms various existing regression methods. Moreover, a comparison between the predictions by anesthesiologists and medical specialists and those of the computational approach for an independent test data set of 60 patients further evidenced the superiority of the computational approach in predicting analgesic consumption because it produced markedly lower root mean squared errors.
Walking on a user similarity network towards personalized recommendations.
Gan, Mingxin
2014-01-01
Personalized recommender systems have been receiving more and more attention in addressing the serious problem of information overload accompanying the rapid evolution of the world-wide-web. Although traditional collaborative filtering approaches based on similarities between users have achieved remarkable success, it has been shown that the existence of popular objects may adversely influence the correct scoring of candidate objects, which lead to unreasonable recommendation results. Meanwhile, recent advances have demonstrated that approaches based on diffusion and random walk processes exhibit superior performance over collaborative filtering methods in both the recommendation accuracy and diversity. Building on these results, we adopt three strategies (power-law adjustment, nearest neighbor, and threshold filtration) to adjust a user similarity network from user similarity scores calculated on historical data, and then propose a random walk with restart model on the constructed network to achieve personalized recommendations. We perform cross-validation experiments on two real data sets (MovieLens and Netflix) and compare the performance of our method against the existing state-of-the-art methods. Results show that our method outperforms existing methods in not only recommendation accuracy and diversity, but also retrieval performance.
Inferring epidemiological parameters from phylogenies using regression-ABC: A comparative study
Gascuel, Olivier
2017-01-01
Inferring epidemiological parameters such as the R0 from time-scaled phylogenies is a timely challenge. Most current approaches rely on likelihood functions, which raise specific issues that range from computing these functions to finding their maxima numerically. Here, we present a new regression-based Approximate Bayesian Computation (ABC) approach, which we base on a large variety of summary statistics intended to capture the information contained in the phylogeny and its corresponding lineage-through-time plot. The regression step involves the Least Absolute Shrinkage and Selection Operator (LASSO) method, which is a robust machine learning technique. It allows us to readily deal with the large number of summary statistics, while avoiding resorting to Markov Chain Monte Carlo (MCMC) techniques. To compare our approach to existing ones, we simulated target trees under a variety of epidemiological models and settings, and inferred parameters of interest using the same priors. We found that, for large phylogenies, the accuracy of our regression-ABC is comparable to that of likelihood-based approaches involving birth-death processes implemented in BEAST2. Our approach even outperformed these when inferring the host population size with a Susceptible-Infected-Removed epidemiological model. It also clearly outperformed a recent kernel-ABC approach when assuming a Susceptible-Infected epidemiological model with two host types. Lastly, by re-analyzing data from the early stages of the recent Ebola epidemic in Sierra Leone, we showed that regression-ABC provides more realistic estimates for the duration parameters (latency and infectiousness) than the likelihood-based method. Overall, ABC based on a large variety of summary statistics and a regression method able to perform variable selection and avoid overfitting is a promising approach to analyze large phylogenies. PMID:28263987
Cross-View Action Recognition via Transferable Dictionary Learning.
Zheng, Jingjing; Jiang, Zhuolin; Chellappa, Rama
2016-05-01
Discriminative appearance features are effective for recognizing actions in a fixed view, but may not generalize well to a new view. In this paper, we present two effective approaches to learn dictionaries for robust action recognition across views. In the first approach, we learn a set of view-specific dictionaries where each dictionary corresponds to one camera view. These dictionaries are learned simultaneously from the sets of correspondence videos taken at different views with the aim of encouraging each video in the set to have the same sparse representation. In the second approach, we additionally learn a common dictionary shared by different views to model view-shared features. This approach represents the videos in each view using a view-specific dictionary and the common dictionary. More importantly, it encourages the set of videos taken from the different views of the same action to have the similar sparse representations. The learned common dictionary not only has the capability to represent actions from unseen views, but also makes our approach effective in a semi-supervised setting where no correspondence videos exist and only a few labeled videos exist in the target view. The extensive experiments using three public datasets demonstrate that the proposed approach outperforms recently developed approaches for cross-view action recognition.
A new distributed systems scheduling algorithm: a swarm intelligence approach
NASA Astrophysics Data System (ADS)
Haghi Kashani, Mostafa; Sarvizadeh, Raheleh; Jameii, Mahdi
2011-12-01
The scheduling problem in distributed systems is known as an NP-complete problem, and methods based on heuristic or metaheuristic search have been proposed to obtain optimal and suboptimal solutions. The task scheduling is a key factor for distributed systems to gain better performance. In this paper, an efficient method based on memetic algorithm is developed to solve the problem of distributed systems scheduling. With regard to load balancing efficiently, Artificial Bee Colony (ABC) has been applied as local search in the proposed memetic algorithm. The proposed method has been compared to existing memetic-Based approach in which Learning Automata method has been used as local search. The results demonstrated that the proposed method outperform the above mentioned method in terms of communication cost.
Body-Earth Mover's Distance: A Matching-Based Approach for Sleep Posture Recognition.
Xu, Xiaowei; Lin, Feng; Wang, Aosen; Hu, Yu; Huang, Ming-Chun; Xu, Wenyao
2016-10-01
Sleep posture is a key component in sleep quality assessment and pressure ulcer prevention. Currently, body pressure analysis has been a popular method for sleep posture recognition. In this paper, a matching-based approach, Body-Earth Mover's Distance (BEMD), for sleep posture recognition is proposed. BEMD treats pressure images as weighted 2D shapes, and combines EMD and Euclidean distance for similarity measure. Compared with existing work, sleep posture recognition is achieved with posture similarity rather than multiple features for specific postures. A pilot study is performed with 14 persons for six different postures. The experimental results show that the proposed BEMD can achieve 91.21% accuracy, which outperforms the previous method with an improvement of 8.01%.
An artificial bioindicator system for network intrusion detection.
Blum, Christian; Lozano, José A; Davidson, Pedro Pinacho
An artificial bioindicator system is developed in order to solve a network intrusion detection problem. The system, inspired by an ecological approach to biological immune systems, evolves a population of agents that learn to survive in their environment. An adaptation process allows the transformation of the agent population into a bioindicator that is capable of reacting to system anomalies. Two characteristics stand out in our proposal. On the one hand, it is able to discover new, previously unseen attacks, and on the other hand, contrary to most of the existing systems for network intrusion detection, it does not need any previous training. We experimentally compare our proposal with three state-of-the-art algorithms and show that it outperforms the competing approaches on widely used benchmark data.
Automatic Authorship Detection Using Textual Patterns Extracted from Integrated Syntactic Graphs
Gómez-Adorno, Helena; Sidorov, Grigori; Pinto, David; Vilariño, Darnes; Gelbukh, Alexander
2016-01-01
We apply the integrated syntactic graph feature extraction methodology to the task of automatic authorship detection. This graph-based representation allows integrating different levels of language description into a single structure. We extract textual patterns based on features obtained from shortest path walks over integrated syntactic graphs and apply them to determine the authors of documents. On average, our method outperforms the state of the art approaches and gives consistently high results across different corpora, unlike existing methods. Our results show that our textual patterns are useful for the task of authorship attribution. PMID:27589740
A comparison of regional flood frequency analysis approaches in a simulation framework
NASA Astrophysics Data System (ADS)
Ganora, D.; Laio, F.
2016-07-01
Regional frequency analysis (RFA) is a well-established methodology to provide an estimate of the flood frequency curve at ungauged (or scarcely gauged) sites. Different RFA approaches exist, depending on the way the information is transferred to the site of interest, but it is not clear in the literature if a specific method systematically outperforms the others. The aim of this study is to provide a framework wherein carrying out the intercomparison by building up a virtual environment based on synthetically generated data. The considered regional approaches include: (i) a unique regional curve for the whole region; (ii) a multiple-region model where homogeneous subregions are determined through cluster analysis; (iii) a Region-of-Influence model which defines a homogeneous subregion for each site; (iv) a spatially smooth estimation procedure where the parameters of the regional model vary continuously along the space. Virtual environments are generated considering different patterns of heterogeneity, including step change and smooth variations. If the region is heterogeneous, with the parent distribution changing continuously within the region, the spatially smooth regional approach outperforms the others, with overall errors 10-50% lower than the other methods. In the case of a step-change, the spatially smooth and clustering procedures perform similarly if the heterogeneity is moderate, while clustering procedures work better when the step-change is severe. To extend our findings, an extensive sensitivity analysis has been performed to investigate the effect of sample length, number of virtual stations, return period of the predicted quantile, variability of the scale parameter of the parent distribution, number of predictor variables and different parent distribution. Overall, the spatially smooth approach appears as the most robust approach as its performances are more stable across different patterns of heterogeneity, especially when short records are considered.
Fast Object Motion Estimation Based on Dynamic Stixels.
Morales, Néstor; Morell, Antonio; Toledo, Jonay; Acosta, Leopoldo
2016-07-28
The stixel world is a simplification of the world in which obstacles are represented as vertical instances, called stixels, standing on a surface assumed to be planar. In this paper, previous approaches for stixel tracking are extended using a two-level scheme. In the first level, stixels are tracked by matching them between frames using a bipartite graph in which edges represent a matching cost function. Then, stixels are clustered into sets representing objects in the environment. These objects are matched based on the number of stixels paired inside them. Furthermore, a faster, but less accurate approach is proposed in which only the second level is used. Several configurations of our method are compared to an existing state-of-the-art approach to show how our methodology outperforms it in several areas, including an improvement in the quality of the depth reconstruction.
A Hyper-Heuristic Ensemble Method for Static Job-Shop Scheduling.
Hart, Emma; Sim, Kevin
2016-01-01
We describe a new hyper-heuristic method NELLI-GP for solving job-shop scheduling problems (JSSP) that evolves an ensemble of heuristics. The ensemble adopts a divide-and-conquer approach in which each heuristic solves a unique subset of the instance set considered. NELLI-GP extends an existing ensemble method called NELLI by introducing a novel heuristic generator that evolves heuristics composed of linear sequences of dispatching rules: each rule is represented using a tree structure and is itself evolved. Following a training period, the ensemble is shown to outperform both existing dispatching rules and a standard genetic programming algorithm on a large set of new test instances. In addition, it obtains superior results on a set of 210 benchmark problems from the literature when compared to two state-of-the-art hyper-heuristic approaches. Further analysis of the relationship between heuristics in the evolved ensemble and the instances each solves provides new insights into features that might describe similar instances.
RootGraph: a graphic optimization tool for automated image analysis of plant roots
Cai, Jinhai; Zeng, Zhanghui; Connor, Jason N.; Huang, Chun Yuan; Melino, Vanessa; Kumar, Pankaj; Miklavcic, Stanley J.
2015-01-01
This paper outlines a numerical scheme for accurate, detailed, and high-throughput image analysis of plant roots. In contrast to existing root image analysis tools that focus on root system-average traits, a novel, fully automated and robust approach for the detailed characterization of root traits, based on a graph optimization process is presented. The scheme, firstly, distinguishes primary roots from lateral roots and, secondly, quantifies a broad spectrum of root traits for each identified primary and lateral root. Thirdly, it associates lateral roots and their properties with the specific primary root from which the laterals emerge. The performance of this approach was evaluated through comparisons with other automated and semi-automated software solutions as well as against results based on manual measurements. The comparisons and subsequent application of the algorithm to an array of experimental data demonstrate that this method outperforms existing methods in terms of accuracy, robustness, and the ability to process root images under high-throughput conditions. PMID:26224880
Towards an Automated Acoustic Detection System for Free Ranging Elephants.
Zeppelzauer, Matthias; Hensman, Sean; Stoeger, Angela S
The human-elephant conflict is one of the most serious conservation problems in Asia and Africa today. The involuntary confrontation of humans and elephants claims the lives of many animals and humans every year. A promising approach to alleviate this conflict is the development of an acoustic early warning system. Such a system requires the robust automated detection of elephant vocalizations under unconstrained field conditions. Today, no system exists that fulfills these requirements. In this paper, we present a method for the automated detection of elephant vocalizations that is robust to the diverse noise sources present in the field. We evaluate the method on a dataset recorded under natural field conditions to simulate a real-world scenario. The proposed method outperformed existing approaches and robustly and accurately detected elephants. It thus can form the basis for a future automated early warning system for elephants. Furthermore, the method may be a useful tool for scientists in bioacoustics for the study of wildlife recordings.
A Graph-Embedding Approach to Hierarchical Visual Word Mergence.
Wang, Lei; Liu, Lingqiao; Zhou, Luping
2017-02-01
Appropriately merging visual words are an effective dimension reduction method for the bag-of-visual-words model in image classification. The approach of hierarchically merging visual words has been extensively employed, because it gives a fully determined merging hierarchy. Existing supervised hierarchical merging methods take different approaches and realize the merging process with various formulations. In this paper, we propose a unified hierarchical merging approach built upon the graph-embedding framework. Our approach is able to merge visual words for any scenario, where a preferred structure and an undesired structure are defined, and, therefore, can effectively attend to all kinds of requirements for the word-merging process. In terms of computational efficiency, we show that our algorithm can seamlessly integrate a fast search strategy developed in our previous work and, thus, well maintain the state-of-the-art merging speed. To the best of our survey, the proposed approach is the first one that addresses the hierarchical visual word mergence in such a flexible and unified manner. As demonstrated, it can maintain excellent image classification performance even after a significant dimension reduction, and outperform all the existing comparable visual word-merging methods. In a broad sense, our work provides an open platform for applying, evaluating, and developing new criteria for hierarchical word-merging tasks.
Link-Based Similarity Measures Using Reachability Vectors
Yoon, Seok-Ho; Kim, Ji-Soo; Ryu, Minsoo; Choi, Ho-Jin
2014-01-01
We present a novel approach for computing link-based similarities among objects accurately by utilizing the link information pertaining to the objects involved. We discuss the problems with previous link-based similarity measures and propose a novel approach for computing link based similarities that does not suffer from these problems. In the proposed approach each target object is represented by a vector. Each element of the vector corresponds to all the objects in the given data, and the value of each element denotes the weight for the corresponding object. As for this weight value, we propose to utilize the probability of reaching from the target object to the specific object, computed using the “Random Walk with Restart” strategy. Then, we define the similarity between two objects as the cosine similarity of the two vectors. In this paper, we provide examples to show that our approach does not suffer from the aforementioned problems. We also evaluate the performance of the proposed methods in comparison with existing link-based measures, qualitatively and quantitatively, with respect to two kinds of data sets, scientific papers and Web documents. Our experimental results indicate that the proposed methods significantly outperform the existing measures. PMID:24701188
Drug-target interaction prediction: A Bayesian ranking approach.
Peska, Ladislav; Buza, Krisztian; Koller, Júlia
2017-12-01
In silico prediction of drug-target interactions (DTI) could provide valuable information and speed-up the process of drug repositioning - finding novel usage for existing drugs. In our work, we focus on machine learning algorithms supporting drug-centric repositioning approach, which aims to find novel usage for existing or abandoned drugs. We aim at proposing a per-drug ranking-based method, which reflects the needs of drug-centric repositioning research better than conventional drug-target prediction approaches. We propose Bayesian Ranking Prediction of Drug-Target Interactions (BRDTI). The method is based on Bayesian Personalized Ranking matrix factorization (BPR) which has been shown to be an excellent approach for various preference learning tasks, however, it has not been used for DTI prediction previously. In order to successfully deal with DTI challenges, we extended BPR by proposing: (i) the incorporation of target bias, (ii) a technique to handle new drugs and (iii) content alignment to take structural similarities of drugs and targets into account. Evaluation on five benchmark datasets shows that BRDTI outperforms several state-of-the-art approaches in terms of per-drug nDCG and AUC. BRDTI results w.r.t. nDCG are 0.929, 0.953, 0.948, 0.897 and 0.690 for G-Protein Coupled Receptors (GPCR), Ion Channels (IC), Nuclear Receptors (NR), Enzymes (E) and Kinase (K) datasets respectively. Additionally, BRDTI significantly outperformed other methods (BLM-NII, WNN-GIP, NetLapRLS and CMF) w.r.t. nDCG in 17 out of 20 cases. Furthermore, BRDTI was also shown to be able to predict novel drug-target interactions not contained in the original datasets. The average recall at top-10 predicted targets for each drug was 0.762, 0.560, 1.000 and 0.404 for GPCR, IC, NR, and E datasets respectively. Based on the evaluation, we can conclude that BRDTI is an appropriate choice for researchers looking for an in silico DTI prediction technique to be used in drug-centric repositioning scenarios. BRDTI Software and supplementary materials are available online at www.ksi.mff.cuni.cz/∼peska/BRDTI. Copyright © 2017 Elsevier B.V. All rights reserved.
Staley, Dennis M.; Negri, Jacquelyn; Kean, Jason W.; Laber, Jayme L.; Tillery, Anne C.; Youberg, Ann M.
2017-01-01
Early warning of post-fire debris-flow occurrence during intense rainfall has traditionally relied upon a library of regionally specific empirical rainfall intensity–duration thresholds. Development of this library and the calculation of rainfall intensity-duration thresholds often require several years of monitoring local rainfall and hydrologic response to rainstorms, a time-consuming approach where results are often only applicable to the specific region where data were collected. Here, we present a new, fully predictive approach that utilizes rainfall, hydrologic response, and readily available geospatial data to predict rainfall intensity–duration thresholds for debris-flow generation in recently burned locations in the western United States. Unlike the traditional approach to defining regional thresholds from historical data, the proposed methodology permits the direct calculation of rainfall intensity–duration thresholds for areas where no such data exist. The thresholds calculated by this method are demonstrated to provide predictions that are of similar accuracy, and in some cases outperform, previously published regional intensity–duration thresholds. The method also provides improved predictions of debris-flow likelihood, which can be incorporated into existing approaches for post-fire debris-flow hazard assessment. Our results also provide guidance for the operational expansion of post-fire debris-flow early warning systems in areas where empirically defined regional rainfall intensity–duration thresholds do not currently exist.
NASA Astrophysics Data System (ADS)
Staley, Dennis M.; Negri, Jacquelyn A.; Kean, Jason W.; Laber, Jayme L.; Tillery, Anne C.; Youberg, Ann M.
2017-02-01
Early warning of post-fire debris-flow occurrence during intense rainfall has traditionally relied upon a library of regionally specific empirical rainfall intensity-duration thresholds. Development of this library and the calculation of rainfall intensity-duration thresholds often require several years of monitoring local rainfall and hydrologic response to rainstorms, a time-consuming approach where results are often only applicable to the specific region where data were collected. Here, we present a new, fully predictive approach that utilizes rainfall, hydrologic response, and readily available geospatial data to predict rainfall intensity-duration thresholds for debris-flow generation in recently burned locations in the western United States. Unlike the traditional approach to defining regional thresholds from historical data, the proposed methodology permits the direct calculation of rainfall intensity-duration thresholds for areas where no such data exist. The thresholds calculated by this method are demonstrated to provide predictions that are of similar accuracy, and in some cases outperform, previously published regional intensity-duration thresholds. The method also provides improved predictions of debris-flow likelihood, which can be incorporated into existing approaches for post-fire debris-flow hazard assessment. Our results also provide guidance for the operational expansion of post-fire debris-flow early warning systems in areas where empirically defined regional rainfall intensity-duration thresholds do not currently exist.
SpliceRover: Interpretable Convolutional Neural: Networks for Improved Splice Site Prediction.
Zuallaert, Jasper; Godin, Fréderic; Kim, Mijung; Soete, Arne; Saeys, Yvan; De Neve, Wesley
2018-06-21
During the last decade, improvements in high-throughput sequencing have generated a wealth of genomic data. Functionally interpreting these sequences and finding the biological signals that are hallmarks of gene function and regulation is currently mostly done using automated genome annotation platforms, which mainly rely on integrated machine learning frameworks to identify different functional sites of interest, including splice sites. Splicing is an essential step in the gene regulation process, and the correct identification of splice sites is a major cornerstone in a genome annotation system. In this paper, we present SpliceRover, a predictive deep learning approach that outperforms the state-of-the-art in splice site prediction. SpliceRover uses convolutional neural networks (CNNs), which have been shown to obtain cutting edge performance on a wide variety of prediction tasks. We adapted this approach to deal with genomic sequence inputs, and show it consistently outperforms already existing approaches, with relative improvements in prediction effectiveness of up to 80.9% when measured in terms of false discovery rate. However, a major criticism of CNNs concerns their "black box" nature, as mechanisms to obtain insight into their reasoning processes are limited. To facilitate interpretability of the SpliceRover models, we introduce an approach to visualize the biologically relevant information learnt. We show that our visualization approach is able to recover features known to be important for splice site prediction (binding motifs around the splice site, presence of polypyrimidine tracts and branch points), as well as reveal new features (e.g., several types of exclusion patterns near splice sites). SpliceRover is available as a web service. The prediction tool and instructions can be found at http://bioit2.irc.ugent.be/splicerover/. Supplementary materials are available at Bioinformatics online.
Prediction of protein-protein interaction network using a multi-objective optimization approach.
Chowdhury, Archana; Rakshit, Pratyusha; Konar, Amit
2016-06-01
Protein-Protein Interactions (PPIs) are very important as they coordinate almost all cellular processes. This paper attempts to formulate PPI prediction problem in a multi-objective optimization framework. The scoring functions for the trial solution deal with simultaneous maximization of functional similarity, strength of the domain interaction profiles, and the number of common neighbors of the proteins predicted to be interacting. The above optimization problem is solved using the proposed Firefly Algorithm with Nondominated Sorting. Experiments undertaken reveal that the proposed PPI prediction technique outperforms existing methods, including gene ontology-based Relative Specific Similarity, multi-domain-based Domain Cohesion Coupling method, domain-based Random Decision Forest method, Bagging with REP Tree, and evolutionary/swarm algorithm-based approaches, with respect to sensitivity, specificity, and F1 score.
Meher, Prabina Kumar; Sahu, Tanmaya Kumar; Rao, A R
2016-11-05
DNA barcoding is a molecular diagnostic method that allows automated and accurate identification of species based on a short and standardized fragment of DNA. To this end, an attempt has been made in this study to develop a computational approach for identifying the species by comparing its barcode with the barcode sequence of known species present in the reference library. Each barcode sequence was first mapped onto a numeric feature vector based on k-mer frequencies and then Random forest methodology was employed on the transformed dataset for species identification. The proposed approach outperformed similarity-based, tree-based, diagnostic-based approaches and found comparable with existing supervised learning based approaches in terms of species identification success rate, while compared using real and simulated datasets. Based on the proposed approach, an online web interface SPIDBAR has also been developed and made freely available at http://cabgrid.res.in:8080/spidbar/ for species identification by the taxonomists. Copyright © 2016 Elsevier B.V. All rights reserved.
Influence Function Learning in Information Diffusion Networks.
Du, Nan; Liang, Yingyu; Balcan, Maria-Florina; Song, Le
2014-06-01
Can we learn the influence of a set of people in a social network from cascades of information diffusion? This question is often addressed by a two-stage approach: first learn a diffusion model, and then calculate the influence based on the learned model. Thus, the success of this approach relies heavily on the correctness of the diffusion model which is hard to verify for real world data. In this paper, we exploit the insight that the influence functions in many diffusion models are coverage functions, and propose a novel parameterization of such functions using a convex combination of random basis functions. Moreover, we propose an efficient maximum likelihood based algorithm to learn such functions directly from cascade data, and hence bypass the need to specify a particular diffusion model in advance. We provide both theoretical and empirical analysis for our approach, showing that the proposed approach can provably learn the influence function with low sample complexity, be robust to the unknown diffusion models, and significantly outperform existing approaches in both synthetic and real world data.
A new enhanced index tracking model in portfolio optimization with sum weighted approach
NASA Astrophysics Data System (ADS)
Siew, Lam Weng; Jaaman, Saiful Hafizah; Hoe, Lam Weng
2017-04-01
Index tracking is a portfolio management which aims to construct the optimal portfolio to achieve similar return with the benchmark index return at minimum tracking error without purchasing all the stocks that make up the index. Enhanced index tracking is an improved portfolio management which aims to generate higher portfolio return than the benchmark index return besides minimizing the tracking error. The objective of this paper is to propose a new enhanced index tracking model with sum weighted approach to improve the existing index tracking model for tracking the benchmark Technology Index in Malaysia. The optimal portfolio composition and performance of both models are determined and compared in terms of portfolio mean return, tracking error and information ratio. The results of this study show that the optimal portfolio of the proposed model is able to generate higher mean return than the benchmark index at minimum tracking error. Besides that, the proposed model is able to outperform the existing model in tracking the benchmark index. The significance of this study is to propose a new enhanced index tracking model with sum weighted apporach which contributes 67% improvement on the portfolio mean return as compared to the existing model.
Walking on a User Similarity Network towards Personalized Recommendations
Gan, Mingxin
2014-01-01
Personalized recommender systems have been receiving more and more attention in addressing the serious problem of information overload accompanying the rapid evolution of the world-wide-web. Although traditional collaborative filtering approaches based on similarities between users have achieved remarkable success, it has been shown that the existence of popular objects may adversely influence the correct scoring of candidate objects, which lead to unreasonable recommendation results. Meanwhile, recent advances have demonstrated that approaches based on diffusion and random walk processes exhibit superior performance over collaborative filtering methods in both the recommendation accuracy and diversity. Building on these results, we adopt three strategies (power-law adjustment, nearest neighbor, and threshold filtration) to adjust a user similarity network from user similarity scores calculated on historical data, and then propose a random walk with restart model on the constructed network to achieve personalized recommendations. We perform cross-validation experiments on two real data sets (MovieLens and Netflix) and compare the performance of our method against the existing state-of-the-art methods. Results show that our method outperforms existing methods in not only recommendation accuracy and diversity, but also retrieval performance. PMID:25489942
Chen, Yang; Gao, Zhen; Wang, Bingcheng; Xu, Rong
2016-08-22
Glioblastoma (GBM) is the most common and aggressive brain tumors. It has poor prognosis even with optimal radio- and chemo-therapies. Since GBM is highly heterogeneous, drugs that target on specific molecular profiles of individual tumors may achieve maximized efficacy. Currently, the Cancer Genome Atlas (TCGA) projects have identified hundreds of GBM-associated genes. We develop a drug repositioning approach combining disease genomics and mouse phenotype data towards predicting targeted therapies for GBM. We first identified disease specific mouse phenotypes using the most recently discovered GBM genes. Then we systematically searched all FDA-approved drugs for candidates that share similar mouse phenotype profiles with GBM. We evaluated the ranks for approved and novel GBM drugs, and compared with an existing approach, which also use the mouse phenotype data but not the disease genomics data. We achieved significantly higher ranks for the approved and novel GBM drugs than the earlier approach. For all positive examples of GBM drugs, we achieved a median rank of 9.2 45.6 of the top predictions have been demonstrated effective in inhibiting the growth of human GBM cells. We developed a computational drug repositioning approach based on both genomic and phenotypic data. Our approach prioritized existing GBM drugs and outperformed a recent approach. Overall, our approach shows potential in discovering new targeted therapies for GBM.
Identification of hybrid node and link communities in complex networks
He, Dongxiao; Jin, Di; Chen, Zheng; Zhang, Weixiong
2015-01-01
Identifying communities in complex networks is an effective means for analyzing complex systems, with applications in diverse areas such as social science, engineering, biology and medicine. Finding communities of nodes and finding communities of links are two popular schemes for network analysis. These schemes, however, have inherent drawbacks and are inadequate to capture complex organizational structures in real networks. We introduce a new scheme and an effective approach for identifying complex mixture structures of node and link communities, called hybrid node-link communities. A central piece of our approach is a probabilistic model that accommodates node, link and hybrid node-link communities. Our extensive experiments on various real-world networks, including a large protein-protein interaction network and a large network of semantically associated words, illustrated that the scheme for hybrid communities is superior in revealing network characteristics. Moreover, the new approach outperformed the existing methods for finding node or link communities separately. PMID:25728010
Identification of hybrid node and link communities in complex networks.
He, Dongxiao; Jin, Di; Chen, Zheng; Zhang, Weixiong
2015-03-02
Identifying communities in complex networks is an effective means for analyzing complex systems, with applications in diverse areas such as social science, engineering, biology and medicine. Finding communities of nodes and finding communities of links are two popular schemes for network analysis. These schemes, however, have inherent drawbacks and are inadequate to capture complex organizational structures in real networks. We introduce a new scheme and an effective approach for identifying complex mixture structures of node and link communities, called hybrid node-link communities. A central piece of our approach is a probabilistic model that accommodates node, link and hybrid node-link communities. Our extensive experiments on various real-world networks, including a large protein-protein interaction network and a large network of semantically associated words, illustrated that the scheme for hybrid communities is superior in revealing network characteristics. Moreover, the new approach outperformed the existing methods for finding node or link communities separately.
Identification of hybrid node and link communities in complex networks
NASA Astrophysics Data System (ADS)
He, Dongxiao; Jin, Di; Chen, Zheng; Zhang, Weixiong
2015-03-01
Identifying communities in complex networks is an effective means for analyzing complex systems, with applications in diverse areas such as social science, engineering, biology and medicine. Finding communities of nodes and finding communities of links are two popular schemes for network analysis. These schemes, however, have inherent drawbacks and are inadequate to capture complex organizational structures in real networks. We introduce a new scheme and an effective approach for identifying complex mixture structures of node and link communities, called hybrid node-link communities. A central piece of our approach is a probabilistic model that accommodates node, link and hybrid node-link communities. Our extensive experiments on various real-world networks, including a large protein-protein interaction network and a large network of semantically associated words, illustrated that the scheme for hybrid communities is superior in revealing network characteristics. Moreover, the new approach outperformed the existing methods for finding node or link communities separately.
Löwe, Roland; Mikkelsen, Peter Steen; Rasmussen, Michael R; Madsen, Henrik
2013-01-01
Merging of radar rainfall data with rain gauge measurements is a common approach to overcome problems in deriving rain intensities from radar measurements. We extend an existing approach for adjustment of C-band radar data using state-space models and use the resulting rainfall intensities as input for forecasting outflow from two catchments in the Copenhagen area. Stochastic grey-box models are applied to create the runoff forecasts, providing us with not only a point forecast but also a quantification of the forecast uncertainty. Evaluating the results, we can show that using the adjusted radar data improves runoff forecasts compared with using the original radar data and that rain gauge measurements as forecast input are also outperformed. Combining the data merging approach with short-term rainfall forecasting algorithms may result in further improved runoff forecasts that can be used in real time control.
Multiplicative noise removal via a learned dictionary.
Huang, Yu-Mei; Moisan, Lionel; Ng, Michael K; Zeng, Tieyong
2012-11-01
Multiplicative noise removal is a challenging image processing problem, and most existing methods are based on the maximum a posteriori formulation and the logarithmic transformation of multiplicative denoising problems into additive denoising problems. Sparse representations of images have shown to be efficient approaches for image recovery. Following this idea, in this paper, we propose to learn a dictionary from the logarithmic transformed image, and then to use it in a variational model built for noise removal. Extensive experimental results suggest that in terms of visual quality, peak signal-to-noise ratio, and mean absolute deviation error, the proposed algorithm outperforms state-of-the-art methods.
Robust Transceiver Design for Multiuser MIMO Downlink with Channel Uncertainties
NASA Astrophysics Data System (ADS)
Miao, Wei; Li, Yunzhou; Chen, Xiang; Zhou, Shidong; Wang, Jing
This letter addresses the problem of robust transceiver design for the multiuser multiple-input-multiple-output (MIMO) downlink where the channel state information at the base station (BS) is imperfect. A stochastic approach which minimizes the expectation of the total mean square error (MSE) of the downlink conditioned on the channel estimates under a total transmit power constraint is adopted. The iterative algorithm reported in [2] is improved to handle the proposed robust optimization problem. Simulation results show that our proposed robust scheme effectively reduces the performance loss due to channel uncertainties and outperforms existing methods, especially when the channel errors of the users are different.
Joint Facial Action Unit Detection and Feature Fusion: A Multi-conditional Learning Approach.
Eleftheriadis, Stefanos; Rudovic, Ognjen; Pantic, Maja
2016-10-05
Automated analysis of facial expressions can benefit many domains, from marketing to clinical diagnosis of neurodevelopmental disorders. Facial expressions are typically encoded as a combination of facial muscle activations, i.e., action units. Depending on context, these action units co-occur in specific patterns, and rarely in isolation. Yet, most existing methods for automatic action unit detection fail to exploit dependencies among them, and the corresponding facial features. To address this, we propose a novel multi-conditional latent variable model for simultaneous fusion of facial features and joint action unit detection. Specifically, the proposed model performs feature fusion in a generative fashion via a low-dimensional shared subspace, while simultaneously performing action unit detection using a discriminative classification approach. We show that by combining the merits of both approaches, the proposed methodology outperforms existing purely discriminative/generative methods for the target task. To reduce the number of parameters, and avoid overfitting, a novel Bayesian learning approach based on Monte Carlo sampling is proposed, to integrate out the shared subspace. We validate the proposed method on posed and spontaneous data from three publicly available datasets (CK+, DISFA and Shoulder-pain), and show that both feature fusion and joint learning of action units leads to improved performance compared to the state-of-the-art methods for the task.
Multiple imputation of missing covariates for the Cox proportional hazards cure model
Beesley, Lauren J; Bartlett, Jonathan W; Wolf, Gregory T; Taylor, Jeremy M G
2016-01-01
We explore several approaches for imputing partially observed covariates when the outcome of interest is a censored event time and when there is an underlying subset of the population that will never experience the event of interest. We call these subjects “cured,” and we consider the case where the data are modeled using a Cox proportional hazards (CPH) mixture cure model. We study covariate imputation approaches using fully conditional specification (FCS). We derive the exact conditional distribution and suggest a sampling scheme for imputing partially observed covariates in the CPH cure model setting. We also propose several approximations to the exact distribution that are simpler and more convenient to use for imputation. A simulation study demonstrates that the proposed imputation approaches outperform existing imputation approaches for survival data without a cure fraction in terms of bias in estimating CPH cure model parameters. We apply our multiple imputation techniques to a study of patients with head and neck cancer. PMID:27439726
Image-guided filtering for improving photoacoustic tomographic image reconstruction.
Awasthi, Navchetan; Kalva, Sandeep Kumar; Pramanik, Manojit; Yalavarthy, Phaneendra K
2018-06-01
Several algorithms exist to solve the photoacoustic image reconstruction problem depending on the expected reconstructed image features. These reconstruction algorithms promote typically one feature, such as being smooth or sharp, in the output image. Combining these features using a guided filtering approach was attempted in this work, which requires an input and guiding image. This approach act as a postprocessing step to improve commonly used Tikhonov or total variational regularization method. The result obtained from linear backprojection was used as a guiding image to improve these results. Using both numerical and experimental phantom cases, it was shown that the proposed guided filtering approach was able to improve (as high as 11.23 dB) the signal-to-noise ratio of the reconstructed images with the added advantage being computationally efficient. This approach was compared with state-of-the-art basis pursuit deconvolution as well as standard denoising methods and shown to outperform them. (2018) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE).
Data-driven confounder selection via Markov and Bayesian networks.
Häggström, Jenny
2018-06-01
To unbiasedly estimate a causal effect on an outcome unconfoundedness is often assumed. If there is sufficient knowledge on the underlying causal structure then existing confounder selection criteria can be used to select subsets of the observed pretreatment covariates, X, sufficient for unconfoundedness, if such subsets exist. Here, estimation of these target subsets is considered when the underlying causal structure is unknown. The proposed method is to model the causal structure by a probabilistic graphical model, for example, a Markov or Bayesian network, estimate this graph from observed data and select the target subsets given the estimated graph. The approach is evaluated by simulation both in a high-dimensional setting where unconfoundedness holds given X and in a setting where unconfoundedness only holds given subsets of X. Several common target subsets are investigated and the selected subsets are compared with respect to accuracy in estimating the average causal effect. The proposed method is implemented with existing software that can easily handle high-dimensional data, in terms of large samples and large number of covariates. The results from the simulation study show that, if unconfoundedness holds given X, this approach is very successful in selecting the target subsets, outperforming alternative approaches based on random forests and LASSO, and that the subset estimating the target subset containing all causes of outcome yields smallest MSE in the average causal effect estimation. © 2017, The International Biometric Society.
Heating and flooding: A unified approach for rapid generation of free energy surfaces
NASA Astrophysics Data System (ADS)
Chen, Ming; Cuendet, Michel A.; Tuckerman, Mark E.
2012-07-01
We propose a general framework for the efficient sampling of conformational equilibria in complex systems and the generation of associated free energy hypersurfaces in terms of a set of collective variables. The method is a strategic synthesis of the adiabatic free energy dynamics approach, previously introduced by us and others, and existing schemes using Gaussian-based adaptive bias potentials to disfavor previously visited regions. In addition, we suggest sampling the thermodynamic force instead of the probability density to reconstruct the free energy hypersurface. All these elements are combined into a robust extended phase-space formalism that can be easily incorporated into existing molecular dynamics packages. The unified scheme is shown to outperform both metadynamics and adiabatic free energy dynamics in generating two-dimensional free energy surfaces for several example cases including the alanine dipeptide in the gas and aqueous phases and the met-enkephalin oligopeptide. In addition, the method can efficiently generate higher dimensional free energy landscapes, which we demonstrate by calculating a four-dimensional surface in the Ramachandran angles of the gas-phase alanine tripeptide.
COUSCOus: improved protein contact prediction using an empirical Bayes covariance estimator.
Rawi, Reda; Mall, Raghvendra; Kunji, Khalid; El Anbari, Mohammed; Aupetit, Michael; Ullah, Ehsan; Bensmail, Halima
2016-12-15
The post-genomic era with its wealth of sequences gave rise to a broad range of protein residue-residue contact detecting methods. Although various coevolution methods such as PSICOV, DCA and plmDCA provide correct contact predictions, they do not completely overlap. Hence, new approaches and improvements of existing methods are needed to motivate further development and progress in the field. We present a new contact detecting method, COUSCOus, by combining the best shrinkage approach, the empirical Bayes covariance estimator and GLasso. Using the original PSICOV benchmark dataset, COUSCOus achieves mean accuracies of 0.74, 0.62 and 0.55 for the top L/10 predicted long, medium and short range contacts, respectively. In addition, COUSCOus attains mean areas under the precision-recall curves of 0.25, 0.29 and 0.30 for long, medium and short contacts and outperforms PSICOV. We also observed that COUSCOus outperforms PSICOV w.r.t. Matthew's correlation coefficient criterion on full list of residue contacts. Furthermore, COUSCOus achieves on average 10% more gain in prediction accuracy compared to PSICOV on an independent test set composed of CASP11 protein targets. Finally, we showed that when using a simple random forest meta-classifier, by combining contact detecting techniques and sequence derived features, PSICOV predictions should be replaced by the more accurate COUSCOus predictions. We conclude that the consideration of superior covariance shrinkage approaches will boost several research fields that apply the GLasso procedure, amongst the presented one of residue-residue contact prediction as well as fields such as gene network reconstruction.
Zhao, Huaqing; Rebbeck, Timothy R; Mitra, Nandita
2009-12-01
Confounding due to population stratification (PS) arises when differences in both allele and disease frequencies exist in a population of mixed racial/ethnic subpopulations. Genomic control, structured association, principal components analysis (PCA), and multidimensional scaling (MDS) approaches have been proposed to address this bias using genetic markers. However, confounding due to PS can also be due to non-genetic factors. Propensity scores are widely used to address confounding in observational studies but have not been adapted to deal with PS in genetic association studies. We propose a genomic propensity score (GPS) approach to correct for bias due to PS that considers both genetic and non-genetic factors. We compare the GPS method with PCA and MDS using simulation studies. Our results show that GPS can adequately adjust and consistently correct for bias due to PS. Under no/mild, moderate, and severe PS, GPS yielded estimated with bias close to 0 (mean=-0.0044, standard error=0.0087). Under moderate or severe PS, the GPS method consistently outperforms the PCA method in terms of bias, coverage probability (CP), and type I error. Under moderate PS, the GPS method consistently outperforms the MDS method in terms of CP. PCA maintains relatively high power compared to both MDS and GPS methods under the simulated situations. GPS and MDS are comparable in terms of statistical properties such as bias, type I error, and power. The GPS method provides a novel and robust tool for obtaining less-biased estimates of genetic associations that can consider both genetic and non-genetic factors. 2009 Wiley-Liss, Inc.
A dictionary learning approach for Poisson image deblurring.
Ma, Liyan; Moisan, Lionel; Yu, Jian; Zeng, Tieyong
2013-07-01
The restoration of images corrupted by blur and Poisson noise is a key issue in medical and biological image processing. While most existing methods are based on variational models, generally derived from a maximum a posteriori (MAP) formulation, recently sparse representations of images have shown to be efficient approaches for image recovery. Following this idea, we propose in this paper a model containing three terms: a patch-based sparse representation prior over a learned dictionary, the pixel-based total variation regularization term and a data-fidelity term capturing the statistics of Poisson noise. The resulting optimization problem can be solved by an alternating minimization technique combined with variable splitting. Extensive experimental results suggest that in terms of visual quality, peak signal-to-noise ratio value and the method noise, the proposed algorithm outperforms state-of-the-art methods.
Discriminative Bayesian Dictionary Learning for Classification.
Akhtar, Naveed; Shafait, Faisal; Mian, Ajmal
2016-12-01
We propose a Bayesian approach to learn discriminative dictionaries for sparse representation of data. The proposed approach infers probability distributions over the atoms of a discriminative dictionary using a finite approximation of Beta Process. It also computes sets of Bernoulli distributions that associate class labels to the learned dictionary atoms. This association signifies the selection probabilities of the dictionary atoms in the expansion of class-specific data. Furthermore, the non-parametric character of the proposed approach allows it to infer the correct size of the dictionary. We exploit the aforementioned Bernoulli distributions in separately learning a linear classifier. The classifier uses the same hierarchical Bayesian model as the dictionary, which we present along the analytical inference solution for Gibbs sampling. For classification, a test instance is first sparsely encoded over the learned dictionary and the codes are fed to the classifier. We performed experiments for face and action recognition; and object and scene-category classification using five public datasets and compared the results with state-of-the-art discriminative sparse representation approaches. Experiments show that the proposed Bayesian approach consistently outperforms the existing approaches.
Joint histogram-based cost aggregation for stereo matching.
Min, Dongbo; Lu, Jiangbo; Do, Minh N
2013-10-01
This paper presents a novel method for performing efficient cost aggregation in stereo matching. The cost aggregation problem is reformulated from the perspective of a histogram, giving us the potential to reduce the complexity of the cost aggregation in stereo matching significantly. Differently from previous methods which have tried to reduce the complexity in terms of the size of an image and a matching window, our approach focuses on reducing the computational redundancy that exists among the search range, caused by a repeated filtering for all the hypotheses. Moreover, we also reduce the complexity of the window-based filtering through an efficient sampling scheme inside the matching window. The tradeoff between accuracy and complexity is extensively investigated by varying the parameters used in the proposed method. Experimental results show that the proposed method provides high-quality disparity maps with low complexity and outperforms existing local methods. This paper also provides new insights into complexity-constrained stereo-matching algorithm design.
Analysis of Genome-Wide Association Studies with Multiple Outcomes Using Penalization
Liu, Jin; Huang, Jian; Ma, Shuangge
2012-01-01
Genome-wide association studies have been extensively conducted, searching for markers for biologically meaningful outcomes and phenotypes. Penalization methods have been adopted in the analysis of the joint effects of a large number of SNPs (single nucleotide polymorphisms) and marker identification. This study is partly motivated by the analysis of heterogeneous stock mice dataset, in which multiple correlated phenotypes and a large number of SNPs are available. Existing penalization methods designed to analyze a single response variable cannot accommodate the correlation among multiple response variables. With multiple response variables sharing the same set of markers, joint modeling is first employed to accommodate the correlation. The group Lasso approach is adopted to select markers associated with all the outcome variables. An efficient computational algorithm is developed. Simulation study and analysis of the heterogeneous stock mice dataset show that the proposed method can outperform existing penalization methods. PMID:23272092
Revealing the hidden language of complex networks.
Yaveroğlu, Ömer Nebil; Malod-Dognin, Noël; Davis, Darren; Levnajic, Zoran; Janjic, Vuk; Karapandza, Rasa; Stojmirovic, Aleksandar; Pržulj, Nataša
2014-04-01
Sophisticated methods for analysing complex networks promise to be of great benefit to almost all scientific disciplines, yet they elude us. In this work, we make fundamental methodological advances to rectify this. We discover that the interaction between a small number of roles, played by nodes in a network, can characterize a network's structure and also provide a clear real-world interpretation. Given this insight, we develop a framework for analysing and comparing networks, which outperforms all existing ones. We demonstrate its strength by uncovering novel relationships between seemingly unrelated networks, such as Facebook, metabolic, and protein structure networks. We also use it to track the dynamics of the world trade network, showing that a country's role of a broker between non-trading countries indicates economic prosperity, whereas peripheral roles are associated with poverty. This result, though intuitive, has escaped all existing frameworks. Finally, our approach translates network topology into everyday language, bringing network analysis closer to domain scientists.
Extreme learning machine based optimal embedding location finder for image steganography
Aljeroudi, Yazan
2017-01-01
In image steganography, determining the optimum location for embedding the secret message precisely with minimum distortion of the host medium remains a challenging issue. Yet, an effective approach for the selection of the best embedding location with least deformation is far from being achieved. To attain this goal, we propose a novel approach for image steganography with high-performance, where extreme learning machine (ELM) algorithm is modified to create a supervised mathematical model. This ELM is first trained on a part of an image or any host medium before being tested in the regression mode. This allowed us to choose the optimal location for embedding the message with best values of the predicted evaluation metrics. Contrast, homogeneity, and other texture features are used for training on a new metric. Furthermore, the developed ELM is exploited for counter over-fitting while training. The performance of the proposed steganography approach is evaluated by computing the correlation, structural similarity (SSIM) index, fusion matrices, and mean square error (MSE). The modified ELM is found to outperform the existing approaches in terms of imperceptibility. Excellent features of the experimental results demonstrate that the proposed steganographic approach is greatly proficient for preserving the visual information of an image. An improvement in the imperceptibility as much as 28% is achieved compared to the existing state of the art methods. PMID:28196080
Sparse Group Penalized Integrative Analysis of Multiple Cancer Prognosis Datasets
Liu, Jin; Huang, Jian; Xie, Yang; Ma, Shuangge
2014-01-01
SUMMARY In cancer research, high-throughput profiling studies have been extensively conducted, searching for markers associated with prognosis. Because of the “large d, small n” characteristic, results generated from the analysis of a single dataset can be unsatisfactory. Recent studies have shown that integrative analysis, which simultaneously analyzes multiple datasets, can be more effective than single-dataset analysis and classic meta-analysis. In most of existing integrative analysis, the homogeneity model has been assumed, which postulates that different datasets share the same set of markers. Several approaches have been designed to reinforce this assumption. In practice, different datasets may differ in terms of patient selection criteria, profiling techniques, and many other aspects. Such differences may make the homogeneity model too restricted. In this study, we assume the heterogeneity model, under which different datasets are allowed to have different sets of markers. With multiple cancer prognosis datasets, we adopt the AFT (accelerated failure time) model to describe survival. This model may have the lowest computational cost among popular semiparametric survival models. For marker selection, we adopt a sparse group MCP (minimax concave penalty) approach. This approach has an intuitive formulation and can be computed using an effective group coordinate descent algorithm. Simulation study shows that it outperforms the existing approaches under both the homogeneity and heterogeneity models. Data analysis further demonstrates the merit of heterogeneity model and proposed approach. PMID:23938111
NASA Astrophysics Data System (ADS)
Salcedo-Sanz, S.
2016-10-01
Meta-heuristic algorithms are problem-solving methods which try to find good-enough solutions to very hard optimization problems, at a reasonable computation time, where classical approaches fail, or cannot even been applied. Many existing meta-heuristics approaches are nature-inspired techniques, which work by simulating or modeling different natural processes in a computer. Historically, many of the most successful meta-heuristic approaches have had a biological inspiration, such as evolutionary computation or swarm intelligence paradigms, but in the last few years new approaches based on nonlinear physics processes modeling have been proposed and applied with success. Non-linear physics processes, modeled as optimization algorithms, are able to produce completely new search procedures, with extremely effective exploration capabilities in many cases, which are able to outperform existing optimization approaches. In this paper we review the most important optimization algorithms based on nonlinear physics, how they have been constructed from specific modeling of a real phenomena, and also their novelty in terms of comparison with alternative existing algorithms for optimization. We first review important concepts on optimization problems, search spaces and problems' difficulty. Then, the usefulness of heuristics and meta-heuristics approaches to face hard optimization problems is introduced, and some of the main existing classical versions of these algorithms are reviewed. The mathematical framework of different nonlinear physics processes is then introduced as a preparatory step to review in detail the most important meta-heuristics based on them. A discussion on the novelty of these approaches, their main computational implementation and design issues, and the evaluation of a novel meta-heuristic based on Strange Attractors mutation will be carried out to complete the review of these techniques. We also describe some of the most important application areas, in broad sense, of meta-heuristics, and describe free-accessible software frameworks which can be used to make easier the implementation of these algorithms.
Machine Learning for Social Services: A Study of Prenatal Case Management in Illinois.
Pan, Ian; Nolan, Laura B; Brown, Rashida R; Khan, Romana; van der Boor, Paul; Harris, Daniel G; Ghani, Rayid
2017-06-01
To evaluate the positive predictive value of machine learning algorithms for early assessment of adverse birth risk among pregnant women as a means of improving the allocation of social services. We used administrative data for 6457 women collected by the Illinois Department of Human Services from July 2014 to May 2015 to develop a machine learning model for adverse birth prediction and improve upon the existing paper-based risk assessment. We compared different models and determined the strongest predictors of adverse birth outcomes using positive predictive value as the metric for selection. Machine learning algorithms performed similarly, outperforming the current paper-based risk assessment by up to 36%; a refined paper-based assessment outperformed the current assessment by up to 22%. We estimate that these improvements will allow 100 to 170 additional high-risk pregnant women screened for program eligibility each year to receive services that would have otherwise been unobtainable. Our analysis exhibits the potential for machine learning to move government agencies toward a more data-informed approach to evaluating risk and providing social services. Overall, such efforts will improve the efficiency of allocating resource-intensive interventions.
Automated Urban Travel Interpretation: A Bottom-up Approach for Trajectory Segmentation.
Das, Rahul Deb; Winter, Stephan
2016-11-23
Understanding travel behavior is critical for an effective urban planning as well as for enabling various context-aware service provisions to support mobility as a service (MaaS). Both applications rely on the sensor traces generated by travellers' smartphones. These traces can be used to interpret travel modes, both for generating automated travel diaries as well as for real-time travel mode detection. Current approaches segment a trajectory by certain criteria, e.g., drop in speed. However, these criteria are heuristic, and, thus, existing approaches are subjective and involve significant vagueness and uncertainty in activity transitions in space and time. Also, segmentation approaches are not suited for real time interpretation of open-ended segments, and cannot cope with the frequent gaps in the location traces. In order to address all these challenges a novel, state based bottom-up approach is proposed. This approach assumes a fixed atomic segment of a homogeneous state, instead of an event-based segment, and a progressive iteration until a new state is found. The research investigates how an atomic state-based approach can be developed in such a way that can work in real time, near-real time and offline mode and in different environmental conditions with their varying quality of sensor traces. The results show the proposed bottom-up model outperforms the existing event-based segmentation models in terms of adaptivity, flexibility, accuracy and richness in information delivery pertinent to automated travel behavior interpretation.
Automated Urban Travel Interpretation: A Bottom-up Approach for Trajectory Segmentation
Das, Rahul Deb; Winter, Stephan
2016-01-01
Understanding travel behavior is critical for an effective urban planning as well as for enabling various context-aware service provisions to support mobility as a service (MaaS). Both applications rely on the sensor traces generated by travellers’ smartphones. These traces can be used to interpret travel modes, both for generating automated travel diaries as well as for real-time travel mode detection. Current approaches segment a trajectory by certain criteria, e.g., drop in speed. However, these criteria are heuristic, and, thus, existing approaches are subjective and involve significant vagueness and uncertainty in activity transitions in space and time. Also, segmentation approaches are not suited for real time interpretation of open-ended segments, and cannot cope with the frequent gaps in the location traces. In order to address all these challenges a novel, state based bottom-up approach is proposed. This approach assumes a fixed atomic segment of a homogeneous state, instead of an event-based segment, and a progressive iteration until a new state is found. The research investigates how an atomic state-based approach can be developed in such a way that can work in real time, near-real time and offline mode and in different environmental conditions with their varying quality of sensor traces. The results show the proposed bottom-up model outperforms the existing event-based segmentation models in terms of adaptivity, flexibility, accuracy and richness in information delivery pertinent to automated travel behavior interpretation. PMID:27886053
Two hybrid compaction algorithms for the layout optimization problem.
Xiao, Ren-Bin; Xu, Yi-Chun; Amos, Martyn
2007-01-01
In this paper we present two new algorithms for the layout optimization problem: this concerns the placement of circular, weighted objects inside a circular container, the two objectives being to minimize imbalance of mass and to minimize the radius of the container. This problem carries real practical significance in industrial applications (such as the design of satellites), as well as being of significant theoretical interest. We present two nature-inspired algorithms for this problem, the first based on simulated annealing, and the second on particle swarm optimization. We compare our algorithms with the existing best-known algorithm, and show that our approaches out-perform it in terms of both solution quality and execution time.
Influence Function Learning in Information Diffusion Networks
Du, Nan; Liang, Yingyu; Balcan, Maria-Florina; Song, Le
2015-01-01
Can we learn the influence of a set of people in a social network from cascades of information diffusion? This question is often addressed by a two-stage approach: first learn a diffusion model, and then calculate the influence based on the learned model. Thus, the success of this approach relies heavily on the correctness of the diffusion model which is hard to verify for real world data. In this paper, we exploit the insight that the influence functions in many diffusion models are coverage functions, and propose a novel parameterization of such functions using a convex combination of random basis functions. Moreover, we propose an efficient maximum likelihood based algorithm to learn such functions directly from cascade data, and hence bypass the need to specify a particular diffusion model in advance. We provide both theoretical and empirical analysis for our approach, showing that the proposed approach can provably learn the influence function with low sample complexity, be robust to the unknown diffusion models, and significantly outperform existing approaches in both synthetic and real world data. PMID:25973445
NASA Astrophysics Data System (ADS)
Ahmad, Kashif; Conci, Nicola; Boato, Giulia; De Natale, Francesco G. B.
2017-11-01
Over the last few years, a rapid growth has been witnessed in the number of digital photos produced per year. This rapid process poses challenges in the organization and management of multimedia collections, and one viable solution consists of arranging the media on the basis of the underlying events. However, album-level annotation and the presence of irrelevant pictures in photo collections make event-based organization of personal photo albums a more challenging task. To tackle these challenges, in contrast to conventional approaches relying on supervised learning, we propose a pipeline for event recognition in personal photo collections relying on a multiple instance-learning (MIL) strategy. MIL is a modified form of supervised learning and fits well for such applications with weakly labeled data. The experimental evaluation of the proposed approach is carried out on two large-scale datasets including a self-collected and a benchmark dataset. On both, our approach significantly outperforms the existing state-of-the-art.
A Space Affine Matching Approach to fMRI Time Series Analysis.
Chen, Liang; Zhang, Weishi; Liu, Hongbo; Feng, Shigang; Chen, C L Philip; Wang, Huili
2016-07-01
For fMRI time series analysis, an important challenge is to overcome the potential delay between hemodynamic response signal and cognitive stimuli signal, namely the same frequency but different phase (SFDP) problem. In this paper, a novel space affine matching feature is presented by introducing the time domain and frequency domain features. The time domain feature is used to discern different stimuli, while the frequency domain feature to eliminate the delay. And then we propose a space affine matching (SAM) algorithm to match fMRI time series by our affine feature, in which a normal vector is estimated using gradient descent to explore the time series matching optimally. The experimental results illustrate that the SAM algorithm is insensitive to the delay between the hemodynamic response signal and the cognitive stimuli signal. Our approach significantly outperforms GLM method while there exists the delay. The approach can help us solve the SFDP problem in fMRI time series matching and thus of great promise to reveal brain dynamics.
NASA Astrophysics Data System (ADS)
Krestyannikov, E.; Tohka, J.; Ruotsalainen, U.
2008-06-01
This paper presents a novel statistical approach for joint estimation of regions-of-interest (ROIs) and the corresponding time-activity curves (TACs) from dynamic positron emission tomography (PET) brain projection data. It is based on optimizing the joint objective function that consists of a data log-likelihood term and two penalty terms reflecting the available a priori information about the human brain anatomy. The developed local optimization strategy iteratively updates both the ROI and TAC parameters and is guaranteed to monotonically increase the objective function. The quantitative evaluation of the algorithm is performed with numerically and Monte Carlo-simulated dynamic PET brain data of the 11C-Raclopride and 18F-FDG tracers. The results demonstrate that the method outperforms the existing sequential ROI quantification approaches in terms of accuracy, and can noticeably reduce the errors in TACs arising due to the finite spatial resolution and ROI delineation.
A Generalized Mixture Framework for Multi-label Classification
Hong, Charmgil; Batal, Iyad; Hauskrecht, Milos
2015-01-01
We develop a novel probabilistic ensemble framework for multi-label classification that is based on the mixtures-of-experts architecture. In this framework, we combine multi-label classification models in the classifier chains family that decompose the class posterior distribution P(Y1, …, Yd|X) using a product of posterior distributions over components of the output space. Our approach captures different input–output and output–output relations that tend to change across data. As a result, we can recover a rich set of dependency relations among inputs and outputs that a single multi-label classification model cannot capture due to its modeling simplifications. We develop and present algorithms for learning the mixtures-of-experts models from data and for performing multi-label predictions on unseen data instances. Experiments on multiple benchmark datasets demonstrate that our approach achieves highly competitive results and outperforms the existing state-of-the-art multi-label classification methods. PMID:26613069
A Recommendation System to Facilitate Business Process Modeling.
Deng, Shuiguang; Wang, Dongjing; Li, Ying; Cao, Bin; Yin, Jianwei; Wu, Zhaohui; Zhou, Mengchu
2017-06-01
This paper presents a system that utilizes process recommendation technology to help design new business processes from scratch in an efficient and accurate way. The proposed system consists of two phases: 1) offline mining and 2) online recommendation. At the first phase, it mines relations among activity nodes from existing processes in repository, and then stores the extracted relations as patterns in a database. At the second phase, it compares the new process under construction with the premined patterns, and recommends proper activity nodes of the most matching patterns to help build a new process. Specifically, there are three different online recommendation strategies in this system. Experiments on both real and synthetic datasets are conducted to compare the proposed approaches with the other state-of-the-art ones, and the results show that the proposed approaches outperform them in terms of accuracy and efficiency.
Acoustic window planning for ultrasound acquisition.
Göbl, Rüdiger; Virga, Salvatore; Rackerseder, Julia; Frisch, Benjamin; Navab, Nassir; Hennersperger, Christoph
2017-06-01
Autonomous robotic ultrasound has recently gained considerable interest, especially for collaborative applications. Existing methods for acquisition trajectory planning are solely based on geometrical considerations, such as the pose of the transducer with respect to the patient surface. This work aims at establishing acoustic window planning to enable autonomous ultrasound acquisitions of anatomies with restricted acoustic windows, such as the liver or the heart. We propose a fully automatic approach for the planning of acquisition trajectories, which only requires information about the target region as well as existing tomographic imaging data, such as X-ray computed tomography. The framework integrates both geometrical and physics-based constraints to estimate the best ultrasound acquisition trajectories with respect to the available acoustic windows. We evaluate the developed method using virtual planning scenarios based on real patient data as well as for real robotic ultrasound acquisitions on a tissue-mimicking phantom. The proposed method yields superior image quality in comparison with a naive planning approach, while maintaining the necessary coverage of the target. We demonstrate that by taking image formation properties into account acquisition planning methods can outperform naive plannings. Furthermore, we show the need for such planning techniques, since naive approaches are not sufficient as they do not take the expected image quality into account.
Application of Wavelet Transform for PDZ Domain Classification
Daqrouq, Khaled; Alhmouz, Rami; Balamesh, Ahmed; Memic, Adnan
2015-01-01
PDZ domains have been identified as part of an array of signaling proteins that are often unrelated, except for the well-conserved structural PDZ domain they contain. These domains have been linked to many disease processes including common Avian influenza, as well as very rare conditions such as Fraser and Usher syndromes. Historically, based on the interactions and the nature of bonds they form, PDZ domains have most often been classified into one of three classes (class I, class II and others - class III), that is directly dependent on their binding partner. In this study, we report on three unique feature extraction approaches based on the bigram and trigram occurrence and existence rearrangements within the domain's primary amino acid sequences in assisting PDZ domain classification. Wavelet packet transform (WPT) and Shannon entropy denoted by wavelet entropy (WE) feature extraction methods were proposed. Using 115 unique human and mouse PDZ domains, the existence rearrangement approach yielded a high recognition rate (78.34%), which outperformed our occurrence rearrangements based method. The recognition rate was (81.41%) with validation technique. The method reported for PDZ domain classification from primary sequences proved to be an encouraging approach for obtaining consistent classification results. We anticipate that by increasing the database size, we can further improve feature extraction and correct classification. PMID:25860375
Identifying Interacting Genetic Variations by Fish-Swarm Logic Regression
Yang, Aiyuan; Yan, Chunxia; Zhu, Feng; Zhao, Zhongmeng; Cao, Zhi
2013-01-01
Understanding associations between genotypes and complex traits is a fundamental problem in human genetics. A major open problem in mapping phenotypes is that of identifying a set of interacting genetic variants, which might contribute to complex traits. Logic regression (LR) is a powerful multivariant association tool. Several LR-based approaches have been successfully applied to different datasets. However, these approaches are not adequate with regard to accuracy and efficiency. In this paper, we propose a new LR-based approach, called fish-swarm logic regression (FSLR), which improves the logic regression process by incorporating swarm optimization. In our approach, a school of fish agents are conducted in parallel. Each fish agent holds a regression model, while the school searches for better models through various preset behaviors. A swarm algorithm improves the accuracy and the efficiency by speeding up the convergence and preventing it from dropping into local optimums. We apply our approach on a real screening dataset and a series of simulation scenarios. Compared to three existing LR-based approaches, our approach outperforms them by having lower type I and type II error rates, being able to identify more preset causal sites, and performing at faster speeds. PMID:23984382
BigFoot: Bayesian alignment and phylogenetic footprinting with MCMC
Satija, Rahul; Novák, Ádám; Miklós, István; Lyngsø, Rune; Hein, Jotun
2009-01-01
Background We have previously combined statistical alignment and phylogenetic footprinting to detect conserved functional elements without assuming a fixed alignment. Considering a probability-weighted distribution of alignments removes sensitivity to alignment errors, properly accommodates regions of alignment uncertainty, and increases the accuracy of functional element prediction. Our method utilized standard dynamic programming hidden markov model algorithms to analyze up to four sequences. Results We present a novel approach, implemented in the software package BigFoot, for performing phylogenetic footprinting on greater numbers of sequences. We have developed a Markov chain Monte Carlo (MCMC) approach which samples both sequence alignments and locations of slowly evolving regions. We implement our method as an extension of the existing StatAlign software package and test it on well-annotated regions controlling the expression of the even-skipped gene in Drosophila and the α-globin gene in vertebrates. The results exhibit how adding additional sequences to the analysis has the potential to improve the accuracy of functional predictions, and demonstrate how BigFoot outperforms existing alignment-based phylogenetic footprinting techniques. Conclusion BigFoot extends a combined alignment and phylogenetic footprinting approach to analyze larger amounts of sequence data using MCMC. Our approach is robust to alignment error and uncertainty and can be applied to a variety of biological datasets. The source code and documentation are publicly available for download from PMID:19715598
BigFoot: Bayesian alignment and phylogenetic footprinting with MCMC.
Satija, Rahul; Novák, Adám; Miklós, István; Lyngsø, Rune; Hein, Jotun
2009-08-28
We have previously combined statistical alignment and phylogenetic footprinting to detect conserved functional elements without assuming a fixed alignment. Considering a probability-weighted distribution of alignments removes sensitivity to alignment errors, properly accommodates regions of alignment uncertainty, and increases the accuracy of functional element prediction. Our method utilized standard dynamic programming hidden markov model algorithms to analyze up to four sequences. We present a novel approach, implemented in the software package BigFoot, for performing phylogenetic footprinting on greater numbers of sequences. We have developed a Markov chain Monte Carlo (MCMC) approach which samples both sequence alignments and locations of slowly evolving regions. We implement our method as an extension of the existing StatAlign software package and test it on well-annotated regions controlling the expression of the even-skipped gene in Drosophila and the alpha-globin gene in vertebrates. The results exhibit how adding additional sequences to the analysis has the potential to improve the accuracy of functional predictions, and demonstrate how BigFoot outperforms existing alignment-based phylogenetic footprinting techniques. BigFoot extends a combined alignment and phylogenetic footprinting approach to analyze larger amounts of sequence data using MCMC. Our approach is robust to alignment error and uncertainty and can be applied to a variety of biological datasets. The source code and documentation are publicly available for download from http://www.stats.ox.ac.uk/~satija/BigFoot/
Sun, Jimeng; Hu, Jianying; Luo, Dijun; Markatou, Marianthi; Wang, Fei; Edabollahi, Shahram; Steinhubl, Steven E.; Daar, Zahra; Stewart, Walter F.
2012-01-01
Background: The ability to identify the risk factors related to an adverse condition, e.g., heart failures (HF) diagnosis, is very important for improving care quality and reducing cost. Existing approaches for risk factor identification are either knowledge driven (from guidelines or literatures) or data driven (from observational data). No existing method provides a model to effectively combine expert knowledge with data driven insight for risk factor identification. Methods: We present a systematic approach to enhance known knowledge-based risk factors with additional potential risk factors derived from data. The core of our approach is a sparse regression model with regularization terms that correspond to both knowledge and data driven risk factors. Results: The approach is validated using a large dataset containing 4,644 heart failure cases and 45,981 controls. The outpatient electronic health records (EHRs) for these patients include diagnosis, medication, lab results from 2003–2010. We demonstrate that the proposed method can identify complementary risk factors that are not in the existing known factors and can better predict the onset of HF. We quantitatively compare different sets of risk factors in the context of predicting onset of HF using the performance metric, the Area Under the ROC Curve (AUC). The combined risk factors between knowledge and data significantly outperform knowledge-based risk factors alone. Furthermore, those additional risk factors are confirmed to be clinically meaningful by a cardiologist. Conclusion: We present a systematic framework for combining knowledge and data driven insights for risk factor identification. We demonstrate the power of this framework in the context of predicting onset of HF, where our approach can successfully identify intuitive and predictive risk factors beyond a set of known HF risk factors. PMID:23304365
An Effective Palmprint Recognition Approach for Visible and Multispectral Sensor Images.
Gumaei, Abdu; Sammouda, Rachid; Al-Salman, Abdul Malik; Alsanad, Ahmed
2018-05-15
Among several palmprint feature extraction methods the HOG-based method is attractive and performs well against changes in illumination and shadowing of palmprint images. However, it still lacks the robustness to extract the palmprint features at different rotation angles. To solve this problem, this paper presents a hybrid feature extraction method, named HOG-SGF that combines the histogram of oriented gradients (HOG) with a steerable Gaussian filter (SGF) to develop an effective palmprint recognition approach. The approach starts by processing all palmprint images by David Zhang's method to segment only the region of interests. Next, we extracted palmprint features based on the hybrid HOG-SGF feature extraction method. Then, an optimized auto-encoder (AE) was utilized to reduce the dimensionality of the extracted features. Finally, a fast and robust regularized extreme learning machine (RELM) was applied for the classification task. In the evaluation phase of the proposed approach, a number of experiments were conducted on three publicly available palmprint databases, namely MS-PolyU of multispectral palmprint images and CASIA and Tongji of contactless palmprint images. Experimentally, the results reveal that the proposed approach outperforms the existing state-of-the-art approaches even when a small number of training samples are used.
Deng, Bai-chuan; Yun, Yong-huan; Liang, Yi-zeng; Yi, Lun-zhao
2014-10-07
In this study, a new optimization algorithm called the Variable Iterative Space Shrinkage Approach (VISSA) that is based on the idea of model population analysis (MPA) is proposed for variable selection. Unlike most of the existing optimization methods for variable selection, VISSA statistically evaluates the performance of variable space in each step of optimization. Weighted binary matrix sampling (WBMS) is proposed to generate sub-models that span the variable subspace. Two rules are highlighted during the optimization procedure. First, the variable space shrinks in each step. Second, the new variable space outperforms the previous one. The second rule, which is rarely satisfied in most of the existing methods, is the core of the VISSA strategy. Compared with some promising variable selection methods such as competitive adaptive reweighted sampling (CARS), Monte Carlo uninformative variable elimination (MCUVE) and iteratively retaining informative variables (IRIV), VISSA showed better prediction ability for the calibration of NIR data. In addition, VISSA is user-friendly; only a few insensitive parameters are needed, and the program terminates automatically without any additional conditions. The Matlab codes for implementing VISSA are freely available on the website: https://sourceforge.net/projects/multivariateanalysis/files/VISSA/.
Active link selection for efficient semi-supervised community detection
NASA Astrophysics Data System (ADS)
Yang, Liang; Jin, Di; Wang, Xiao; Cao, Xiaochun
2015-03-01
Several semi-supervised community detection algorithms have been proposed recently to improve the performance of traditional topology-based methods. However, most of them focus on how to integrate supervised information with topology information; few of them pay attention to which information is critical for performance improvement. This leads to large amounts of demand for supervised information, which is expensive or difficult to obtain in most fields. For this problem we propose an active link selection framework, that is we actively select the most uncertain and informative links for human labeling for the efficient utilization of the supervised information. We also disconnect the most likely inter-community edges to further improve the efficiency. Our main idea is that, by connecting uncertain nodes to their community hubs and disconnecting the inter-community edges, one can sharpen the block structure of adjacency matrix more efficiently than randomly labeling links as the existing methods did. Experiments on both synthetic and real networks demonstrate that our new approach significantly outperforms the existing methods in terms of the efficiency of using supervised information. It needs ~13% of the supervised information to achieve a performance similar to that of the original semi-supervised approaches.
Hu, Jialu; Kehr, Birte; Reinert, Knut
2014-02-15
Owing to recent advancements in high-throughput technologies, protein-protein interaction networks of more and more species become available in public databases. The question of how to identify functionally conserved proteins across species attracts a lot of attention in computational biology. Network alignments provide a systematic way to solve this problem. However, most existing alignment tools encounter limitations in tackling this problem. Therefore, the demand for faster and more efficient alignment tools is growing. We present a fast and accurate algorithm, NetCoffee, which allows to find a global alignment of multiple protein-protein interaction networks. NetCoffee searches for a global alignment by maximizing a target function using simulated annealing on a set of weighted bipartite graphs that are constructed using a triplet approach similar to T-Coffee. To assess its performance, NetCoffee was applied to four real datasets. Our results suggest that NetCoffee remedies several limitations of previous algorithms, outperforms all existing alignment tools in terms of speed and nevertheless identifies biologically meaningful alignments. The source code and data are freely available for download under the GNU GPL v3 license at https://code.google.com/p/netcoffee/.
Active link selection for efficient semi-supervised community detection
Yang, Liang; Jin, Di; Wang, Xiao; Cao, Xiaochun
2015-01-01
Several semi-supervised community detection algorithms have been proposed recently to improve the performance of traditional topology-based methods. However, most of them focus on how to integrate supervised information with topology information; few of them pay attention to which information is critical for performance improvement. This leads to large amounts of demand for supervised information, which is expensive or difficult to obtain in most fields. For this problem we propose an active link selection framework, that is we actively select the most uncertain and informative links for human labeling for the efficient utilization of the supervised information. We also disconnect the most likely inter-community edges to further improve the efficiency. Our main idea is that, by connecting uncertain nodes to their community hubs and disconnecting the inter-community edges, one can sharpen the block structure of adjacency matrix more efficiently than randomly labeling links as the existing methods did. Experiments on both synthetic and real networks demonstrate that our new approach significantly outperforms the existing methods in terms of the efficiency of using supervised information. It needs ~13% of the supervised information to achieve a performance similar to that of the original semi-supervised approaches. PMID:25761385
NHPP-Based Software Reliability Models Using Equilibrium Distribution
NASA Astrophysics Data System (ADS)
Xiao, Xiao; Okamura, Hiroyuki; Dohi, Tadashi
Non-homogeneous Poisson processes (NHPPs) have gained much popularity in actual software testing phases to estimate the software reliability, the number of remaining faults in software and the software release timing. In this paper, we propose a new modeling approach for the NHPP-based software reliability models (SRMs) to describe the stochastic behavior of software fault-detection processes. The fundamental idea is to apply the equilibrium distribution to the fault-detection time distribution in NHPP-based modeling. We also develop efficient parameter estimation procedures for the proposed NHPP-based SRMs. Through numerical experiments, it can be concluded that the proposed NHPP-based SRMs outperform the existing ones in many data sets from the perspective of goodness-of-fit and prediction performance.
Song, Dandan; Li, Ning; Liao, Lejian
2015-01-01
Due to the generation of enormous amounts of data at both lower costs as well as in shorter times, whole-exome sequencing technologies provide dramatic opportunities for identifying disease genes implicated in Mendelian disorders. Since upwards of thousands genomic variants can be sequenced in each exome, it is challenging to filter pathogenic variants in protein coding regions and reduce the number of missing true variants. Therefore, an automatic and efficient pipeline for finding disease variants in Mendelian disorders is designed by exploiting a combination of variants filtering steps to analyze the family-based exome sequencing approach. Recent studies on the Freeman-Sheldon disease are revisited and show that the proposed method outperforms other existing candidate gene identification methods.
Pattern-set generation algorithm for the one-dimensional multiple stock sizes cutting stock problem
NASA Astrophysics Data System (ADS)
Cui, Yaodong; Cui, Yi-Ping; Zhao, Zhigang
2015-09-01
A pattern-set generation algorithm (PSG) for the one-dimensional multiple stock sizes cutting stock problem (1DMSSCSP) is presented. The solution process contains two stages. In the first stage, the PSG solves the residual problems repeatedly to generate the patterns in the pattern set, where each residual problem is solved by the column-generation approach, and each pattern is generated by solving a single large object placement problem. In the second stage, the integer linear programming model of the 1DMSSCSP is solved using a commercial solver, where only the patterns in the pattern set are considered. The computational results of benchmark instances indicate that the PSG outperforms existing heuristic algorithms and rivals the exact algorithm in solution quality.
Sub-pattern based multi-manifold discriminant analysis for face recognition
NASA Astrophysics Data System (ADS)
Dai, Jiangyan; Guo, Changlu; Zhou, Wei; Shi, Yanjiao; Cong, Lin; Yi, Yugen
2018-04-01
In this paper, we present a Sub-pattern based Multi-manifold Discriminant Analysis (SpMMDA) algorithm for face recognition. Unlike existing Multi-manifold Discriminant Analysis (MMDA) approach which is based on holistic information of face image for recognition, SpMMDA operates on sub-images partitioned from the original face image and then extracts the discriminative local feature from the sub-images separately. Moreover, the structure information of different sub-images from the same face image is considered in the proposed method with the aim of further improve the recognition performance. Extensive experiments on three standard face databases (Extended YaleB, CMU PIE and AR) demonstrate that the proposed method is effective and outperforms some other sub-pattern based face recognition methods.
Sengupta Chattopadhyay, Amrita; Hsiao, Ching-Lin; Chang, Chien Ching; Lian, Ie-Bin; Fann, Cathy S J
2014-01-01
Identifying susceptibility genes that influence complex diseases is extremely difficult because loci often influence the disease state through genetic interactions. Numerous approaches to detect disease-associated SNP-SNP interactions have been developed, but none consistently generates high-quality results under different disease scenarios. Using summarizing techniques to combine a number of existing methods may provide a solution to this problem. Here we used three popular non-parametric methods-Gini, absolute probability difference (APD), and entropy-to develop two novel summary scores, namely principle component score (PCS) and Z-sum score (ZSS), with which to predict disease-associated genetic interactions. We used a simulation study to compare performance of the non-parametric scores, the summary scores, the scaled-sum score (SSS; used in polymorphism interaction analysis (PIA)), and the multifactor dimensionality reduction (MDR). The non-parametric methods achieved high power, but no non-parametric method outperformed all others under a variety of epistatic scenarios. PCS and ZSS, however, outperformed MDR. PCS, ZSS and SSS displayed controlled type-I-errors (<0.05) compared to GS, APDS, ES (>0.05). A real data study using the genetic-analysis-workshop 16 (GAW 16) rheumatoid arthritis dataset identified a number of interesting SNP-SNP interactions. © 2013 Elsevier B.V. All rights reserved.
Prediction of heterotrimeric protein complexes by two-phase learning using neighboring kernels
2014-01-01
Background Protein complexes play important roles in biological systems such as gene regulatory networks and metabolic pathways. Most methods for predicting protein complexes try to find protein complexes with size more than three. It, however, is known that protein complexes with smaller sizes occupy a large part of whole complexes for several species. In our previous work, we developed a method with several feature space mappings and the domain composition kernel for prediction of heterodimeric protein complexes, which outperforms existing methods. Results We propose methods for prediction of heterotrimeric protein complexes by extending techniques in the previous work on the basis of the idea that most heterotrimeric protein complexes are not likely to share the same protein with each other. We make use of the discriminant function in support vector machines (SVMs), and design novel feature space mappings for the second phase. As the second classifier, we examine SVMs and relevance vector machines (RVMs). We perform 10-fold cross-validation computational experiments. The results suggest that our proposed two-phase methods and SVM with the extended features outperform the existing method NWE, which was reported to outperform other existing methods such as MCL, MCODE, DPClus, CMC, COACH, RRW, and PPSampler for prediction of heterotrimeric protein complexes. Conclusions We propose two-phase prediction methods with the extended features, the domain composition kernel, SVMs and RVMs. The two-phase method with the extended features and the domain composition kernel using SVM as the second classifier is particularly useful for prediction of heterotrimeric protein complexes. PMID:24564744
Hwang, I-Shyan
2017-01-01
The K-coverage configuration that guarantees coverage of each location by at least K sensors is highly popular and is extensively used to monitor diversified applications in wireless sensor networks. Long network lifetime and high detection quality are the essentials of such K-covered sleep-scheduling algorithms. However, the existing sleep-scheduling algorithms either cause high cost or cannot preserve the detection quality effectively. In this paper, the Pre-Scheduling-based K-coverage Group Scheduling (PSKGS) and Self-Organized K-coverage Scheduling (SKS) algorithms are proposed to settle the problems in the existing sleep-scheduling algorithms. Simulation results show that our pre-scheduled-based KGS approach enhances the detection quality and network lifetime, whereas the self-organized-based SKS algorithm minimizes the computation and communication cost of the nodes and thereby is energy efficient. Besides, SKS outperforms PSKGS in terms of network lifetime and detection quality as it is self-organized. PMID:29257078
Schouten, Kim; van der Weijde, Onne; Frasincar, Flavius; Dekker, Rommert
2018-04-01
Using online consumer reviews as electronic word of mouth to assist purchase-decision making has become increasingly popular. The Web provides an extensive source of consumer reviews, but one can hardly read all reviews to obtain a fair evaluation of a product or service. A text processing framework that can summarize reviews, would therefore be desirable. A subtask to be performed by such a framework would be to find the general aspect categories addressed in review sentences, for which this paper presents two methods. In contrast to most existing approaches, the first method presented is an unsupervised method that applies association rule mining on co-occurrence frequency data obtained from a corpus to find these aspect categories. While not on par with state-of-the-art supervised methods, the proposed unsupervised method performs better than several simple baselines, a similar but supervised method, and a supervised baseline, with an -score of 67%. The second method is a supervised variant that outperforms existing methods with an -score of 84%.
NASA Astrophysics Data System (ADS)
Liu, Miaofeng
2017-07-01
In recent years, deep convolutional neural networks come into use in image inpainting and super-resolution in many fields. Distinct to most of the former methods requiring to know beforehand the local information for corrupted pixels, we propose a 20-depth fully convolutional network to learn an end-to-end mapping a dataset of damaged/ground truth subimage pairs realizing non-local blind inpainting and super-resolution. As there often exist image with huge corruptions or inpainting on a low-resolution image that the existing approaches unable to perform well, we also share parameters in local area of layers to achieve spatial recursion and enlarge the receptive field. To avoid the difficulty of training this deep neural network, skip-connections between symmetric convolutional layers are designed. Experimental results shows that the proposed method outperforms state-of-the-art methods for diverse corrupting and low-resolution conditions, it works excellently when realizing super-resolution and image inpainting simultaneously
Revealing the Hidden Language of Complex Networks
Yaveroğlu, Ömer Nebil; Malod-Dognin, Noël; Davis, Darren; Levnajic, Zoran; Janjic, Vuk; Karapandza, Rasa; Stojmirovic, Aleksandar; Pržulj, Nataša
2014-01-01
Sophisticated methods for analysing complex networks promise to be of great benefit to almost all scientific disciplines, yet they elude us. In this work, we make fundamental methodological advances to rectify this. We discover that the interaction between a small number of roles, played by nodes in a network, can characterize a network's structure and also provide a clear real-world interpretation. Given this insight, we develop a framework for analysing and comparing networks, which outperforms all existing ones. We demonstrate its strength by uncovering novel relationships between seemingly unrelated networks, such as Facebook, metabolic, and protein structure networks. We also use it to track the dynamics of the world trade network, showing that a country's role of a broker between non-trading countries indicates economic prosperity, whereas peripheral roles are associated with poverty. This result, though intuitive, has escaped all existing frameworks. Finally, our approach translates network topology into everyday language, bringing network analysis closer to domain scientists. PMID:24686408
NASA Astrophysics Data System (ADS)
Marani, M.; Zorzetto, E.; Hosseini, S. R.; Miniussi, A.; Scaioni, M.
2017-12-01
The Generalized Extreme Value (GEV) distribution is widely adopted irrespective of the properties of the stochastic process generating the extreme events. However, GEV presents several limitations, both theoretical (asymptotic validity for a large number of events/year or hypothesis of Poisson occurrences of Generalized Pareto events), and practical (fitting uses just yearly maxima or a few values above a high threshold). Here we describe the Metastatistical Extreme Value Distribution (MEVD, Marani & Ignaccolo, 2015), which relaxes asymptotic or Poisson/GPD assumptions and makes use of all available observations. We then illustrate the flexibility of the MEVD by applying it to daily precipitation, hurricane intensity, and storm surge magnitude. Application to daily rainfall from a global raingauge network shows that MEVD estimates are 50% more accurate than those from GEV when the recurrence interval of interest is much greater than the observational period. This makes MEVD suited for application to satellite rainfall observations ( 20 yrs length). Use of MEVD on TRMM data yields extreme event patterns that are in better agreement with surface observations than corresponding GEV estimates.Applied to the HURDAT2 Atlantic hurricane intensity dataset, MEVD significantly outperforms GEV estimates of extreme hurricanes. Interestingly, the Generalized Pareto distribution used for "ordinary" hurricane intensity points to the existence of a maximum limit wind speed that is significantly smaller than corresponding physically-based estimates. Finally, we applied the MEVD approach to water levels generated by tidal fluctuations and storm surges at a set of coastal sites spanning different storm-surge regimes. MEVD yields accurate estimates of large quantiles and inferences on tail thickness (fat vs. thin) of the underlying distribution of "ordinary" surges. In summary, the MEVD approach presents a number of theoretical and practical advantages, and outperforms traditional approaches in several applications. We conclude that the MEVD is a significant contribution to further generalize extreme value theory, with implications for a broad range of Earth Sciences.
Huet, Michaël; Jacobs, David M; Camachon, Cyril; Goulon, Cedric; Montagne, Gilles
2009-12-01
This study (a) compares the effectiveness of different types of feedback for novices who learn to land a virtual aircraft in a fixed-base flight simulator and (b) analyzes the informational variables that learners come to use after practice. An extensive body of research exists concerning the informational variables that allow successful landing. In contrast, few studies have examined how the attention of pilots can be directed toward these sources of information. In this study, 15 participants were asked to land a virtual Cessna 172 on 245 trials while trying to follow the glide-slope area as accurately as possible. Three groups of participants practiced under different feedback conditions: with self-controlled concurrent feedback (the self-controlled group), with imposed concurrent feedback (the yoked group), or without concurrent feedback (the control group). The self-controlled group outperformed the yoked group, which in turn outperformed the control group. Removing or manipulating specific sources of information during transfer tests had different effects for different individuals. However, removing the cockpit from the visual scene had a detrimental effect on the performance of the majority of the participants. Self-controlled concurrent feedback helps learners to more quickly attune to the informational variables that allow them to control the aircraft during the approach phase. Knowledge concerning feedback schedules can be used for the design of optimal practice methods for student pilots, and knowledge about the informational variables used by expert performers has implications for the design of cockpits and runways that facilitate the detection of these variables.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zeng, Dong; Zhang, Xinyu; Bian, Zhaoying, E-mail: zybian@smu.edu.cn, E-mail: jhma@smu.edu.cn
Purpose: Cerebral perfusion computed tomography (PCT) imaging as an accurate and fast acute ischemic stroke examination has been widely used in clinic. Meanwhile, a major drawback of PCT imaging is the high radiation dose due to its dynamic scan protocol. The purpose of this work is to develop a robust perfusion deconvolution approach via structure tensor total variation (STV) regularization (PD-STV) for estimating an accurate residue function in PCT imaging with the low-milliampere-seconds (low-mAs) data acquisition. Methods: Besides modeling the spatio-temporal structure information of PCT data, the STV regularization of the present PD-STV approach can utilize the higher order derivativesmore » of the residue function to enhance denoising performance. To minimize the objective function, the authors propose an effective iterative algorithm with a shrinkage/thresholding scheme. A simulation study on a digital brain perfusion phantom and a clinical study on an old infarction patient were conducted to validate and evaluate the performance of the present PD-STV approach. Results: In the digital phantom study, visual inspection and quantitative metrics (i.e., the normalized mean square error, the peak signal-to-noise ratio, and the universal quality index) assessments demonstrated that the PD-STV approach outperformed other existing approaches in terms of the performance of noise-induced artifacts reduction and accurate perfusion hemodynamic maps (PHM) estimation. In the patient data study, the present PD-STV approach could yield accurate PHM estimation with several noticeable gains over other existing approaches in terms of visual inspection and correlation analysis. Conclusions: This study demonstrated the feasibility and efficacy of the present PD-STV approach in utilizing STV regularization to improve the accuracy of residue function estimation of cerebral PCT imaging in the case of low-mAs.« less
Reactive navigation in extremely dense and highly intricate environments
2017-01-01
Reactive navigation is a well-known paradigm for controlling an autonomous mobile robot, which suggests making all control decisions through some light processing of the current/recent sensor data. Among the many advantages of this paradigm are: 1) the possibility to apply it to robots with limited and low-priced hardware resources, and 2) the fact of being able to safely navigate a robot in completely unknown environments containing unpredictable moving obstacles. As a major disadvantage, nevertheless, the reactive paradigm may occasionally cause robots to get trapped in certain areas of the environment—typically, these conflicting areas have a large concave shape and/or are full of closely-spaced obstacles. In this last respect, an enormous effort has been devoted to overcome such a serious drawback during the last two decades. As a result of this effort, a substantial number of new approaches for reactive navigation have been put forward. Some of these approaches have clearly improved the way how a reactively-controlled robot can move among densely cluttered obstacles; some other approaches have essentially focused on increasing the variety of obstacle shapes and sizes that could be successfully circumnavigated; etc. In this paper, as a starting point, we choose the best existing reactive approach to move in densely cluttered environments, and we also choose the existing reactive approach with the greatest ability to circumvent large intricate-shaped obstacles. Then, we combine these two approaches in a way that makes the most of them. From the experimental point of view, we use both simulated and real scenarios of challenging complexity for testing purposes. In such scenarios, we demonstrate that the combined approach herein proposed clearly outperforms the two individual approaches on which it is built. PMID:29287078
Gehrmann, Sebastian; Dernoncourt, Franck; Li, Yeran; Carlson, Eric T; Wu, Joy T; Welt, Jonathan; Foote, John; Moseley, Edward T; Grant, David W; Tyler, Patrick D; Celi, Leo A
2018-01-01
In secondary analysis of electronic health records, a crucial task consists in correctly identifying the patient cohort under investigation. In many cases, the most valuable and relevant information for an accurate classification of medical conditions exist only in clinical narratives. Therefore, it is necessary to use natural language processing (NLP) techniques to extract and evaluate these narratives. The most commonly used approach to this problem relies on extracting a number of clinician-defined medical concepts from text and using machine learning techniques to identify whether a particular patient has a certain condition. However, recent advances in deep learning and NLP enable models to learn a rich representation of (medical) language. Convolutional neural networks (CNN) for text classification can augment the existing techniques by leveraging the representation of language to learn which phrases in a text are relevant for a given medical condition. In this work, we compare concept extraction based methods with CNNs and other commonly used models in NLP in ten phenotyping tasks using 1,610 discharge summaries from the MIMIC-III database. We show that CNNs outperform concept extraction based methods in almost all of the tasks, with an improvement in F1-score of up to 26 and up to 7 percentage points in area under the ROC curve (AUC). We additionally assess the interpretability of both approaches by presenting and evaluating methods that calculate and extract the most salient phrases for a prediction. The results indicate that CNNs are a valid alternative to existing approaches in patient phenotyping and cohort identification, and should be further investigated. Moreover, the deep learning approach presented in this paper can be used to assist clinicians during chart review or support the extraction of billing codes from text by identifying and highlighting relevant phrases for various medical conditions.
Kuang, Li; Yu, Long; Huang, Lan; Wang, Yin; Ma, Pengju; Li, Chuanbin; Zhu, Yujia
2018-05-14
With the rapid development of cyber-physical systems (CPS), building cyber-physical systems with high quality of service (QoS) has become an urgent requirement in both academia and industry. During the procedure of building Cyber-physical systems, it has been found that a large number of functionally equivalent services exist, so it becomes an urgent task to recommend suitable services from the large number of services available in CPS. However, since it is time-consuming, and even impractical, for a single user to invoke all of the services in CPS to experience their QoS, a robust QoS prediction method is needed to predict unknown QoS values. A commonly used method in QoS prediction is collaborative filtering, however, it is hard to deal with the data sparsity and cold start problem, and meanwhile most of the existing methods ignore the data credibility issue. Thence, in order to solve both of these challenging problems, in this paper, we design a framework of QoS prediction for CPS services, and propose a personalized QoS prediction approach based on reputation and location-aware collaborative filtering. Our approach first calculates the reputation of users by using the Dirichlet probability distribution, so as to identify untrusted users and process their unreliable data, and then it digs out the geographic neighborhood in three levels to improve the similarity calculation of users and services. Finally, the data from geographical neighbors of users and services are fused to predict the unknown QoS values. The experiments using real datasets show that our proposed approach outperforms other existing methods in terms of accuracy, efficiency, and robustness.
Huang, Lan; Wang, Yin; Ma, Pengju; Li, Chuanbin; Zhu, Yujia
2018-01-01
With the rapid development of cyber-physical systems (CPS), building cyber-physical systems with high quality of service (QoS) has become an urgent requirement in both academia and industry. During the procedure of building Cyber-physical systems, it has been found that a large number of functionally equivalent services exist, so it becomes an urgent task to recommend suitable services from the large number of services available in CPS. However, since it is time-consuming, and even impractical, for a single user to invoke all of the services in CPS to experience their QoS, a robust QoS prediction method is needed to predict unknown QoS values. A commonly used method in QoS prediction is collaborative filtering, however, it is hard to deal with the data sparsity and cold start problem, and meanwhile most of the existing methods ignore the data credibility issue. Thence, in order to solve both of these challenging problems, in this paper, we design a framework of QoS prediction for CPS services, and propose a personalized QoS prediction approach based on reputation and location-aware collaborative filtering. Our approach first calculates the reputation of users by using the Dirichlet probability distribution, so as to identify untrusted users and process their unreliable data, and then it digs out the geographic neighborhood in three levels to improve the similarity calculation of users and services. Finally, the data from geographical neighbors of users and services are fused to predict the unknown QoS values. The experiments using real datasets show that our proposed approach outperforms other existing methods in terms of accuracy, efficiency, and robustness. PMID:29757995
Bayesian CP Factorization of Incomplete Tensors with Automatic Rank Determination.
Zhao, Qibin; Zhang, Liqing; Cichocki, Andrzej
2015-09-01
CANDECOMP/PARAFAC (CP) tensor factorization of incomplete data is a powerful technique for tensor completion through explicitly capturing the multilinear latent factors. The existing CP algorithms require the tensor rank to be manually specified, however, the determination of tensor rank remains a challenging problem especially for CP rank . In addition, existing approaches do not take into account uncertainty information of latent factors, as well as missing entries. To address these issues, we formulate CP factorization using a hierarchical probabilistic model and employ a fully Bayesian treatment by incorporating a sparsity-inducing prior over multiple latent factors and the appropriate hyperpriors over all hyperparameters, resulting in automatic rank determination. To learn the model, we develop an efficient deterministic Bayesian inference algorithm, which scales linearly with data size. Our method is characterized as a tuning parameter-free approach, which can effectively infer underlying multilinear factors with a low-rank constraint, while also providing predictive distributions over missing entries. Extensive simulations on synthetic data illustrate the intrinsic capability of our method to recover the ground-truth of CP rank and prevent the overfitting problem, even when a large amount of entries are missing. Moreover, the results from real-world applications, including image inpainting and facial image synthesis, demonstrate that our method outperforms state-of-the-art approaches for both tensor factorization and tensor completion in terms of predictive performance.
Distributed Power Allocation for Wireless Sensor Network Localization: A Potential Game Approach.
Ke, Mingxing; Li, Ding; Tian, Shiwei; Zhang, Yuli; Tong, Kaixiang; Xu, Yuhua
2018-05-08
The problem of distributed power allocation in wireless sensor network (WSN) localization systems is investigated in this paper, using the game theoretic approach. Existing research focuses on the minimization of the localization errors of individual agent nodes over all anchor nodes subject to power budgets. When the service area and the distribution of target nodes are considered, finding the optimal trade-off between localization accuracy and power consumption is a new critical task. To cope with this issue, we propose a power allocation game where each anchor node minimizes the square position error bound (SPEB) of the service area penalized by its individual power. Meanwhile, it is proven that the power allocation game is an exact potential game which has one pure Nash equilibrium (NE) at least. In addition, we also prove the existence of an ϵ -equilibrium point, which is a refinement of NE and the better response dynamic approach can reach the end solution. Analytical and simulation results demonstrate that: (i) when prior distribution information is available, the proposed strategies have better localization accuracy than the uniform strategies; (ii) when prior distribution information is unknown, the performance of the proposed strategies outperforms power management strategies based on the second-order cone program (SOCP) for particular agent nodes after obtaining the estimated distribution of agent nodes. In addition, proposed strategies also provide an instructional trade-off between power consumption and localization accuracy.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Niu, S; Zhang, Y; Ma, J
Purpose: To investigate iterative reconstruction via prior image constrained total generalized variation (PICTGV) for spectral computed tomography (CT) using fewer projections while achieving greater image quality. Methods: The proposed PICTGV method is formulated as an optimization problem, which balances the data fidelity and prior image constrained total generalized variation of reconstructed images in one framework. The PICTGV method is based on structure correlations among images in the energy domain and high-quality images to guide the reconstruction of energy-specific images. In PICTGV method, the high-quality image is reconstructed from all detector-collected X-ray signals and is referred as the broad-spectrum image. Distinctmore » from the existing reconstruction methods applied on the images with first order derivative, the higher order derivative of the images is incorporated into the PICTGV method. An alternating optimization algorithm is used to minimize the PICTGV objective function. We evaluate the performance of PICTGV on noise and artifacts suppressing using phantom studies and compare the method with the conventional filtered back-projection method as well as TGV based method without prior image. Results: On the digital phantom, the proposed method outperforms the existing TGV method in terms of the noise reduction, artifacts suppression, and edge detail preservation. Compared to that obtained by the TGV based method without prior image, the relative root mean square error in the images reconstructed by the proposed method is reduced by over 20%. Conclusion: The authors propose an iterative reconstruction via prior image constrained total generalize variation for spectral CT. Also, we have developed an alternating optimization algorithm and numerically demonstrated the merits of our approach. Results show that the proposed PICTGV method outperforms the TGV method for spectral CT.« less
Piastra, Maria Carla; Nüßing, Andreas; Vorwerk, Johannes; Bornfleth, Harald; Oostenveld, Robert; Engwer, Christian; Wolters, Carsten H.
2018-01-01
In Electro- (EEG) and Magnetoencephalography (MEG), one important requirement of source reconstruction is the forward model. The continuous Galerkin finite element method (CG-FEM) has become one of the dominant approaches for solving the forward problem over the last decades. Recently, a discontinuous Galerkin FEM (DG-FEM) EEG forward approach has been proposed as an alternative to CG-FEM (Engwer et al., 2017). It was shown that DG-FEM preserves the property of conservation of charge and that it can, in certain situations such as the so-called skull leakages, be superior to the standard CG-FEM approach. In this paper, we developed, implemented, and evaluated two DG-FEM approaches for the MEG forward problem, namely a conservative and a non-conservative one. The subtraction approach was used as source model. The validation and evaluation work was done in statistical investigations in multi-layer homogeneous sphere models, where an analytic solution exists, and in a six-compartment realistically shaped head volume conductor model. In agreement with the theory, the conservative DG-FEM approach was found to be superior to the non-conservative DG-FEM implementation. This approach also showed convergence with increasing resolution of the hexahedral meshes. While in the EEG case, in presence of skull leakages, DG-FEM outperformed CG-FEM, in MEG, DG-FEM achieved similar numerical errors as the CG-FEM approach, i.e., skull leakages do not play a role for the MEG modality. In particular, for the finest mesh resolution of 1 mm sources with a distance of 1.59 mm from the brain-CSF surface, DG-FEM yielded mean topographical errors (relative difference measure, RDM%) of 1.5% and mean magnitude errors (MAG%) of 0.1% for the magnetic field. However, if the goal is a combined source analysis of EEG and MEG data, then it is highly desirable to employ the same forward model for both EEG and MEG data. Based on these results, we conclude that the newly presented conservative DG-FEM can at least complement and in some scenarios even outperform the established CG-FEM approaches in EEG or combined MEG/EEG source analysis scenarios, which motivates a further evaluation of DG-FEM for applications in bioelectromagnetism. PMID:29456487
An Extraction Method of an Informative DOM Node from a Web Page by Using Layout Information
NASA Astrophysics Data System (ADS)
Tsuruta, Masanobu; Masuyama, Shigeru
We propose an informative DOM node extraction method from a Web page for preprocessing of Web content mining. Our proposed method LM uses layout data of DOM nodes generated by a generic Web browser, and the learning set consists of hundreds of Web pages and the annotations of informative DOM nodes of those Web pages. Our method does not require large scale crawling of the whole Web site to which the target Web page belongs. We design LM so that it uses the information of the learning set more efficiently in comparison to the existing method that uses the same learning set. By experiments, we evaluate the methods obtained by combining one that consists of the method for extracting the informative DOM node both the proposed method and the existing methods, and the existing noise elimination methods: Heur removes advertisements and link-lists by some heuristics and CE removes the DOM nodes existing in the Web pages in the same Web site to which the target Web page belongs. Experimental results show that 1) LM outperforms other methods for extracting the informative DOM node, 2) the combination method (LM, {CE(10), Heur}) based on LM (precision: 0.755, recall: 0.826, F-measure: 0.746) outperforms other combination methods.
Novel secret key generation techniques using memristor devices
NASA Astrophysics Data System (ADS)
Abunahla, Heba; Shehada, Dina; Yeun, Chan Yeob; Mohammad, Baker; Jaoude, Maguy Abi
2016-02-01
This paper proposes novel secret key generation techniques using memristor devices. The approach depends on using the initial profile of a memristor as a master key. In addition, session keys are generated using the master key and other specified parameters. In contrast to existing memristor-based security approaches, the proposed development is cost effective and power efficient since the operation can be achieved with a single device rather than a crossbar structure. An algorithm is suggested and demonstrated using physics based Matlab model. It is shown that the generated keys can have dynamic size which provides perfect security. Moreover, the proposed encryption and decryption technique using the memristor based generated keys outperforms Triple Data Encryption Standard (3DES) and Advanced Encryption Standard (AES) in terms of processing time. This paper is enriched by providing characterization results of a fabricated microscale Al/TiO2/Al memristor prototype in order to prove the concept of the proposed approach and study the impacts of process variations. The work proposed in this paper is a milestone towards System On Chip (SOC) memristor based security.
Semi-supervised prediction of gene regulatory networks using machine learning algorithms.
Patel, Nihir; Wang, Jason T L
2015-10-01
Use of computational methods to predict gene regulatory networks (GRNs) from gene expression data is a challenging task. Many studies have been conducted using unsupervised methods to fulfill the task; however, such methods usually yield low prediction accuracies due to the lack of training data. In this article, we propose semi-supervised methods for GRN prediction by utilizing two machine learning algorithms, namely, support vector machines (SVM) and random forests (RF). The semi-supervised methods make use of unlabelled data for training. We investigated inductive and transductive learning approaches, both of which adopt an iterative procedure to obtain reliable negative training data from the unlabelled data. We then applied our semi-supervised methods to gene expression data of Escherichia coli and Saccharomyces cerevisiae, and evaluated the performance of our methods using the expression data. Our analysis indicated that the transductive learning approach outperformed the inductive learning approach for both organisms. However, there was no conclusive difference identified in the performance of SVM and RF. Experimental results also showed that the proposed semi-supervised methods performed better than existing supervised methods for both organisms.
Jonnagaddala, Jitendra; Jue, Toni Rose; Chang, Nai-Wen; Dai, Hong-Jie
2016-01-01
The rapidly increasing biomedical literature calls for the need of an automatic approach in the recognition and normalization of disease mentions in order to increase the precision and effectivity of disease based information retrieval. A variety of methods have been proposed to deal with the problem of disease named entity recognition and normalization. Among all the proposed methods, conditional random fields (CRFs) and dictionary lookup method are widely used for named entity recognition and normalization respectively. We herein developed a CRF-based model to allow automated recognition of disease mentions, and studied the effect of various techniques in improving the normalization results based on the dictionary lookup approach. The dataset from the BioCreative V CDR track was used to report the performance of the developed normalization methods and compare with other existing dictionary lookup based normalization methods. The best configuration achieved an F-measure of 0.77 for the disease normalization, which outperformed the best dictionary lookup based baseline method studied in this work by an F-measure of 0.13. Database URL: https://github.com/TCRNBioinformatics/DiseaseExtract PMID:27504009
Qu, Jianfeng; Ouyang, Dantong; Hua, Wen; Ye, Yuxin; Li, Ximing
2018-04-01
Distant supervision for neural relation extraction is an efficient approach to extracting massive relations with reference to plain texts. However, the existing neural methods fail to capture the critical words in sentence encoding and meanwhile lack useful sentence information for some positive training instances. To address the above issues, we propose a novel neural relation extraction model. First, we develop a word-level attention mechanism to distinguish the importance of each individual word in a sentence, increasing the attention weights for those critical words. Second, we investigate the semantic information from word embeddings of target entities, which can be developed as a supplementary feature for the extractor. Experimental results show that our model outperforms previous state-of-the-art baselines. Copyright © 2018 Elsevier Ltd. All rights reserved.
Sparse nonnegative matrix factorization with ℓ0-constraints
Peharz, Robert; Pernkopf, Franz
2012-01-01
Although nonnegative matrix factorization (NMF) favors a sparse and part-based representation of nonnegative data, there is no guarantee for this behavior. Several authors proposed NMF methods which enforce sparseness by constraining or penalizing the ℓ1-norm of the factor matrices. On the other hand, little work has been done using a more natural sparseness measure, the ℓ0-pseudo-norm. In this paper, we propose a framework for approximate NMF which constrains the ℓ0-norm of the basis matrix, or the coefficient matrix, respectively. For this purpose, techniques for unconstrained NMF can be easily incorporated, such as multiplicative update rules, or the alternating nonnegative least-squares scheme. In experiments we demonstrate the benefits of our methods, which compare to, or outperform existing approaches. PMID:22505792
Using OpenMP vs. Threading Building Blocks for Medical Imaging on Multi-cores
NASA Astrophysics Data System (ADS)
Kegel, Philipp; Schellmann, Maraike; Gorlatch, Sergei
We compare two parallel programming approaches for multi-core systems: the well-known OpenMP and the recently introduced Threading Building Blocks (TBB) library by Intel®. The comparison is made using the parallelization of a real-world numerical algorithm for medical imaging. We develop several parallel implementations, and compare them w.r.t. programming effort, programming style and abstraction, and runtime performance. We show that TBB requires a considerable program re-design, whereas with OpenMP simple compiler directives are sufficient. While TBB appears to be less appropriate for parallelizing existing implementations, it fosters a good programming style and higher abstraction level for newly developed parallel programs. Our experimental measurements on a dual quad-core system demonstrate that OpenMP slightly outperforms TBB in our implementation.
Deconvolution of mixing time series on a graph
Blocker, Alexander W.; Airoldi, Edoardo M.
2013-01-01
In many applications we are interested in making inference on latent time series from indirect measurements, which are often low-dimensional projections resulting from mixing or aggregation. Positron emission tomography, super-resolution, and network traffic monitoring are some examples. Inference in such settings requires solving a sequence of ill-posed inverse problems, yt = Axt, where the projection mechanism provides information on A. We consider problems in which A specifies mixing on a graph of times series that are bursty and sparse. We develop a multilevel state-space model for mixing times series and an efficient approach to inference. A simple model is used to calibrate regularization parameters that lead to efficient inference in the multilevel state-space model. We apply this method to the problem of estimating point-to-point traffic flows on a network from aggregate measurements. Our solution outperforms existing methods for this problem, and our two-stage approach suggests an efficient inference strategy for multilevel models of multivariate time series. PMID:25309135
Jia, Erik; Chen, Tianlu
2018-01-01
Left-censored missing values commonly exist in targeted metabolomics datasets and can be considered as missing not at random (MNAR). Improper data processing procedures for missing values will cause adverse impacts on subsequent statistical analyses. However, few imputation methods have been developed and applied to the situation of MNAR in the field of metabolomics. Thus, a practical left-censored missing value imputation method is urgently needed. We developed an iterative Gibbs sampler based left-censored missing value imputation approach (GSimp). We compared GSimp with other three imputation methods on two real-world targeted metabolomics datasets and one simulation dataset using our imputation evaluation pipeline. The results show that GSimp outperforms other imputation methods in terms of imputation accuracy, observation distribution, univariate and multivariate analyses, and statistical sensitivity. Additionally, a parallel version of GSimp was developed for dealing with large scale metabolomics datasets. The R code for GSimp, evaluation pipeline, tutorial, real-world and simulated targeted metabolomics datasets are available at: https://github.com/WandeRum/GSimp. PMID:29385130
An Effective Palmprint Recognition Approach for Visible and Multispectral Sensor Images
Sammouda, Rachid; Al-Salman, Abdul Malik; Alsanad, Ahmed
2018-01-01
Among several palmprint feature extraction methods the HOG-based method is attractive and performs well against changes in illumination and shadowing of palmprint images. However, it still lacks the robustness to extract the palmprint features at different rotation angles. To solve this problem, this paper presents a hybrid feature extraction method, named HOG-SGF that combines the histogram of oriented gradients (HOG) with a steerable Gaussian filter (SGF) to develop an effective palmprint recognition approach. The approach starts by processing all palmprint images by David Zhang’s method to segment only the region of interests. Next, we extracted palmprint features based on the hybrid HOG-SGF feature extraction method. Then, an optimized auto-encoder (AE) was utilized to reduce the dimensionality of the extracted features. Finally, a fast and robust regularized extreme learning machine (RELM) was applied for the classification task. In the evaluation phase of the proposed approach, a number of experiments were conducted on three publicly available palmprint databases, namely MS-PolyU of multispectral palmprint images and CASIA and Tongji of contactless palmprint images. Experimentally, the results reveal that the proposed approach outperforms the existing state-of-the-art approaches even when a small number of training samples are used. PMID:29762519
Weakly supervised visual dictionary learning by harnessing image attributes.
Gao, Yue; Ji, Rongrong; Liu, Wei; Dai, Qionghai; Hua, Gang
2014-12-01
Bag-of-features (BoFs) representation has been extensively applied to deal with various computer vision applications. To extract discriminative and descriptive BoF, one important step is to learn a good dictionary to minimize the quantization loss between local features and codewords. While most existing visual dictionary learning approaches are engaged with unsupervised feature quantization, the latest trend has turned to supervised learning by harnessing the semantic labels of images or regions. However, such labels are typically too expensive to acquire, which restricts the scalability of supervised dictionary learning approaches. In this paper, we propose to leverage image attributes to weakly supervise the dictionary learning procedure without requiring any actual labels. As a key contribution, our approach establishes a generative hidden Markov random field (HMRF), which models the quantized codewords as the observed states and the image attributes as the hidden states, respectively. Dictionary learning is then performed by supervised grouping the observed states, where the supervised information is stemmed from the hidden states of the HMRF. In such a way, the proposed dictionary learning approach incorporates the image attributes to learn a semantic-preserving BoF representation without any genuine supervision. Experiments in large-scale image retrieval and classification tasks corroborate that our approach significantly outperforms the state-of-the-art unsupervised dictionary learning approaches.
A study of active learning methods for named entity recognition in clinical text.
Chen, Yukun; Lasko, Thomas A; Mei, Qiaozhu; Denny, Joshua C; Xu, Hua
2015-12-01
Named entity recognition (NER), a sequential labeling task, is one of the fundamental tasks for building clinical natural language processing (NLP) systems. Machine learning (ML) based approaches can achieve good performance, but they often require large amounts of annotated samples, which are expensive to build due to the requirement of domain experts in annotation. Active learning (AL), a sample selection approach integrated with supervised ML, aims to minimize the annotation cost while maximizing the performance of ML-based models. In this study, our goal was to develop and evaluate both existing and new AL methods for a clinical NER task to identify concepts of medical problems, treatments, and lab tests from the clinical notes. Using the annotated NER corpus from the 2010 i2b2/VA NLP challenge that contained 349 clinical documents with 20,423 unique sentences, we simulated AL experiments using a number of existing and novel algorithms in three different categories including uncertainty-based, diversity-based, and baseline sampling strategies. They were compared with the passive learning that uses random sampling. Learning curves that plot performance of the NER model against the estimated annotation cost (based on number of sentences or words in the training set) were generated to evaluate different active learning and the passive learning methods and the area under the learning curve (ALC) score was computed. Based on the learning curves of F-measure vs. number of sentences, uncertainty sampling algorithms outperformed all other methods in ALC. Most diversity-based methods also performed better than random sampling in ALC. To achieve an F-measure of 0.80, the best method based on uncertainty sampling could save 66% annotations in sentences, as compared to random sampling. For the learning curves of F-measure vs. number of words, uncertainty sampling methods again outperformed all other methods in ALC. To achieve 0.80 in F-measure, in comparison to random sampling, the best uncertainty based method saved 42% annotations in words. But the best diversity based method reduced only 7% annotation effort. In the simulated setting, AL methods, particularly uncertainty-sampling based approaches, seemed to significantly save annotation cost for the clinical NER task. The actual benefit of active learning in clinical NER should be further evaluated in a real-time setting. Copyright © 2015 Elsevier Inc. All rights reserved.
Generalized superradiant assembly for nanophotonic thermal emitters
NASA Astrophysics Data System (ADS)
Mallawaarachchi, Sudaraka; Gunapala, Sarath D.; Stockman, Mark I.; Premaratne, Malin
2018-03-01
Superradiance explains the collective enhancement of emission, observed when nanophotonic emitters are arranged within subwavelength proximity and perfect symmetry. Thermal superradiant emitter assemblies with variable photon far-field coupling rates are known to be capable of outperforming their conventional, nonsuperradiant counterparts. However, due to the inability to account for assemblies comprising emitters with various materials and dimensional configurations, existing thermal superradiant models are inadequate and incongruent. In this paper, a generalized thermal superradiant assembly for nanophotonic emitters is developed from first principles. Spectral analysis shows that not only does the proposed model outperform existing models in power delivery, but also portrays unforeseen and startling characteristics during emission. These electromagnetically induced transparency like (EIT-like) and superscattering-like characteristics are reported here for a superradiant assembly, and the effects escalate as the emitters become increasingly disparate. The fact that the EIT-like characteristics are in close agreement with a recent experimental observation involving the superradiant decay of qubits strongly bolsters the validity of the proposed model.
A hybrid approach to estimate the complex motions of clouds in sky images
DOE Office of Scientific and Technical Information (OSTI.GOV)
Peng, Zhenzhou; Yu, Dantong; Huang, Dong
Tracking the motion of clouds is essential to forecasting the weather and to predicting the short-term solar energy generation. Existing techniques mainly fall into two categories: variational optical flow, and block matching. In this article, we summarize recent advances in estimating cloud motion using ground-based sky imagers and quantitatively evaluate state-of-the-art approaches. Then we propose a hybrid tracking framework to incorporate the strength of both block matching and optical flow models. To validate the accuracy of the proposed approach, we introduce a series of synthetic images to simulate the cloud movement and deformation, and thereafter comprehensively compare our hybrid approachmore » with several representative tracking algorithms over both simulated and real images collected from various sites/imagers. The results show that our hybrid approach outperforms state-of-the-art models by reducing at least 30% motion estimation errors compared with the ground-truth motions in most of simulated image sequences. Furthermore, our hybrid model demonstrates its superior efficiency in several real cloud image datasets by lowering at least 15% Mean Absolute Error (MAE) between predicted images and ground-truth images.« less
A hybrid approach to estimate the complex motions of clouds in sky images
Peng, Zhenzhou; Yu, Dantong; Huang, Dong; ...
2016-09-14
Tracking the motion of clouds is essential to forecasting the weather and to predicting the short-term solar energy generation. Existing techniques mainly fall into two categories: variational optical flow, and block matching. In this article, we summarize recent advances in estimating cloud motion using ground-based sky imagers and quantitatively evaluate state-of-the-art approaches. Then we propose a hybrid tracking framework to incorporate the strength of both block matching and optical flow models. To validate the accuracy of the proposed approach, we introduce a series of synthetic images to simulate the cloud movement and deformation, and thereafter comprehensively compare our hybrid approachmore » with several representative tracking algorithms over both simulated and real images collected from various sites/imagers. The results show that our hybrid approach outperforms state-of-the-art models by reducing at least 30% motion estimation errors compared with the ground-truth motions in most of simulated image sequences. Furthermore, our hybrid model demonstrates its superior efficiency in several real cloud image datasets by lowering at least 15% Mean Absolute Error (MAE) between predicted images and ground-truth images.« less
Improved Test Planning and Analysis Through the Use of Advanced Statistical Methods
NASA Technical Reports Server (NTRS)
Green, Lawrence L.; Maxwell, Katherine A.; Glass, David E.; Vaughn, Wallace L.; Barger, Weston; Cook, Mylan
2016-01-01
The goal of this work is, through computational simulations, to provide statistically-based evidence to convince the testing community that a distributed testing approach is superior to a clustered testing approach for most situations. For clustered testing, numerous, repeated test points are acquired at a limited number of test conditions. For distributed testing, only one or a few test points are requested at many different conditions. The statistical techniques of Analysis of Variance (ANOVA), Design of Experiments (DOE) and Response Surface Methods (RSM) are applied to enable distributed test planning, data analysis and test augmentation. The D-Optimal class of DOE is used to plan an optimally efficient single- and multi-factor test. The resulting simulated test data are analyzed via ANOVA and a parametric model is constructed using RSM. Finally, ANOVA can be used to plan a second round of testing to augment the existing data set with new data points. The use of these techniques is demonstrated through several illustrative examples. To date, many thousands of comparisons have been performed and the results strongly support the conclusion that the distributed testing approach outperforms the clustered testing approach.
Body Fat Percentage Prediction Using Intelligent Hybrid Approaches
Shao, Yuehjen E.
2014-01-01
Excess of body fat often leads to obesity. Obesity is typically associated with serious medical diseases, such as cancer, heart disease, and diabetes. Accordingly, knowing the body fat is an extremely important issue since it affects everyone's health. Although there are several ways to measure the body fat percentage (BFP), the accurate methods are often associated with hassle and/or high costs. Traditional single-stage approaches may use certain body measurements or explanatory variables to predict the BFP. Diverging from existing approaches, this study proposes new intelligent hybrid approaches to obtain fewer explanatory variables, and the proposed forecasting models are able to effectively predict the BFP. The proposed hybrid models consist of multiple regression (MR), artificial neural network (ANN), multivariate adaptive regression splines (MARS), and support vector regression (SVR) techniques. The first stage of the modeling includes the use of MR and MARS to obtain fewer but more important sets of explanatory variables. In the second stage, the remaining important variables are served as inputs for the other forecasting methods. A real dataset was used to demonstrate the development of the proposed hybrid models. The prediction results revealed that the proposed hybrid schemes outperformed the typical, single-stage forecasting models. PMID:24723804
VanderKraats, Nathan D.; Hiken, Jeffrey F.; Decker, Keith F.; Edwards, John R.
2013-01-01
Methylation of the CpG-rich region (CpG island) overlapping a gene’s promoter is a generally accepted mechanism for silencing expression. While recent technological advances have enabled measurement of DNA methylation and expression changes genome-wide, only modest correlations between differential methylation at gene promoters and expression have been found. We hypothesize that stronger associations are not observed because existing analysis methods oversimplify their representation of the data and do not capture the diversity of existing methylation patterns. Recently, other patterns such as CpG island shore methylation and long partially hypomethylated domains have also been linked with gene silencing. Here, we detail a new approach for discovering differential methylation patterns associated with expression change using genome-wide high-resolution methylation data: we represent differential methylation as an interpolated curve, or signature, and then identify groups of genes with similarly shaped signatures and corresponding expression changes. Our technique uncovers a diverse set of patterns that are conserved across embryonic stem cell and cancer data sets. Overall, we find strong associations between these methylation patterns and expression. We further show that an extension of our method also outperforms other approaches by generating a longer list of genes with higher quality associations between differential methylation and expression. PMID:23748561
Distance measurement based on light field geometry and ray tracing.
Chen, Yanqin; Jin, Xin; Dai, Qionghai
2017-01-09
In this paper, we propose a geometric optical model to measure the distances of object planes in a light field image. The proposed geometric optical model is composed of two sub-models based on ray tracing: object space model and image space model. The two theoretic sub-models are derived on account of on-axis point light sources. In object space model, light rays propagate into the main lens and refract inside it following the refraction theorem. In image space model, light rays exit from emission positions on the main lens and subsequently impinge on the image sensor with different imaging diameters. The relationships between imaging diameters of objects and their corresponding emission positions on the main lens are investigated through utilizing refocusing and similar triangle principle. By combining the two sub-models together and tracing light rays back to the object space, the relationships between objects' imaging diameters and corresponding distances of object planes are figured out. The performance of the proposed geometric optical model is compared with existing approaches using different configurations of hand-held plenoptic 1.0 cameras and real experiments are conducted using a preliminary imaging system. Results demonstrate that the proposed model can outperform existing approaches in terms of accuracy and exhibits good performance at general imaging range.
Manifold Regularized Experimental Design for Active Learning.
Zhang, Lining; Shum, Hubert P H; Shao, Ling
2016-12-02
Various machine learning and data mining tasks in classification require abundant data samples to be labeled for training. Conventional active learning methods aim at labeling the most informative samples for alleviating the labor of the user. Many previous studies in active learning select one sample after another in a greedy manner. However, this is not very effective because the classification models has to be retrained for each newly labeled sample. Moreover, many popular active learning approaches utilize the most uncertain samples by leveraging the classification hyperplane of the classifier, which is not appropriate since the classification hyperplane is inaccurate when the training data are small-sized. The problem of insufficient training data in real-world systems limits the potential applications of these approaches. This paper presents a novel method of active learning called manifold regularized experimental design (MRED), which can label multiple informative samples at one time for training. In addition, MRED gives an explicit geometric explanation for the selected samples to be labeled by the user. Different from existing active learning methods, our method avoids the intrinsic problems caused by insufficiently labeled samples in real-world applications. Various experiments on synthetic datasets, the Yale face database and the Corel image database have been carried out to show how MRED outperforms existing methods.
Mujtaba, Ghulam; Shuib, Liyana; Raj, Ram Gopal; Rajandram, Retnagowri; Shaikh, Khairunisa
2018-07-01
Automatic text classification techniques are useful for classifying plaintext medical documents. This study aims to automatically predict the cause of death from free text forensic autopsy reports by comparing various schemes for feature extraction, term weighing or feature value representation, text classification, and feature reduction. For experiments, the autopsy reports belonging to eight different causes of death were collected, preprocessed and converted into 43 master feature vectors using various schemes for feature extraction, representation, and reduction. The six different text classification techniques were applied on these 43 master feature vectors to construct a classification model that can predict the cause of death. Finally, classification model performance was evaluated using four performance measures i.e. overall accuracy, macro precision, macro-F-measure, and macro recall. From experiments, it was found that that unigram features obtained the highest performance compared to bigram, trigram, and hybrid-gram features. Furthermore, in feature representation schemes, term frequency, and term frequency with inverse document frequency obtained similar and better results when compared with binary frequency, and normalized term frequency with inverse document frequency. Furthermore, the chi-square feature reduction approach outperformed Pearson correlation, and information gain approaches. Finally, in text classification algorithms, support vector machine classifier outperforms random forest, Naive Bayes, k-nearest neighbor, decision tree, and ensemble-voted classifier. Our results and comparisons hold practical importance and serve as references for future works. Moreover, the comparison outputs will act as state-of-art techniques to compare future proposals with existing automated text classification techniques. Copyright © 2017 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
Gene set analysis using variance component tests.
Huang, Yen-Tsung; Lin, Xihong
2013-06-28
Gene set analyses have become increasingly important in genomic research, as many complex diseases are contributed jointly by alterations of numerous genes. Genes often coordinate together as a functional repertoire, e.g., a biological pathway/network and are highly correlated. However, most of the existing gene set analysis methods do not fully account for the correlation among the genes. Here we propose to tackle this important feature of a gene set to improve statistical power in gene set analyses. We propose to model the effects of an independent variable, e.g., exposure/biological status (yes/no), on multiple gene expression values in a gene set using a multivariate linear regression model, where the correlation among the genes is explicitly modeled using a working covariance matrix. We develop TEGS (Test for the Effect of a Gene Set), a variance component test for the gene set effects by assuming a common distribution for regression coefficients in multivariate linear regression models, and calculate the p-values using permutation and a scaled chi-square approximation. We show using simulations that type I error is protected under different choices of working covariance matrices and power is improved as the working covariance approaches the true covariance. The global test is a special case of TEGS when correlation among genes in a gene set is ignored. Using both simulation data and a published diabetes dataset, we show that our test outperforms the commonly used approaches, the global test and gene set enrichment analysis (GSEA). We develop a gene set analyses method (TEGS) under the multivariate regression framework, which directly models the interdependence of the expression values in a gene set using a working covariance. TEGS outperforms two widely used methods, GSEA and global test in both simulation and a diabetes microarray data.
Estimating Demand for Industrial and Commercial Land Use Given Economic Forecasts
Batista e Silva, Filipe; Koomen, Eric; Diogo, Vasco; Lavalle, Carlo
2014-01-01
Current developments in the field of land use modelling point towards greater level of spatial and thematic resolution and the possibility to model large geographical extents. Improvements are taking place as computational capabilities increase and socioeconomic and environmental data are produced with sufficient detail. Integrated approaches to land use modelling rely on the development of interfaces with specialized models from fields like economy, hydrology, and agriculture. Impact assessment of scenarios/policies at various geographical scales can particularly benefit from these advances. A comprehensive land use modelling framework includes necessarily both the estimation of the quantity and the spatial allocation of land uses within a given timeframe. In this paper, we seek to establish straightforward methods to estimate demand for industrial and commercial land uses that can be used in the context of land use modelling, in particular for applications at continental scale, where the unavailability of data is often a major constraint. We propose a set of approaches based on ‘land use intensity’ measures indicating the amount of economic output per existing areal unit of land use. A base model was designed to estimate land demand based on regional-specific land use intensities; in addition, variants accounting for sectoral differences in land use intensity were introduced. A validation was carried out for a set of European countries by estimating land use for 2006 and comparing it to observations. The models’ results were compared with estimations generated using the ‘null model’ (no land use change) and simple trend extrapolations. Results indicate that the proposed approaches clearly outperformed the ‘null model’, but did not consistently outperform the linear extrapolation. An uncertainty analysis further revealed that the models’ performances are particularly sensitive to the quality of the input land use data. In addition, unknown future trends of regional land use intensity widen considerably the uncertainty bands of the predictions. PMID:24647587
Chow, Sy-Miin; Bendezú, Jason J.; Cole, Pamela M.; Ram, Nilam
2016-01-01
Several approaches currently exist for estimating the derivatives of observed data for model exploration purposes, including functional data analysis (FDA), generalized local linear approximation (GLLA), and generalized orthogonal local derivative approximation (GOLD). These derivative estimation procedures can be used in a two-stage process to fit mixed effects ordinary differential equation (ODE) models. While the performance and utility of these routines for estimating linear ODEs have been established, they have not yet been evaluated in the context of nonlinear ODEs with mixed effects. We compared properties of the GLLA and GOLD to an FDA-based two-stage approach denoted herein as functional ordinary differential equation with mixed effects (FODEmixed) in a Monte Carlo study using a nonlinear coupled oscillators model with mixed effects. Simulation results showed that overall, the FODEmixed outperformed both the GLLA and GOLD across all the embedding dimensions considered, but a novel use of a fourth-order GLLA approach combined with very high embedding dimensions yielded estimation results that almost paralleled those from the FODEmixed. We discuss the strengths and limitations of each approach and demonstrate how output from each stage of FODEmixed may be used to inform empirical modeling of young children’s self-regulation. PMID:27391255
Chow, Sy-Miin; Bendezú, Jason J; Cole, Pamela M; Ram, Nilam
2016-01-01
Several approaches exist for estimating the derivatives of observed data for model exploration purposes, including functional data analysis (FDA; Ramsay & Silverman, 2005 ), generalized local linear approximation (GLLA; Boker, Deboeck, Edler, & Peel, 2010 ), and generalized orthogonal local derivative approximation (GOLD; Deboeck, 2010 ). These derivative estimation procedures can be used in a two-stage process to fit mixed effects ordinary differential equation (ODE) models. While the performance and utility of these routines for estimating linear ODEs have been established, they have not yet been evaluated in the context of nonlinear ODEs with mixed effects. We compared properties of the GLLA and GOLD to an FDA-based two-stage approach denoted herein as functional ordinary differential equation with mixed effects (FODEmixed) in a Monte Carlo (MC) study using a nonlinear coupled oscillators model with mixed effects. Simulation results showed that overall, the FODEmixed outperformed both the GLLA and GOLD across all the embedding dimensions considered, but a novel use of a fourth-order GLLA approach combined with very high embedding dimensions yielded estimation results that almost paralleled those from the FODEmixed. We discuss the strengths and limitations of each approach and demonstrate how output from each stage of FODEmixed may be used to inform empirical modeling of young children's self-regulation.
Retinal artery-vein classification via topology estimation
Estrada, Rolando; Allingham, Michael J.; Mettu, Priyatham S.; Cousins, Scott W.; Tomasi, Carlo; Farsiu, Sina
2015-01-01
We propose a novel, graph-theoretic framework for distinguishing arteries from veins in a fundus image. We make use of the underlying vessel topology to better classify small and midsized vessels. We extend our previously proposed tree topology estimation framework by incorporating expert, domain-specific features to construct a simple, yet powerful global likelihood model. We efficiently maximize this model by iteratively exploring the space of possible solutions consistent with the projected vessels. We tested our method on four retinal datasets and achieved classification accuracies of 91.0%, 93.5%, 91.7%, and 90.9%, outperforming existing methods. Our results show the effectiveness of our approach, which is capable of analyzing the entire vasculature, including peripheral vessels, in wide field-of-view fundus photographs. This topology-based method is a potentially important tool for diagnosing diseases with retinal vascular manifestation. PMID:26068204
NASA Astrophysics Data System (ADS)
Ariffin, Syaiba Balqish; Midi, Habshah
2014-06-01
This article is concerned with the performance of logistic ridge regression estimation technique in the presence of multicollinearity and high leverage points. In logistic regression, multicollinearity exists among predictors and in the information matrix. The maximum likelihood estimator suffers a huge setback in the presence of multicollinearity which cause regression estimates to have unduly large standard errors. To remedy this problem, a logistic ridge regression estimator is put forward. It is evident that the logistic ridge regression estimator outperforms the maximum likelihood approach for handling multicollinearity. The effect of high leverage points are then investigated on the performance of the logistic ridge regression estimator through real data set and simulation study. The findings signify that logistic ridge regression estimator fails to provide better parameter estimates in the presence of both high leverage points and multicollinearity.
Pose estimation for augmented reality applications using genetic algorithm.
Yu, Ying Kin; Wong, Kin Hong; Chang, Michael Ming Yuen
2005-12-01
This paper describes a genetic algorithm that tackles the pose-estimation problem in computer vision. Our genetic algorithm can find the rotation and translation of an object accurately when the three-dimensional structure of the object is given. In our implementation, each chromosome encodes both the pose and the indexes to the selected point features of the object. Instead of only searching for the pose as in the existing work, our algorithm, at the same time, searches for a set containing the most reliable feature points in the process. This mismatch filtering strategy successfully makes the algorithm more robust under the presence of point mismatches and outliers in the images. Our algorithm has been tested with both synthetic and real data with good results. The accuracy of the recovered pose is compared to the existing algorithms. Our approach outperformed the Lowe's method and the other two genetic algorithms under the presence of point mismatches and outliers. In addition, it has been used to estimate the pose of a real object. It is shown that the proposed method is applicable to augmented reality applications.
Respiratory Artefact Removal in Forced Oscillation Measurements: A Machine Learning Approach.
Pham, Thuy T; Thamrin, Cindy; Robinson, Paul D; McEwan, Alistair L; Leong, Philip H W
2017-08-01
Respiratory artefact removal for the forced oscillation technique can be treated as an anomaly detection problem. Manual removal is currently considered the gold standard, but this approach is laborious and subjective. Most existing automated techniques used simple statistics and/or rejected anomalous data points. Unfortunately, simple statistics are insensitive to numerous artefacts, leading to low reproducibility of results. Furthermore, rejecting anomalous data points causes an imbalance between the inspiratory and expiratory contributions. From a machine learning perspective, such methods are unsupervised and can be considered simple feature extraction. We hypothesize that supervised techniques can be used to find improved features that are more discriminative and more highly correlated with the desired output. Features thus found are then used for anomaly detection by applying quartile thresholding, which rejects complete breaths if one of its features is out of range. The thresholds are determined by both saliency and performance metrics rather than qualitative assumptions as in previous works. Feature ranking indicates that our new landmark features are among the highest scoring candidates regardless of age across saliency criteria. F1-scores, receiver operating characteristic, and variability of the mean resistance metrics show that the proposed scheme outperforms previous simple feature extraction approaches. Our subject-independent detector, 1IQR-SU, demonstrated approval rates of 80.6% for adults and 98% for children, higher than existing methods. Our new features are more relevant. Our removal is objective and comparable to the manual method. This is a critical work to automate forced oscillation technique quality control.
Transductive multi-view zero-shot learning.
Fu, Yanwei; Hospedales, Timothy M; Xiang, Tao; Gong, Shaogang
2015-11-01
Most existing zero-shot learning approaches exploit transfer learning via an intermediate semantic representation shared between an annotated auxiliary dataset and a target dataset with different classes and no annotation. A projection from a low-level feature space to the semantic representation space is learned from the auxiliary dataset and applied without adaptation to the target dataset. In this paper we identify two inherent limitations with these approaches. First, due to having disjoint and potentially unrelated classes, the projection functions learned from the auxiliary dataset/domain are biased when applied directly to the target dataset/domain. We call this problem the projection domain shift problem and propose a novel framework, transductive multi-view embedding, to solve it. The second limitation is the prototype sparsity problem which refers to the fact that for each target class, only a single prototype is available for zero-shot learning given a semantic representation. To overcome this problem, a novel heterogeneous multi-view hypergraph label propagation method is formulated for zero-shot learning in the transductive embedding space. It effectively exploits the complementary information offered by different semantic representations and takes advantage of the manifold structures of multiple representation spaces in a coherent manner. We demonstrate through extensive experiments that the proposed approach (1) rectifies the projection shift between the auxiliary and target domains, (2) exploits the complementarity of multiple semantic representations, (3) significantly outperforms existing methods for both zero-shot and N-shot recognition on three image and video benchmark datasets, and (4) enables novel cross-view annotation tasks.
Butch-Femme Identity and Visuospatial Performance Among Lesbian and Bisexual Women in China.
Zheng, Lijun; Wen, Guangju; Zheng, Yong
2018-05-01
Lesbian and bisexual women who self-identify as "butch" show a masculine profile with regard to gender roles, gender nonconformity, and systemizing cognitive style, whereas lesbian and bisexual women who self-identify as "femme" show a corresponding feminine profile and those who self-identify as "androgynes" show an intermediate profile. This study examined the association between butch or femme lesbian or bisexual identity and visuospatial ability among 323 lesbian and bisexual women, compared to heterosexual women (n = 207) and men (n = 125), from multiple cities in China. Visuospatial ability was assessed using a Shepard and Metzler-type mental rotation task and Judgment of Line Angle and Position (JLAP) test on the Internet. Heterosexual men outperformed heterosexual women on both mental rotation and JLAP tasks. Lesbian and bisexual women outperformed heterosexual women on mental rotation, but not on JLAP. There were significant differences in mental rotation performance among women, with butch- and androgyne-identified lesbian/bisexual women outperforming femme-identified and heterosexual women. There were also significant differences in JLAP performance among women, with butch- and androgyne-identified lesbian/bisexual women and heterosexual women outperforming femme-identified lesbian/bisexual women. The butch-femme differences in visuospatial ability indicated an association between cognitive ability and butch-femme identity and suggest that neurobiological underpinnings may contribute to butch-femme identity although alternative explanations exist.
Assessing Mediational Models: Testing and Interval Estimation for Indirect Effects.
Biesanz, Jeremy C; Falk, Carl F; Savalei, Victoria
2010-08-06
Theoretical models specifying indirect or mediated effects are common in the social sciences. An indirect effect exists when an independent variable's influence on the dependent variable is mediated through an intervening variable. Classic approaches to assessing such mediational hypotheses ( Baron & Kenny, 1986 ; Sobel, 1982 ) have in recent years been supplemented by computationally intensive methods such as bootstrapping, the distribution of the product methods, and hierarchical Bayesian Markov chain Monte Carlo (MCMC) methods. These different approaches for assessing mediation are illustrated using data from Dunn, Biesanz, Human, and Finn (2007). However, little is known about how these methods perform relative to each other, particularly in more challenging situations, such as with data that are incomplete and/or nonnormal. This article presents an extensive Monte Carlo simulation evaluating a host of approaches for assessing mediation. We examine Type I error rates, power, and coverage. We study normal and nonnormal data as well as complete and incomplete data. In addition, we adapt a method, recently proposed in statistical literature, that does not rely on confidence intervals (CIs) to test the null hypothesis of no indirect effect. The results suggest that the new inferential method-the partial posterior p value-slightly outperforms existing ones in terms of maintaining Type I error rates while maximizing power, especially with incomplete data. Among confidence interval approaches, the bias-corrected accelerated (BC a ) bootstrapping approach often has inflated Type I error rates and inconsistent coverage and is not recommended; In contrast, the bootstrapped percentile confidence interval and the hierarchical Bayesian MCMC method perform best overall, maintaining Type I error rates, exhibiting reasonable power, and producing stable and accurate coverage rates.
Affinity learning with diffusion on tensor product graph.
Yang, Xingwei; Prasad, Lakshman; Latecki, Longin Jan
2013-01-01
In many applications, we are given a finite set of data points sampled from a data manifold and represented as a graph with edge weights determined by pairwise similarities of the samples. Often the pairwise similarities (which are also called affinities) are unreliable due to noise or due to intrinsic difficulties in estimating similarity values of the samples. As observed in several recent approaches, more reliable similarities can be obtained if the original similarities are diffused in the context of other data points, where the context of each point is a set of points most similar to it. Compared to the existing methods, our approach differs in two main aspects. First, instead of diffusing the similarity information on the original graph, we propose to utilize the tensor product graph (TPG) obtained by the tensor product of the original graph with itself. Since TPG takes into account higher order information, it is not a surprise that we obtain more reliable similarities. However, it comes at the price of higher order computational complexity and storage requirement. The key contribution of the proposed approach is that the information propagation on TPG can be computed with the same computational complexity and the same amount of storage as the propagation on the original graph. We prove that a graph diffusion process on TPG is equivalent to a novel iterative algorithm on the original graph, which is guaranteed to converge. After its convergence we obtain new edge weights that can be interpreted as new, learned affinities. We stress that the affinities are learned in an unsupervised setting. We illustrate the benefits of the proposed approach for data manifolds composed of shapes, images, and image patches on two very different tasks of image retrieval and image segmentation. With learned affinities, we achieve the bull's eye retrieval score of 99.99 percent on the MPEG-7 shape dataset, which is much higher than the state-of-the-art algorithms. When the data- points are image patches, the NCut with the learned affinities not only significantly outperforms the NCut with the original affinities, but it also outperforms state-of-the-art image segmentation methods.
Comparison between goal programming and cointegration approaches in enhanced index tracking
NASA Astrophysics Data System (ADS)
Lam, Weng Siew; Jamaan, Saiful Hafizah Hj.
2013-04-01
Index tracking is a popular form of passive fund management in stock market. Passive management is a buy-and-hold strategy that aims to achieve rate of return similar to the market return. Index tracking problem is a problem of reproducing the performance of a stock market index, without purchasing all of the stocks that make up the index. This can be done by establishing an optimal portfolio that minimizes risk or tracking error. An improved index tracking (enhanced index tracking) is a dual-objective optimization problem, a trade-off between maximizing the mean return and minimizing the tracking error. Enhanced index tracking aims to generate excess return over the return achieved by the index. The objective of this study is to compare the portfolio compositions and performances by using two different approaches in enhanced index tracking problem, which are goal programming and cointegration. The result of this study shows that the optimal portfolios for both approaches are able to outperform the Malaysia market index which is Kuala Lumpur Composite Index. Both approaches give different optimal portfolio compositions. Besides, the cointegration approach outperforms the goal programming approach because the cointegration approach gives higher mean return and lower risk or tracking error. Therefore, the cointegration approach is more appropriate for the investors in Malaysia.
NASA Astrophysics Data System (ADS)
Dubuque, Shaun; Coffman, Thayne; McCarley, Paul; Bovik, A. C.; Thomas, C. William
2009-05-01
Foveated imaging has been explored for compression and tele-presence, but gaps exist in the study of foveated imaging applied to acquisition and tracking systems. Results are presented from two sets of experiments comparing simple foveated and uniform resolution targeting (acquisition and tracking) algorithms. The first experiments measure acquisition performance when locating Gabor wavelet targets in noise, with fovea placement driven by a mutual information measure. The foveated approach is shown to have lower detection delay than a notional uniform resolution approach when using video that consumes equivalent bandwidth. The second experiments compare the accuracy of target position estimates from foveated and uniform resolution tracking algorithms. A technique is developed to select foveation parameters that minimize error in Kalman filter state estimates. Foveated tracking is shown to consistently outperform uniform resolution tracking on an abstract multiple target task when using video that consumes equivalent bandwidth. Performance is also compared to uniform resolution processing without bandwidth limitations. In both experiments, superior performance is achieved at a given bandwidth by foveated processing because limited resources are allocated intelligently to maximize operational performance. These findings indicate the potential for operational performance improvements over uniform resolution systems in both acquisition and tracking tasks.
Solving multi-objective optimization problems in conservation with the reference point method
Dujardin, Yann; Chadès, Iadine
2018-01-01
Managing the biodiversity extinction crisis requires wise decision-making processes able to account for the limited resources available. In most decision problems in conservation biology, several conflicting objectives have to be taken into account. Most methods used in conservation either provide suboptimal solutions or use strong assumptions about the decision-maker’s preferences. Our paper reviews some of the existing approaches to solve multi-objective decision problems and presents new multi-objective linear programming formulations of two multi-objective optimization problems in conservation, allowing the use of a reference point approach. Reference point approaches solve multi-objective optimization problems by interactively representing the preferences of the decision-maker with a point in the criteria (objectives) space, called the reference point. We modelled and solved the following two problems in conservation: a dynamic multi-species management problem under uncertainty and a spatial allocation resource management problem. Results show that the reference point method outperforms classic methods while illustrating the use of an interactive methodology for solving combinatorial problems with multiple objectives. The method is general and can be adapted to a wide range of ecological combinatorial problems. PMID:29293650
Jonnagaddala, Jitendra; Jue, Toni Rose; Chang, Nai-Wen; Dai, Hong-Jie
2016-01-01
The rapidly increasing biomedical literature calls for the need of an automatic approach in the recognition and normalization of disease mentions in order to increase the precision and effectivity of disease based information retrieval. A variety of methods have been proposed to deal with the problem of disease named entity recognition and normalization. Among all the proposed methods, conditional random fields (CRFs) and dictionary lookup method are widely used for named entity recognition and normalization respectively. We herein developed a CRF-based model to allow automated recognition of disease mentions, and studied the effect of various techniques in improving the normalization results based on the dictionary lookup approach. The dataset from the BioCreative V CDR track was used to report the performance of the developed normalization methods and compare with other existing dictionary lookup based normalization methods. The best configuration achieved an F-measure of 0.77 for the disease normalization, which outperformed the best dictionary lookup based baseline method studied in this work by an F-measure of 0.13.Database URL: https://github.com/TCRNBioinformatics/DiseaseExtract. © The Author(s) 2016. Published by Oxford University Press.
The Achievement Crisis Is Real: A Review of "The Manufactured Crisis."
ERIC Educational Resources Information Center
Stedman, Lawrence C.
1996-01-01
In "The Manufactured Crisis," D. Berliner and B. Biddle argue that there has been no decline in achievement test scores, that today's students outperform their parents and do well in international examinations, and that the supposed crisis in American education does not exist. This review refutes all these claims. (SLD)
ERIC Educational Resources Information Center
Senko, Corwin; Dawson, Blair
2017-01-01
Achievement goal theory originally defined performance-approach goals as striving to demonstrate competence to outsiders by outperforming peers. The research, however, has operationalized the goals inconsistently, emphasizing the competence demonstration element in some cases and the peer comparison element in others. A meta-analysis by Hulleman…
ERIC Educational Resources Information Center
Jensen, Ben; Farmer, Joanna
2013-01-01
Public-school students in the world's largest city, Shanghai, China, are academically outperforming their counterparts across the globe and becoming the talk and envy of education experts worldwide. Using an innovative partnering approach that matches successful schools with low-performing schools, Shanghai has valuable lessons to teach on turning…
Chaos theory perspective for industry clusters development
NASA Astrophysics Data System (ADS)
Yu, Haiying; Jiang, Minghui; Li, Chengzhang
2016-03-01
Industry clusters have outperformed in economic development in most developing countries. The contributions of industrial clusters have been recognized as promotion of regional business and the alleviation of economic and social costs. It is no doubt globalization is rendering clusters in accelerating the competitiveness of economic activities. In accordance, many ideas and concepts involve in illustrating evolution tendency, stimulating the clusters development, meanwhile, avoiding industrial clusters recession. The term chaos theory is introduced to explain inherent relationship of features within industry clusters. A preferred life cycle approach is proposed for industrial cluster recessive theory analysis. Lyapunov exponents and Wolf model are presented for chaotic identification and examination. A case study of Tianjin, China has verified the model effectiveness. The investigations indicate that the approaches outperform in explaining chaos properties in industrial clusters, which demonstrates industrial clusters evolution, solves empirical issues and generates corresponding strategies.
Bayesian source term estimation of atmospheric releases in urban areas using LES approach.
Xue, Fei; Kikumoto, Hideki; Li, Xiaofeng; Ooka, Ryozo
2018-05-05
The estimation of source information from limited measurements of a sensor network is a challenging inverse problem, which can be viewed as an assimilation process of the observed concentration data and the predicted concentration data. When dealing with releases in built-up areas, the predicted data are generally obtained by the Reynolds-averaged Navier-Stokes (RANS) equations, which yields building-resolving results; however, RANS-based models are outperformed by large-eddy simulation (LES) in the predictions of both airflow and dispersion. Therefore, it is important to explore the possibility of improving the estimation of the source parameters by using the LES approach. In this paper, a novel source term estimation method is proposed based on LES approach using Bayesian inference. The source-receptor relationship is obtained by solving the adjoint equations constructed using the time-averaged flow field simulated by the LES approach based on the gradient diffusion hypothesis. A wind tunnel experiment with a constant point source downwind of a single building model is used to evaluate the performance of the proposed method, which is compared with that of the existing method using a RANS model. The results show that the proposed method reduces the errors of source location and releasing strength by 77% and 28%, respectively. Copyright © 2018 Elsevier B.V. All rights reserved.
YamiPred: A Novel Evolutionary Method for Predicting Pre-miRNAs and Selecting Relevant Features.
Kleftogiannis, Dimitrios; Theofilatos, Konstantinos; Likothanassis, Spiros; Mavroudi, Seferina
2015-01-01
MicroRNAs (miRNAs) are small non-coding RNAs, which play a significant role in gene regulation. Predicting miRNA genes is a challenging bioinformatics problem and existing experimental and computational methods fail to deal with it effectively. We developed YamiPred, an embedded classification method that combines the efficiency and robustness of support vector machines (SVM) with genetic algorithms (GA) for feature selection and parameters optimization. YamiPred was tested in a new and realistic human dataset and was compared with state-of-the-art computational intelligence approaches and the prevalent SVM-based tools for miRNA prediction. Experimental results indicate that YamiPred outperforms existing approaches in terms of accuracy and of geometric mean of sensitivity and specificity. The embedded feature selection component selects a compact feature subset that contributes to the performance optimization. Further experimentation with this minimal feature subset has achieved very high classification performance and revealed the minimum number of samples required for developing a robust predictor. YamiPred also confirmed the important role of commonly used features such as entropy and enthalpy, and uncovered the significance of newly introduced features, such as %A-U aggregate nucleotide frequency and positional entropy. The best model trained on human data has successfully predicted pre-miRNAs to other organisms including the category of viruses.
Mousavi Kahaki, Seyed Mostafa; Nordin, Md Jan; Ashtari, Amir H.; J. Zahra, Sophia
2016-01-01
An invariant feature matching method is proposed as a spatially invariant feature matching approach. Deformation effects, such as affine and homography, change the local information within the image and can result in ambiguous local information pertaining to image points. New method based on dissimilarity values, which measures the dissimilarity of the features through the path based on Eigenvector properties, is proposed. Evidence shows that existing matching techniques using similarity metrics—such as normalized cross-correlation, squared sum of intensity differences and correlation coefficient—are insufficient for achieving adequate results under different image deformations. Thus, new descriptor’s similarity metrics based on normalized Eigenvector correlation and signal directional differences, which are robust under local variation of the image information, are proposed to establish an efficient feature matching technique. The method proposed in this study measures the dissimilarity in the signal frequency along the path between two features. Moreover, these dissimilarity values are accumulated in a 2D dissimilarity space, allowing accurate corresponding features to be extracted based on the cumulative space using a voting strategy. This method can be used in image registration applications, as it overcomes the limitations of the existing approaches. The output results demonstrate that the proposed technique outperforms the other methods when evaluated using a standard dataset, in terms of precision-recall and corner correspondence. PMID:26985996
Semi-nonparametric VaR forecasts for hedge funds during the recent crisis
NASA Astrophysics Data System (ADS)
Del Brio, Esther B.; Mora-Valencia, Andrés; Perote, Javier
2014-05-01
The need to provide accurate value-at-risk (VaR) forecasting measures has triggered an important literature in econophysics. Although these accurate VaR models and methodologies are particularly demanded for hedge fund managers, there exist few articles specifically devoted to implement new techniques in hedge fund returns VaR forecasting. This article advances in these issues by comparing the performance of risk measures based on parametric distributions (the normal, Student’s t and skewed-t), semi-nonparametric (SNP) methodologies based on Gram-Charlier (GC) series and the extreme value theory (EVT) approach. Our results show that normal-, Student’s t- and Skewed t- based methodologies fail to forecast hedge fund VaR, whilst SNP and EVT approaches accurately success on it. We extend these results to the multivariate framework by providing an explicit formula for the GC copula and its density that encompasses the Gaussian copula and accounts for non-linear dependences. We show that the VaR obtained by the meta GC accurately captures portfolio risk and outperforms regulatory VaR estimates obtained through the meta Gaussian and Student’s t distributions.
Link Prediction in Evolving Networks Based on Popularity of Nodes.
Wang, Tong; He, Xing-Sheng; Zhou, Ming-Yang; Fu, Zhong-Qian
2017-08-02
Link prediction aims to uncover the underlying relationship behind networks, which could be utilized to predict missing edges or identify the spurious edges. The key issue of link prediction is to estimate the likelihood of potential links in networks. Most classical static-structure based methods ignore the temporal aspects of networks, limited by the time-varying features, such approaches perform poorly in evolving networks. In this paper, we propose a hypothesis that the ability of each node to attract links depends not only on its structural importance, but also on its current popularity (activeness), since active nodes have much more probability to attract future links. Then a novel approach named popularity based structural perturbation method (PBSPM) and its fast algorithm are proposed to characterize the likelihood of an edge from both existing connectivity structure and current popularity of its two endpoints. Experiments on six evolving networks show that the proposed methods outperform state-of-the-art methods in accuracy and robustness. Besides, visual results and statistical analysis reveal that the proposed methods are inclined to predict future edges between active nodes, rather than edges between inactive nodes.
Approach to design neural cryptography: a generalized architecture and a heuristic rule.
Mu, Nankun; Liao, Xiaofeng; Huang, Tingwen
2013-06-01
Neural cryptography, a type of public key exchange protocol, is widely considered as an effective method for sharing a common secret key between two neural networks on public channels. How to design neural cryptography remains a great challenge. In this paper, in order to provide an approach to solve this challenge, a generalized network architecture and a significant heuristic rule are designed. The proposed generic framework is named as tree state classification machine (TSCM), which extends and unifies the existing structures, i.e., tree parity machine (TPM) and tree committee machine (TCM). Furthermore, we carefully study and find that the heuristic rule can improve the security of TSCM-based neural cryptography. Therefore, TSCM and the heuristic rule can guide us to designing a great deal of effective neural cryptography candidates, in which it is possible to achieve the more secure instances. Significantly, in the light of TSCM and the heuristic rule, we further expound that our designed neural cryptography outperforms TPM (the most secure model at present) on security. Finally, a series of numerical simulation experiments are provided to verify validity and applicability of our results.
Akita, Yasuyuki; Baldasano, Jose M; Beelen, Rob; Cirach, Marta; de Hoogh, Kees; Hoek, Gerard; Nieuwenhuijsen, Mark; Serre, Marc L; de Nazelle, Audrey
2014-04-15
In recognition that intraurban exposure gradients may be as large as between-city variations, recent air pollution epidemiologic studies have become increasingly interested in capturing within-city exposure gradients. In addition, because of the rapidly accumulating health data, recent studies also need to handle large study populations distributed over large geographic domains. Even though several modeling approaches have been introduced, a consistent modeling framework capturing within-city exposure variability and applicable to large geographic domains is still missing. To address these needs, we proposed a modeling framework based on the Bayesian Maximum Entropy method that integrates monitoring data and outputs from existing air quality models based on Land Use Regression (LUR) and Chemical Transport Models (CTM). The framework was applied to estimate the yearly average NO2 concentrations over the region of Catalunya in Spain. By jointly accounting for the global scale variability in the concentration from the output of CTM and the intraurban scale variability through LUR model output, the proposed framework outperformed more conventional approaches.
Adaptive hidden Markov model with anomaly States for price manipulation detection.
Cao, Yi; Li, Yuhua; Coleman, Sonya; Belatreche, Ammar; McGinnity, Thomas Martin
2015-02-01
Price manipulation refers to the activities of those traders who use carefully designed trading behaviors to manually push up or down the underlying equity prices for making profits. With increasing volumes and frequency of trading, price manipulation can be extremely damaging to the proper functioning and integrity of capital markets. The existing literature focuses on either empirical studies of market abuse cases or analysis of particular manipulation types based on certain assumptions. Effective approaches for analyzing and detecting price manipulation in real time are yet to be developed. This paper proposes a novel approach, called adaptive hidden Markov model with anomaly states (AHMMAS) for modeling and detecting price manipulation activities. Together with wavelet transformations and gradients as the feature extraction methods, the AHMMAS model caters to price manipulation detection and basic manipulation type recognition. The evaluation experiments conducted on seven stock tick data from NASDAQ and the London Stock Exchange and 10 simulated stock prices by stochastic differential equation show that the proposed AHMMAS model can effectively detect price manipulation patterns and outperforms the selected benchmark models.
Cohen, Michael X; Gulbinaite, Rasa
2017-02-15
Steady-state evoked potentials (SSEPs) are rhythmic brain responses to rhythmic sensory stimulation, and are often used to study perceptual and attentional processes. We present a data analysis method for maximizing the signal-to-noise ratio of the narrow-band steady-state response in the frequency and time-frequency domains. The method, termed rhythmic entrainment source separation (RESS), is based on denoising source separation approaches that take advantage of the simultaneous but differential projection of neural activity to multiple electrodes or sensors. Our approach is a combination and extension of existing multivariate source separation methods. We demonstrate that RESS performs well on both simulated and empirical data, and outperforms conventional SSEP analysis methods based on selecting electrodes with the strongest SSEP response, as well as several other linear spatial filters. We also discuss the potential confound of overfitting, whereby the filter captures noise in absence of a signal. Matlab scripts are available to replicate and extend our simulations and methods. We conclude with some practical advice for optimizing SSEP data analyses and interpreting the results. Copyright © 2016 Elsevier Inc. All rights reserved.
An image retrieval framework for real-time endoscopic image retargeting.
Ye, Menglong; Johns, Edward; Walter, Benjamin; Meining, Alexander; Yang, Guang-Zhong
2017-08-01
Serial endoscopic examinations of a patient are important for early diagnosis of malignancies in the gastrointestinal tract. However, retargeting for optical biopsy is challenging due to extensive tissue variations between examinations, requiring the method to be tolerant to these changes whilst enabling real-time retargeting. This work presents an image retrieval framework for inter-examination retargeting. We propose both a novel image descriptor tolerant of long-term tissue changes and a novel descriptor matching method in real time. The descriptor is based on histograms generated from regional intensity comparisons over multiple scales, offering stability over long-term appearance changes at the higher levels, whilst remaining discriminative at the lower levels. The matching method then learns a hashing function using random forests, to compress the string and allow for fast image comparison by a simple Hamming distance metric. A dataset that contains 13 in vivo gastrointestinal videos was collected from six patients, representing serial examinations of each patient, which includes videos captured with significant time intervals. Precision-recall for retargeting shows that our new descriptor outperforms a number of alternative descriptors, whilst our hashing method outperforms a number of alternative hashing approaches. We have proposed a novel framework for optical biopsy in serial endoscopic examinations. A new descriptor, combined with a novel hashing method, achieves state-of-the-art retargeting, with validation on in vivo videos from six patients. Real-time performance also allows for practical integration without disturbing the existing clinical workflow.
Bearing Fault Diagnosis Based on Statistical Locally Linear Embedding
Wang, Xiang; Zheng, Yuan; Zhao, Zhenzhou; Wang, Jinping
2015-01-01
Fault diagnosis is essentially a kind of pattern recognition. The measured signal samples usually distribute on nonlinear low-dimensional manifolds embedded in the high-dimensional signal space, so how to implement feature extraction, dimensionality reduction and improve recognition performance is a crucial task. In this paper a novel machinery fault diagnosis approach based on a statistical locally linear embedding (S-LLE) algorithm which is an extension of LLE by exploiting the fault class label information is proposed. The fault diagnosis approach first extracts the intrinsic manifold features from the high-dimensional feature vectors which are obtained from vibration signals that feature extraction by time-domain, frequency-domain and empirical mode decomposition (EMD), and then translates the complex mode space into a salient low-dimensional feature space by the manifold learning algorithm S-LLE, which outperforms other feature reduction methods such as PCA, LDA and LLE. Finally in the feature reduction space pattern classification and fault diagnosis by classifier are carried out easily and rapidly. Rolling bearing fault signals are used to validate the proposed fault diagnosis approach. The results indicate that the proposed approach obviously improves the classification performance of fault pattern recognition and outperforms the other traditional approaches. PMID:26153771
Dong, Yitong; Qiao, Tian; Kim, Doyun; Parobek, David; Rossi, Daniel; Son, Dong Hee
2018-05-09
Cesium lead halide (CsPbX 3 ) nanocrystals have emerged as a new family of materials that can outperform the existing semiconductor nanocrystals due to their superb optical and charge-transport properties. However, the lack of a robust method for producing quantum dots with controlled size and high ensemble uniformity has been one of the major obstacles in exploring the useful properties of excitons in zero-dimensional nanostructures of CsPbX 3 . Here, we report a new synthesis approach that enables the precise control of the size based on the equilibrium rather than kinetics, producing CsPbX 3 quantum dots nearly free of heterogeneous broadening in their exciton luminescence. The high level of size control and ensemble uniformity achieved here will open the door to harnessing the benefits of excitons in CsPbX 3 quantum dots for photonic and energy-harvesting applications.
Automatic inference of indexing rules for MEDLINE
Névéol, Aurélie; Shooshan, Sonya E; Claveau, Vincent
2008-01-01
Background: Indexing is a crucial step in any information retrieval system. In MEDLINE, a widely used database of the biomedical literature, the indexing process involves the selection of Medical Subject Headings in order to describe the subject matter of articles. The need for automatic tools to assist MEDLINE indexers in this task is growing with the increasing number of publications being added to MEDLINE. Methods: In this paper, we describe the use and the customization of Inductive Logic Programming (ILP) to infer indexing rules that may be used to produce automatic indexing recommendations for MEDLINE indexers. Results: Our results show that this original ILP-based approach outperforms manual rules when they exist. In addition, the use of ILP rules also improves the overall performance of the Medical Text Indexer (MTI), a system producing automatic indexing recommendations for MEDLINE. Conclusion: We expect the sets of ILP rules obtained in this experiment to be integrated into MTI. PMID:19025687
Automatic inference of indexing rules for MEDLINE.
Névéol, Aurélie; Shooshan, Sonya E; Claveau, Vincent
2008-11-19
Indexing is a crucial step in any information retrieval system. In MEDLINE, a widely used database of the biomedical literature, the indexing process involves the selection of Medical Subject Headings in order to describe the subject matter of articles. The need for automatic tools to assist MEDLINE indexers in this task is growing with the increasing number of publications being added to MEDLINE. In this paper, we describe the use and the customization of Inductive Logic Programming (ILP) to infer indexing rules that may be used to produce automatic indexing recommendations for MEDLINE indexers. Our results show that this original ILP-based approach outperforms manual rules when they exist. In addition, the use of ILP rules also improves the overall performance of the Medical Text Indexer (MTI), a system producing automatic indexing recommendations for MEDLINE. We expect the sets of ILP rules obtained in this experiment to be integrated into MTI.
Active impulsive noise control using maximum correntropy with adaptive kernel size
NASA Astrophysics Data System (ADS)
Lu, Lu; Zhao, Haiquan
2017-03-01
The active noise control (ANC) based on the principle of superposition is an attractive method to attenuate the noise signals. However, the impulsive noise in the ANC systems will degrade the performance of the controller. In this paper, a filtered-x recursive maximum correntropy (FxRMC) algorithm is proposed based on the maximum correntropy criterion (MCC) to reduce the effect of outliers. The proposed FxRMC algorithm does not requires any priori information of the noise characteristics and outperforms the filtered-x least mean square (FxLMS) algorithm for impulsive noise. Meanwhile, in order to adjust the kernel size of FxRMC algorithm online, a recursive approach is proposed through taking into account the past estimates of error signals over a sliding window. Simulation and experimental results in the context of active impulsive noise control demonstrate that the proposed algorithms achieve much better performance than the existing algorithms in various noise environments.
Cheung, Y M; Leung, W M; Xu, L
1997-01-01
We propose a prediction model called Rival Penalized Competitive Learning (RPCL) and Combined Linear Predictor method (CLP), which involves a set of local linear predictors such that a prediction is made by the combination of some activated predictors through a gating network (Xu et al., 1994). Furthermore, we present its improved variant named Adaptive RPCL-CLP that includes an adaptive learning mechanism as well as a data pre-and-post processing scheme. We compare them with some existing models by demonstrating their performance on two real-world financial time series--a China stock price and an exchange-rate series of US Dollar (USD) versus Deutschmark (DEM). Experiments have shown that Adaptive RPCL-CLP not only outperforms the other approaches with the smallest prediction error and training costs, but also brings in considerable high profits in the trading simulation of foreign exchange market.
Translation of Genotype to Phenotype by a Hierarchy of Cell Subsystems.
Yu, Michael Ku; Kramer, Michael; Dutkowski, Janusz; Srivas, Rohith; Licon, Katherine; Kreisberg, Jason; Ng, Cherie T; Krogan, Nevan; Sharan, Roded; Ideker, Trey
2016-02-24
Accurately translating genotype to phenotype requires accounting for the functional impact of genetic variation at many biological scales. Here we present a strategy for genotype-phenotype reasoning based on existing knowledge of cellular subsystems. These subsystems and their hierarchical organization are defined by the Gene Ontology or a complementary ontology inferred directly from previously published datasets. Guided by the ontology's hierarchical structure, we organize genotype data into an "ontotype," that is, a hierarchy of perturbations representing the effects of genetic variation at multiple cellular scales. The ontotype is then interpreted using logical rules generated by machine learning to predict phenotype. This approach substantially outperforms previous, non-hierarchical methods for translating yeast genotype to cell growth phenotype, and it accurately predicts the growth outcomes of two new screens of 2,503 double gene knockouts impacting DNA repair or nuclear lumen. Ontotypes also generalize to larger knockout combinations, setting the stage for interpreting the complex genetics of disease.
Refining Automatically Extracted Knowledge Bases Using Crowdsourcing.
Li, Chunhua; Zhao, Pengpeng; Sheng, Victor S; Xian, Xuefeng; Wu, Jian; Cui, Zhiming
2017-01-01
Machine-constructed knowledge bases often contain noisy and inaccurate facts. There exists significant work in developing automated algorithms for knowledge base refinement. Automated approaches improve the quality of knowledge bases but are far from perfect. In this paper, we leverage crowdsourcing to improve the quality of automatically extracted knowledge bases. As human labelling is costly, an important research challenge is how we can use limited human resources to maximize the quality improvement for a knowledge base. To address this problem, we first introduce a concept of semantic constraints that can be used to detect potential errors and do inference among candidate facts. Then, based on semantic constraints, we propose rank-based and graph-based algorithms for crowdsourced knowledge refining, which judiciously select the most beneficial candidate facts to conduct crowdsourcing and prune unnecessary questions. Our experiments show that our method improves the quality of knowledge bases significantly and outperforms state-of-the-art automatic methods under a reasonable crowdsourcing cost.
A salient region detection model combining background distribution measure for indoor robots.
Li, Na; Xu, Hui; Wang, Zhenhua; Sun, Lining; Chen, Guodong
2017-01-01
Vision system plays an important role in the field of indoor robot. Saliency detection methods, capturing regions that are perceived as important, are used to improve the performance of visual perception system. Most of state-of-the-art methods for saliency detection, performing outstandingly in natural images, cannot work in complicated indoor environment. Therefore, we propose a new method comprised of graph-based RGB-D segmentation, primary saliency measure, background distribution measure, and combination. Besides, region roundness is proposed to describe the compactness of a region to measure background distribution more robustly. To validate the proposed approach, eleven influential methods are compared on the DSD and ECSSD dataset. Moreover, we build a mobile robot platform for application in an actual environment, and design three different kinds of experimental constructions that are different viewpoints, illumination variations and partial occlusions. Experimental results demonstrate that our model outperforms existing methods and is useful for indoor mobile robots.
Designing bioinspired composite reinforcement architectures via 3D magnetic printing
NASA Astrophysics Data System (ADS)
Martin, Joshua J.; Fiore, Brad E.; Erb, Randall M.
2015-10-01
Discontinuous fibre composites represent a class of materials that are strong, lightweight and have remarkable fracture toughness. These advantages partially explain the abundance and variety of discontinuous fibre composites that have evolved in the natural world. Many natural structures out-perform the conventional synthetic counterparts due, in part, to the more elaborate reinforcement architectures that occur in natural composites. Here we present an additive manufacturing approach that combines real-time colloidal assembly with existing additive manufacturing technologies to create highly programmable discontinuous fibre composites. This technology, termed as `3D magnetic printing', has enabled us to recreate complex bioinspired reinforcement architectures that deliver enhanced material performance compared with monolithic structures. Further, we demonstrate that we can now design and evolve elaborate reinforcement architectures that are not found in nature, demonstrating a high level of possible customization in discontinuous fibre composites with arbitrary geometries.
Designing bioinspired composite reinforcement architectures via 3D magnetic printing.
Martin, Joshua J; Fiore, Brad E; Erb, Randall M
2015-10-23
Discontinuous fibre composites represent a class of materials that are strong, lightweight and have remarkable fracture toughness. These advantages partially explain the abundance and variety of discontinuous fibre composites that have evolved in the natural world. Many natural structures out-perform the conventional synthetic counterparts due, in part, to the more elaborate reinforcement architectures that occur in natural composites. Here we present an additive manufacturing approach that combines real-time colloidal assembly with existing additive manufacturing technologies to create highly programmable discontinuous fibre composites. This technology, termed as '3D magnetic printing', has enabled us to recreate complex bioinspired reinforcement architectures that deliver enhanced material performance compared with monolithic structures. Further, we demonstrate that we can now design and evolve elaborate reinforcement architectures that are not found in nature, demonstrating a high level of possible customization in discontinuous fibre composites with arbitrary geometries.
A Hybrid Genetic Programming Algorithm for Automated Design of Dispatching Rules.
Nguyen, Su; Mei, Yi; Xue, Bing; Zhang, Mengjie
2018-06-04
Designing effective dispatching rules for production systems is a difficult and timeconsuming task if it is done manually. In the last decade, the growth of computing power, advanced machine learning, and optimisation techniques has made the automated design of dispatching rules possible and automatically discovered rules are competitive or outperform existing rules developed by researchers. Genetic programming is one of the most popular approaches to discovering dispatching rules in the literature, especially for complex production systems. However, the large heuristic search space may restrict genetic programming from finding near optimal dispatching rules. This paper develops a new hybrid genetic programming algorithm for dynamic job shop scheduling based on a new representation, a new local search heuristic, and efficient fitness evaluators. Experiments show that the new method is effective regarding the quality of evolved rules. Moreover, evolved rules are also significantly smaller and contain more relevant attributes.
Consistently Sampled Correlation Filters with Space Anisotropic Regularization for Visual Tracking
Shi, Guokai; Xu, Tingfa; Luo, Jiqiang; Li, Yuankun
2017-01-01
Most existing correlation filter-based tracking algorithms, which use fixed patches and cyclic shifts as training and detection measures, assume that the training samples are reliable and ignore the inconsistencies between training samples and detection samples. We propose to construct and study a consistently sampled correlation filter with space anisotropic regularization (CSSAR) to solve these two problems simultaneously. Our approach constructs a spatiotemporally consistent sample strategy to alleviate the redundancies in training samples caused by the cyclical shifts, eliminate the inconsistencies between training samples and detection samples, and introduce space anisotropic regularization to constrain the correlation filter for alleviating drift caused by occlusion. Moreover, an optimization strategy based on the Gauss-Seidel method was developed for obtaining robust and efficient online learning. Both qualitative and quantitative evaluations demonstrate that our tracker outperforms state-of-the-art trackers in object tracking benchmarks (OTBs). PMID:29231876
A performance study of live VM migration technologies: VMotion vs XenMotion
NASA Astrophysics Data System (ADS)
Feng, Xiujie; Tang, Jianxiong; Luo, Xuan; Jin, Yaohui
2011-12-01
Due to the growing demand of flexible resource management for cloud computing services, researches on live virtual machine migration have attained more and more attention. Live migration of virtual machine across different hosts has been a powerful tool to facilitate system maintenance, load balancing, fault tolerance and so on. In this paper, we use a measurement-based approach to compare the performance of two major live migration technologies under certain network conditions, i.e., VMotion and XenMotion. The results show that VMotion generates much less data transferred than XenMotion when migrating identical VMs. However, in network with moderate packet loss and delay, which are typical in a VPN (virtual private network) scenario used to connect the data centers, XenMotion outperforms VMotion in total migration time. We hope that this study can be helpful in choosing suitable virtualization environments for data center administrators and optimizing existing live migration mechanisms.
New insights from cluster analysis methods for RNA secondary structure prediction
Rogers, Emily; Heitsch, Christine
2016-01-01
A widening gap exists between the best practices for RNA secondary structure prediction developed by computational researchers and the methods used in practice by experimentalists. Minimum free energy (MFE) predictions, although broadly used, are outperformed by methods which sample from the Boltzmann distribution and data mine the results. In particular, moving beyond the single structure prediction paradigm yields substantial gains in accuracy. Furthermore, the largest improvements in accuracy and precision come from viewing secondary structures not at the base pair level but at lower granularity/higher abstraction. This suggests that random errors affecting precision and systematic ones affecting accuracy are both reduced by this “fuzzier” view of secondary structures. Thus experimentalists who are willing to adopt a more rigorous, multilayered approach to secondary structure prediction by iterating through these levels of granularity will be much better able to capture fundamental aspects of RNA base pairing. PMID:26971529
Ensemble-based prediction of RNA secondary structures.
Aghaeepour, Nima; Hoos, Holger H
2013-04-24
Accurate structure prediction methods play an important role for the understanding of RNA function. Energy-based, pseudoknot-free secondary structure prediction is one of the most widely used and versatile approaches, and improved methods for this task have received much attention over the past five years. Despite the impressive progress that as been achieved in this area, existing evaluations of the prediction accuracy achieved by various algorithms do not provide a comprehensive, statistically sound assessment. Furthermore, while there is increasing evidence that no prediction algorithm consistently outperforms all others, no work has been done to exploit the complementary strengths of multiple approaches. In this work, we present two contributions to the area of RNA secondary structure prediction. Firstly, we use state-of-the-art, resampling-based statistical methods together with a previously published and increasingly widely used dataset of high-quality RNA structures to conduct a comprehensive evaluation of existing RNA secondary structure prediction procedures. The results from this evaluation clarify the performance relationship between ten well-known existing energy-based pseudoknot-free RNA secondary structure prediction methods and clearly demonstrate the progress that has been achieved in recent years. Secondly, we introduce AveRNA, a generic and powerful method for combining a set of existing secondary structure prediction procedures into an ensemble-based method that achieves significantly higher prediction accuracies than obtained from any of its component procedures. Our new, ensemble-based method, AveRNA, improves the state of the art for energy-based, pseudoknot-free RNA secondary structure prediction by exploiting the complementary strengths of multiple existing prediction procedures, as demonstrated using a state-of-the-art statistical resampling approach. In addition, AveRNA allows an intuitive and effective control of the trade-off between false negative and false positive base pair predictions. Finally, AveRNA can make use of arbitrary sets of secondary structure prediction procedures and can therefore be used to leverage improvements in prediction accuracy offered by algorithms and energy models developed in the future. Our data, MATLAB software and a web-based version of AveRNA are publicly available at http://www.cs.ubc.ca/labs/beta/Software/AveRNA.
NASA Astrophysics Data System (ADS)
Clunie, David A.
2000-05-01
Proprietary compression schemes have a cost and risk associated with their support, end of life and interoperability. Standards reduce this cost and risk. The new JPEG-LS process (ISO/IEC 14495-1), and the lossless mode of the proposed JPEG 2000 scheme (ISO/IEC CD15444-1), new standard schemes that may be incorporated into DICOM, are evaluated here. Three thousand, six hundred and seventy-nine (3,679) single frame grayscale images from multiple anatomical regions, modalities and vendors, were tested. For all images combined JPEG-LS and JPEG 2000 performed equally well (3.81), almost as well as CALIC (3.91), a complex predictive scheme used only as a benchmark. Both out-performed existing JPEG (3.04 with optimum predictor choice per image, 2.79 for previous pixel prediction as most commonly used in DICOM). Text dictionary schemes performed poorly (gzip 2.38), as did image dictionary schemes without statistical modeling (PNG 2.76). Proprietary transform based schemes did not perform as well as JPEG-LS or JPEG 2000 (S+P Arithmetic 3.4, CREW 3.56). Stratified by modality, JPEG-LS compressed CT images (4.00), MR (3.59), NM (5.98), US (3.4), IO (2.66), CR (3.64), DX (2.43), and MG (2.62). CALIC always achieved the highest compression except for one modality for which JPEG-LS did better (MG digital vendor A JPEG-LS 4.02, CALIC 4.01). JPEG-LS outperformed existing JPEG for all modalities. The use of standard schemes can achieve state of the art performance, regardless of modality, JPEG-LS is simple, easy to implement, consumes less memory, and is faster than JPEG 2000, though JPEG 2000 will offer lossy and progressive transmission. It is recommended that DICOM add transfer syntaxes for both JPEG-LS and JPEG 2000.
NASA Astrophysics Data System (ADS)
Obulesu, O.; Rama Mohan Reddy, A., Dr; Mahendra, M.
2017-08-01
Detecting regular and efficient cyclic models is the demanding activity for data analysts due to unstructured, vigorous and enormous raw information produced from web. Many existing approaches generate large candidate patterns in the occurrence of huge and complex databases. In this work, two novel algorithms are proposed and a comparative examination is performed by considering scalability and performance parameters. The first algorithm is, EFPMA (Extended Regular Model Detection Algorithm) used to find frequent sequential patterns from the spatiotemporal dataset and the second one is, ETMA (Enhanced Tree-based Mining Algorithm) for detecting effective cyclic models with symbolic database representation. EFPMA is an algorithm grows models from both ends (prefixes and suffixes) of detected patterns, which results in faster pattern growth because of less levels of database projection compared to existing approaches such as Prefixspan and SPADE. ETMA uses distinct notions to store and manage transactions data horizontally such as segment, sequence and individual symbols. ETMA exploits a partition-and-conquer method to find maximal patterns by using symbolic notations. Using this algorithm, we can mine cyclic models in full-series sequential patterns including subsection series also. ETMA reduces the memory consumption and makes use of the efficient symbolic operation. Furthermore, ETMA only records time-series instances dynamically, in terms of character, series and section approaches respectively. The extent of the pattern and proving efficiency of the reducing and retrieval techniques from synthetic and actual datasets is a really open & challenging mining problem. These techniques are useful in data streams, traffic risk analysis, medical diagnosis, DNA sequence Mining, Earthquake prediction applications. Extensive investigational outcomes illustrates that the algorithms outperforms well towards efficiency and scalability than ECLAT, STNR and MAFIA approaches.
Gender Gaps and Gendered Action in a First-Year Physics Laboratory
ERIC Educational Resources Information Center
Day, James; Stang, Jared B.; Holmes, N. G.; Kumar, Dhaneesh; Bonn, D. A.
2016-01-01
It is established that male students outperform female students on almost all commonly used physics concept inventories. However, there is significant variation in the factors that contribute to the gap, as well as the direction in which they influence it. It is presently unknown if such a gender gap exists on the relatively new Concise Data…
Searching Information Sources in Networks
2017-06-14
SECURITY CLASSIFICATION OF: During the course of this project, we made significant progresses in multiple directions of the information detection...result on information source detection on non-tree networks; (2) The development of information source localization algorithms to detect multiple... information sources. The algorithms have provable performance guarantees and outperform existing algorithms in 1. REPORT DATE (DD-MM-YYYY) 4. TITLE AND
Towards a SIM-Less Existence: The Evolution of Smart Learning Networks
ERIC Educational Resources Information Center
Al-Khouri, Ali M.
2015-01-01
This article proposes that the widespread availability of wireless networks creates a case in which there is no real need for SIM cards. Recent technological developments offer the capability to outperform SIM cards and provide more innovative dimensions to current systems of mobility. In this context of changing realities in the domain of…
On prediction of genetic values in marker-assisted selection.
Lange, C; Whittaker, J C
2001-01-01
We suggest a new approximation for the prediction of genetic values in marker-assisted selection. The new approximation is compared to the standard approach. It is shown that the new approach will often provide substantially better prediction of genetic values; furthermore the new approximation avoids some of the known statistical problems of the standard approach. The advantages of the new approach are illustrated by a simulation study in which the new approximation outperforms both the standard approach and phenotypic selection. PMID:11729177
ERIC Educational Resources Information Center
Kliegel, Matthias; Altgassen, Mareike
2006-01-01
The present study investigated fluid and crystallized intelligence as well as strategic task approaches as potential sources of age-related differences in adult learning performance. Therefore, 45 young and 45 old adults were asked to learn pictured objects. Overall, young participants outperformed old participants in this learning test. However,…
Sudha, M
2017-09-27
As a recent trend, various computational intelligence and machine learning approaches have been used for mining inferences hidden in the large clinical databases to assist the clinician in strategic decision making. In any target data the irrelevant information may be detrimental, causing confusion for the mining algorithm and degrades the prediction outcome. To address this issue, this study attempts to identify an intelligent approach to assist disease diagnostic procedure using an optimal set of attributes instead of all attributes present in the clinical data set. In this proposed Application Specific Intelligent Computing (ASIC) decision support system, a rough set based genetic algorithm is employed in pre-processing phase and a back propagation neural network is applied in training and testing phase. ASIC has two phases, the first phase handles outliers, noisy data, and missing values to obtain a qualitative target data to generate appropriate attribute reduct sets from the input data using rough computing based genetic algorithm centred on a relative fitness function measure. The succeeding phase of this system involves both training and testing of back propagation neural network classifier on the selected reducts. The model performance is evaluated with widely adopted existing classifiers. The proposed ASIC system for clinical decision support has been tested with breast cancer, fertility diagnosis and heart disease data set from the University of California at Irvine (UCI) machine learning repository. The proposed system outperformed the existing approaches attaining the accuracy rate of 95.33%, 97.61%, and 93.04% for breast cancer, fertility issue and heart disease diagnosis.
On accuracy, privacy, and complexity in the identification problem
NASA Astrophysics Data System (ADS)
Beekhof, F.; Voloshynovskiy, S.; Koval, O.; Holotyak, T.
2010-02-01
This paper presents recent advances in the identification problem taking into account the accuracy, complexity and privacy leak of different decoding algorithms. Using a model of different actors from literature, we show that it is possible to use more accurate decoding algorithms using reliability information without increasing the privacy leak relative to algorithms that only use binary information. Existing algorithms from literature have been modified to take advantage of reliability information, and we show that a proposed branch-and-bound algorithm can outperform existing work, including the enhanced variants.
Forecasting Tehran stock exchange volatility; Markov switching GARCH approach
NASA Astrophysics Data System (ADS)
Abounoori, Esmaiel; Elmi, Zahra (Mila); Nademi, Younes
2016-03-01
This paper evaluates several GARCH models regarding their ability to forecast volatility in Tehran Stock Exchange (TSE). These include GARCH models with both Gaussian and fat-tailed residual conditional distribution, concerning their ability to describe and forecast volatility from 1-day to 22-day horizon. Results indicate that AR(2)-MRSGARCH-GED model outperforms other models at one-day horizon. Also, the AR(2)-MRSGARCH-GED as well as AR(2)-MRSGARCH-t models outperform other models at 5-day horizon. In 10 day horizon, three models of AR(2)-MRSGARCH outperform other models. Concerning 22 day forecast horizon, results indicate no differences between MRSGARCH models with that of standard GARCH models. Regarding Risk management out-of-sample evaluation (95% VaR), a few models seem to provide reasonable and accurate VaR estimates at 1-day horizon, with a coverage rate close to the nominal level. According to the risk management loss functions, there is not a uniformly most accurate model.
Connected Component Model for Multi-Object Tracking.
He, Zhenyu; Li, Xin; You, Xinge; Tao, Dacheng; Tang, Yuan Yan
2016-08-01
In multi-object tracking, it is critical to explore the data associations by exploiting the temporal information from a sequence of frames rather than the information from the adjacent two frames. Since straightforwardly obtaining data associations from multi-frames is an NP-hard multi-dimensional assignment (MDA) problem, most existing methods solve this MDA problem by either developing complicated approximate algorithms, or simplifying MDA as a 2D assignment problem based upon the information extracted only from adjacent frames. In this paper, we show that the relation between associations of two observations is the equivalence relation in the data association problem, based on the spatial-temporal constraint that the trajectories of different objects must be disjoint. Therefore, the MDA problem can be equivalently divided into independent subproblems by equivalence partitioning. In contrast to existing works for solving the MDA problem, we develop a connected component model (CCM) by exploiting the constraints of the data association and the equivalence relation on the constraints. Based upon CCM, we can efficiently obtain the global solution of the MDA problem for multi-object tracking by optimizing a sequence of independent data association subproblems. Experiments on challenging public data sets demonstrate that our algorithm outperforms the state-of-the-art approaches.
A novel swarm intelligence algorithm for finding DNA motifs.
Lei, Chengwei; Ruan, Jianhua
2009-01-01
Discovering DNA motifs from co-expressed or co-regulated genes is an important step towards deciphering complex gene regulatory networks and understanding gene functions. Despite significant improvement in the last decade, it still remains one of the most challenging problems in computational molecular biology. In this work, we propose a novel motif finding algorithm that finds consensus patterns using a population-based stochastic optimisation technique called Particle Swarm Optimisation (PSO), which has been shown to be effective in optimising difficult multidimensional problems in continuous domains. We propose to use a word dissimilarity graph to remap the neighborhood structure of the solution space of DNA motifs, and propose a modification of the naive PSO algorithm to accommodate discrete variables. In order to improve efficiency, we also propose several strategies for escaping from local optima and for automatically determining the termination criteria. Experimental results on simulated challenge problems show that our method is both more efficient and more accurate than several existing algorithms. Applications to several sets of real promoter sequences also show that our approach is able to detect known transcription factor binding sites, and outperforms two of the most popular existing algorithms.
Towards Open-World Person Re-Identification by One-Shot Group-Based Verification.
Zheng, Wei-Shi; Gong, Shaogang; Xiang, Tao
2016-03-01
Solving the problem of matching people across non-overlapping multi-camera views, known as person re-identification (re-id), has received increasing interests in computer vision. In a real-world application scenario, a watch-list (gallery set) of a handful of known target people are provided with very few (in many cases only a single) image(s) (shots) per target. Existing re-id methods are largely unsuitable to address this open-world re-id challenge because they are designed for (1) a closed-world scenario where the gallery and probe sets are assumed to contain exactly the same people, (2) person-wise identification whereby the model attempts to verify exhaustively against each individual in the gallery set, and (3) learning a matching model using multi-shots. In this paper, a novel transfer local relative distance comparison (t-LRDC) model is formulated to address the open-world person re-identification problem by one-shot group-based verification. The model is designed to mine and transfer useful information from a labelled open-world non-target dataset. Extensive experiments demonstrate that the proposed approach outperforms both non-transfer learning and existing transfer learning based re-id methods.
Dong, Yadong; Sun, Yongqi; Qin, Chao
2018-01-01
The existing protein complex detection methods can be broadly divided into two categories: unsupervised and supervised learning methods. Most of the unsupervised learning methods assume that protein complexes are in dense regions of protein-protein interaction (PPI) networks even though many true complexes are not dense subgraphs. Supervised learning methods utilize the informative properties of known complexes; they often extract features from existing complexes and then use the features to train a classification model. The trained model is used to guide the search process for new complexes. However, insufficient extracted features, noise in the PPI data and the incompleteness of complex data make the classification model imprecise. Consequently, the classification model is not sufficient for guiding the detection of complexes. Therefore, we propose a new robust score function that combines the classification model with local structural information. Based on the score function, we provide a search method that works both forwards and backwards. The results from experiments on six benchmark PPI datasets and three protein complex datasets show that our approach can achieve better performance compared with the state-of-the-art supervised, semi-supervised and unsupervised methods for protein complex detection, occasionally significantly outperforming such methods.
SCOUT: simultaneous time segmentation and community detection in dynamic networks
Hulovatyy, Yuriy; Milenković, Tijana
2016-01-01
Many evolving complex real-world systems can be modeled via dynamic networks. An important problem in dynamic network research is community detection, which finds groups of topologically related nodes. Typically, this problem is approached by assuming either that each time point has a distinct community organization or that all time points share a single community organization. The reality likely lies between these two extremes. To find the compromise, we consider community detection in the context of the problem of segment detection, which identifies contiguous time periods with consistent network structure. Consequently, we formulate a combined problem of segment community detection (SCD), which simultaneously partitions the network into contiguous time segments with consistent community organization and finds this community organization for each segment. To solve SCD, we introduce SCOUT, an optimization framework that explicitly considers both segmentation quality and partition quality. SCOUT addresses limitations of existing methods that can be adapted to solve SCD, which consider only one of segmentation quality or partition quality. In a thorough evaluation, SCOUT outperforms the existing methods in terms of both accuracy and computational complexity. We apply SCOUT to biological network data to study human aging. PMID:27881879
Jian, Yulin; Huang, Daoyu; Yan, Jia; Lu, Kun; Huang, Ying; Wen, Tailai; Zeng, Tanyue; Zhong, Shijie; Xie, Qilong
2017-06-19
A novel classification model, named the quantum-behaved particle swarm optimization (QPSO)-based weighted multiple kernel extreme learning machine (QWMK-ELM), is proposed in this paper. Experimental validation is carried out with two different electronic nose (e-nose) datasets. Being different from the existing multiple kernel extreme learning machine (MK-ELM) algorithms, the combination coefficients of base kernels are regarded as external parameters of single-hidden layer feedforward neural networks (SLFNs). The combination coefficients of base kernels, the model parameters of each base kernel, and the regularization parameter are optimized by QPSO simultaneously before implementing the kernel extreme learning machine (KELM) with the composite kernel function. Four types of common single kernel functions (Gaussian kernel, polynomial kernel, sigmoid kernel, and wavelet kernel) are utilized to constitute different composite kernel functions. Moreover, the method is also compared with other existing classification methods: extreme learning machine (ELM), kernel extreme learning machine (KELM), k-nearest neighbors (KNN), support vector machine (SVM), multi-layer perceptron (MLP), radical basis function neural network (RBFNN), and probabilistic neural network (PNN). The results have demonstrated that the proposed QWMK-ELM outperforms the aforementioned methods, not only in precision, but also in efficiency for gas classification.
First in the Class? Age and the Education Production Function. NBER Working Paper No. 13663
ERIC Educational Resources Information Center
Cascio, Elizabeth; Schanzenbach, Diane Whitmore
2007-01-01
Older children outperform younger children in a school-entry cohort well into their school careers. The existing literature has provided little insight into the causes of this phenomenon, leaving open the possibility that school-entry age is zero-sum game, where relatively young students lose what relatively old students gain. In this paper, we…
The Impact of Seating Location and Seating Type on Student Performance
ERIC Educational Resources Information Center
Meeks, Michael D.; Knotts, Tami L.; James, Karen D.; Williams, Felice; Vassar, John A.; Wren, Amy Oakes
2013-01-01
While an extensive body of research exists regarding the delivery of course knowledge and material, much less attention has been paid to the performance effect of seating position within a classroom. Research findings are mixed as to whether students in the front row of a classroom outperform students in the back row. Another issue that has not…
Identity Recognition Algorithm Using Improved Gabor Feature Selection of Gait Energy Image
NASA Astrophysics Data System (ADS)
Chao, LIANG; Ling-yao, JIA; Dong-cheng, SHI
2017-01-01
This paper describes an effective gait recognition approach based on Gabor features of gait energy image. In this paper, the kernel Fisher analysis combined with kernel matrix is proposed to select dominant features. The nearest neighbor classifier based on whitened cosine distance is used to discriminate different gait patterns. The approach proposed is tested on the CASIA and USF gait databases. The results show that our approach outperforms other state of gait recognition approaches in terms of recognition accuracy and robustness.
A Comparision of Heuristic Methods Used in Hierarchical Production Planning.
1979-03-01
literature: Equalization— of—Run—Out—Ti mes ( EROT ) (12], Winters ( 18 ] , Hex and Meal (lii, and Knap- sack [1] . Figure 2 displays a flow chart...forecast error mus t be greater than 24% for this difference to be noticed. Hax—Meal outperformed the EROT with forecast error between 18 and 23% by more...planning frame- work: the Equalization—of—Run—Out—Time approach (12], the Winters approach [ 18 ], the Hax—Meal approach [lii, and the Knapsack approach (1
Luo, Wei; Tran, Truyen; Berk, Michael; Venkatesh, Svetha
2016-01-01
Background Although physical illnesses, routinely documented in electronic medical records (EMR), have been found to be a contributing factor to suicides, no automated systems use this information to predict suicide risk. Objective The aim of this study is to quantify the impact of physical illnesses on suicide risk, and develop a predictive model that captures this relationship using EMR data. Methods We used history of physical illnesses (except chapter V: Mental and behavioral disorders) from EMR data over different time-periods to build a lookup table that contains the probability of suicide risk for each chapter of the International Statistical Classification of Diseases and Related Health Problems, 10th Revision (ICD-10) codes. The lookup table was then used to predict the probability of suicide risk for any new assessment. Based on the different lengths of history of physical illnesses, we developed six different models to predict suicide risk. We tested the performance of developed models to predict 90-day risk using historical data over differing time-periods ranging from 3 to 48 months. A total of 16,858 assessments from 7399 mental health patients with at least one risk assessment was used for the validation of the developed model. The performance was measured using area under the receiver operating characteristic curve (AUC). Results The best predictive results were derived (AUC=0.71) using combined data across all time-periods, which significantly outperformed the clinical baseline derived from routine risk assessment (AUC=0.56). The proposed approach thus shows potential to be incorporated in the broader risk assessment processes used by clinicians. Conclusions This study provides a novel approach to exploit the history of physical illnesses extracted from EMR (ICD-10 codes without chapter V-mental and behavioral disorders) to predict suicide risk, and this model outperforms existing clinical assessments of suicide risk. PMID:27400764
Vision-Aided RAIM: A New Method for GPS Integrity Monitoring in Approach and Landing Phase
Fu, Li; Zhang, Jun; Li, Rui; Cao, Xianbin; Wang, Jinling
2015-01-01
In the 1980s, Global Positioning System (GPS) receiver autonomous integrity monitoring (RAIM) was proposed to provide the integrity of a navigation system by checking the consistency of GPS measurements. However, during the approach and landing phase of a flight path, where there is often low GPS visibility conditions, the performance of the existing RAIM method may not meet the stringent aviation requirements for availability and integrity due to insufficient observations. To solve this problem, a new RAIM method, named vision-aided RAIM (VA-RAIM), is proposed for GPS integrity monitoring in the approach and landing phase. By introducing landmarks as pseudo-satellites, the VA-RAIM enriches the navigation observations to improve the performance of RAIM. In the method, a computer vision system photographs and matches these landmarks to obtain additional measurements for navigation. Nevertheless, the challenging issue is that such additional measurements may suffer from vision errors. To ensure the reliability of the vision measurements, a GPS-based calibration algorithm is presented to reduce the time-invariant part of the vision errors. Then, the calibrated vision measurements are integrated with the GPS observations for integrity monitoring. Simulation results show that the VA-RAIM outperforms the conventional RAIM with a higher level of availability and fault detection rate. PMID:26378533
Vision-Aided RAIM: A New Method for GPS Integrity Monitoring in Approach and Landing Phase.
Fu, Li; Zhang, Jun; Li, Rui; Cao, Xianbin; Wang, Jinling
2015-09-10
In the 1980s, Global Positioning System (GPS) receiver autonomous integrity monitoring (RAIM) was proposed to provide the integrity of a navigation system by checking the consistency of GPS measurements. However, during the approach and landing phase of a flight path, where there is often low GPS visibility conditions, the performance of the existing RAIM method may not meet the stringent aviation requirements for availability and integrity due to insufficient observations. To solve this problem, a new RAIM method, named vision-aided RAIM (VA-RAIM), is proposed for GPS integrity monitoring in the approach and landing phase. By introducing landmarks as pseudo-satellites, the VA-RAIM enriches the navigation observations to improve the performance of RAIM. In the method, a computer vision system photographs and matches these landmarks to obtain additional measurements for navigation. Nevertheless, the challenging issue is that such additional measurements may suffer from vision errors. To ensure the reliability of the vision measurements, a GPS-based calibration algorithm is presented to reduce the time-invariant part of the vision errors. Then, the calibrated vision measurements are integrated with the GPS observations for integrity monitoring. Simulation results show that the VA-RAIM outperforms the conventional RAIM with a higher level of availability and fault detection rate.
Efficient inference for genetic association studies with multiple outcomes.
Ruffieux, Helene; Davison, Anthony C; Hager, Jorg; Irincheeva, Irina
2017-10-01
Combined inference for heterogeneous high-dimensional data is critical in modern biology, where clinical and various kinds of molecular data may be available from a single study. Classical genetic association studies regress a single clinical outcome on many genetic variants one by one, but there is an increasing demand for joint analysis of many molecular outcomes and genetic variants in order to unravel functional interactions. Unfortunately, most existing approaches to joint modeling are either too simplistic to be powerful or are impracticable for computational reasons. Inspired by Richardson and others (2010, Bayesian Statistics 9), we consider a sparse multivariate regression model that allows simultaneous selection of predictors and associated responses. As Markov chain Monte Carlo (MCMC) inference on such models can be prohibitively slow when the number of genetic variants exceeds a few thousand, we propose a variational inference approach which produces posterior information very close to that of MCMC inference, at a much reduced computational cost. Extensive numerical experiments show that our approach outperforms popular variable selection methods and tailored Bayesian procedures, dealing within hours with problems involving hundreds of thousands of genetic variants and tens to hundreds of clinical or molecular outcomes. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Retrieval of Sentence Sequences for an Image Stream via Coherence Recurrent Convolutional Networks.
Park, Cesc Chunseong; Kim, Youngjin; Kim, Gunhee
2018-04-01
We propose an approach for retrieving a sequence of natural sentences for an image stream. Since general users often take a series of pictures on their experiences, much online visual information exists in the form of image streams, for which it would better take into consideration of the whole image stream to produce natural language descriptions. While almost all previous studies have dealt with the relation between a single image and a single natural sentence, our work extends both input and output dimension to a sequence of images and a sequence of sentences. For retrieving a coherent flow of multiple sentences for a photo stream, we propose a multimodal neural architecture called coherence recurrent convolutional network (CRCN), which consists of convolutional neural networks, bidirectional long short-term memory (LSTM) networks, and an entity-based local coherence model. Our approach directly learns from vast user-generated resource of blog posts as text-image parallel training data. We collect more than 22 K unique blog posts with 170 K associated images for the travel topics of NYC, Disneyland , Australia, and Hawaii. We demonstrate that our approach outperforms other state-of-the-art image captioning methods for text sequence generation, using both quantitative measures and user studies via Amazon Mechanical Turk.
A computational intelligent approach to multi-factor analysis of violent crime information system
NASA Astrophysics Data System (ADS)
Liu, Hongbo; Yang, Chao; Zhang, Meng; McLoone, Seán; Sun, Yeqing
2017-02-01
Various scientific studies have explored the causes of violent behaviour from different perspectives, with psychological tests, in particular, applied to the analysis of crime factors. The relationship between bi-factors has also been extensively studied including the link between age and crime. In reality, many factors interact to contribute to criminal behaviour and as such there is a need to have a greater level of insight into its complex nature. In this article we analyse violent crime information systems containing data on psychological, environmental and genetic factors. Our approach combines elements of rough set theory with fuzzy logic and particle swarm optimisation to yield an algorithm and methodology that can effectively extract multi-knowledge from information systems. The experimental results show that our approach outperforms alternative genetic algorithm and dynamic reduct-based techniques for reduct identification and has the added advantage of identifying multiple reducts and hence multi-knowledge (rules). Identified rules are consistent with classical statistical analysis of violent crime data and also reveal new insights into the interaction between several factors. As such, the results are helpful in improving our understanding of the factors contributing to violent crime and in highlighting the existence of hidden and intangible relationships between crime factors.
A multi-objective optimization approach accurately resolves protein domain architectures
Bernardes, J.S.; Vieira, F.R.J.; Zaverucha, G.; Carbone, A.
2016-01-01
Motivation: Given a protein sequence and a number of potential domains matching it, what are the domain content and the most likely domain architecture for the sequence? This problem is of fundamental importance in protein annotation, constituting one of the main steps of all predictive annotation strategies. On the other hand, when potential domains are several and in conflict because of overlapping domain boundaries, finding a solution for the problem might become difficult. An accurate prediction of the domain architecture of a multi-domain protein provides important information for function prediction, comparative genomics and molecular evolution. Results: We developed DAMA (Domain Annotation by a Multi-objective Approach), a novel approach that identifies architectures through a multi-objective optimization algorithm combining scores of domain matches, previously observed multi-domain co-occurrence and domain overlapping. DAMA has been validated on a known benchmark dataset based on CATH structural domain assignments and on the set of Plasmodium falciparum proteins. When compared with existing tools on both datasets, it outperforms all of them. Availability and implementation: DAMA software is implemented in C++ and the source code can be found at http://www.lcqb.upmc.fr/DAMA. Contact: juliana.silva_bernardes@upmc.fr or alessandra.carbone@lip6.fr Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26458889
Segmentation of malignant lesions in 3D breast ultrasound using a depth-dependent model.
Tan, Tao; Gubern-Mérida, Albert; Borelli, Cristina; Manniesing, Rashindra; van Zelst, Jan; Wang, Lei; Zhang, Wei; Platel, Bram; Mann, Ritse M; Karssemeijer, Nico
2016-07-01
Automated 3D breast ultrasound (ABUS) has been proposed as a complementary screening modality to mammography for early detection of breast cancers. To facilitate the interpretation of ABUS images, automated diagnosis and detection techniques are being developed, in which malignant lesion segmentation plays an important role. However, automated segmentation of cancer in ABUS is challenging since lesion edges might not be well defined. In this study, the authors aim at developing an automated segmentation method for malignant lesions in ABUS that is robust to ill-defined cancer edges and posterior shadowing. A segmentation method using depth-guided dynamic programming based on spiral scanning is proposed. The method automatically adjusts aggressiveness of the segmentation according to the position of the voxels relative to the lesion center. Segmentation is more aggressive in the upper part of the lesion (close to the transducer) than at the bottom (far away from the transducer), where posterior shadowing is usually visible. The authors used Dice similarity coefficient (Dice) for evaluation. The proposed method is compared to existing state of the art approaches such as graph cut, level set, and smart opening and an existing dynamic programming method without depth dependence. In a dataset of 78 cancers, our proposed segmentation method achieved a mean Dice of 0.73 ± 0.14. The method outperforms an existing dynamic programming method (0.70 ± 0.16) on this task (p = 0.03) and it is also significantly (p < 0.001) better than graph cut (0.66 ± 0.18), level set based approach (0.63 ± 0.20) and smart opening (0.65 ± 0.12). The proposed depth-guided dynamic programming method achieves accurate breast malignant lesion segmentation results in automated breast ultrasound.
Zhou, Jiyun; Xu, Ruifeng; He, Yulan; Lu, Qin; Wang, Hongpeng; Kong, Bing
2016-01-01
Protein-DNA interactions are involved in many fundamental biological processes essential for cellular function. Most of the existing computational approaches employed only the sequence context of the target residue for its prediction. In the present study, for each target residue, we applied both the spatial context and the sequence context to construct the feature space. Subsequently, Latent Semantic Analysis (LSA) was applied to remove the redundancies in the feature space. Finally, a predictor (PDNAsite) was developed through the integration of the support vector machines (SVM) classifier and ensemble learning. Results on the PDNA-62 and the PDNA-224 datasets demonstrate that features extracted from spatial context provide more information than those from sequence context and the combination of them gives more performance gain. An analysis of the number of binding sites in the spatial context of the target site indicates that the interactions between binding sites next to each other are important for protein-DNA recognition and their binding ability. The comparison between our proposed PDNAsite method and the existing methods indicate that PDNAsite outperforms most of the existing methods and is a useful tool for DNA-binding site identification. A web-server of our predictor (http://hlt.hitsz.edu.cn:8080/PDNAsite/) is made available for free public accessible to the biological research community. PMID:27282833
RENT+: an improved method for inferring local genealogical trees from haplotypes with recombination
Mirzaei, Sajad; Wu, Yufeng
2017-01-01
Abstract Motivation: Haplotypes from one or multiple related populations share a common genealogical history. If this shared genealogy can be inferred from haplotypes, it can be very useful for many population genetics problems. However, with the presence of recombination, the genealogical history of haplotypes is complex and cannot be represented by a single genealogical tree. Therefore, inference of genealogical history with recombination is much more challenging than the case of no recombination. Results: In this paper, we present a new approach called RENT+ for the inference of local genealogical trees from haplotypes with the presence of recombination. RENT+ builds on a previous genealogy inference approach called RENT, which infers a set of related genealogical trees at different genomic positions. RENT+ represents a significant improvement over RENT in the sense that it is more effective in extracting information contained in the haplotype data about the underlying genealogy than RENT. The key components of RENT+ are several greatly enhanced genealogy inference rules. Through simulation, we show that RENT+ is more efficient and accurate than several existing genealogy inference methods. As an application, we apply RENT+ in the inference of population demographic history from haplotypes, which outperforms several existing methods. Availability and Implementation: RENT+ is implemented in Java, and is freely available for download from: https://github.com/SajadMirzaei/RentPlus. Contacts: sajad@engr.uconn.edu or ywu@engr.uconn.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28065901
Investigations of image fusion
NASA Astrophysics Data System (ADS)
Zhang, Zhong
1999-12-01
The objective of image fusion is to combine information from multiple images of the same scene. The result of image fusion is a single image which is more suitable for the purpose of human visual perception or further image processing tasks. In this thesis, a region-based fusion algorithm using the wavelet transform is proposed. The identification of important features in each image, such as edges and regions of interest, are used to guide the fusion process. The idea of multiscale grouping is also introduced and a generic image fusion framework based on multiscale decomposition is studied. The framework includes all of the existing multiscale-decomposition- based fusion approaches we found in the literature which did not assume a statistical model for the source images. Comparisons indicate that our framework includes some new approaches which outperform the existing approaches for the cases we consider. Registration must precede our fusion algorithms. So we proposed a hybrid scheme which uses both feature-based and intensity-based methods. The idea of robust estimation of optical flow from time- varying images is employed with a coarse-to-fine multi- resolution approach and feature-based registration to overcome some of the limitations of the intensity-based schemes. Experiments show that this approach is robust and efficient. Assessing image fusion performance in a real application is a complicated issue. In this dissertation, a mixture probability density function model is used in conjunction with the Expectation- Maximization algorithm to model histograms of edge intensity. Some new techniques are proposed for estimating the quality of a noisy image of a natural scene. Such quality measures can be used to guide the fusion. Finally, we study fusion of images obtained from several copies of a new type of camera developed for video surveillance. Our techniques increase the capability and reliability of the surveillance system and provide an easy way to obtain 3-D information of objects in the space monitored by the system.
ERIC Educational Resources Information Center
Brandstatter, Eduard; Gigerenzer, Gerd; Hertwig, Ralph
2008-01-01
E. Brandstatter, G. Gigerenzer, and R. Hertwig (2006) showed that the priority heuristic matches or outperforms modifications of expected utility theory in predicting choice in 4 diverse problem sets. M. H. Birnbaum (2008) argued that sets exist in which the opposite is true. The authors agree--but stress that all choice strategies have regions of…
Multivariate decoding of brain images using ordinal regression.
Doyle, O M; Ashburner, J; Zelaya, F O; Williams, S C R; Mehta, M A; Marquand, A F
2013-11-01
Neuroimaging data are increasingly being used to predict potential outcomes or groupings, such as clinical severity, drug dose response, and transitional illness states. In these examples, the variable (target) we want to predict is ordinal in nature. Conventional classification schemes assume that the targets are nominal and hence ignore their ranked nature, whereas parametric and/or non-parametric regression models enforce a metric notion of distance between classes. Here, we propose a novel, alternative multivariate approach that overcomes these limitations - whole brain probabilistic ordinal regression using a Gaussian process framework. We applied this technique to two data sets of pharmacological neuroimaging data from healthy volunteers. The first study was designed to investigate the effect of ketamine on brain activity and its subsequent modulation with two compounds - lamotrigine and risperidone. The second study investigates the effect of scopolamine on cerebral blood flow and its modulation using donepezil. We compared ordinal regression to multi-class classification schemes and metric regression. Considering the modulation of ketamine with lamotrigine, we found that ordinal regression significantly outperformed multi-class classification and metric regression in terms of accuracy and mean absolute error. However, for risperidone ordinal regression significantly outperformed metric regression but performed similarly to multi-class classification both in terms of accuracy and mean absolute error. For the scopolamine data set, ordinal regression was found to outperform both multi-class and metric regression techniques considering the regional cerebral blood flow in the anterior cingulate cortex. Ordinal regression was thus the only method that performed well in all cases. Our results indicate the potential of an ordinal regression approach for neuroimaging data while providing a fully probabilistic framework with elegant approaches for model selection. Copyright © 2013. Published by Elsevier Inc.
Female Chess Players Outperform Expectations When Playing Men.
Stafford, Tom
2018-03-01
Stereotype threat has been offered as a potential explanation of differential performance between men and women in some cognitive domains. Questions remain about the reliability and generality of the phenomenon. Previous studies have found that stereotype threat is activated in female chess players when they are matched against male players. I used data from over 5.5 million games of international tournament chess and found no evidence of a stereotype-threat effect. In fact, female players outperform expectations when playing men. Further analysis showed no influence of degree of challenge, player age, nor prevalence of female role models in national chess leagues on differences in performance when women play men versus when they play women. Though this analysis contradicts one specific mechanism of influence of gender stereotypes, the persistent differences between male and female players suggest that systematic factors do exist and remain to be uncovered.
Valeja, Santosh G; Xiu, Lichen; Gregorich, Zachery R; Guner, Huseyin; Jin, Song; Ge, Ying
2015-01-01
To address the complexity of the proteome in mass spectrometry (MS)-based top-down proteomics, multidimensional liquid chromatography (MDLC) strategies that can effectively separate proteins with high resolution and automation are highly desirable. Although various MDLC methods that can effectively separate peptides from protein digests exist, very few MDLC strategies, primarily consisting of 2DLC, are available for intact protein separation, which is insufficient to address the complexity of the proteome. We recently demonstrated that hydrophobic interaction chromatography (HIC) utilizing a MS-compatible salt can provide high resolution separation of intact proteins for top-down proteomics. Herein, we have developed a novel 3DLC strategy by coupling HIC with ion exchange chromatography (IEC) and reverse phase chromatography (RPC) for intact protein separation. We demonstrated that a 3D (IEC-HIC-RPC) approach greatly outperformed the conventional 2D IEC-RPC approach. For the same IEC fraction (out of 35 fractions) from a crude HEK 293 cell lysate, a total of 640 proteins were identified in the 3D approach (corresponding to 201 nonredundant proteins) as compared to 47 in the 2D approach, whereas simply prolonging the gradients in RPC in the 2D approach only led to minimal improvement in protein separation and identifications. Therefore, this novel 3DLC method has great potential for effective separation of intact proteins to achieve deep proteome coverage in top-down proteomics.
González, Juan R; Carrasco, Josep L; Armengol, Lluís; Villatoro, Sergi; Jover, Lluís; Yasui, Yutaka; Estivill, Xavier
2008-01-01
Background MLPA method is a potentially useful semi-quantitative method to detect copy number alterations in targeted regions. In this paper, we propose a method for the normalization procedure based on a non-linear mixed-model, as well as a new approach for determining the statistical significance of altered probes based on linear mixed-model. This method establishes a threshold by using different tolerance intervals that accommodates the specific random error variability observed in each test sample. Results Through simulation studies we have shown that our proposed method outperforms two existing methods that are based on simple threshold rules or iterative regression. We have illustrated the method using a controlled MLPA assay in which targeted regions are variable in copy number in individuals suffering from different disorders such as Prader-Willi, DiGeorge or Autism showing the best performace. Conclusion Using the proposed mixed-model, we are able to determine thresholds to decide whether a region is altered. These threholds are specific for each individual, incorporating experimental variability, resulting in improved sensitivity and specificity as the examples with real data have revealed. PMID:18522760
Ng, Clara; Hauptman, Ruth; Zhang, Yinliang; Bourne, Philip E; Xie, Lei
2014-01-01
The emergence of multi-drug and extensive drug resistance of microbes to antibiotics poses a great threat to human health. Although drug repurposing is a promising solution for accelerating the drug development process, its application to anti-infectious drug discovery is limited by the scope of existing phenotype-, ligand-, or target-based methods. In this paper we introduce a new computational strategy to determine the genome-wide molecular targets of bioactive compounds in both human and bacterial genomes. Our method is based on the use of a novel algorithm, ligand Enrichment of Network Topological Similarity (ligENTS), to map the chemical universe to its global pharmacological space. ligENTS outperforms the state-of-the-art algorithms in identifying novel drug-target relationships. Furthermore, we integrate ligENTS with our structural systems biology platform to identify drug repurposing opportunities via target similarity profiling. Using this integrated strategy, we have identified novel P. falciparum targets of drug-like active compounds from the Malaria Box, and suggest that a number of approved drugs may be active against malaria. This study demonstrates the potential of an integrative chemical genomics and structural systems biology approach to drug repurposing.
Still-to-video face recognition in unconstrained environments
NASA Astrophysics Data System (ADS)
Wang, Haoyu; Liu, Changsong; Ding, Xiaoqing
2015-02-01
Face images from video sequences captured in unconstrained environments usually contain several kinds of variations, e.g. pose, facial expression, illumination, image resolution and occlusion. Motion blur and compression artifacts also deteriorate recognition performance. Besides, in various practical systems such as law enforcement, video surveillance and e-passport identification, only a single still image per person is enrolled as the gallery set. Many existing methods may fail to work due to variations in face appearances and the limit of available gallery samples. In this paper, we propose a novel approach for still-to-video face recognition in unconstrained environments. By assuming that faces from still images and video frames share the same identity space, a regularized least squares regression method is utilized to tackle the multi-modality problem. Regularization terms based on heuristic assumptions are enrolled to avoid overfitting. In order to deal with the single image per person problem, we exploit face variations learned from training sets to synthesize virtual samples for gallery samples. We adopt a learning algorithm combining both affine/convex hull-based approach and regularizations to match image sets. Experimental results on a real-world dataset consisting of unconstrained video sequences demonstrate that our method outperforms the state-of-the-art methods impressively.
NASA Astrophysics Data System (ADS)
Mercier, Sylvain; Gratton, Serge; Tardieu, Nicolas; Vasseur, Xavier
2017-12-01
Many applications in structural mechanics require the numerical solution of sequences of linear systems typically issued from a finite element discretization of the governing equations on fine meshes. The method of Lagrange multipliers is often used to take into account mechanical constraints. The resulting matrices then exhibit a saddle point structure and the iterative solution of such preconditioned linear systems is considered as challenging. A popular strategy is then to combine preconditioning and deflation to yield an efficient method. We propose an alternative that is applicable to the general case and not only to matrices with a saddle point structure. In this approach, we consider to update an existing algebraic or application-based preconditioner, using specific available information exploiting the knowledge of an approximate invariant subspace or of matrix-vector products. The resulting preconditioner has the form of a limited memory quasi-Newton matrix and requires a small number of linearly independent vectors. Numerical experiments performed on three large-scale applications in elasticity highlight the relevance of the new approach. We show that the proposed method outperforms the deflation method when considering sequences of linear systems with varying matrices.
ZK DrugResist 2.0: A TextMiner to extract semantic relations of drug resistance from PubMed.
Khalid, Zoya; Sezerman, Osman Ugur
2017-05-01
Extracting useful knowledge from an unstructured textual data is a challenging task for biologists, since biomedical literature is growing exponentially on a daily basis. Building an automated method for such tasks is gaining much attention of researchers. ZK DrugResist is an online tool that automatically extracts mutations and expression changes associated with drug resistance from PubMed. In this study we have extended our tool to include semantic relations extracted from biomedical text covering drug resistance and established a server including both of these features. Our system was tested for three relations, Resistance (R), Intermediate (I) and Susceptible (S) by applying hybrid feature set. From the last few decades the focus has changed to hybrid approaches as it provides better results. In our case this approach combines rule-based methods with machine learning techniques. The results showed 97.67% accuracy with 96% precision, recall and F-measure. The results have outperformed the previously existing relation extraction systems thus can facilitate computational analysis of drug resistance against complex diseases and further can be implemented on other areas of biomedicine. Copyright © 2017 Elsevier Inc. All rights reserved.
Using GO-WAR for mining cross-ontology weighted association rules.
Agapito, Giuseppe; Cannataro, Mario; Guzzi, Pietro Hiram; Milano, Marianna
2015-07-01
The Gene Ontology (GO) is a structured repository of concepts (GO terms) that are associated to one or more gene products. The process of association is referred to as annotation. The relevance and the specificity of both GO terms and annotations are evaluated by a measure defined as information content (IC). The analysis of annotated data is thus an important challenge for bioinformatics. There exist different approaches of analysis. From those, the use of association rules (AR) may provide useful knowledge, and it has been used in some applications, e.g. improving the quality of annotations. Nevertheless classical association rules algorithms do not take into account the source of annotation nor the importance yielding to the generation of candidate rules with low IC. This paper presents GO-WAR (Gene Ontology-based Weighted Association Rules) a methodology for extracting weighted association rules. GO-WAR can extract association rules with a high level of IC without loss of support and confidence from a dataset of annotated data. A case study on using of GO-WAR on publicly available GO annotation datasets is used to demonstrate that our method outperforms current state of the art approaches. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Development of Single-Channel Hybrid BCI System Using Motor Imagery and SSVEP.
Ko, Li-Wei; Ranga, S S K; Komarov, Oleksii; Chen, Chung-Chiang
2017-01-01
Numerous EEG-based brain-computer interface (BCI) systems that are being developed focus on novel feature extraction algorithms, classification methods and combining existing approaches to create hybrid BCIs. Several recent studies demonstrated various advantages of hybrid BCI systems in terms of an improved accuracy or number of commands available for the user. But still, BCI systems are far from realization for daily use. Having high performance with less number of channels is one of the challenging issues that persists, especially with hybrid BCI systems, where multiple channels are necessary to record information from two or more EEG signal components. Therefore, this work proposes a single-channel (C3 or C4) hybrid BCI system that combines motor imagery (MI) and steady-state visually evoked potential (SSVEP) approaches. This study demonstrates that besides MI features, SSVEP features can also be captured from C3 or C4 channel. The results show that due to rich feature information (MI and SSVEP) at these channels, the proposed hybrid BCI system outperforms both MI- and SSVEP-based systems having an average classification accuracy of 85.6 ± 7.7% in a two-class task.
Sparse kernel methods for high-dimensional survival data.
Evers, Ludger; Messow, Claudia-Martina
2008-07-15
Sparse kernel methods like support vector machines (SVM) have been applied with great success to classification and (standard) regression settings. Existing support vector classification and regression techniques however are not suitable for partly censored survival data, which are typically analysed using Cox's proportional hazards model. As the partial likelihood of the proportional hazards model only depends on the covariates through inner products, it can be 'kernelized'. The kernelized proportional hazards model however yields a solution that is dense, i.e. the solution depends on all observations. One of the key features of an SVM is that it yields a sparse solution, depending only on a small fraction of the training data. We propose two methods. One is based on a geometric idea, where-akin to support vector classification-the margin between the failed observation and the observations currently at risk is maximised. The other approach is based on obtaining a sparse model by adding observations one after another akin to the Import Vector Machine (IVM). Data examples studied suggest that both methods can outperform competing approaches. Software is available under the GNU Public License as an R package and can be obtained from the first author's website http://www.maths.bris.ac.uk/~maxle/software.html.
Hasan, Md Mehedi; Khatun, Mst Shamima; Mollah, Md Nurul Haque; Yong, Cao; Guo, Dianjing
2017-01-01
Lysine succinylation, an important type of protein posttranslational modification, plays significant roles in many cellular processes. Accurate identification of succinylation sites can facilitate our understanding about the molecular mechanism and potential roles of lysine succinylation. However, even in well-studied systems, a majority of the succinylation sites remain undetected because the traditional experimental approaches to succinylation site identification are often costly, time-consuming, and laborious. In silico approach, on the other hand, is potentially an alternative strategy to predict succinylation substrates. In this paper, a novel computational predictor SuccinSite2.0 was developed for predicting generic and species-specific protein succinylation sites. This predictor takes the composition of profile-based amino acid and orthogonal binary features, which were used to train a random forest classifier. We demonstrated that the proposed SuccinSite2.0 predictor outperformed other currently existing implementations on a complementarily independent dataset. Furthermore, the important features that make visible contributions to species-specific and cross-species-specific prediction of protein succinylation site were analyzed. The proposed predictor is anticipated to be a useful computational resource for lysine succinylation site prediction. The integrated species-specific online tool of SuccinSite2.0 is publicly accessible.
Zhong, Sheng; McPeek, Mary Sara
2016-01-01
We consider the problem of genetic association testing of a binary trait in a sample that contains related individuals, where we adjust for relevant covariates and allow for missing data. We propose CERAMIC, an estimating equation approach that can be viewed as a hybrid of logistic regression and linear mixed-effects model (LMM) approaches. CERAMIC extends the recently proposed CARAT method to allow samples with related individuals and to incorporate partially missing data. In simulations, we show that CERAMIC outperforms existing LMM and generalized LMM approaches, maintaining high power and correct type 1 error across a wider range of scenarios. CERAMIC results in a particularly large power increase over existing methods when the sample includes related individuals with some missing data (e.g., when some individuals with phenotype and covariate information have missing genotype), because CERAMIC is able to make use of the relationship information to incorporate partially missing data in the analysis while correcting for dependence. Because CERAMIC is based on a retrospective analysis, it is robust to misspecification of the phenotype model, resulting in better control of type 1 error and higher power than that of prospective methods, such as GMMAT, when the phenotype model is misspecified. CERAMIC is computationally efficient for genomewide analysis in samples of related individuals of almost any configuration, including small families, unrelated individuals and even large, complex pedigrees. We apply CERAMIC to data on type 2 diabetes (T2D) from the Framingham Heart Study. In a genome scan, 9 of the 10 smallest CERAMIC p-values occur in or near either known T2D susceptibility loci or plausible candidates, verifying that CERAMIC is able to home in on the important loci in a genome scan. PMID:27695091
Comparing three pedagogical approaches to psychomotor skills acquisition.
Willis, Ross E; Richa, Jacqueline; Oppeltz, Richard; Nguyen, Patrick; Wagner, Kelly; Van Sickle, Kent R; Dent, Daniel L
2012-01-01
We compared traditional pedagogical approaches such as time- and repetition-based methods with proficiency-based training. Laparoscopic novices were assigned randomly to 1 of 3 training conditions. In experiment 1, participants in the time condition practiced for 60 minutes, participants in the repetition condition performed 5 practice trials, and participants in the proficiency condition trained until reaching a predetermined proficiency goal. In experiment 2, practice time and number of trials were equated across conditions. In experiment 1, participants in the proficiency-based training conditions outperformed participants in the other 2 conditions (P < .014); however, these participants trained longer (P < .001) and performed more repetitions (P < .001). In experiment 2, despite training for similar amounts of time and number of repetitions, participants in the proficiency condition outperformed their counterparts (P < .038). In both experiments, the standard deviations for the proficiency condition were smaller than the other conditions. Proficiency-based training results in trainees who perform uniformly and at a higher level than traditional training methodologies. Copyright © 2012 Elsevier Inc. All rights reserved.
Boosting medical diagnostics by pooling independent judgments
Kurvers, Ralf H. J. M.; Herzog, Stefan M.; Hertwig, Ralph; Krause, Jens; Carney, Patricia A.; Bogart, Andy; Argenziano, Giuseppe; Zalaudek, Iris; Wolf, Max
2016-01-01
Collective intelligence refers to the ability of groups to outperform individual decision makers when solving complex cognitive problems. Despite its potential to revolutionize decision making in a wide range of domains, including medical, economic, and political decision making, at present, little is known about the conditions underlying collective intelligence in real-world contexts. We here focus on two key areas of medical diagnostics, breast and skin cancer detection. Using a simulation study that draws on large real-world datasets, involving more than 140 doctors making more than 20,000 diagnoses, we investigate when combining the independent judgments of multiple doctors outperforms the best doctor in a group. We find that similarity in diagnostic accuracy is a key condition for collective intelligence: Aggregating the independent judgments of doctors outperforms the best doctor in a group whenever the diagnostic accuracy of doctors is relatively similar, but not when doctors’ diagnostic accuracy differs too much. This intriguingly simple result is highly robust and holds across different group sizes, performance levels of the best doctor, and collective intelligence rules. The enabling role of similarity, in turn, is explained by its systematic effects on the number of correct and incorrect decisions of the best doctor that are overruled by the collective. By identifying a key factor underlying collective intelligence in two important real-world contexts, our findings pave the way for innovative and more effective approaches to complex real-world decision making, and to the scientific analyses of those approaches. PMID:27432950
Binary Interval Search: a scalable algorithm for counting interval intersections.
Layer, Ryan M; Skadron, Kevin; Robins, Gabriel; Hall, Ira M; Quinlan, Aaron R
2013-01-01
The comparison of diverse genomic datasets is fundamental to understand genome biology. Researchers must explore many large datasets of genome intervals (e.g. genes, sequence alignments) to place their experimental results in a broader context and to make new discoveries. Relationships between genomic datasets are typically measured by identifying intervals that intersect, that is, they overlap and thus share a common genome interval. Given the continued advances in DNA sequencing technologies, efficient methods for measuring statistically significant relationships between many sets of genomic features are crucial for future discovery. We introduce the Binary Interval Search (BITS) algorithm, a novel and scalable approach to interval set intersection. We demonstrate that BITS outperforms existing methods at counting interval intersections. Moreover, we show that BITS is intrinsically suited to parallel computing architectures, such as graphics processing units by illustrating its utility for efficient Monte Carlo simulations measuring the significance of relationships between sets of genomic intervals. https://github.com/arq5x/bits.
Fragmenting networks by targeting collective influencers at a mesoscopic level.
Kobayashi, Teruyoshi; Masuda, Naoki
2016-11-25
A practical approach to protecting networks against epidemic processes such as spreading of infectious diseases, malware, and harmful viral information is to remove some influential nodes beforehand to fragment the network into small components. Because determining the optimal order to remove nodes is a computationally hard problem, various approximate algorithms have been proposed to efficiently fragment networks by sequential node removal. Morone and Makse proposed an algorithm employing the non-backtracking matrix of given networks, which outperforms various existing algorithms. In fact, many empirical networks have community structure, compromising the assumption of local tree-like structure on which the original algorithm is based. We develop an immunization algorithm by synergistically combining the Morone-Makse algorithm and coarse graining of the network in which we regard a community as a supernode. In this way, we aim to identify nodes that connect different communities at a reasonable computational cost. The proposed algorithm works more efficiently than the Morone-Makse and other algorithms on networks with community structure.
Fragmenting networks by targeting collective influencers at a mesoscopic level
NASA Astrophysics Data System (ADS)
Kobayashi, Teruyoshi; Masuda, Naoki
2016-11-01
A practical approach to protecting networks against epidemic processes such as spreading of infectious diseases, malware, and harmful viral information is to remove some influential nodes beforehand to fragment the network into small components. Because determining the optimal order to remove nodes is a computationally hard problem, various approximate algorithms have been proposed to efficiently fragment networks by sequential node removal. Morone and Makse proposed an algorithm employing the non-backtracking matrix of given networks, which outperforms various existing algorithms. In fact, many empirical networks have community structure, compromising the assumption of local tree-like structure on which the original algorithm is based. We develop an immunization algorithm by synergistically combining the Morone-Makse algorithm and coarse graining of the network in which we regard a community as a supernode. In this way, we aim to identify nodes that connect different communities at a reasonable computational cost. The proposed algorithm works more efficiently than the Morone-Makse and other algorithms on networks with community structure.
Fragmenting networks by targeting collective influencers at a mesoscopic level
Kobayashi, Teruyoshi; Masuda, Naoki
2016-01-01
A practical approach to protecting networks against epidemic processes such as spreading of infectious diseases, malware, and harmful viral information is to remove some influential nodes beforehand to fragment the network into small components. Because determining the optimal order to remove nodes is a computationally hard problem, various approximate algorithms have been proposed to efficiently fragment networks by sequential node removal. Morone and Makse proposed an algorithm employing the non-backtracking matrix of given networks, which outperforms various existing algorithms. In fact, many empirical networks have community structure, compromising the assumption of local tree-like structure on which the original algorithm is based. We develop an immunization algorithm by synergistically combining the Morone-Makse algorithm and coarse graining of the network in which we regard a community as a supernode. In this way, we aim to identify nodes that connect different communities at a reasonable computational cost. The proposed algorithm works more efficiently than the Morone-Makse and other algorithms on networks with community structure. PMID:27886251
Boyen, Peter; Van Dyck, Dries; Neven, Frank; van Ham, Roeland C H J; van Dijk, Aalt D J
2011-01-01
Correlated motif mining (cmm) is the problem of finding overrepresented pairs of patterns, called motifs, in sequences of interacting proteins. Algorithmic solutions for cmm thereby provide a computational method for predicting binding sites for protein interaction. In this paper, we adopt a motif-driven approach where the support of candidate motif pairs is evaluated in the network. We experimentally establish the superiority of the Chi-square-based support measure over other support measures. Furthermore, we obtain that cmm is an np-hard problem for a large class of support measures (including Chi-square) and reformulate the search for correlated motifs as a combinatorial optimization problem. We then present the generic metaheuristic slider which uses steepest ascent with a neighborhood function based on sliding motifs and employs the Chi-square-based support measure. We show that slider outperforms existing motif-driven cmm methods and scales to large protein-protein interaction networks. The slider-implementation and the data used in the experiments are available on http://bioinformatics.uhasselt.be.
El Hanandeh, Ali; El-Zein, Abbas
2010-01-01
A modified version of the multi-criteria decision aid, ELECTRE III has been developed to account for uncertainty in criteria weightings and threshold values. The new procedure, called ELECTRE-SS, modifies the exploitation phase in ELECTRE III, through a new definition of the pre-order and the introduction of a ranking index (RI). The new approach accommodates cases where incomplete or uncertain preference data are present. The method is applied to a case of selecting a management strategy for the bio-degradable fraction in the municipal solid waste of Sydney. Ten alternatives are compared against 11 criteria. The results show that anaerobic digestion (AD) and composting of paper are less environmentally sound options than recycling. AD is likely to out-perform incineration where a market for heating does not exist. Moreover, landfilling can be a sound alternative, when considering overall performance and conditions of uncertainty.
Surface Estimation, Variable Selection, and the Nonparametric Oracle Property.
Storlie, Curtis B; Bondell, Howard D; Reich, Brian J; Zhang, Hao Helen
2011-04-01
Variable selection for multivariate nonparametric regression is an important, yet challenging, problem due, in part, to the infinite dimensionality of the function space. An ideal selection procedure should be automatic, stable, easy to use, and have desirable asymptotic properties. In particular, we define a selection procedure to be nonparametric oracle (np-oracle) if it consistently selects the correct subset of predictors and at the same time estimates the smooth surface at the optimal nonparametric rate, as the sample size goes to infinity. In this paper, we propose a model selection procedure for nonparametric models, and explore the conditions under which the new method enjoys the aforementioned properties. Developed in the framework of smoothing spline ANOVA, our estimator is obtained via solving a regularization problem with a novel adaptive penalty on the sum of functional component norms. Theoretical properties of the new estimator are established. Additionally, numerous simulated and real examples further demonstrate that the new approach substantially outperforms other existing methods in the finite sample setting.
Surface Estimation, Variable Selection, and the Nonparametric Oracle Property
Storlie, Curtis B.; Bondell, Howard D.; Reich, Brian J.; Zhang, Hao Helen
2010-01-01
Variable selection for multivariate nonparametric regression is an important, yet challenging, problem due, in part, to the infinite dimensionality of the function space. An ideal selection procedure should be automatic, stable, easy to use, and have desirable asymptotic properties. In particular, we define a selection procedure to be nonparametric oracle (np-oracle) if it consistently selects the correct subset of predictors and at the same time estimates the smooth surface at the optimal nonparametric rate, as the sample size goes to infinity. In this paper, we propose a model selection procedure for nonparametric models, and explore the conditions under which the new method enjoys the aforementioned properties. Developed in the framework of smoothing spline ANOVA, our estimator is obtained via solving a regularization problem with a novel adaptive penalty on the sum of functional component norms. Theoretical properties of the new estimator are established. Additionally, numerous simulated and real examples further demonstrate that the new approach substantially outperforms other existing methods in the finite sample setting. PMID:21603586
Islam, Jyoti; Zhang, Yanqing
2018-05-31
Alzheimer's disease is an incurable, progressive neurological brain disorder. Earlier detection of Alzheimer's disease can help with proper treatment and prevent brain tissue damage. Several statistical and machine learning models have been exploited by researchers for Alzheimer's disease diagnosis. Analyzing magnetic resonance imaging (MRI) is a common practice for Alzheimer's disease diagnosis in clinical research. Detection of Alzheimer's disease is exacting due to the similarity in Alzheimer's disease MRI data and standard healthy MRI data of older people. Recently, advanced deep learning techniques have successfully demonstrated human-level performance in numerous fields including medical image analysis. We propose a deep convolutional neural network for Alzheimer's disease diagnosis using brain MRI data analysis. While most of the existing approaches perform binary classification, our model can identify different stages of Alzheimer's disease and obtains superior performance for early-stage diagnosis. We conducted ample experiments to demonstrate that our proposed model outperformed comparative baselines on the Open Access Series of Imaging Studies dataset.
Serang, Oliver
2012-01-01
Linear programming (LP) problems are commonly used in analysis and resource allocation, frequently surfacing as approximations to more difficult problems. Existing approaches to LP have been dominated by a small group of methods, and randomized algorithms have not enjoyed popularity in practice. This paper introduces a novel randomized method of solving LP problems by moving along the facets and within the interior of the polytope along rays randomly sampled from the polyhedral cones defined by the bounding constraints. This conic sampling method is then applied to randomly sampled LPs, and its runtime performance is shown to compare favorably to the simplex and primal affine-scaling algorithms, especially on polytopes with certain characteristics. The conic sampling method is then adapted and applied to solve a certain quadratic program, which compute a projection onto a polytope; the proposed method is shown to outperform the proprietary software Mathematica on large, sparse QP problems constructed from mass spectometry-based proteomics. PMID:22952741
Crawford, Forrest W.; Suchard, Marc A.
2011-01-01
A birth-death process is a continuous-time Markov chain that counts the number of particles in a system over time. In the general process with n current particles, a new particle is born with instantaneous rate λn and a particle dies with instantaneous rate μn. Currently no robust and efficient method exists to evaluate the finite-time transition probabilities in a general birth-death process with arbitrary birth and death rates. In this paper, we first revisit the theory of continued fractions to obtain expressions for the Laplace transforms of these transition probabilities and make explicit an important derivation connecting transition probabilities and continued fractions. We then develop an efficient algorithm for computing these probabilities that analyzes the error associated with approximations in the method. We demonstrate that this error-controlled method agrees with known solutions and outperforms previous approaches to computing these probabilities. Finally, we apply our novel method to several important problems in ecology, evolution, and genetics. PMID:21984359
A Hybrid alldifferent-Tabu Search Algorithm for Solving Sudoku Puzzles
Crawford, Broderick; Paredes, Fernando; Norero, Enrique
2015-01-01
The Sudoku problem is a well-known logic-based puzzle of combinatorial number-placement. It consists in filling a n 2 × n 2 grid, composed of n columns, n rows, and n subgrids, each one containing distinct integers from 1 to n 2. Such a puzzle belongs to the NP-complete collection of problems, to which there exist diverse exact and approximate methods able to solve it. In this paper, we propose a new hybrid algorithm that smartly combines a classic tabu search procedure with the alldifferent global constraint from the constraint programming world. The alldifferent constraint is known to be efficient for domain filtering in the presence of constraints that must be pairwise different, which are exactly the kind of constraints that Sudokus own. This ability clearly alleviates the work of the tabu search, resulting in a faster and more robust approach for solving Sudokus. We illustrate interesting experimental results where our proposed algorithm outperforms the best results previously reported by hybrids and approximate methods. PMID:26078751
Aligning for Innovation - Alignment Strategy to Drive Innovation
NASA Technical Reports Server (NTRS)
Johnson, Hurel; Teltschik, David; Bussey, Horace, Jr.; Moy, James
2010-01-01
With the sudden need for innovation that will help the country achieve its long-term space exploration objectives, the question of whether NASA is aligned effectively to drive the innovation that it so desperately needs to take space exploration to the next level should be entertained. Authors such as Robert Kaplan and David North have noted that companies that use a formal system for implementing strategy consistently outperform their peers. They have outlined a six-stage management systems model for implementing strategy, which includes the aligning of the organization towards its objectives. This involves the alignment of the organization from the top down. This presentation will explore the impacts of existing U.S. industrial policy on technological innovation; assess the current NASA organizational alignment and its impacts on driving technological innovation; and finally suggest an alternative approach that may drive the innovation needed to take the world to the next level of space exploration, with NASA truly leading the way.
Refining Automatically Extracted Knowledge Bases Using Crowdsourcing
Xian, Xuefeng; Cui, Zhiming
2017-01-01
Machine-constructed knowledge bases often contain noisy and inaccurate facts. There exists significant work in developing automated algorithms for knowledge base refinement. Automated approaches improve the quality of knowledge bases but are far from perfect. In this paper, we leverage crowdsourcing to improve the quality of automatically extracted knowledge bases. As human labelling is costly, an important research challenge is how we can use limited human resources to maximize the quality improvement for a knowledge base. To address this problem, we first introduce a concept of semantic constraints that can be used to detect potential errors and do inference among candidate facts. Then, based on semantic constraints, we propose rank-based and graph-based algorithms for crowdsourced knowledge refining, which judiciously select the most beneficial candidate facts to conduct crowdsourcing and prune unnecessary questions. Our experiments show that our method improves the quality of knowledge bases significantly and outperforms state-of-the-art automatic methods under a reasonable crowdsourcing cost. PMID:28588611
NASA Astrophysics Data System (ADS)
Biazzo, Indaco; Braunstein, Alfredo; Zecchina, Riccardo
2012-08-01
We study the behavior of an algorithm derived from the cavity method for the prize-collecting steiner tree (PCST) problem on graphs. The algorithm is based on the zero temperature limit of the cavity equations and as such is formally simple (a fixed point equation resolved by iteration) and distributed (parallelizable). We provide a detailed comparison with state-of-the-art algorithms on a wide range of existing benchmarks, networks, and random graphs. Specifically, we consider an enhanced derivative of the Goemans-Williamson heuristics and the dhea solver, a branch and cut integer linear programming based approach. The comparison shows that the cavity algorithm outperforms the two algorithms in most large instances both in running time and quality of the solution. Finally we prove a few optimality properties of the solutions provided by our algorithm, including optimality under the two postprocessing procedures defined in the Goemans-Williamson derivative and global optimality in some limit cases.
Fuzzy logic-based approach to detecting a passive RFID tag in an outpatient clinic.
Min, Daiki; Yih, Yuehwern
2011-06-01
This study is motivated by the observations on the data collected by radio frequency identification (RFID) readers in a pilot study, which was used to investigate the feasibility of implementing an RFID-based monitoring system in an outpatient eye clinic. The raw RFID data collected from RFID readers contain noise and missing reads, which prevent us from determining the tag location. In this paper, fuzzy logic-based algorithms are proposed to interpret the raw RFID data to extract accurate information. The proposed algorithms determine the location of an RFID tag by evaluating its possibility of presence and absence. To evaluate the performance of the proposed algorithms, numerical experiments are conducted using the data observed in the outpatient eye clinic. Experiments results showed that the proposed algorithms outperform existing static smoothing method in terms of minimizing both false positives and false negatives. Furthermore, the proposed algorithms are applied to a set of simulated data to show the robustness of the proposed algorithms at various levels of RFID reader reliability.
Kohonen, Pekka; Parkkinen, Juuso A.; Willighagen, Egon L.; Ceder, Rebecca; Wennerberg, Krister; Kaski, Samuel; Grafström, Roland C.
2017-01-01
Predicting unanticipated harmful effects of chemicals and drug molecules is a difficult and costly task. Here we utilize a ‘big data compacting and data fusion’—concept to capture diverse adverse outcomes on cellular and organismal levels. The approach generates from transcriptomics data set a ‘predictive toxicogenomics space’ (PTGS) tool composed of 1,331 genes distributed over 14 overlapping cytotoxicity-related gene space components. Involving ∼2.5 × 108 data points and 1,300 compounds to construct and validate the PTGS, the tool serves to: explain dose-dependent cytotoxicity effects, provide a virtual cytotoxicity probability estimate intrinsic to omics data, predict chemically-induced pathological states in liver resulting from repeated dosing of rats, and furthermore, predict human drug-induced liver injury (DILI) from hepatocyte experiments. Analysing 68 DILI-annotated drugs, the PTGS tool outperforms and complements existing tests, leading to a hereto-unseen level of DILI prediction accuracy. PMID:28671182
Shah, Sohil Atul
2017-01-01
Clustering is a fundamental procedure in the analysis of scientific data. It is used ubiquitously across the sciences. Despite decades of research, existing clustering algorithms have limited effectiveness in high dimensions and often require tuning parameters for different domains and datasets. We present a clustering algorithm that achieves high accuracy across multiple domains and scales efficiently to high dimensions and large datasets. The presented algorithm optimizes a smooth continuous objective, which is based on robust statistics and allows heavily mixed clusters to be untangled. The continuous nature of the objective also allows clustering to be integrated as a module in end-to-end feature learning pipelines. We demonstrate this by extending the algorithm to perform joint clustering and dimensionality reduction by efficiently optimizing a continuous global objective. The presented approach is evaluated on large datasets of faces, hand-written digits, objects, newswire articles, sensor readings from the Space Shuttle, and protein expression levels. Our method achieves high accuracy across all datasets, outperforming the best prior algorithm by a factor of 3 in average rank. PMID:28851838
Wavelet transform approach for fitting financial time series data
NASA Astrophysics Data System (ADS)
Ahmed, Amel Abdoullah; Ismail, Mohd Tahir
2015-10-01
This study investigates a newly developed technique; a combined wavelet filtering and VEC model, to study the dynamic relationship among financial time series. Wavelet filter has been used to annihilate noise data in daily data set of NASDAQ stock market of US, and three stock markets of Middle East and North Africa (MENA) region, namely, Egypt, Jordan, and Istanbul. The data covered is from 6/29/2001 to 5/5/2009. After that, the returns of generated series by wavelet filter and original series are analyzed by cointegration test and VEC model. The results show that the cointegration test affirms the existence of cointegration between the studied series, and there is a long-term relationship between the US, stock markets and MENA stock markets. A comparison between the proposed model and traditional model demonstrates that, the proposed model (DWT with VEC model) outperforms traditional model (VEC model) to fit the financial stock markets series well, and shows real information about these relationships among the stock markets.
A Study on the Secure User Profiling Structure and Procedure for Home Healthcare Systems.
Ko, Hoon; Song, MoonBae
2016-01-01
Despite of various benefits such as a convenience and efficiency, home healthcare systems have some inherent security risks that may cause a serious leak on personal health information. This work presents a Secure User Profiling Structure which has the patient information including their health information. A patient and a hospital keep it at that same time, they share the updated data. While they share the data and communicate, the data can be leaked. To solve the security problems, a secure communication channel with a hash function and an One-Time Password between a client and a hospital should be established and to generate an input value to an OTP, it uses a dual hash-function. This work presents a dual hash function-based approach to generate the One-Time Password ensuring a secure communication channel with the secured key. In result, attackers are unable to decrypt the leaked information because of the secured key; in addition, the proposed method outperforms the existing methods in terms of computation cost.
Thermal transport in binary colloidal glasses: Composition dependence and percolation assessment
NASA Astrophysics Data System (ADS)
Ruckdeschel, Pia; Philipp, Alexandra; Kopera, Bernd A. F.; Bitterlich, Flora; Dulle, Martin; Pech-May, Nelson W.; Retsch, Markus
2018-02-01
The combination of various types of materials is often used to create superior composites that outperform the pure phase components. For any rational design, the thermal conductivity of the composite as a function of the volume fraction of the filler component needs to be known. When approaching the nanoscale, the homogeneous mixture of various components poses an additional challenge. Here, we investigate binary nanocomposite materials based on polymer latex beads and hollow silica nanoparticles. These form randomly mixed colloidal glasses on a sub-μ m scale. We focus on the heat transport properties through such binary assembly structures. The thermal conductivity can be well described by the effective medium theory. However, film formation of the soft polymer component leads to phase segregation and a mismatch between existing mixing models. We confirm our experimental data by finite element modeling. This additionally allowed us to assess the onset of thermal transport percolation in such random particulate structures. Our study contributes to a better understanding of thermal transport through heterostructured particulate assemblies.
An auto-adaptive optimization approach for targeting nonpoint source pollution control practices.
Chen, Lei; Wei, Guoyuan; Shen, Zhenyao
2015-10-21
To solve computationally intensive and technically complex control of nonpoint source pollution, the traditional genetic algorithm was modified into an auto-adaptive pattern, and a new framework was proposed by integrating this new algorithm with a watershed model and an economic module. Although conceptually simple and comprehensive, the proposed algorithm would search automatically for those Pareto-optimality solutions without a complex calibration of optimization parameters. The model was applied in a case study in a typical watershed of the Three Gorges Reservoir area, China. The results indicated that the evolutionary process of optimization was improved due to the incorporation of auto-adaptive parameters. In addition, the proposed algorithm outperformed the state-of-the-art existing algorithms in terms of convergence ability and computational efficiency. At the same cost level, solutions with greater pollutant reductions could be identified. From a scientific viewpoint, the proposed algorithm could be extended to other watersheds to provide cost-effective configurations of BMPs.
A Hybrid alldifferent-Tabu Search Algorithm for Solving Sudoku Puzzles.
Soto, Ricardo; Crawford, Broderick; Galleguillos, Cristian; Paredes, Fernando; Norero, Enrique
2015-01-01
The Sudoku problem is a well-known logic-based puzzle of combinatorial number-placement. It consists in filling a n(2) × n(2) grid, composed of n columns, n rows, and n subgrids, each one containing distinct integers from 1 to n(2). Such a puzzle belongs to the NP-complete collection of problems, to which there exist diverse exact and approximate methods able to solve it. In this paper, we propose a new hybrid algorithm that smartly combines a classic tabu search procedure with the alldifferent global constraint from the constraint programming world. The alldifferent constraint is known to be efficient for domain filtering in the presence of constraints that must be pairwise different, which are exactly the kind of constraints that Sudokus own. This ability clearly alleviates the work of the tabu search, resulting in a faster and more robust approach for solving Sudokus. We illustrate interesting experimental results where our proposed algorithm outperforms the best results previously reported by hybrids and approximate methods.
A Robust Statistics Approach to Minimum Variance Portfolio Optimization
NASA Astrophysics Data System (ADS)
Yang, Liusha; Couillet, Romain; McKay, Matthew R.
2015-12-01
We study the design of portfolios under a minimum risk criterion. The performance of the optimized portfolio relies on the accuracy of the estimated covariance matrix of the portfolio asset returns. For large portfolios, the number of available market returns is often of similar order to the number of assets, so that the sample covariance matrix performs poorly as a covariance estimator. Additionally, financial market data often contain outliers which, if not correctly handled, may further corrupt the covariance estimation. We address these shortcomings by studying the performance of a hybrid covariance matrix estimator based on Tyler's robust M-estimator and on Ledoit-Wolf's shrinkage estimator while assuming samples with heavy-tailed distribution. Employing recent results from random matrix theory, we develop a consistent estimator of (a scaled version of) the realized portfolio risk, which is minimized by optimizing online the shrinkage intensity. Our portfolio optimization method is shown via simulations to outperform existing methods both for synthetic and real market data.
Static and Dynamic Frequency Scaling on Multicore CPUs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bao, Wenlei; Hong, Changwan; Chunduri, Sudheer
2016-12-28
Dynamic voltage and frequency scaling (DVFS) adapts CPU power consumption by modifying a processor’s operating frequency (and the associated voltage). Typical approaches employing DVFS involve default strategies such as running at the lowest or the highest frequency, or observing the CPU’s runtime behavior and dynamically adapting the voltage/frequency configuration based on CPU usage. In this paper, we argue that many previous approaches suffer from inherent limitations, such as not account- ing for processor-specific impact of frequency changes on energy for different workload types. We first propose a lightweight runtime-based approach to automatically adapt the frequency based on the CPU workload,more » that is agnostic of the processor characteristics. We then show that further improvements can be achieved for affine kernels in the application, using a compile-time characterization instead of run-time monitoring to select the frequency and number of CPU cores to use. Our framework relies on a one-time energy characterization of CPU-specific DVFS profiles followed by a compile-time categorization of loop-based code segments in the application. These are combined to determine a priori of the frequency and the number of cores to use to execute the application so as to optimize energy or energy-delay product, outperforming runtime approach. Extensive evaluation on 60 benchmarks and five multi-core CPUs show that our approach systematically outperforms the powersave Linux governor, while improving overall performance.« less
Storyline Visualization: A Compelling Way to Understand Patterns over Time and Space
DOE Office of Scientific and Technical Information (OSTI.GOV)
None
2017-10-16
Storyline visualization is a compelling way to understand patterns over time and space. Much effort has been spent developing efficient and aesthetically pleasing layout optimization algorithms. But what if those algorithms are optimizing the wrong things? To answer this question, we conducted a design study with different storyline layout algorithms. We found that users with our new design principles for storyline visualization outperform existing methods.
Algorithmic Management for Improving Collective Productivity in Crowdsourcing.
Yu, Han; Miao, Chunyan; Chen, Yiqiang; Fauvel, Simon; Li, Xiaoming; Lesser, Victor R
2017-10-02
Crowdsourcing systems are complex not only because of the huge number of potential strategies for assigning workers to tasks, but also due to the dynamic characteristics associated with workers. Maximizing social welfare in such situations is known to be NP-hard. To address these fundamental challenges, we propose the surprise-minimization-value-maximization (SMVM) approach. By analysing typical crowdsourcing system dynamics, we established a simple and novel worker desirability index (WDI) jointly considering the effect of each worker's reputation, workload and motivation to work on collective productivity. Through evaluating workers' WDI values, SMVM influences individual workers in real time about courses of action which can benefit the workers and lead to high collective productivity. Solutions can be produced in polynomial time and are proven to be asymptotically bounded by a theoretical optimal solution. High resolution simulations based on a real-world dataset demonstrate that SMVM significantly outperforms state-of-the-art approaches. A large-scale 3-year empirical study involving 1,144 participants in over 9,000 sessions shows that SMVM outperforms human task delegation decisions over 80% of the time under common workload conditions. The approach and results can help engineer highly scalable data-driven algorithmic management decision support systems for crowdsourcing.
Crichton, Gamal; Guo, Yufan; Pyysalo, Sampo; Korhonen, Anna
2018-05-21
Link prediction in biomedical graphs has several important applications including predicting Drug-Target Interactions (DTI), Protein-Protein Interaction (PPI) prediction and Literature-Based Discovery (LBD). It can be done using a classifier to output the probability of link formation between nodes. Recently several works have used neural networks to create node representations which allow rich inputs to neural classifiers. Preliminary works were done on this and report promising results. However they did not use realistic settings like time-slicing, evaluate performances with comprehensive metrics or explain when or why neural network methods outperform. We investigated how inputs from four node representation algorithms affect performance of a neural link predictor on random- and time-sliced biomedical graphs of real-world sizes (∼ 6 million edges) containing information relevant to DTI, PPI and LBD. We compared the performance of the neural link predictor to those of established baselines and report performance across five metrics. In random- and time-sliced experiments when the neural network methods were able to learn good node representations and there was a negligible amount of disconnected nodes, those approaches outperformed the baselines. In the smallest graph (∼ 15,000 edges) and in larger graphs with approximately 14% disconnected nodes, baselines such as Common Neighbours proved a justifiable choice for link prediction. At low recall levels (∼ 0.3) the approaches were mostly equal, but at higher recall levels across all nodes and average performance at individual nodes, neural network approaches were superior. Analysis showed that neural network methods performed well on links between nodes with no previous common neighbours; potentially the most interesting links. Additionally, while neural network methods benefit from large amounts of data, they require considerable amounts of computational resources to utilise them. Our results indicate that when there is enough data for the neural network methods to use and there are a negligible amount of disconnected nodes, those approaches outperform the baselines. At low recall levels the approaches are mostly equal but at higher recall levels and average performance at individual nodes, neural network approaches are superior. Performance at nodes without common neighbours which indicate more unexpected and perhaps more useful links account for this.
A threshold-based fixed predictor for JPEG-LS image compression
NASA Astrophysics Data System (ADS)
Deng, Lihua; Huang, Zhenghua; Yao, Shoukui
2018-03-01
In JPEG-LS, fixed predictor based on median edge detector (MED) only detect horizontal and vertical edges, and thus produces large prediction errors in the locality of diagonal edges. In this paper, we propose a threshold-based edge detection scheme for the fixed predictor. The proposed scheme can detect not only the horizontal and vertical edges, but also diagonal edges. For some certain thresholds, the proposed scheme can be simplified to other existing schemes. So, it can also be regarded as the integration of these existing schemes. For a suitable threshold, the accuracy of horizontal and vertical edges detection is higher than the existing median edge detection in JPEG-LS. Thus, the proposed fixed predictor outperforms the existing JPEG-LS predictors for all images tested, while the complexity of the overall algorithm is maintained at a similar level.
Schall, Marina; Martiny, Sarah E; Goetz, Thomas; Hall, Nathan C
2016-05-01
Although expressing positive emotions is typically socially rewarded, in the present work, we predicted that people suppress positive emotions and thereby experience social benefits when outperformed others are present. We tested our predictions in three experimental studies with high school students. In Studies 1 and 2, we manipulated the type of social situation (outperformance vs. non-outperformance) and assessed suppression of positive emotions. In both studies, individuals reported suppressing positive emotions more in outperformance situations than in non-outperformance situations. In Study 3, we manipulated the social situation (outperformance vs. non-outperformance) as well as the videotaped person's expression of positive emotions (suppression vs. expression). The findings showed that when outperforming others, individuals were indeed evaluated more positively when they suppressed rather than expressed their positive emotions, and demonstrate the importance of the specific social situation with respect to the effects of suppression. © 2016 by the Society for Personality and Social Psychology, Inc.
Lu, Chao; Chelikani, Sudhakar; Jaffray, David A.; Milosevic, Michael F.; Staib, Lawrence H.; Duncan, James S.
2013-01-01
External beam radiation therapy (EBRT) for the treatment of cancer enables accurate placement of radiation dose on the cancerous region. However, the deformation of soft tissue during the course of treatment, such as in cervical cancer, presents significant challenges for the delineation of the target volume and other structures of interest. Furthermore, the presence and regression of pathologies such as tumors may violate registration constraints and cause registration errors. In this paper, automatic segmentation, nonrigid registration and tumor detection in cervical magnetic resonance (MR) data are addressed simultaneously using a unified Bayesian framework. The proposed novel method can generate a tumor probability map while progressively identifying the boundary of an organ of interest based on the achieved nonrigid transformation. The method is able to handle the challenges of significant tumor regression and its effect on surrounding tissues. The new method was compared to various currently existing algorithms on a set of 36 MR data from six patients, each patient has six T2-weighted MR cervical images. The results show that the proposed approach achieves an accuracy comparable to manual segmentation and it significantly outperforms the existing registration algorithms. In addition, the tumor detection result generated by the proposed method has a high agreement with manual delineation by a qualified clinician. PMID:22328178
Yousefi, Siavash; Qin, Jia; Zhi, Zhongwei; Wang, Ruikang K
2013-02-01
Optical microangiography is an imaging technology that is capable of providing detailed functional blood flow maps within microcirculatory tissue beds in vivo. Some practical issues however exist when displaying and quantifying the microcirculation that perfuses the scanned tissue volume. These issues include: (I) Probing light is subject to specular reflection when it shines onto sample. The unevenness of the tissue surface makes the light energy entering the tissue not uniform over the entire scanned tissue volume. (II) The biological tissue is heterogeneous in nature, meaning the scattering and absorption properties of tissue would attenuate the probe beam. These physical limitations can result in local contrast degradation and non-uniform micro-angiogram images. In this paper, we propose a post-processing method that uses Rayleigh contrast-limited adaptive histogram equalization to increase the contrast and improve the overall appearance and uniformity of optical micro-angiograms without saturating the vessel intensity and changing the physical meaning of the micro-angiograms. The qualitative and quantitative performance of the proposed method is compared with those of common histogram equalization and contrast enhancement methods. We demonstrate that the proposed method outperforms other existing approaches. The proposed method is not limited to optical microangiography and can be used in other image modalities such as photo-acoustic tomography and scanning laser confocal microscopy.
Jian, Yulin; Huang, Daoyu; Yan, Jia; Lu, Kun; Huang, Ying; Wen, Tailai; Zeng, Tanyue; Zhong, Shijie; Xie, Qilong
2017-01-01
A novel classification model, named the quantum-behaved particle swarm optimization (QPSO)-based weighted multiple kernel extreme learning machine (QWMK-ELM), is proposed in this paper. Experimental validation is carried out with two different electronic nose (e-nose) datasets. Being different from the existing multiple kernel extreme learning machine (MK-ELM) algorithms, the combination coefficients of base kernels are regarded as external parameters of single-hidden layer feedforward neural networks (SLFNs). The combination coefficients of base kernels, the model parameters of each base kernel, and the regularization parameter are optimized by QPSO simultaneously before implementing the kernel extreme learning machine (KELM) with the composite kernel function. Four types of common single kernel functions (Gaussian kernel, polynomial kernel, sigmoid kernel, and wavelet kernel) are utilized to constitute different composite kernel functions. Moreover, the method is also compared with other existing classification methods: extreme learning machine (ELM), kernel extreme learning machine (KELM), k-nearest neighbors (KNN), support vector machine (SVM), multi-layer perceptron (MLP), radical basis function neural network (RBFNN), and probabilistic neural network (PNN). The results have demonstrated that the proposed QWMK-ELM outperforms the aforementioned methods, not only in precision, but also in efficiency for gas classification. PMID:28629202
Novel approach for dam break flow modeling using computational intelligence
NASA Astrophysics Data System (ADS)
Seyedashraf, Omid; Mehrabi, Mohammad; Akhtari, Ali Akbar
2018-04-01
A new methodology based on the computational intelligence (CI) system is proposed and tested for modeling the classic 1D dam-break flow problem. The reason to seek for a new solution lies in the shortcomings of the existing analytical and numerical models. This includes the difficulty of using the exact solutions and the unwanted fluctuations, which arise in the numerical results. In this research, the application of the radial-basis-function (RBF) and multi-layer-perceptron (MLP) systems is detailed for the solution of twenty-nine dam-break scenarios. The models are developed using seven variables, i.e. the length of the channel, the depths of the up-and downstream sections, time, and distance as the inputs. Moreover, the depths and velocities of each computational node in the flow domain are considered as the model outputs. The models are validated against the analytical, and Lax-Wendroff and MacCormack FDM schemes. The findings indicate that the employed CI models are able to replicate the overall shape of the shock- and rarefaction-waves. Furthermore, the MLP system outperforms RBF and the tested numerical schemes. A new monolithic equation is proposed based on the best fitting model, which can be used as an efficient alternative to the existing piecewise analytic equations.
Growing a Waldorf-Inspired Approach in a Public School District
ERIC Educational Resources Information Center
Friedlaender, Diane; Beckham, Kyle; Zheng, Xinhua; Darling-Hammond, Linda
2015-01-01
This report documents the practices and outcomes of Alice Birney, a public K-8 Waldorf-Inspired School in Sacramento City Unified School District (SCUSD). This study highlights how such a school addresses students' academic, social, emotional, physical, and creative development. Birney students outperform similar students in SCUSD on several…
Combining Pear Ester with Codlemone Improves Management of Codling Moth
USDA-ARS?s Scientific Manuscript database
Several management approaches utilizing pear ester combined with codlemone have been developed in the first 10 years after the discovery of this ripe pear fruit volatile’s kairomonal activity for larvae and both sexes of codling moth. These include a lure that consistently outperforms other high loa...
Li, Jian-Long; Wang, Peng; Fung, Wing Kam; Zhou, Ji-Yuan
2017-10-16
For dichotomous traits, the generalized disequilibrium test with the moment estimate of the variance (GDT-ME) is a powerful family-based association method. Genomic imprinting is an important epigenetic phenomenon and currently, there has been increasing interest of incorporating imprinting to improve the test power of association analysis. However, GDT-ME does not take imprinting effects into account, and it has not been investigated whether it can be used for association analysis when the effects indeed exist. In this article, based on a novel decomposition of the genotype score according to the paternal or maternal source of the allele, we propose the generalized disequilibrium test with imprinting (GDTI) for complete pedigrees without any missing genotypes. Then, we extend GDTI and GDT-ME to accommodate incomplete pedigrees with some pedigrees having missing genotypes, by using a Monte Carlo (MC) sampling and estimation scheme to infer missing genotypes given available genotypes in each pedigree, denoted by MCGDTI and MCGDT-ME, respectively. The proposed GDTI and MCGDTI methods evaluate the differences of the paternal as well as maternal allele scores for all discordant relative pairs in a pedigree, including beyond first-degree relative pairs. Advantages of the proposed GDTI and MCGDTI test statistics over existing methods are demonstrated by simulation studies under various simulation settings and by application to the rheumatoid arthritis dataset. Simulation results show that the proposed tests control the size well under the null hypothesis of no association, and outperform the existing methods under various imprinting effect models. The existing GDT-ME and the proposed MCGDT-ME can be used to test for association even when imprinting effects exist. For the application to the rheumatoid arthritis data, compared to the existing methods, MCGDTI identifies more loci statistically significantly associated with the disease. Under complete and incomplete imprinting effect models, our proposed GDTI and MCGDTI methods, by considering the information on imprinting effects and all discordant relative pairs within each pedigree, outperform all the existing test statistics and MCGDTI can recapture much of the missing information. Therefore, MCGDTI is recommended in practice.
Saludes-Rodil, Sergio; Baeyens, Enrique; Rodríguez-Juan, Carlos P
2015-04-29
An unsupervised approach to classify surface defects in wire rod manufacturing is developed in this paper. The defects are extracted from an eddy current signal and classified using a clustering technique that uses the dynamic time warping distance as the dissimilarity measure. The new approach has been successfully tested using industrial data. It is shown that it outperforms other classification alternatives, such as the modified Fourier descriptors.
Overlapping communities from dense disjoint and high total degree clusters
NASA Astrophysics Data System (ADS)
Zhang, Hongli; Gao, Yang; Zhang, Yue
2018-04-01
Community plays an important role in the field of sociology, biology and especially in domains of computer science, where systems are often represented as networks. And community detection is of great importance in the domains. A community is a dense subgraph of the whole graph with more links between its members than between its members to the outside nodes, and nodes in the same community probably share common properties or play similar roles in the graph. Communities overlap when nodes in a graph belong to multiple communities. A vast variety of overlapping community detection methods have been proposed in the literature, and the local expansion method is one of the most successful techniques dealing with large networks. The paper presents a density-based seeding method, in which dense disjoint local clusters are searched and selected as seeds. The proposed method selects a seed by the total degree and density of local clusters utilizing merely local structures of the network. Furthermore, this paper proposes a novel community refining phase via minimizing the conductance of each community, through which the quality of identified communities is largely improved in linear time. Experimental results in synthetic networks show that the proposed seeding method outperforms other seeding methods in the state of the art and the proposed refining method largely enhances the quality of the identified communities. Experimental results in real graphs with ground-truth communities show that the proposed approach outperforms other state of the art overlapping community detection algorithms, in particular, it is more than two orders of magnitude faster than the existing global algorithms with higher quality, and it obtains much more accurate community structure than the current local algorithms without any priori information.
Automatic identification of high impact articles in PubMed to support clinical decision making.
Bian, Jiantao; Morid, Mohammad Amin; Jonnalagadda, Siddhartha; Luo, Gang; Del Fiol, Guilherme
2017-09-01
The practice of evidence-based medicine involves integrating the latest best available evidence into patient care decisions. Yet, critical barriers exist for clinicians' retrieval of evidence that is relevant for a particular patient from primary sources such as randomized controlled trials and meta-analyses. To help address those barriers, we investigated machine learning algorithms that find clinical studies with high clinical impact from PubMed®. Our machine learning algorithms use a variety of features including bibliometric features (e.g., citation count), social media attention, journal impact factors, and citation metadata. The algorithms were developed and evaluated with a gold standard composed of 502 high impact clinical studies that are referenced in 11 clinical evidence-based guidelines on the treatment of various diseases. We tested the following hypotheses: (1) our high impact classifier outperforms a state-of-the-art classifier based on citation metadata and citation terms, and PubMed's® relevance sort algorithm; and (2) the performance of our high impact classifier does not decrease significantly after removing proprietary features such as citation count. The mean top 20 precision of our high impact classifier was 34% versus 11% for the state-of-the-art classifier and 4% for PubMed's® relevance sort (p=0.009); and the performance of our high impact classifier did not decrease significantly after removing proprietary features (mean top 20 precision=34% vs. 36%; p=0.085). The high impact classifier, using features such as bibliometrics, social media attention and MEDLINE® metadata, outperformed previous approaches and is a promising alternative to identifying high impact studies for clinical decision support. Copyright © 2017 Elsevier Inc. All rights reserved.
Moradi, Milad; Ghadiri, Nasser
2018-01-01
Automatic text summarization tools help users in the biomedical domain to acquire their intended information from various textual resources more efficiently. Some of biomedical text summarization systems put the basis of their sentence selection approach on the frequency of concepts extracted from the input text. However, it seems that exploring other measures rather than the raw frequency for identifying valuable contents within an input document, or considering correlations existing between concepts, may be more useful for this type of summarization. In this paper, we describe a Bayesian summarization method for biomedical text documents. The Bayesian summarizer initially maps the input text to the Unified Medical Language System (UMLS) concepts; then it selects the important ones to be used as classification features. We introduce six different feature selection approaches to identify the most important concepts of the text and select the most informative contents according to the distribution of these concepts. We show that with the use of an appropriate feature selection approach, the Bayesian summarizer can improve the performance of biomedical summarization. Using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) toolkit, we perform extensive evaluations on a corpus of scientific papers in the biomedical domain. The results show that when the Bayesian summarizer utilizes the feature selection methods that do not use the raw frequency, it can outperform the biomedical summarizers that rely on the frequency of concepts, domain-independent and baseline methods. Copyright © 2017 Elsevier B.V. All rights reserved.
Paper-Based Quantification of Male Fertility Potential.
Nosrati, Reza; Gong, Max M; San Gabriel, Maria C; Pedraza, Claudio E; Zini, Armand; Sinton, David
2016-03-01
More than 70 million couples worldwide are affected by infertility, with male-factor infertility accounting for about half of the cases. Semen analysis is critical for determining male fertility potential, but conventional testing is costly and complex. Here, we demonstrate a paper-based microfluidic approach to quantify male fertility potential, simultaneously measuring 3 critical semen parameters in 10 min: live and motile sperm concentrations and sperm motility. The device measures the colorimetric change of yellow tetrazolium dye to purple formazan by the diaphorase flavoprotein enzyme present in metabolically active human sperm to quantify live and motile sperm concentration. Sperm motility was determined as the ratio of motile to live sperm. We assessed the performance of the device by use of clinical semen samples, in parallel with standard clinical approaches. Detection limits of 8.46 and 15.18 million/mL were achieved for live and motile sperm concentrations, respectively. The live and motile sperm concentrations and motility values from our device correlated with those of the standard clinical approaches (R(2) ≥ 0.84). In all cases, our device provided 100% agreement in terms of clinical outcome. The device was also robust and could tolerate conditions of high absolute humidity (22.8 g/m(3)) up to 16 weeks when packaged with desiccant. Our device outperforms existing commercial paper-based assays by quantitatively measuring live and motile sperm concentrations and motility, in only 10 min. This approach is applicable to current clinical practices as well as self-diagnostic applications. © 2015 American Association for Clinical Chemistry.
Meta-Heuristics in Short Scale Construction: Ant Colony Optimization and Genetic Algorithm.
Schroeders, Ulrich; Wilhelm, Oliver; Olaru, Gabriel
2016-01-01
The advent of large-scale assessment, but also the more frequent use of longitudinal and multivariate approaches to measurement in psychological, educational, and sociological research, caused an increased demand for psychometrically sound short scales. Shortening scales economizes on valuable administration time, but might result in inadequate measures because reducing an item set could: a) change the internal structure of the measure, b) result in poorer reliability and measurement precision, c) deliver measures that cannot effectively discriminate between persons on the intended ability spectrum, and d) reduce test-criterion relations. Different approaches to abbreviate measures fare differently with respect to the above-mentioned problems. Therefore, we compare the quality and efficiency of three item selection strategies to derive short scales from an existing long version: a Stepwise COnfirmatory Factor Analytical approach (SCOFA) that maximizes factor loadings and two metaheuristics, specifically an Ant Colony Optimization (ACO) with a tailored user-defined optimization function and a Genetic Algorithm (GA) with an unspecific cost-reduction function. SCOFA compiled short versions were highly reliable, but had poor validity. In contrast, both metaheuristics outperformed SCOFA and produced efficient and psychometrically sound short versions (unidimensional, reliable, sensitive, and valid). We discuss under which circumstances ACO and GA produce equivalent results and provide recommendations for conditions in which it is advisable to use a metaheuristic with an unspecific out-of-the-box optimization function.
Retrospective Binary-Trait Association Test Elucidates Genetic Architecture of Crohn Disease
Jiang, Duo; Zhong, Sheng; McPeek, Mary Sara
2016-01-01
In genetic association testing, failure to properly control for population structure can lead to severely inflated type 1 error and power loss. Meanwhile, adjustment for relevant covariates is often desirable and sometimes necessary to protect against spurious association and to improve power. Many recent methods to account for population structure and covariates are based on linear mixed models (LMMs), which are primarily designed for quantitative traits. For binary traits, however, LMM is a misspecified model and can lead to deteriorated performance. We propose CARAT, a binary-trait association testing approach based on a mixed-effects quasi-likelihood framework, which exploits the dichotomous nature of the trait and achieves computational efficiency through estimating equations. We show in simulation studies that CARAT consistently outperforms existing methods and maintains high power in a wide range of population structure settings and trait models. Furthermore, CARAT is based on a retrospective approach, which is robust to misspecification of the phenotype model. We apply our approach to a genome-wide analysis of Crohn disease, in which we replicate association with 17 previously identified regions. Moreover, our analysis on 5p13.1, an extensively reported region of association, shows evidence for the presence of multiple independent association signals in the region. This example shows how CARAT can leverage known disease risk factors to shed light on the genetic architecture of complex traits. PMID:26833331
Meta-Heuristics in Short Scale Construction: Ant Colony Optimization and Genetic Algorithm
Schroeders, Ulrich; Wilhelm, Oliver; Olaru, Gabriel
2016-01-01
The advent of large-scale assessment, but also the more frequent use of longitudinal and multivariate approaches to measurement in psychological, educational, and sociological research, caused an increased demand for psychometrically sound short scales. Shortening scales economizes on valuable administration time, but might result in inadequate measures because reducing an item set could: a) change the internal structure of the measure, b) result in poorer reliability and measurement precision, c) deliver measures that cannot effectively discriminate between persons on the intended ability spectrum, and d) reduce test-criterion relations. Different approaches to abbreviate measures fare differently with respect to the above-mentioned problems. Therefore, we compare the quality and efficiency of three item selection strategies to derive short scales from an existing long version: a Stepwise COnfirmatory Factor Analytical approach (SCOFA) that maximizes factor loadings and two metaheuristics, specifically an Ant Colony Optimization (ACO) with a tailored user-defined optimization function and a Genetic Algorithm (GA) with an unspecific cost-reduction function. SCOFA compiled short versions were highly reliable, but had poor validity. In contrast, both metaheuristics outperformed SCOFA and produced efficient and psychometrically sound short versions (unidimensional, reliable, sensitive, and valid). We discuss under which circumstances ACO and GA produce equivalent results and provide recommendations for conditions in which it is advisable to use a metaheuristic with an unspecific out-of-the-box optimization function. PMID:27893845
NASA Astrophysics Data System (ADS)
Oustimov, Andrew; Gastounioti, Aimilia; Hsieh, Meng-Kang; Pantalone, Lauren; Conant, Emily F.; Kontos, Despina
2017-03-01
We assess the feasibility of a parenchymal texture feature fusion approach, utilizing a convolutional neural network (ConvNet) architecture, to benefit breast cancer risk assessment. Hypothesizing that by capturing sparse, subtle interactions between localized motifs present in two-dimensional texture feature maps derived from mammographic images, a multitude of texture feature descriptors can be optimally reduced to five meta-features capable of serving as a basis on which a linear classifier, such as logistic regression, can efficiently assess breast cancer risk. We combine this methodology with our previously validated lattice-based strategy for parenchymal texture analysis and we evaluate the feasibility of this approach in a case-control study with 424 digital mammograms. In a randomized split-sample setting, we optimize our framework in training/validation sets (N=300) and evaluate its descriminatory performance in an independent test set (N=124). The discriminatory capacity is assessed in terms of the the area under the curve (AUC) of the receiver operator characteristic (ROC). The resulting meta-features exhibited strong classification capability in the test dataset (AUC = 0.90), outperforming conventional, non-fused, texture analysis which previously resulted in an AUC=0.85 on the same case-control dataset. Our results suggest that informative interactions between localized motifs exist and can be extracted and summarized via a fairly simple ConvNet architecture.
RENT+: an improved method for inferring local genealogical trees from haplotypes with recombination.
Mirzaei, Sajad; Wu, Yufeng
2017-04-01
: Haplotypes from one or multiple related populations share a common genealogical history. If this shared genealogy can be inferred from haplotypes, it can be very useful for many population genetics problems. However, with the presence of recombination, the genealogical history of haplotypes is complex and cannot be represented by a single genealogical tree. Therefore, inference of genealogical history with recombination is much more challenging than the case of no recombination. : In this paper, we present a new approach called RENT+ for the inference of local genealogical trees from haplotypes with the presence of recombination. RENT+ builds on a previous genealogy inference approach called RENT , which infers a set of related genealogical trees at different genomic positions. RENT+ represents a significant improvement over RENT in the sense that it is more effective in extracting information contained in the haplotype data about the underlying genealogy than RENT . The key components of RENT+ are several greatly enhanced genealogy inference rules. Through simulation, we show that RENT+ is more efficient and accurate than several existing genealogy inference methods. As an application, we apply RENT+ in the inference of population demographic history from haplotypes, which outperforms several existing methods. : RENT+ is implemented in Java, and is freely available for download from: https://github.com/SajadMirzaei/RentPlus . : sajad@engr.uconn.edu or ywu@engr.uconn.edu. : Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
R, GeethaRamani; Balasubramanian, Lakshmi
2018-07-01
Macula segmentation and fovea localization is one of the primary tasks in retinal analysis as they are responsible for detailed vision. Existing approaches required segmentation of retinal structures viz. optic disc and blood vessels for this purpose. This work avoids knowledge of other retinal structures and attempts data mining techniques to segment macula. Unsupervised clustering algorithm is exploited for this purpose. Selection of initial cluster centres has a great impact on performance of clustering algorithms. A heuristic based clustering in which initial centres are selected based on measures defining statistical distribution of data is incorporated in the proposed methodology. The initial phase of proposed framework includes image cropping, green channel extraction, contrast enhancement and application of mathematical closing. Then, the pre-processed image is subjected to heuristic based clustering yielding a binary map. The binary image is post-processed to eliminate unwanted components. Finally, the component which possessed the minimum intensity is finalized as macula and its centre constitutes the fovea. The proposed approach outperforms existing works by reporting that 100%,of HRF, 100% of DRIVE, 96.92% of DIARETDB0, 97.75% of DIARETDB1, 98.81% of HEI-MED, 90% of STARE and 99.33% of MESSIDOR images satisfy the 1R criterion, a standard adopted for evaluating performance of macula and fovea identification. The proposed system thus helps the ophthalmologists in identifying the macula thereby facilitating to identify if any abnormality is present within the macula region. Copyright © 2018 Elsevier B.V. All rights reserved.
Machine learning with naturally labeled data for identifying abbreviation definitions.
Yeganova, Lana; Comeau, Donald C; Wilbur, W John
2011-06-09
The rapid growth of biomedical literature requires accurate text analysis and text processing tools. Detecting abbreviations and identifying their definitions is an important component of such tools. Most existing approaches for the abbreviation definition identification task employ rule-based methods. While achieving high precision, rule-based methods are limited to the rules defined and fail to capture many uncommon definition patterns. Supervised learning techniques, which offer more flexibility in detecting abbreviation definitions, have also been applied to the problem. However, they require manually labeled training data. In this work, we develop a machine learning algorithm for abbreviation definition identification in text which makes use of what we term naturally labeled data. Positive training examples are naturally occurring potential abbreviation-definition pairs in text. Negative training examples are generated by randomly mixing potential abbreviations with unrelated potential definitions. The machine learner is trained to distinguish between these two sets of examples. Then, the learned feature weights are used to identify the abbreviation full form. This approach does not require manually labeled training data. We evaluate the performance of our algorithm on the Ab3P, BIOADI and Medstract corpora. Our system demonstrated results that compare favourably to the existing Ab3P and BIOADI systems. We achieve an F-measure of 91.36% on Ab3P corpus, and an F-measure of 87.13% on BIOADI corpus which are superior to the results reported by Ab3P and BIOADI systems. Moreover, we outperform these systems in terms of recall, which is one of our goals.
NASA Astrophysics Data System (ADS)
Tsai, Jinn-Tsong; Chou, Ping-Yi; Chou, Jyh-Horng
2015-11-01
The aim of this study is to generate vector quantisation (VQ) codebooks by integrating principle component analysis (PCA) algorithm, Linde-Buzo-Gray (LBG) algorithm, and evolutionary algorithms (EAs). The EAs include genetic algorithm (GA), particle swarm optimisation (PSO), honey bee mating optimisation (HBMO), and firefly algorithm (FF). The study is to provide performance comparisons between PCA-EA-LBG and PCA-LBG-EA approaches. The PCA-EA-LBG approaches contain PCA-GA-LBG, PCA-PSO-LBG, PCA-HBMO-LBG, and PCA-FF-LBG, while the PCA-LBG-EA approaches contain PCA-LBG, PCA-LBG-GA, PCA-LBG-PSO, PCA-LBG-HBMO, and PCA-LBG-FF. All training vectors of test images are grouped according to PCA. The PCA-EA-LBG used the vectors grouped by PCA as initial individuals, and the best solution gained by the EAs was given for LBG to discover a codebook. The PCA-LBG approach is to use the PCA to select vectors as initial individuals for LBG to find a codebook. The PCA-LBG-EA used the final result of PCA-LBG as an initial individual for EAs to find a codebook. The search schemes in PCA-EA-LBG first used global search and then applied local search skill, while in PCA-LBG-EA first used local search and then employed global search skill. The results verify that the PCA-EA-LBG indeed gain superior results compared to the PCA-LBG-EA, because the PCA-EA-LBG explores a global area to find a solution, and then exploits a better one from the local area of the solution. Furthermore the proposed PCA-EA-LBG approaches in designing VQ codebooks outperform existing approaches shown in the literature.
A Mixtures-of-Trees Framework for Multi-Label Classification
Hong, Charmgil; Batal, Iyad; Hauskrecht, Milos
2015-01-01
We propose a new probabilistic approach for multi-label classification that aims to represent the class posterior distribution P(Y|X). Our approach uses a mixture of tree-structured Bayesian networks, which can leverage the computational advantages of conditional tree-structured models and the abilities of mixtures to compensate for tree-structured restrictions. We develop algorithms for learning the model from data and for performing multi-label predictions using the learned model. Experiments on multiple datasets demonstrate that our approach outperforms several state-of-the-art multi-label classification methods. PMID:25927011
Korczowski, L; Congedo, M; Jutten, C
2015-08-01
The classification of electroencephalographic (EEG) data recorded from multiple users simultaneously is an important challenge in the field of Brain-Computer Interface (BCI). In this paper we compare different approaches for classification of single-trials Event-Related Potential (ERP) on two subjects playing a collaborative BCI game. The minimum distance to mean (MDM) classifier in a Riemannian framework is extended to use the diversity of the inter-subjects spatio-temporal statistics (MDM-hyper) or to merge multiple classifiers (MDM-multi). We show that both these classifiers outperform significantly the mean performance of the two users and analogous classifiers based on the step-wise linear discriminant analysis. More importantly, the MDM-multi outperforms the performance of the best player within the pair.
Knowledge-Based Topic Model for Unsupervised Object Discovery and Localization.
Niu, Zhenxing; Hua, Gang; Wang, Le; Gao, Xinbo
Unsupervised object discovery and localization is to discover some dominant object classes and localize all of object instances from a given image collection without any supervision. Previous work has attempted to tackle this problem with vanilla topic models, such as latent Dirichlet allocation (LDA). However, in those methods no prior knowledge for the given image collection is exploited to facilitate object discovery. On the other hand, the topic models used in those methods suffer from the topic coherence issue-some inferred topics do not have clear meaning, which limits the final performance of object discovery. In this paper, prior knowledge in terms of the so-called must-links are exploited from Web images on the Internet. Furthermore, a novel knowledge-based topic model, called LDA with mixture of Dirichlet trees, is proposed to incorporate the must-links into topic modeling for object discovery. In particular, to better deal with the polysemy phenomenon of visual words, the must-link is re-defined as that one must-link only constrains one or some topic(s) instead of all topics, which leads to significantly improved topic coherence. Moreover, the must-links are built and grouped with respect to specific object classes, thus the must-links in our approach are semantic-specific , which allows to more efficiently exploit discriminative prior knowledge from Web images. Extensive experiments validated the efficiency of our proposed approach on several data sets. It is shown that our method significantly improves topic coherence and outperforms the unsupervised methods for object discovery and localization. In addition, compared with discriminative methods, the naturally existing object classes in the given image collection can be subtly discovered, which makes our approach well suited for realistic applications of unsupervised object discovery.Unsupervised object discovery and localization is to discover some dominant object classes and localize all of object instances from a given image collection without any supervision. Previous work has attempted to tackle this problem with vanilla topic models, such as latent Dirichlet allocation (LDA). However, in those methods no prior knowledge for the given image collection is exploited to facilitate object discovery. On the other hand, the topic models used in those methods suffer from the topic coherence issue-some inferred topics do not have clear meaning, which limits the final performance of object discovery. In this paper, prior knowledge in terms of the so-called must-links are exploited from Web images on the Internet. Furthermore, a novel knowledge-based topic model, called LDA with mixture of Dirichlet trees, is proposed to incorporate the must-links into topic modeling for object discovery. In particular, to better deal with the polysemy phenomenon of visual words, the must-link is re-defined as that one must-link only constrains one or some topic(s) instead of all topics, which leads to significantly improved topic coherence. Moreover, the must-links are built and grouped with respect to specific object classes, thus the must-links in our approach are semantic-specific , which allows to more efficiently exploit discriminative prior knowledge from Web images. Extensive experiments validated the efficiency of our proposed approach on several data sets. It is shown that our method significantly improves topic coherence and outperforms the unsupervised methods for object discovery and localization. In addition, compared with discriminative methods, the naturally existing object classes in the given image collection can be subtly discovered, which makes our approach well suited for realistic applications of unsupervised object discovery.
A novel approach to identifying regulatory motifs in distantly related genomes
Van Hellemont, Ruth; Monsieurs, Pieter; Thijs, Gert; De Moor, Bart; Van de Peer, Yves; Marchal, Kathleen
2005-01-01
Although proven successful in the identification of regulatory motifs, phylogenetic footprinting methods still show some shortcomings. To assess these difficulties, most apparent when applying phylogenetic footprinting to distantly related organisms, we developed a two-step procedure that combines the advantages of sequence alignment and motif detection approaches. The results on well-studied benchmark datasets indicate that the presented method outperforms other methods when the sequences become either too long or too heterogeneous in size. PMID:16420672
Echo state networks with filter neurons and a delay&sum readout.
Holzmann, Georg; Hauser, Helmut
2010-03-01
Echo state networks (ESNs) are a novel approach to recurrent neural network training with the advantage of a very simple and linear learning algorithm. It has been demonstrated that ESNs outperform other methods on a number of benchmark tasks. Although the approach is appealing, there are still some inherent limitations in the original formulation. Here we suggest two enhancements of this network model. First, the previously proposed idea of filters in neurons is extended to arbitrary infinite impulse response (IIR) filter neurons. This enables such networks to learn multiple attractors and signals at different timescales, which is especially important for modeling real-world time series. Second, a delay&sum readout is introduced, which adds trainable delays in the synaptic connections of output neurons and therefore vastly improves the memory capacity of echo state networks. It is shown in commonly used benchmark tasks and real-world examples, that this new structure is able to significantly outperform standard ESNs and other state-of-the-art models for nonlinear dynamical system modeling. Copyright 2009 Elsevier Ltd. All rights reserved.
Collectives for Multiple Resource Job Scheduling Across Heterogeneous Servers
NASA Technical Reports Server (NTRS)
Tumer, K.; Lawson, J.
2003-01-01
Efficient management of large-scale, distributed data storage and processing systems is a major challenge for many computational applications. Many of these systems are characterized by multi-resource tasks processed across a heterogeneous network. Conventional approaches, such as load balancing, work well for centralized, single resource problems, but breakdown in the more general case. In addition, most approaches are often based on heuristics which do not directly attempt to optimize the world utility. In this paper, we propose an agent based control system using the theory of collectives. We configure the servers of our network with agents who make local job scheduling decisions. These decisions are based on local goals which are constructed to be aligned with the objective of optimizing the overall efficiency of the system. We demonstrate that multi-agent systems in which all the agents attempt to optimize the same global utility function (team game) only marginally outperform conventional load balancing. On the other hand, agents configured using collectives outperform both team games and load balancing (by up to four times for the latter), despite their distributed nature and their limited access to information.
Jia, Jianhua; Liu, Zi; Xiao, Xuan; Liu, Bingxiang; Chou, Kuo-Chen
2016-04-07
Being one type of post-translational modifications (PTMs), protein lysine succinylation is important in regulating varieties of biological processes. It is also involved with some diseases, however. Consequently, from the angles of both basic research and drug development, we are facing a challenging problem: for an uncharacterized protein sequence having many Lys residues therein, which ones can be succinylated, and which ones cannot? To address this problem, we have developed a predictor called pSuc-Lys through (1) incorporating the sequence-coupled information into the general pseudo amino acid composition, (2) balancing out skewed training dataset by random sampling, and (3) constructing an ensemble predictor by fusing a series of individual random forest classifiers. Rigorous cross-validations indicated that it remarkably outperformed the existing methods. A user-friendly web-server for pSuc-Lys has been established at http://www.jci-bioinfo.cn/pSuc-Lys, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved. It has not escaped our notice that the formulation and approach presented here can also be used to analyze many other problems in computational proteomics. Copyright © 2016 Elsevier Ltd. All rights reserved.
Holder and Topic Based Analysis of Emotions on Blog Texts: A Case Study for Bengali
NASA Astrophysics Data System (ADS)
Das, Dipankar; Bandyopadhyay, Sivaji
The paper presents an extended approach of analyzing emotions of the blog users on different topics. The rule based techniques to identify emotion holders and topics with respect to their corresponding emotional expressions helps to develop the baseline system. On the other hand, the Support Vector Machine (SVM) based supervised framework identifies the holders, topics and emotional expressions from the blog sentences by outperforming the baseline system. The existence of many to many relations between the holders and the topics with respect to Ekman's six different emotion classes has been examined using two way evaluation techniques, one is with respect to holder and other is from the perspective of topic. The results of the system were found satisfactory in comparison with the agreement of the subjective annotation. The error analysis shows that the topic of a blog at document level is not always conveyed at the sentence level. Moreover, the difficulty in identifying topic from a blog document is due to the problem of identifying some features like bigrams, Named Entities and sentiment. Thus, we employed a semantic clustering approach along with these features to identify the similarity between document level topic and sentential topic as well as to improve the results of identifying the document level topic.
Unbiased feature selection in learning random forests for high-dimensional data.
Nguyen, Thanh-Tung; Huang, Joshua Zhexue; Nguyen, Thuy Thi
2015-01-01
Random forests (RFs) have been widely used as a powerful classification method. However, with the randomization in both bagging samples and feature selection, the trees in the forest tend to select uninformative features for node splitting. This makes RFs have poor accuracy when working with high-dimensional data. Besides that, RFs have bias in the feature selection process where multivalued features are favored. Aiming at debiasing feature selection in RFs, we propose a new RF algorithm, called xRF, to select good features in learning RFs for high-dimensional data. We first remove the uninformative features using p-value assessment, and the subset of unbiased features is then selected based on some statistical measures. This feature subset is then partitioned into two subsets. A feature weighting sampling technique is used to sample features from these two subsets for building trees. This approach enables one to generate more accurate trees, while allowing one to reduce dimensionality and the amount of data needed for learning RFs. An extensive set of experiments has been conducted on 47 high-dimensional real-world datasets including image datasets. The experimental results have shown that RFs with the proposed approach outperformed the existing random forests in increasing the accuracy and the AUC measures.
Saliency Detection of Stereoscopic 3D Images with Application to Visual Discomfort Prediction
NASA Astrophysics Data System (ADS)
Li, Hong; Luo, Ting; Xu, Haiyong
2017-06-01
Visual saliency detection is potentially useful for a wide range of applications in image processing and computer vision fields. This paper proposes a novel bottom-up saliency detection approach for stereoscopic 3D (S3D) images based on regional covariance matrix. As for S3D saliency detection, besides the traditional 2D low-level visual features, additional 3D depth features should also be considered. However, only limited efforts have been made to investigate how different features (e.g. 2D and 3D features) contribute to the overall saliency of S3D images. The main contribution of this paper is that we introduce a nonlinear feature integration descriptor, i.e., regional covariance matrix, to fuse both 2D and 3D features for S3D saliency detection. The regional covariance matrix is shown to be effective for nonlinear feature integration by modelling the inter-correlation of different feature dimensions. Experimental results demonstrate that the proposed approach outperforms several existing relevant models including 2D extended and pure 3D saliency models. In addition, we also experimentally verified that the proposed S3D saliency map can significantly improve the prediction accuracy of experienced visual discomfort when viewing S3D images.
Liu, Zhenqiu; Sun, Fengzhu; McGovern, Dermot P
2017-01-01
Feature selection and prediction are the most important tasks for big data mining. The common strategies for feature selection in big data mining are L 1 , SCAD and MC+. However, none of the existing algorithms optimizes L 0 , which penalizes the number of nonzero features directly. In this paper, we develop a novel sparse generalized linear model (GLM) with L 0 approximation for feature selection and prediction with big omics data. The proposed approach approximate the L 0 optimization directly. Even though the original L 0 problem is non-convex, the problem is approximated by sequential convex optimizations with the proposed algorithm. The proposed method is easy to implement with only several lines of code. Novel adaptive ridge algorithms ( L 0 ADRIDGE) for L 0 penalized GLM with ultra high dimensional big data are developed. The proposed approach outperforms the other cutting edge regularization methods including SCAD and MC+ in simulations. When it is applied to integrated analysis of mRNA, microRNA, and methylation data from TCGA ovarian cancer, multilevel gene signatures associated with suboptimal debulking are identified simultaneously. The biological significance and potential clinical importance of those genes are further explored. The developed Software L 0 ADRIDGE in MATLAB is available at https://github.com/liuzqx/L0adridge.
Smulders, Tom V; Gould, Kristy L; Leaver, Lisa A
2010-03-27
Understanding the survival value of behaviour does not tell us how the mechanisms that control this behaviour work. Nevertheless, understanding survival value can guide the study of these mechanisms. In this paper, we apply this principle to understanding the cognitive mechanisms that support cache retrieval in scatter-hoarding animals. We believe it is too simplistic to predict that all scatter-hoarding animals will outperform non-hoarding animals on all tests of spatial memory. Instead, we argue that we should look at the detailed ecology and natural history of each species. This understanding of natural history then allows us to make predictions about which aspects of spatial memory should be better in which species. We use the natural hoarding behaviour of the three best-studied groups of scatter-hoarding animals to make predictions about three aspects of their spatial memory: duration, capacity and spatial resolution, and we test these predictions against the existing literature. Having laid out how ecology and natural history can be used to predict detailed cognitive abilities, we then suggest using this approach to guide the study of the neural basis of these abilities. We believe that this complementary approach will reveal aspects of memory processing that would otherwise be difficult to discover.
Shrinkage covariance matrix approach based on robust trimmed mean in gene sets detection
NASA Astrophysics Data System (ADS)
Karjanto, Suryaefiza; Ramli, Norazan Mohamed; Ghani, Nor Azura Md; Aripin, Rasimah; Yusop, Noorezatty Mohd
2015-02-01
Microarray involves of placing an orderly arrangement of thousands of gene sequences in a grid on a suitable surface. The technology has made a novelty discovery since its development and obtained an increasing attention among researchers. The widespread of microarray technology is largely due to its ability to perform simultaneous analysis of thousands of genes in a massively parallel manner in one experiment. Hence, it provides valuable knowledge on gene interaction and function. The microarray data set typically consists of tens of thousands of genes (variables) from just dozens of samples due to various constraints. Therefore, the sample covariance matrix in Hotelling's T2 statistic is not positive definite and become singular, thus it cannot be inverted. In this research, the Hotelling's T2 statistic is combined with a shrinkage approach as an alternative estimation to estimate the covariance matrix to detect significant gene sets. The use of shrinkage covariance matrix overcomes the singularity problem by converting an unbiased to an improved biased estimator of covariance matrix. Robust trimmed mean is integrated into the shrinkage matrix to reduce the influence of outliers and consequently increases its efficiency. The performance of the proposed method is measured using several simulation designs. The results are expected to outperform existing techniques in many tested conditions.
Adaptive correction of ensemble forecasts
NASA Astrophysics Data System (ADS)
Pelosi, Anna; Battista Chirico, Giovanni; Van den Bergh, Joris; Vannitsem, Stephane
2017-04-01
Forecasts from numerical weather prediction (NWP) models often suffer from both systematic and non-systematic errors. These are present in both deterministic and ensemble forecasts, and originate from various sources such as model error and subgrid variability. Statistical post-processing techniques can partly remove such errors, which is particularly important when NWP outputs concerning surface weather variables are employed for site specific applications. Many different post-processing techniques have been developed. For deterministic forecasts, adaptive methods such as the Kalman filter are often used, which sequentially post-process the forecasts by continuously updating the correction parameters as new ground observations become available. These methods are especially valuable when long training data sets do not exist. For ensemble forecasts, well-known techniques are ensemble model output statistics (EMOS), and so-called "member-by-member" approaches (MBM). Here, we introduce a new adaptive post-processing technique for ensemble predictions. The proposed method is a sequential Kalman filtering technique that fully exploits the information content of the ensemble. One correction equation is retrieved and applied to all members, however the parameters of the regression equations are retrieved by exploiting the second order statistics of the forecast ensemble. We compare our new method with two other techniques: a simple method that makes use of a running bias correction of the ensemble mean, and an MBM post-processing approach that rescales the ensemble mean and spread, based on minimization of the Continuous Ranked Probability Score (CRPS). We perform a verification study for the region of Campania in southern Italy. We use two years (2014-2015) of daily meteorological observations of 2-meter temperature and 10-meter wind speed from 18 ground-based automatic weather stations distributed across the region, comparing them with the corresponding COSMO-LEPS ensemble forecasts. Deterministic verification scores (e.g., mean absolute error, bias) and probabilistic scores (e.g., CRPS) are used to evaluate the post-processing techniques. We conclude that the new adaptive method outperforms the simpler running bias-correction. The proposed adaptive method often outperforms the MBM method in removing bias. The MBM method has the advantage of correcting the ensemble spread, although it needs more training data.
ECG-derived respiration based on iterated Hilbert transform and Hilbert vibration decomposition.
Sharma, Hemant; Sharma, K K
2018-06-01
Monitoring of the respiration using the electrocardiogram (ECG) is desirable for the simultaneous study of cardiac activities and the respiration in the aspects of comfort, mobility, and cost of the healthcare system. This paper proposes a new approach for deriving the respiration from single-lead ECG based on the iterated Hilbert transform (IHT) and the Hilbert vibration decomposition (HVD). The ECG signal is first decomposed into the multicomponent sinusoidal signals using the IHT technique. Afterward, the lower order amplitude components obtained from the IHT are filtered using the HVD to extract the respiration information. Experiments are performed on the Fantasia and Apnea-ECG datasets. The performance of the proposed ECG-derived respiration (EDR) approach is compared with the existing techniques including the principal component analysis (PCA), R-peak amplitudes (RPA), respiratory sinus arrhythmia (RSA), slopes of the QRS complex, and R-wave angle. The proposed technique showed the higher median values of correlation (first and third quartile) for both the Fantasia and Apnea-ECG datasets as 0.699 (0.55, 0.82) and 0.57 (0.40, 0.73), respectively. Also, the proposed algorithm provided the lowest values of the mean absolute error and the average percentage error computed from the EDR and reference (recorded) respiration signals for both the Fantasia and Apnea-ECG datasets as 1.27 and 9.3%, and 1.35 and 10.2%, respectively. In the experiments performed over different age group subjects of the Fantasia dataset, the proposed algorithm provided effective results in the younger population but outperformed the existing techniques in the case of elderly subjects. The proposed EDR technique has the advantages over existing techniques in terms of the better agreement in the respiratory rates and specifically, it reduces the need for an extra step required for the detection of fiducial points in the ECG for the estimation of respiration which makes the process effective and less-complex. The above performance results obtained from two different datasets validate that the proposed approach can be used for monitoring of the respiration using single-lead ECG.
A new family of Polak-Ribiere-Polyak conjugate gradient method with the strong-Wolfe line search
NASA Astrophysics Data System (ADS)
Ghani, Nur Hamizah Abdul; Mamat, Mustafa; Rivaie, Mohd
2017-08-01
Conjugate gradient (CG) method is an important technique in unconstrained optimization, due to its effectiveness and low memory requirements. The focus of this paper is to introduce a new CG method for solving large scale unconstrained optimization. Theoretical proofs show that the new method fulfills sufficient descent condition if strong Wolfe-Powell inexact line search is used. Besides, computational results show that our proposed method outperforms to other existing CG methods.
Assessment of Chlorophyll-a Algorithms Considering Different Trophic Statuses and Optimal Bands.
Salem, Salem Ibrahim; Higa, Hiroto; Kim, Hyungjun; Kobayashi, Hiroshi; Oki, Kazuo; Oki, Taikan
2017-07-31
Numerous algorithms have been proposed to retrieve chlorophyll- a concentrations in Case 2 waters; however, the retrieval accuracy is far from satisfactory. In this research, seven algorithms are assessed with different band combinations of multispectral and hyperspectral bands using linear (LN), quadratic polynomial (QP) and power (PW) regression approaches, resulting in altogether 43 algorithmic combinations. These algorithms are evaluated by using simulated and measured datasets to understand the strengths and limitations of these algorithms. Two simulated datasets comprising 500,000 reflectance spectra each, both based on wide ranges of inherent optical properties (IOPs), are generated for the calibration and validation stages. Results reveal that the regression approach (i.e., LN, QP, and PW) has more influence on the simulated dataset than on the measured one. The algorithms that incorporated linear regression provide the highest retrieval accuracy for the simulated dataset. Results from simulated datasets reveal that the 3-band (3b) algorithm that incorporate 665-nm and 680-nm bands and band tuning selection approach outperformed other algorithms with root mean square error (RMSE) of 15.87 mg·m -3 , 16.25 mg·m -3 , and 19.05 mg·m -3 , respectively. The spatial distribution of the best performing algorithms, for various combinations of chlorophyll- a (Chla) and non-algal particles (NAP) concentrations, show that the 3b_tuning_QP and 3b_680_QP outperform other algorithms in terms of minimum RMSE frequency of 33.19% and 60.52%, respectively. However, the two algorithms failed to accurately retrieve Chla for many combinations of Chla and NAP, particularly for low Chla and NAP concentrations. In addition, the spatial distribution emphasizes that no single algorithm can provide outstanding accuracy for Chla retrieval and that multi-algorithms should be included to reduce the error. Comparing the results of the measured and simulated datasets reveal that the algorithms that incorporate the 665-nm band outperform other algorithms for measured dataset (RMSE = 36.84 mg·m -3 ), while algorithms that incorporate the band tuning approach provide the highest retrieval accuracy for the simulated dataset (RMSE = 25.05 mg·m -3 ).
Assessment of Chlorophyll-a Algorithms Considering Different Trophic Statuses and Optimal Bands
Higa, Hiroto; Kobayashi, Hiroshi; Oki, Kazuo
2017-01-01
Numerous algorithms have been proposed to retrieve chlorophyll-a concentrations in Case 2 waters; however, the retrieval accuracy is far from satisfactory. In this research, seven algorithms are assessed with different band combinations of multispectral and hyperspectral bands using linear (LN), quadratic polynomial (QP) and power (PW) regression approaches, resulting in altogether 43 algorithmic combinations. These algorithms are evaluated by using simulated and measured datasets to understand the strengths and limitations of these algorithms. Two simulated datasets comprising 500,000 reflectance spectra each, both based on wide ranges of inherent optical properties (IOPs), are generated for the calibration and validation stages. Results reveal that the regression approach (i.e., LN, QP, and PW) has more influence on the simulated dataset than on the measured one. The algorithms that incorporated linear regression provide the highest retrieval accuracy for the simulated dataset. Results from simulated datasets reveal that the 3-band (3b) algorithm that incorporate 665-nm and 680-nm bands and band tuning selection approach outperformed other algorithms with root mean square error (RMSE) of 15.87 mg·m−3, 16.25 mg·m−3, and 19.05 mg·m−3, respectively. The spatial distribution of the best performing algorithms, for various combinations of chlorophyll-a (Chla) and non-algal particles (NAP) concentrations, show that the 3b_tuning_QP and 3b_680_QP outperform other algorithms in terms of minimum RMSE frequency of 33.19% and 60.52%, respectively. However, the two algorithms failed to accurately retrieve Chla for many combinations of Chla and NAP, particularly for low Chla and NAP concentrations. In addition, the spatial distribution emphasizes that no single algorithm can provide outstanding accuracy for Chla retrieval and that multi-algorithms should be included to reduce the error. Comparing the results of the measured and simulated datasets reveal that the algorithms that incorporate the 665-nm band outperform other algorithms for measured dataset (RMSE = 36.84 mg·m−3), while algorithms that incorporate the band tuning approach provide the highest retrieval accuracy for the simulated dataset (RMSE = 25.05 mg·m−3). PMID:28758984
Critchfield, Thomas S
2010-01-01
A popular-press self-help manual is reviewed with an eye toward two issues. First, the popularity of such books documents the existence of considerable demand for technologies that address the everyday problems (in the present case, troublesome conversations) of nondisordered individuals. Second, many ideas invoked in popular-press books may be interpretable within an analysis of verbal behavior, although much more than casual translation is required to develop technologies that outperform self-help manuals. I discuss several challenges relevant to research, theory refinement, technology development, and dissemination, and conclude that behavioral alternatives to existing popular-press resources may not emerge anytime soon. PMID:22477467
A Geometric Analysis of when Fixed Weighting Schemes Will Outperform Ordinary Least Squares
ERIC Educational Resources Information Center
Davis-Stober, Clintin P.
2011-01-01
Many researchers have demonstrated that fixed, exogenously chosen weights can be useful alternatives to Ordinary Least Squares (OLS) estimation within the linear model (e.g., Dawes, Am. Psychol. 34:571-582, 1979; Einhorn & Hogarth, Org. Behav. Human Perform. 13:171-192, 1975; Wainer, Psychol. Bull. 83:213-217, 1976). Generalizing the approach of…
Bandyopadhyay, Sanghamitra; Mitra, Ramkrishna
2009-10-15
Prediction of microRNA (miRNA) target mRNAs using machine learning approaches is an important area of research. However, most of the methods suffer from either high false positive or false negative rates. One reason for this is the marked deficiency of negative examples or miRNA non-target pairs. Systematic identification of non-target mRNAs is still not addressed properly, and therefore, current machine learning approaches are compelled to rely on artificially generated negative examples for training. In this article, we have identified approximately 300 tissue-specific negative examples using a novel approach that involves expression profiling of both miRNAs and mRNAs, miRNA-mRNA structural interactions and seed-site conservation. The newly generated negative examples are validated with pSILAC dataset, which elucidate the fact that the identified non-targets are indeed non-targets.These high-throughput tissue-specific negative examples and a set of experimentally verified positive examples are then used to build a system called TargetMiner, a support vector machine (SVM)-based classifier. In addition to assessing the prediction accuracy on cross-validation experiments, TargetMiner has been validated with a completely independent experimental test dataset. Our method outperforms 10 existing target prediction algorithms and provides a good balance between sensitivity and specificity that is not reflected in the existing methods. We achieve a significantly higher sensitivity and specificity of 69% and 67.8% based on a pool of 90 feature set and 76.5% and 66.1% using a set of 30 selected feature set on the completely independent test dataset. In order to establish the effectiveness of the systematically generated negative examples, the SVM is trained using a different set of negative data generated using the method in Yousef et al. A significantly higher false positive rate (70.6%) is observed when tested on the independent set, while all other factors are kept the same. Again, when an existing method (NBmiRTar) is executed with the our proposed negative data, we observe an improvement in its performance. These clearly establish the effectiveness of the proposed approach of selecting the negative examples systematically. TargetMiner is now available as an online tool at www.isical.ac.in/ approximately bioinfo_miu
mtDNA-Server: next-generation sequencing data analysis of human mitochondrial DNA in the cloud.
Weissensteiner, Hansi; Forer, Lukas; Fuchsberger, Christian; Schöpf, Bernd; Kloss-Brandstätter, Anita; Specht, Günther; Kronenberg, Florian; Schönherr, Sebastian
2016-07-08
Next generation sequencing (NGS) allows investigating mitochondrial DNA (mtDNA) characteristics such as heteroplasmy (i.e. intra-individual sequence variation) to a higher level of detail. While several pipelines for analyzing heteroplasmies exist, issues in usability, accuracy of results and interpreting final data limit their usage. Here we present mtDNA-Server, a scalable web server for the analysis of mtDNA studies of any size with a special focus on usability as well as reliable identification and quantification of heteroplasmic variants. The mtDNA-Server workflow includes parallel read alignment, heteroplasmy detection, artefact or contamination identification, variant annotation as well as several quality control metrics, often neglected in current mtDNA NGS studies. All computational steps are parallelized with Hadoop MapReduce and executed graphically with Cloudgene. We validated the underlying heteroplasmy and contamination detection model by generating four artificial sample mix-ups on two different NGS devices. Our evaluation data shows that mtDNA-Server detects heteroplasmies and artificial recombinations down to the 1% level with perfect specificity and outperforms existing approaches regarding sensitivity. mtDNA-Server is currently able to analyze the 1000G Phase 3 data (n = 2,504) in less than 5 h and is freely accessible at https://mtdna-server.uibk.ac.at. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
An Improved Forwarding of Diverse Events with Mobile Sinks in Underwater Wireless Sensor Networks.
Raza, Waseem; Arshad, Farzana; Ahmed, Imran; Abdul, Wadood; Ghouzali, Sanaa; Niaz, Iftikhar Azim; Javaid, Nadeem
2016-11-04
In this paper, a novel routing strategy to cater the energy consumption and delay sensitivity issues in deep underwater wireless sensor networks is proposed. This strategy is named as ESDR: Event Segregation based Delay sensitive Routing. In this strategy sensed events are segregated on the basis of their criticality and, are forwarded to their respective destinations based on forwarding functions. These functions depend on different routing metrics like: Signal Quality Index, Localization free Signal to Noise Ratio, Energy Cost Function and Depth Dependent Function. The problem of incomparable values of previously defined forwarding functions causes uneven delays in forwarding process. Hence forwarding functions are redefined to ensure their comparable values in different depth regions. Packet forwarding strategy is based on the event segregation approach which forwards one third of the generated events (delay sensitive) to surface sinks and two third events (normal events) are forwarded to mobile sinks. Motion of mobile sinks is influenced by the relative distribution of normal nodes. We have also incorporated two different mobility patterns named as; adaptive mobility and uniform mobility for mobile sinks. The later one is implemented for collecting the packets generated by the normal nodes. These improvements ensure optimum holding time, uniform delay and in-time reporting of delay sensitive events. This scheme is compared with the existing ones and outperforms the existing schemes in terms of network lifetime, delay and throughput.
Natural image sequences constrain dynamic receptive fields and imply a sparse code.
Häusler, Chris; Susemihl, Alex; Nawrot, Martin P
2013-11-06
In their natural environment, animals experience a complex and dynamic visual scenery. Under such natural stimulus conditions, neurons in the visual cortex employ a spatially and temporally sparse code. For the input scenario of natural still images, previous work demonstrated that unsupervised feature learning combined with the constraint of sparse coding can predict physiologically measured receptive fields of simple cells in the primary visual cortex. This convincingly indicated that the mammalian visual system is adapted to the natural spatial input statistics. Here, we extend this approach to the time domain in order to predict dynamic receptive fields that can account for both spatial and temporal sparse activation in biological neurons. We rely on temporal restricted Boltzmann machines and suggest a novel temporal autoencoding training procedure. When tested on a dynamic multi-variate benchmark dataset this method outperformed existing models of this class. Learning features on a large dataset of natural movies allowed us to model spatio-temporal receptive fields for single neurons. They resemble temporally smooth transformations of previously obtained static receptive fields and are thus consistent with existing theories. A neuronal spike response model demonstrates how the dynamic receptive field facilitates temporal and population sparseness. We discuss the potential mechanisms and benefits of a spatially and temporally sparse representation of natural visual input. Copyright © 2013 The Authors. Published by Elsevier B.V. All rights reserved.
Yousefi, Siavash; Qin, Jia; Zhi, Zhongwei
2013-01-01
Optical microangiography is an imaging technology that is capable of providing detailed functional blood flow maps within microcirculatory tissue beds in vivo. Some practical issues however exist when displaying and quantifying the microcirculation that perfuses the scanned tissue volume. These issues include: (I) Probing light is subject to specular reflection when it shines onto sample. The unevenness of the tissue surface makes the light energy entering the tissue not uniform over the entire scanned tissue volume. (II) The biological tissue is heterogeneous in nature, meaning the scattering and absorption properties of tissue would attenuate the probe beam. These physical limitations can result in local contrast degradation and non-uniform micro-angiogram images. In this paper, we propose a post-processing method that uses Rayleigh contrast-limited adaptive histogram equalization to increase the contrast and improve the overall appearance and uniformity of optical micro-angiograms without saturating the vessel intensity and changing the physical meaning of the micro-angiograms. The qualitative and quantitative performance of the proposed method is compared with those of common histogram equalization and contrast enhancement methods. We demonstrate that the proposed method outperforms other existing approaches. The proposed method is not limited to optical microangiography and can be used in other image modalities such as photo-acoustic tomography and scanning laser confocal microscopy. PMID:23482880
PCA-based spatially adaptive denoising of CFA images for single-sensor digital cameras.
Zheng, Lei; Lukac, Rastislav; Wu, Xiaolin; Zhang, David
2009-04-01
Single-sensor digital color cameras use a process called color demosiacking to produce full color images from the data captured by a color filter array (CAF). The quality of demosiacked images is degraded due to the sensor noise introduced during the image acquisition process. The conventional solution to combating CFA sensor noise is demosiacking first, followed by a separate denoising processing. This strategy will generate many noise-caused color artifacts in the demosiacking process, which are hard to remove in the denoising process. Few denoising schemes that work directly on the CFA images have been presented because of the difficulties arisen from the red, green and blue interlaced mosaic pattern, yet a well-designed "denoising first and demosiacking later" scheme can have advantages such as less noise-caused color artifacts and cost-effective implementation. This paper presents a principle component analysis (PCA)-based spatially-adaptive denoising algorithm, which works directly on the CFA data using a supporting window to analyze the local image statistics. By exploiting the spatial and spectral correlations existing in the CFA image, the proposed method can effectively suppress noise while preserving color edges and details. Experiments using both simulated and real CFA images indicate that the proposed scheme outperforms many existing approaches, including those sophisticated demosiacking and denoising schemes, in terms of both objective measurement and visual evaluation.
Risk Classification with an Adaptive Naive Bayes Kernel Machine Model.
Minnier, Jessica; Yuan, Ming; Liu, Jun S; Cai, Tianxi
2015-04-22
Genetic studies of complex traits have uncovered only a small number of risk markers explaining a small fraction of heritability and adding little improvement to disease risk prediction. Standard single marker methods may lack power in selecting informative markers or estimating effects. Most existing methods also typically do not account for non-linearity. Identifying markers with weak signals and estimating their joint effects among many non-informative markers remains challenging. One potential approach is to group markers based on biological knowledge such as gene structure. If markers in a group tend to have similar effects, proper usage of the group structure could improve power and efficiency in estimation. We propose a two-stage method relating markers to disease risk by taking advantage of known gene-set structures. Imposing a naive bayes kernel machine (KM) model, we estimate gene-set specific risk models that relate each gene-set to the outcome in stage I. The KM framework efficiently models potentially non-linear effects of predictors without requiring explicit specification of functional forms. In stage II, we aggregate information across gene-sets via a regularization procedure. Estimation and computational efficiency is further improved with kernel principle component analysis. Asymptotic results for model estimation and gene set selection are derived and numerical studies suggest that the proposed procedure could outperform existing procedures for constructing genetic risk models.
Searching Remote Homology with Spectral Clustering with Symmetry in Neighborhood Cluster Kernels
Maulik, Ujjwal; Sarkar, Anasua
2013-01-01
Remote homology detection among proteins utilizing only the unlabelled sequences is a central problem in comparative genomics. The existing cluster kernel methods based on neighborhoods and profiles and the Markov clustering algorithms are currently the most popular methods for protein family recognition. The deviation from random walks with inflation or dependency on hard threshold in similarity measure in those methods requires an enhancement for homology detection among multi-domain proteins. We propose to combine spectral clustering with neighborhood kernels in Markov similarity for enhancing sensitivity in detecting homology independent of “recent” paralogs. The spectral clustering approach with new combined local alignment kernels more effectively exploits the unsupervised protein sequences globally reducing inter-cluster walks. When combined with the corrections based on modified symmetry based proximity norm deemphasizing outliers, the technique proposed in this article outperforms other state-of-the-art cluster kernels among all twelve implemented kernels. The comparison with the state-of-the-art string and mismatch kernels also show the superior performance scores provided by the proposed kernels. Similar performance improvement also is found over an existing large dataset. Therefore the proposed spectral clustering framework over combined local alignment kernels with modified symmetry based correction achieves superior performance for unsupervised remote homolog detection even in multi-domain and promiscuous domain proteins from Genolevures database families with better biological relevance. Source code available upon request. Contact: sarkar@labri.fr. PMID:23457439
Cang, Zixuan; Wei, Guo-Wei
2018-02-01
Protein-ligand binding is a fundamental biological process that is paramount to many other biological processes, such as signal transduction, metabolic pathways, enzyme construction, cell secretion, and gene expression. Accurate prediction of protein-ligand binding affinities is vital to rational drug design and the understanding of protein-ligand binding and binding induced function. Existing binding affinity prediction methods are inundated with geometric detail and involve excessively high dimensions, which undermines their predictive power for massive binding data. Topology provides the ultimate level of abstraction and thus incurs too much reduction in geometric information. Persistent homology embeds geometric information into topological invariants and bridges the gap between complex geometry and abstract topology. However, it oversimplifies biological information. This work introduces element specific persistent homology (ESPH) or multicomponent persistent homology to retain crucial biological information during topological simplification. The combination of ESPH and machine learning gives rise to a powerful paradigm for macromolecular analysis. Tests on 2 large data sets indicate that the proposed topology-based machine-learning paradigm outperforms other existing methods in protein-ligand binding affinity predictions. ESPH reveals protein-ligand binding mechanism that can not be attained from other conventional techniques. The present approach reveals that protein-ligand hydrophobic interactions are extended to 40Å away from the binding site, which has a significant ramification to drug and protein design. Copyright © 2017 John Wiley & Sons, Ltd.
Searching remote homology with spectral clustering with symmetry in neighborhood cluster kernels.
Maulik, Ujjwal; Sarkar, Anasua
2013-01-01
Remote homology detection among proteins utilizing only the unlabelled sequences is a central problem in comparative genomics. The existing cluster kernel methods based on neighborhoods and profiles and the Markov clustering algorithms are currently the most popular methods for protein family recognition. The deviation from random walks with inflation or dependency on hard threshold in similarity measure in those methods requires an enhancement for homology detection among multi-domain proteins. We propose to combine spectral clustering with neighborhood kernels in Markov similarity for enhancing sensitivity in detecting homology independent of "recent" paralogs. The spectral clustering approach with new combined local alignment kernels more effectively exploits the unsupervised protein sequences globally reducing inter-cluster walks. When combined with the corrections based on modified symmetry based proximity norm deemphasizing outliers, the technique proposed in this article outperforms other state-of-the-art cluster kernels among all twelve implemented kernels. The comparison with the state-of-the-art string and mismatch kernels also show the superior performance scores provided by the proposed kernels. Similar performance improvement also is found over an existing large dataset. Therefore the proposed spectral clustering framework over combined local alignment kernels with modified symmetry based correction achieves superior performance for unsupervised remote homolog detection even in multi-domain and promiscuous domain proteins from Genolevures database families with better biological relevance. Source code available upon request. sarkar@labri.fr.
Enriching consumer health vocabulary through mining a social Q&A site: A similarity-based approach.
He, Zhe; Chen, Zhiwei; Oh, Sanghee; Hou, Jinghui; Bian, Jiang
2017-05-01
The widely known vocabulary gap between health consumers and healthcare professionals hinders information seeking and health dialogue of consumers on end-user health applications. The Open Access and Collaborative Consumer Health Vocabulary (OAC CHV), which contains health-related terms used by lay consumers, has been created to bridge such a gap. Specifically, the OAC CHV facilitates consumers' health information retrieval by enabling consumer-facing health applications to translate between professional language and consumer friendly language. To keep up with the constantly evolving medical knowledge and language use, new terms need to be identified and added to the OAC CHV. User-generated content on social media, including social question and answer (social Q&A) sites, afford us an enormous opportunity in mining consumer health terms. Existing methods of identifying new consumer terms from text typically use ad-hoc lexical syntactic patterns and human review. Our study extends an existing method by extracting n-grams from a social Q&A textual corpus and representing them with a rich set of contextual and syntactic features. Using K-means clustering, our method, simiTerm, was able to identify terms that are both contextually and syntactically similar to the existing OAC CHV terms. We tested our method on social Q&A corpora on two disease domains: diabetes and cancer. Our method outperformed three baseline ranking methods. A post-hoc qualitative evaluation by human experts further validated that our method can effectively identify meaningful new consumer terms on social Q&A. Copyright © 2017 Elsevier Inc. All rights reserved.
Li, Ben; Sun, Zhaonan; He, Qing; Zhu, Yu; Qin, Zhaohui S.
2016-01-01
Motivation: Modern high-throughput biotechnologies such as microarray are capable of producing a massive amount of information for each sample. However, in a typical high-throughput experiment, only limited number of samples were assayed, thus the classical ‘large p, small n’ problem. On the other hand, rapid propagation of these high-throughput technologies has resulted in a substantial collection of data, often carried out on the same platform and using the same protocol. It is highly desirable to utilize the existing data when performing analysis and inference on a new dataset. Results: Utilizing existing data can be carried out in a straightforward fashion under the Bayesian framework in which the repository of historical data can be exploited to build informative priors and used in new data analysis. In this work, using microarray data, we investigate the feasibility and effectiveness of deriving informative priors from historical data and using them in the problem of detecting differentially expressed genes. Through simulation and real data analysis, we show that the proposed strategy significantly outperforms existing methods including the popular and state-of-the-art Bayesian hierarchical model-based approaches. Our work illustrates the feasibility and benefits of exploiting the increasingly available genomics big data in statistical inference and presents a promising practical strategy for dealing with the ‘large p, small n’ problem. Availability and implementation: Our method is implemented in R package IPBT, which is freely available from https://github.com/benliemory/IPBT. Contact: yuzhu@purdue.edu; zhaohui.qin@emory.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26519502
BioGraph: unsupervised biomedical knowledge discovery via automated hypothesis generation
2011-01-01
We present BioGraph, a data integration and data mining platform for the exploration and discovery of biomedical information. The platform offers prioritizations of putative disease genes, supported by functional hypotheses. We show that BioGraph can retrospectively confirm recently discovered disease genes and identify potential susceptibility genes, outperforming existing technologies, without requiring prior domain knowledge. Additionally, BioGraph allows for generic biomedical applications beyond gene discovery. BioGraph is accessible at http://www.biograph.be. PMID:21696594
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tuo, Rui; Wu, C. F. Jeff
Many computer models contain unknown parameters which need to be estimated using physical observations. Furthermore, the calibration method based on Gaussian process models may lead to unreasonable estimate for imperfect computer models. In this work, we extend their study to calibration problems with stochastic physical data. We propose a novel method, called the L 2 calibration, and show its semiparametric efficiency. The conventional method of the ordinary least squares is also studied. Theoretical analysis shows that it is consistent but not efficient. Here, numerical examples show that the proposed method outperforms the existing ones.
A reconsideration of negative ratings for network-based recommendation
NASA Astrophysics Data System (ADS)
Hu, Liang; Ren, Liang; Lin, Wenbin
2018-01-01
Recommendation algorithms based on bipartite networks have become increasingly popular, thanks to their accuracy and flexibility. Currently, many of these methods ignore users' negative ratings. In this work, we propose a method to exploit negative ratings for the network-based inference algorithm. We find that negative ratings play a positive role regardless of sparsity of data sets. Furthermore, we improve the efficiency of our method and compare it with the state-of-the-art algorithms. Experimental results show that the present method outperforms the existing algorithms.
Graham, Jim; Young, Nick; Jarnevich, Catherine S.; Newman, Greg; Evangelista, Paul; Stohlgren, Thomas J.
2013-01-01
Habitat suitability maps are commonly created by modeling a species’ environmental niche from occurrences and environmental characteristics. Here, we introduce the hyper-envelope modeling interface (HEMI), providing a new method for creating habitat suitability models using Bezier surfaces to model a species niche in environmental space. HEMI allows modeled surfaces to be visualized and edited in environmental space based on expert knowledge and does not require absence points for model development. The modeled surfaces require relatively few parameters compared to similar modeling approaches and may produce models that better match ecological niche theory. As a case study, we modeled the invasive species tamarisk (Tamarix spp.) in the western USA. We compare results from HEMI with those from existing similar modeling approaches (including BioClim, BioMapper, and Maxent). We used synthetic surfaces to create visualizations of the various models in environmental space and used modified area under the curve (AUC) statistic and akaike information criterion (AIC) as measures of model performance. We show that HEMI produced slightly better AUC values, except for Maxent and better AIC values overall. HEMI created a model with only ten parameters while Maxent produced a model with over 100 and BioClim used only eight. Additionally, HEMI allowed visualization and editing of the model in environmental space to develop alternative potential habitat scenarios. The use of Bezier surfaces can provide simple models that match our expectations of biological niche models and, at least in some cases, out-perform more complex approaches.
Benchmarking Deep Learning Models on Large Healthcare Datasets.
Purushotham, Sanjay; Meng, Chuizheng; Che, Zhengping; Liu, Yan
2018-06-04
Deep learning models (aka Deep Neural Networks) have revolutionized many fields including computer vision, natural language processing, speech recognition, and is being increasingly used in clinical healthcare applications. However, few works exist which have benchmarked the performance of the deep learning models with respect to the state-of-the-art machine learning models and prognostic scoring systems on publicly available healthcare datasets. In this paper, we present the benchmarking results for several clinical prediction tasks such as mortality prediction, length of stay prediction, and ICD-9 code group prediction using Deep Learning models, ensemble of machine learning models (Super Learner algorithm), SAPS II and SOFA scores. We used the Medical Information Mart for Intensive Care III (MIMIC-III) (v1.4) publicly available dataset, which includes all patients admitted to an ICU at the Beth Israel Deaconess Medical Center from 2001 to 2012, for the benchmarking tasks. Our results show that deep learning models consistently outperform all the other approaches especially when the 'raw' clinical time series data is used as input features to the models. Copyright © 2018 Elsevier Inc. All rights reserved.
Binary Interval Search: a scalable algorithm for counting interval intersections
Layer, Ryan M.; Skadron, Kevin; Robins, Gabriel; Hall, Ira M.; Quinlan, Aaron R.
2013-01-01
Motivation: The comparison of diverse genomic datasets is fundamental to understand genome biology. Researchers must explore many large datasets of genome intervals (e.g. genes, sequence alignments) to place their experimental results in a broader context and to make new discoveries. Relationships between genomic datasets are typically measured by identifying intervals that intersect, that is, they overlap and thus share a common genome interval. Given the continued advances in DNA sequencing technologies, efficient methods for measuring statistically significant relationships between many sets of genomic features are crucial for future discovery. Results: We introduce the Binary Interval Search (BITS) algorithm, a novel and scalable approach to interval set intersection. We demonstrate that BITS outperforms existing methods at counting interval intersections. Moreover, we show that BITS is intrinsically suited to parallel computing architectures, such as graphics processing units by illustrating its utility for efficient Monte Carlo simulations measuring the significance of relationships between sets of genomic intervals. Availability: https://github.com/arq5x/bits. Contact: arq5x@virginia.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23129298
S4: A spatial-spectral model for speckle suppression
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fergus, Rob; Hogg, David W.; Oppenheimer, Rebecca
2014-10-20
High dynamic range imagers aim to block or eliminate light from a very bright primary star in order to make it possible to detect and measure far fainter companions; in real systems, a small fraction of the primary light is scattered, diffracted, and unocculted. We introduce S4, a flexible data-driven model for the unocculted (and highly speckled) light in the P1640 spectroscopic coronagraph. The model uses principal components analysis (PCA) to capture the spatial structure and wavelength dependence of the speckles, but not the signal produced by any companion. Consequently, the residual typically includes the companion signal. The companion canmore » thus be found by filtering this error signal with a fixed companion model. The approach is sensitive to companions that are of the order of a percent of the brightness of the speckles, or up to 10{sup –7} times the brightness of the primary star. This outperforms existing methods by a factor of two to three and is close to the shot-noise physical limit.« less
Unsupervised Anomaly Detection Based on Clustering and Multiple One-Class SVM
NASA Astrophysics Data System (ADS)
Song, Jungsuk; Takakura, Hiroki; Okabe, Yasuo; Kwon, Yongjin
Intrusion detection system (IDS) has played an important role as a device to defend our networks from cyber attacks. However, since it is unable to detect unknown attacks, i.e., 0-day attacks, the ultimate challenge in intrusion detection field is how we can exactly identify such an attack by an automated manner. Over the past few years, several studies on solving these problems have been made on anomaly detection using unsupervised learning techniques such as clustering, one-class support vector machine (SVM), etc. Although they enable one to construct intrusion detection models at low cost and effort, and have capability to detect unforeseen attacks, they still have mainly two problems in intrusion detection: a low detection rate and a high false positive rate. In this paper, we propose a new anomaly detection method based on clustering and multiple one-class SVM in order to improve the detection rate while maintaining a low false positive rate. We evaluated our method using KDD Cup 1999 data set. Evaluation results show that our approach outperforms the existing algorithms reported in the literature; especially in detection of unknown attacks.
Compressive sensing of high betweenness centrality nodes in networks
NASA Astrophysics Data System (ADS)
Mahyar, Hamidreza; Hasheminezhad, Rouzbeh; Ghalebi K., Elahe; Nazemian, Ali; Grosu, Radu; Movaghar, Ali; Rabiee, Hamid R.
2018-05-01
Betweenness centrality is a prominent centrality measure expressing importance of a node within a network, in terms of the fraction of shortest paths passing through that node. Nodes with high betweenness centrality have significant impacts on the spread of influence and idea in social networks, the user activity in mobile phone networks, the contagion process in biological networks, and the bottlenecks in communication networks. Thus, identifying k-highest betweenness centrality nodes in networks will be of great interest in many applications. In this paper, we introduce CS-HiBet, a new method to efficiently detect top- k betweenness centrality nodes in networks, using compressive sensing. CS-HiBet can perform as a distributed algorithm by using only the local information at each node. Hence, it is applicable to large real-world and unknown networks in which the global approaches are usually unrealizable. The performance of the proposed method is evaluated by extensive simulations on several synthetic and real-world networks. The experimental results demonstrate that CS-HiBet outperforms the best existing methods with notable improvements.
Chen Peng; Ao Li
2017-01-01
The emergence of multi-dimensional data offers opportunities for more comprehensive analysis of the molecular characteristics of human diseases and therefore improving diagnosis, treatment, and prevention. In this study, we proposed a heterogeneous network based method by integrating multi-dimensional data (HNMD) to identify GBM-related genes. The novelty of the method lies in that the multi-dimensional data of GBM from TCGA dataset that provide comprehensive information of genes, are combined with protein-protein interactions to construct a weighted heterogeneous network, which reflects both the general and disease-specific relationships between genes. In addition, a propagation algorithm with resistance is introduced to precisely score and rank GBM-related genes. The results of comprehensive performance evaluation show that the proposed method significantly outperforms the network based methods with single-dimensional data and other existing approaches. Subsequent analysis of the top ranked genes suggests they may be functionally implicated in GBM, which further corroborates the superiority of the proposed method. The source code and the results of HNMD can be downloaded from the following URL: http://bioinformatics.ustc.edu.cn/hnmd/ .
HomoTarget: a new algorithm for prediction of microRNA targets in Homo sapiens.
Ahmadi, Hamed; Ahmadi, Ali; Azimzadeh-Jamalkandi, Sadegh; Shoorehdeli, Mahdi Aliyari; Salehzadeh-Yazdi, Ali; Bidkhori, Gholamreza; Masoudi-Nejad, Ali
2013-02-01
MiRNAs play an essential role in the networks of gene regulation by inhibiting the translation of target mRNAs. Several computational approaches have been proposed for the prediction of miRNA target-genes. Reports reveal a large fraction of under-predicted or falsely predicted target genes. Thus, there is an imperative need to develop a computational method by which the target mRNAs of existing miRNAs can be correctly identified. In this study, combined pattern recognition neural network (PRNN) and principle component analysis (PCA) architecture has been proposed in order to model the complicated relationship between miRNAs and their target mRNAs in humans. The results of several types of intelligent classifiers and our proposed model were compared, showing that our algorithm outperformed them with higher sensitivity and specificity. Using the recent release of the mirBase database to find potential targets of miRNAs, this model incorporated twelve structural, thermodynamic and positional features of miRNA:mRNA binding sites to select target candidates. Copyright © 2012 Elsevier Inc. All rights reserved.
Accurately estimating PSF with straight lines detected by Hough transform
NASA Astrophysics Data System (ADS)
Wang, Ruichen; Xu, Liangpeng; Fan, Chunxiao; Li, Yong
2018-04-01
This paper presents an approach to estimating point spread function (PSF) from low resolution (LR) images. Existing techniques usually rely on accurate detection of ending points of the profile normal to edges. In practice however, it is often a great challenge to accurately localize profiles of edges from a LR image, which hence leads to a poor PSF estimation of the lens taking the LR image. For precisely estimating the PSF, this paper proposes firstly estimating a 1-D PSF kernel with straight lines, and then robustly obtaining the 2-D PSF from the 1-D kernel by least squares techniques and random sample consensus. Canny operator is applied to the LR image for obtaining edges and then Hough transform is utilized to extract straight lines of all orientations. Estimating 1-D PSF kernel with straight lines effectively alleviates the influence of the inaccurate edge detection on PSF estimation. The proposed method is investigated on both natural and synthetic images for estimating PSF. Experimental results show that the proposed method outperforms the state-ofthe- art and does not rely on accurate edge detection.
Chen, Yun; Yang, Hui
2016-01-01
In the era of big data, there are increasing interests on clustering variables for the minimization of data redundancy and the maximization of variable relevancy. Existing clustering methods, however, depend on nontrivial assumptions about the data structure. Note that nonlinear interdependence among variables poses significant challenges on the traditional framework of predictive modeling. In the present work, we reformulate the problem of variable clustering from an information theoretic perspective that does not require the assumption of data structure for the identification of nonlinear interdependence among variables. Specifically, we propose the use of mutual information to characterize and measure nonlinear correlation structures among variables. Further, we develop Dirichlet process (DP) models to cluster variables based on the mutual-information measures among variables. Finally, orthonormalized variables in each cluster are integrated with group elastic-net model to improve the performance of predictive modeling. Both simulation and real-world case studies showed that the proposed methodology not only effectively reveals the nonlinear interdependence structures among variables but also outperforms traditional variable clustering algorithms such as hierarchical clustering. PMID:27966581
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kou, Qiang; Wu, Si; Tolić, Nikola
Motivation: Although proteomics has rapidly developed in the past decade, researchers are still in the early stage of exploring the world of complex proteoforms, which are protein products with various primary structure alterations resulting from gene mutations, alternative splicing, post-translational modifications, and other biological processes. Proteoform identification is essential to mapping proteoforms to their biological functions as well as discovering novel proteoforms and new protein functions. Top-down mass spectrometry is the method of choice for identifying complex proteoforms because it provides a “bird’s eye view” of intact proteoforms. The combinatorial explosion of various alterations on a protein may result inmore » billions of possible proteoforms, making proteoform identification a challenging computational problem. Results: We propose a new data structure, called the mass graph, for efficient representation of proteoforms and design mass graph alignment algorithms. We developed TopMG, a mass graph-based software tool for proteoform identification by top-down mass spectrometry. Experiments on top-down mass spectrometry data sets showed that TopMG outperformed existing methods in identifying complex proteoforms.« less
Xiao, Zhu; Havyarimana, Vincent; Li, Tong; Wang, Dong
2016-05-13
In this paper, a novel nonlinear framework of smoothing method, non-Gaussian delayed particle smoother (nGDPS), is proposed, which enables vehicle state estimation (VSE) with high accuracy taking into account the non-Gaussianity of the measurement and process noises. Within the proposed method, the multivariate Student's t-distribution is adopted in order to compute the probability distribution function (PDF) related to the process and measurement noises, which are assumed to be non-Gaussian distributed. A computation approach based on Ensemble Kalman Filter (EnKF) is designed to cope with the mean and the covariance matrix of the proposal non-Gaussian distribution. A delayed Gibbs sampling algorithm, which incorporates smoothing of the sampled trajectories over a fixed-delay, is proposed to deal with the sample degeneracy of particles. The performance is investigated based on the real-world data, which is collected by low-cost on-board vehicle sensors. The comparison study based on the real-world experiments and the statistical analysis demonstrates that the proposed nGDPS has significant improvement on the vehicle state accuracy and outperforms the existing filtering and smoothing methods.
An Improved BLE Indoor Localization with Kalman-Based Fusion: An Experimental Study
Röbesaat, Jenny; Zhang, Peilin; Abdelaal, Mohamed; Theel, Oliver
2017-01-01
Indoor positioning has grasped great attention in recent years. A number of efforts have been exerted to achieve high positioning accuracy. However, there exists no technology that proves its efficacy in various situations. In this paper, we propose a novel positioning method based on fusing trilateration and dead reckoning. We employ Kalman filtering as a position fusion algorithm. Moreover, we adopt an Android device with Bluetooth Low Energy modules as the communication platform to avoid excessive energy consumption and to improve the stability of the received signal strength. To further improve the positioning accuracy, we take the environmental context information into account while generating the position fixes. Extensive experiments in a testbed are conducted to examine the performance of three approaches: trilateration, dead reckoning and the fusion method. Additionally, the influence of the knowledge of the environmental context is also examined. Finally, our proposed fusion method outperforms both trilateration and dead reckoning in terms of accuracy: experimental results show that the Kalman-based fusion, for our settings, achieves a positioning accuracy of less than one meter. PMID:28445421
Nonrigid Image Registration in Digital Subtraction Angiography Using Multilevel B-Spline
2013-01-01
We address the problem of motion artifact reduction in digital subtraction angiography (DSA) using image registration techniques. Most of registration algorithms proposed for application in DSA, have been designed for peripheral and cerebral angiography images in which we mainly deal with global rigid motions. These algorithms did not yield good results when applied to coronary angiography images because of complex nonrigid motions that exist in this type of angiography images. Multiresolution and iterative algorithms are proposed to cope with this problem, but these algorithms are associated with high computational cost which makes them not acceptable for real-time clinical applications. In this paper we propose a nonrigid image registration algorithm for coronary angiography images that is significantly faster than multiresolution and iterative blocking methods and outperforms competing algorithms evaluated on the same data sets. This algorithm is based on a sparse set of matched feature point pairs and the elastic registration is performed by means of multilevel B-spline image warping. Experimental results with several clinical data sets demonstrate the effectiveness of our approach. PMID:23971026
Anomaly Detection in Large Sets of High-Dimensional Symbol Sequences
NASA Technical Reports Server (NTRS)
Budalakoti, Suratna; Srivastava, Ashok N.; Akella, Ram; Turkov, Eugene
2006-01-01
This paper addresses the problem of detecting and describing anomalies in large sets of high-dimensional symbol sequences. The approach taken uses unsupervised clustering of sequences using the normalized longest common subsequence (LCS) as a similarity measure, followed by detailed analysis of outliers to detect anomalies. As the LCS measure is expensive to compute, the first part of the paper discusses existing algorithms, such as the Hunt-Szymanski algorithm, that have low time-complexity. We then discuss why these algorithms often do not work well in practice and present a new hybrid algorithm for computing the LCS that, in our tests, outperforms the Hunt-Szymanski algorithm by a factor of five. The second part of the paper presents new algorithms for outlier analysis that provide comprehensible indicators as to why a particular sequence was deemed to be an outlier. The algorithms provide a coherent description to an analyst of the anomalies in the sequence, compared to more normal sequences. The algorithms we present are general and domain-independent, so we discuss applications in related areas such as anomaly detection.
Chen, Yun; Yang, Hui
2016-12-14
In the era of big data, there are increasing interests on clustering variables for the minimization of data redundancy and the maximization of variable relevancy. Existing clustering methods, however, depend on nontrivial assumptions about the data structure. Note that nonlinear interdependence among variables poses significant challenges on the traditional framework of predictive modeling. In the present work, we reformulate the problem of variable clustering from an information theoretic perspective that does not require the assumption of data structure for the identification of nonlinear interdependence among variables. Specifically, we propose the use of mutual information to characterize and measure nonlinear correlation structures among variables. Further, we develop Dirichlet process (DP) models to cluster variables based on the mutual-information measures among variables. Finally, orthonormalized variables in each cluster are integrated with group elastic-net model to improve the performance of predictive modeling. Both simulation and real-world case studies showed that the proposed methodology not only effectively reveals the nonlinear interdependence structures among variables but also outperforms traditional variable clustering algorithms such as hierarchical clustering.
Bayesian network prior: network analysis of biological data using external knowledge
Isci, Senol; Dogan, Haluk; Ozturk, Cengizhan; Otu, Hasan H.
2014-01-01
Motivation: Reverse engineering GI networks from experimental data is a challenging task due to the complex nature of the networks and the noise inherent in the data. One way to overcome these hurdles would be incorporating the vast amounts of external biological knowledge when building interaction networks. We propose a framework where GI networks are learned from experimental data using Bayesian networks (BNs) and the incorporation of external knowledge is also done via a BN that we call Bayesian Network Prior (BNP). BNP depicts the relation between various evidence types that contribute to the event ‘gene interaction’ and is used to calculate the probability of a candidate graph (G) in the structure learning process. Results: Our simulation results on synthetic, simulated and real biological data show that the proposed approach can identify the underlying interaction network with high accuracy even when the prior information is distorted and outperforms existing methods. Availability: Accompanying BNP software package is freely available for academic use at http://bioe.bilgi.edu.tr/BNP. Contact: hasan.otu@bilgi.edu.tr Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:24215027
A gossip based information fusion protocol for distributed frequent itemset mining
NASA Astrophysics Data System (ADS)
Sohrabi, Mohammad Karim
2018-07-01
The computational complexity, huge memory space requirement, and time-consuming nature of frequent pattern mining process are the most important motivations for distribution and parallelization of this mining process. On the other hand, the emergence of distributed computational and operational environments, which causes the production and maintenance of data on different distributed data sources, makes the parallelization and distribution of the knowledge discovery process inevitable. In this paper, a gossip based distributed itemset mining (GDIM) algorithm is proposed to extract frequent itemsets, which are special types of frequent patterns, in a wireless sensor network environment. In this algorithm, local frequent itemsets of each sensor are extracted using a bit-wise horizontal approach (LHPM) from the nodes which are clustered using a leach-based protocol. Heads of clusters exploit a gossip based protocol in order to communicate each other to find the patterns which their global support is equal to or more than the specified support threshold. Experimental results show that the proposed algorithm outperforms the best existing gossip based algorithm in term of execution time.
Lin, Hongli; Yang, Xuedong; Wang, Weisheng
2014-08-01
Devising a method that can select cases based on the performance levels of trainees and the characteristics of cases is essential for developing a personalized training program in radiology education. In this paper, we propose a novel hybrid prediction algorithm called content-boosted collaborative filtering (CBCF) to predict the difficulty level of each case for each trainee. The CBCF utilizes a content-based filtering (CBF) method to enhance existing trainee-case ratings data and then provides final predictions through a collaborative filtering (CF) algorithm. The CBCF algorithm incorporates the advantages of both CBF and CF, while not inheriting the disadvantages of either. The CBCF method is compared with the pure CBF and pure CF approaches using three datasets. The experimental data are then evaluated in terms of the MAE metric. Our experimental results show that the CBCF outperforms the pure CBF and CF methods by 13.33 and 12.17 %, respectively, in terms of prediction precision. This also suggests that the CBCF can be used in the development of personalized training systems in radiology education.
BFDCA: A Comprehensive Tool of Using Bayes Factor for Differential Co-Expression Analysis.
Wang, Duolin; Wang, Juexin; Jiang, Yuexu; Liang, Yanchun; Xu, Dong
2017-02-03
Comparing the gene-expression profiles between biological conditions is useful for understanding gene regulation underlying complex phenotypes. Along this line, analysis of differential co-expression (DC) has gained attention in the recent years, where genes under one condition have different co-expression patterns compared with another. We developed an R package Bayes Factor approach for Differential Co-expression Analysis (BFDCA) for DC analysis. BFDCA is unique in integrating various aspects of DC patterns (including Shift, Cross, and Re-wiring) into one uniform Bayes factor. We tested BFDCA using simulation data and experimental data. Simulation results indicate that BFDCA outperforms existing methods in accuracy and robustness of detecting DC pairs and DC modules. Results of using experimental data suggest that BFDCA can cluster disease-related genes into functional DC subunits and estimate the regulatory impact of disease-related genes well. BFDCA also achieves high accuracy in predicting case-control phenotypes by using significant DC gene pairs as markers. BFDCA is publicly available at http://dx.doi.org/10.17632/jdz4vtvnm3.1. Copyright © 2016 Elsevier Ltd. All rights reserved.
Miwa, Makoto; Ohta, Tomoko; Rak, Rafal; Rowley, Andrew; Kell, Douglas B.; Pyysalo, Sampo; Ananiadou, Sophia
2013-01-01
Motivation: To create, verify and maintain pathway models, curators must discover and assess knowledge distributed over the vast body of biological literature. Methods supporting these tasks must understand both the pathway model representations and the natural language in the literature. These methods should identify and order documents by relevance to any given pathway reaction. No existing system has addressed all aspects of this challenge. Method: We present novel methods for associating pathway model reactions with relevant publications. Our approach extracts the reactions directly from the models and then turns them into queries for three text mining-based MEDLINE literature search systems. These queries are executed, and the resulting documents are combined and ranked according to their relevance to the reactions of interest. We manually annotate document-reaction pairs with the relevance of the document to the reaction and use this annotation to study several ranking methods, using various heuristic and machine-learning approaches. Results: Our evaluation shows that the annotated document-reaction pairs can be used to create a rule-based document ranking system, and that machine learning can be used to rank documents by their relevance to pathway reactions. We find that a Support Vector Machine-based system outperforms several baselines and matches the performance of the rule-based system. The success of the query extraction and ranking methods are used to update our existing pathway search system, PathText. Availability: An online demonstration of PathText 2 and the annotated corpus are available for research purposes at http://www.nactem.ac.uk/pathtext2/. Contact: makoto.miwa@manchester.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23813008
Reflectance Prediction Modelling for Residual-Based Hyperspectral Image Coding
Xiao, Rui; Gao, Junbin; Bossomaier, Terry
2016-01-01
A Hyperspectral (HS) image provides observational powers beyond human vision capability but represents more than 100 times the data compared to a traditional image. To transmit and store the huge volume of an HS image, we argue that a fundamental shift is required from the existing “original pixel intensity”-based coding approaches using traditional image coders (e.g., JPEG2000) to the “residual”-based approaches using a video coder for better compression performance. A modified video coder is required to exploit spatial-spectral redundancy using pixel-level reflectance modelling due to the different characteristics of HS images in their spectral and shape domain of panchromatic imagery compared to traditional videos. In this paper a novel coding framework using Reflectance Prediction Modelling (RPM) in the latest video coding standard High Efficiency Video Coding (HEVC) for HS images is proposed. An HS image presents a wealth of data where every pixel is considered a vector for different spectral bands. By quantitative comparison and analysis of pixel vector distribution along spectral bands, we conclude that modelling can predict the distribution and correlation of the pixel vectors for different bands. To exploit distribution of the known pixel vector, we estimate a predicted current spectral band from the previous bands using Gaussian mixture-based modelling. The predicted band is used as the additional reference band together with the immediate previous band when we apply the HEVC. Every spectral band of an HS image is treated like it is an individual frame of a video. In this paper, we compare the proposed method with mainstream encoders. The experimental results are fully justified by three types of HS dataset with different wavelength ranges. The proposed method outperforms the existing mainstream HS encoders in terms of rate-distortion performance of HS image compression. PMID:27695102
Rainfall disaggregation for urban hydrology: Effects of spatial consistence
NASA Astrophysics Data System (ADS)
Müller, Hannes; Haberlandt, Uwe
2015-04-01
For urban hydrology rainfall time series with a high temporal resolution are crucial. Observed time series of this kind are very short in most cases, so they cannot be used. On the contrary, time series with lower temporal resolution (daily measurements) exist for much longer periods. The objective is to derive time series with a long duration and a high resolution by disaggregating time series of the non-recording stations with information of time series of the recording stations. The multiplicative random cascade model is a well-known disaggregation model for daily time series. For urban hydrology it is often assumed, that a day consists of only 1280 minutes in total as starting point for the disaggregation process. We introduce a new variant for the cascade model, which is functional without this assumption and also outperforms the existing approach regarding time series characteristics like wet and dry spell duration, average intensity, fraction of dry intervals and extreme value representation. However, in both approaches rainfall time series of different stations are disaggregated without consideration of surrounding stations. This yields in unrealistic spatial patterns of rainfall. We apply a simulated annealing algorithm that has been used successfully for hourly values before. Relative diurnal cycles of the disaggregated time series are resampled to reproduce the spatial dependence of rainfall. To describe spatial dependence we use bivariate characteristics like probability of occurrence, continuity ratio and coefficient of correlation. Investigation area is a sewage system in Northern Germany. We show that the algorithm has the capability to improve spatial dependence. The influence of the chosen disaggregation routine and the spatial dependence on overflow occurrences and volumes of the sewage system will be analyzed.
Li, Xingyu; Plataniotis, Konstantinos N
2015-07-01
In digital histopathology, tasks of segmentation and disease diagnosis are achieved by quantitative analysis of image content. However, color variation in image samples makes it challenging to produce reliable results. This paper introduces a complete normalization scheme to address the problem of color variation in histopathology images jointly caused by inconsistent biopsy staining and nonstandard imaging condition. Method : Different from existing normalization methods that either address partial cause of color variation or lump them together, our method identifies causes of color variation based on a microscopic imaging model and addresses inconsistency in biopsy imaging and staining by an illuminant normalization module and a spectral normalization module, respectively. In evaluation, we use two public datasets that are representative of histopathology images commonly received in clinics to examine the proposed method from the aspects of robustness to system settings, performance consistency against achromatic pixels, and normalization effectiveness in terms of histological information preservation. As the saturation-weighted statistics proposed in this study generates stable and reliable color cues for stain normalization, our scheme is robust to system parameters and insensitive to image content and achromatic colors. Extensive experimentation suggests that our approach outperforms state-of-the-art normalization methods as the proposed method is the only approach that succeeds to preserve histological information after normalization. The proposed color normalization solution would be useful to mitigate effects of color variation in pathology images on subsequent quantitative analysis.
NASA Astrophysics Data System (ADS)
Effati, Meysam; Thill, Jean-Claude; Shabani, Shahin
2015-04-01
The contention of this paper is that many social science research problems are too "wicked" to be suitably studied using conventional statistical and regression-based methods of data analysis. This paper argues that an integrated geospatial approach based on methods of machine learning is well suited to this purpose. Recognizing the intrinsic wickedness of traffic safety issues, such approach is used to unravel the complexity of traffic crash severity on highway corridors as an example of such problems. The support vector machine (SVM) and coactive neuro-fuzzy inference system (CANFIS) algorithms are tested as inferential engines to predict crash severity and uncover spatial and non-spatial factors that systematically relate to crash severity, while a sensitivity analysis is conducted to determine the relative influence of crash severity factors. Different specifications of the two methods are implemented, trained, and evaluated against crash events recorded over a 4-year period on a regional highway corridor in Northern Iran. Overall, the SVM model outperforms CANFIS by a notable margin. The combined use of spatial analysis and artificial intelligence is effective at identifying leading factors of crash severity, while explicitly accounting for spatial dependence and spatial heterogeneity effects. Thanks to the demonstrated effectiveness of a sensitivity analysis, this approach produces comprehensive results that are consistent with existing traffic safety theories and supports the prioritization of effective safety measures that are geographically targeted and behaviorally sound on regional highway corridors.
NASA Astrophysics Data System (ADS)
Xu, Z.; Guan, K.; Peng, B.; Casler, N. P.; Wang, S. W.
2017-12-01
Landscape has complex three-dimensional features. These 3D features are difficult to extract using conventional methods. Small-footprint LiDAR provides an ideal way for capturing these features. Existing approaches, however, have been relegated to raster or metric-based (two-dimensional) feature extraction from the upper or bottom layer, and thus are not suitable for resolving morphological and intensity features that could be important to fine-scale land cover mapping. Therefore, this research combines airborne LiDAR and multi-temporal Landsat imagery to classify land cover types of Williamson County, Illinois that has diverse and mixed landscape features. Specifically, we applied a 3D convolutional neural network (CNN) method to extract features from LiDAR point clouds by (1) creating occupancy grid, intensity grid at 1-meter resolution, and then (2) normalizing and incorporating data into a 3D CNN feature extractor for many epochs of learning. The learned features (e.g., morphological features, intensity features, etc) were combined with multi-temporal spectral data to enhance the performance of land cover classification based on a Support Vector Machine classifier. We used photo interpretation for training and testing data generation. The classification results show that our approach outperforms traditional methods using LiDAR derived feature maps, and promises to serve as an effective methodology for creating high-quality land cover maps through fusion of complementary types of remote sensing data.
Nguyen, Nam-Ninh; Srihari, Sriganesh; Leong, Hon Wai; Chong, Ket-Fah
2015-10-01
Determining the entire complement of enzymes and their enzymatic functions is a fundamental step for reconstructing the metabolic network of cells. High quality enzyme annotation helps in enhancing metabolic networks reconstructed from the genome, especially by reducing gaps and increasing the enzyme coverage. Currently, structure-based and network-based approaches can only cover a limited number of enzyme families, and the accuracy of homology-based approaches can be further improved. Bottom-up homology-based approach improves the coverage by rebuilding Hidden Markov Model (HMM) profiles for all known enzymes. However, its clustering procedure relies firmly on BLAST similarity score, ignoring protein domains/patterns, and is sensitive to changes in cut-off thresholds. Here, we use functional domain architecture to score the association between domain families and enzyme families (Domain-Enzyme Association Scoring, DEAS). The DEAS score is used to calculate the similarity between proteins, which is then used in clustering procedure, instead of using sequence similarity score. We improve the enzyme annotation protocol using a stringent classification procedure, and by choosing optimal threshold settings and checking for active sites. Our analysis shows that our stringent protocol EnzDP can cover up to 90% of enzyme families available in Swiss-Prot. It achieves a high accuracy of 94.5% based on five-fold cross-validation. EnzDP outperforms existing methods across several testing scenarios. Thus, EnzDP serves as a reliable automated tool for enzyme annotation and metabolic network reconstruction. Available at: www.comp.nus.edu.sg/~nguyennn/EnzDP .
Robust stochastic optimization for reservoir operation
NASA Astrophysics Data System (ADS)
Pan, Limeng; Housh, Mashor; Liu, Pan; Cai, Ximing; Chen, Xin
2015-01-01
Optimal reservoir operation under uncertainty is a challenging engineering problem. Application of classic stochastic optimization methods to large-scale problems is limited due to computational difficulty. Moreover, classic stochastic methods assume that the estimated distribution function or the sample inflow data accurately represents the true probability distribution, which may be invalid and the performance of the algorithms may be undermined. In this study, we introduce a robust optimization (RO) approach, Iterative Linear Decision Rule (ILDR), so as to provide a tractable approximation for a multiperiod hydropower generation problem. The proposed approach extends the existing LDR method by accommodating nonlinear objective functions. It also provides users with the flexibility of choosing the accuracy of ILDR approximations by assigning a desired number of piecewise linear segments to each uncertainty. The performance of the ILDR is compared with benchmark policies including the sampling stochastic dynamic programming (SSDP) policy derived from historical data. The ILDR solves both the single and multireservoir systems efficiently. The single reservoir case study results show that the RO method is as good as SSDP when implemented on the original historical inflows and it outperforms SSDP policy when tested on generated inflows with the same mean and covariance matrix as those in history. For the multireservoir case study, which considers water supply in addition to power generation, numerical results show that the proposed approach performs as well as in the single reservoir case study in terms of optimal value and distributional robustness.
Fusion of multichannel local and global structural cues for photo aesthetics evaluation.
Luming Zhang; Yue Gao; Zimmermann, Roger; Qi Tian; Xuelong Li
2014-03-01
Photo aesthetic quality evaluation is a fundamental yet under addressed task in computer vision and image processing fields. Conventional approaches are frustrated by the following two drawbacks. First, both the local and global spatial arrangements of image regions play an important role in photo aesthetics. However, existing rules, e.g., visual balance, heuristically define which spatial distribution among the salient regions of a photo is aesthetically pleasing. Second, it is difficult to adjust visual cues from multiple channels automatically in photo aesthetics assessment. To solve these problems, we propose a new photo aesthetics evaluation framework, focusing on learning the image descriptors that characterize local and global structural aesthetics from multiple visual channels. In particular, to describe the spatial structure of the image local regions, we construct graphlets small-sized connected graphs by connecting spatially adjacent atomic regions. Since spatially adjacent graphlets distribute closely in their feature space, we project them onto a manifold and subsequently propose an embedding algorithm. The embedding algorithm encodes the photo global spatial layout into graphlets. Simultaneously, the importance of graphlets from multiple visual channels are dynamically adjusted. Finally, these post-embedding graphlets are integrated for photo aesthetics evaluation using a probabilistic model. Experimental results show that: 1) the visualized graphlets explicitly capture the aesthetically arranged atomic regions; 2) the proposed approach generalizes and improves four prominent aesthetic rules; and 3) our approach significantly outperforms state-of-the-art algorithms in photo aesthetics prediction.
NASA Astrophysics Data System (ADS)
Natraj, V.; Thompson, D. R.; Mathur, A. K.; Babu, K. N.; Kindel, B. C.; Massie, S. T.; Green, R. O.; Bhattacharya, B. K.
2017-12-01
Remote Visible / ShortWave InfraRed (VSWIR) spectroscopy, typified by the Next-Generation Airborne Visible/Infrared Imaging Spectrometer (AVIRIS-NG), is a powerful tool to map the composition, health, and biodiversity of Earth's terrestrial and aquatic ecosystems. These studies must first estimate surface reflectance, removing the atmospheric effects of absorption and scattering by water vapor and aerosols. Since atmospheric state varies spatiotemporally, and is insufficiently constrained by climatological models, it is important to estimate it directly from the VSWIR data. However, water vapor and aerosol estimation is a significant ongoing challenge for existing atmospheric correction models. Conventional VSWIR atmospheric correction methods evolved from multi-band approaches and do not fully utilize the rich spectroscopic data available. We use spectrally resolved (line-by-line) radiative transfer calculations, coupled with optimal estimation theory, to demonstrate improved accuracy of surface retrievals. These spectroscopic techniques are already pervasive in atmospheric remote sounding disciplines but have not yet been applied to imaging spectroscopy. Our analysis employs a variety of scenes from the recent AVIRIS-NG India campaign, which spans various climes, elevation changes, a wide range of biomes and diverse aerosol scenarios. A key aspect of our approach is joint estimation of surface and aerosol parameters, which allows assessment of aerosol distortion effects using spectral shapes across the entire measured interval from 380-2500 nm. We expect that this method would outperform band ratio approaches, and enable evaluation of subtle aerosol parameters where in situ reference data is not available, or for extreme aerosol loadings, as is observed in the India scenarios. The results are validated using existing in-situ reference spectra, reflectance measurements from assigned partners in India, and objective spectral quality metrics for scenes without any ground reference data. We also quantify the true information content of VSWIR spectroscopy for improving retrieval efficiency. We anticipate that our work will significantly improve the state of the art for VSWIR atmospheric correction, reducing regional biases in global ecosystem studies. 2017. All rights reserved.
Slater, Graham J; Pennell, Matthew W
2014-05-01
A central prediction of much theory on adaptive radiations is that traits should evolve rapidly during the early stages of a clade's history and subsequently slowdown in rate as niches become saturated--a so-called "Early Burst." Although a common pattern in the fossil record, evidence for early bursts of trait evolution in phylogenetic comparative data has been equivocal at best. We show here that this may not necessarily be due to the absence of this pattern in nature. Rather, commonly used methods to infer its presence perform poorly when when the strength of the burst--the rate at which phenotypic evolution declines--is small, and when some morphological convergence is present within the clade. We present two modifications to existing comparative methods that allow greater power to detect early bursts in simulated datasets. First, we develop posterior predictive simulation approaches and show that they outperform maximum likelihood approaches at identifying early bursts at moderate strength. Second, we use a robust regression procedure that allows for the identification and down-weighting of convergent taxa, leading to moderate increases in method performance. We demonstrate the utility and power of these approach by investigating the evolution of body size in cetaceans. Model fitting using maximum likelihood is equivocal with regards the mode of cetacean body size evolution. However, posterior predictive simulation combined with a robust node height test return low support for Brownian motion or rate shift models, but not the early burst model. While the jury is still out on whether early bursts are actually common in nature, our approach will hopefully facilitate more robust testing of this hypothesis. We advocate the adoption of similar posterior predictive approaches to improve the fit and to assess the adequacy of macroevolutionary models in general.
Position-aware deep multi-task learning for drug-drug interaction extraction.
Zhou, Deyu; Miao, Lei; He, Yulan
2018-05-01
A drug-drug interaction (DDI) is a situation in which a drug affects the activity of another drug synergistically or antagonistically when being administered together. The information of DDIs is crucial for healthcare professionals to prevent adverse drug events. Although some known DDIs can be found in purposely-built databases such as DrugBank, most information is still buried in scientific publications. Therefore, automatically extracting DDIs from biomedical texts is sorely needed. In this paper, we propose a novel position-aware deep multi-task learning approach for extracting DDIs from biomedical texts. In particular, sentences are represented as a sequence of word embeddings and position embeddings. An attention-based bidirectional long short-term memory (BiLSTM) network is used to encode each sentence. The relative position information of words with the target drugs in text is combined with the hidden states of BiLSTM to generate the position-aware attention weights. Moreover, the tasks of predicting whether or not two drugs interact with each other and further distinguishing the types of interactions are learned jointly in multi-task learning framework. The proposed approach has been evaluated on the DDIExtraction challenge 2013 corpus and the results show that with the position-aware attention only, our proposed approach outperforms the state-of-the-art method by 0.99% for binary DDI classification, and with both position-aware attention and multi-task learning, our approach achieves a micro F-score of 72.99% on interaction type identification, outperforming the state-of-the-art approach by 1.51%, which demonstrates the effectiveness of the proposed approach. Copyright © 2018 Elsevier B.V. All rights reserved.
Effective STEM Programs for Adolescent Girls: Three Approaches and Many Lessons Learned
ERIC Educational Resources Information Center
Mosatche, Harriet S.; Matloff-Nieves, Susan; Kekelis, Linda; Lawner, Elizabeth K.
2013-01-01
While women's participation in math and physical science continues to lag to some degree behind that of men, the disparity is much greater in engineering and computer science. Though boys may outperform girls at the highest levels on math and science standardized tests, girls tend to get better course grades in math and science than boys do.…
Invariant 2D object recognition using the wavelet transform and structured neural networks
NASA Astrophysics Data System (ADS)
Khalil, Mahmoud I.; Bayoumi, Mohamed M.
1999-03-01
This paper applies the dyadic wavelet transform and the structured neural networks approach to recognize 2D objects under translation, rotation, and scale transformation. Experimental results are presented and compared with traditional methods. The experimental results showed that this refined technique successfully classified the objects and outperformed some traditional methods especially in the presence of noise.
An Exact Integration Scheme for Radiative Cooling in Hydrodynamical Simulations
NASA Astrophysics Data System (ADS)
Townsend, R. H. D.
2009-04-01
A new scheme for incorporating radiative cooling in hydrodynamical codes is presented, centered around exact integration of the governing semidiscrete cooling equation. Using benchmark calculations based on the cooling downstream of a radiative shock, I demonstrate that the new scheme outperforms traditional explicit and implicit approaches in terms of accuracy, while remaining competitive in terms of execution speed.
Fringe pattern demodulation with a two-dimensional digital phase-locked loop algorithm.
Gdeisat, Munther A; Burton, David R; Lalor, Michael J
2002-09-10
A novel technique called a two-dimensional digital phase-locked loop (DPLL) for fringe pattern demodulation is presented. This algorithm is more suitable for demodulation of fringe patterns with varying phase in two directions than the existing DPLL techniques that assume that the phase of the fringe patterns varies only in one direction. The two-dimensional DPLL technique assumes that the phase of a fringe pattern is continuous in both directions and takes advantage of the phase continuity; consequently, the algorithm has better noise performance than the existing DPLL schemes. The two-dimensional DPLL algorithm is also suitable for demodulation of fringe patterns with low sampling rates, and it outperforms the Fourier fringe analysis technique in this aspect.
FIND: difFerential chromatin INteractions Detection using a spatial Poisson process
Chen, Yang; Zhang, Michael Q.
2018-01-01
Polymer-based simulations and experimental studies indicate the existence of a spatial dependency between the adjacent DNA fibers involved in the formation of chromatin loops. However, the existing strategies for detecting differential chromatin interactions assume that the interacting segments are spatially independent from the other segments nearby. To resolve this issue, we developed a new computational method, FIND, which considers the local spatial dependency between interacting loci. FIND uses a spatial Poisson process to detect differential chromatin interactions that show a significant difference in their interaction frequency and the interaction frequency of their neighbors. Simulation and biological data analysis show that FIND outperforms the widely used count-based methods and has a better signal-to-noise ratio. PMID:29440282
A hybrid frame concealment algorithm for H.264/AVC.
Yan, Bo; Gharavi, Hamid
2010-01-01
In packet-based video transmissions, packets loss due to channel errors may result in the loss of the whole video frame. Recently, many error concealment algorithms have been proposed in order to combat channel errors; however, most of the existing algorithms can only deal with the loss of macroblocks and are not able to conceal the whole missing frame. In order to resolve this problem, in this paper, we have proposed a new hybrid motion vector extrapolation (HMVE) algorithm to recover the whole missing frame, and it is able to provide more accurate estimation for the motion vectors of the missing frame than other conventional methods. Simulation results show that it is highly effective and significantly outperforms other existing frame recovery methods.
A Review of Depth and Normal Fusion Algorithms
Štolc, Svorad; Pock, Thomas
2018-01-01
Geometric surface information such as depth maps and surface normals can be acquired by various methods such as stereo light fields, shape from shading and photometric stereo techniques. We compare several algorithms which deal with the combination of depth with surface normal information in order to reconstruct a refined depth map. The reasons for performance differences are examined from the perspective of alternative formulations of surface normals for depth reconstruction. We review and analyze methods in a systematic way. Based on our findings, we introduce a new generalized fusion method, which is formulated as a least squares problem and outperforms previous methods in the depth error domain by introducing a novel normal weighting that performs closer to the geodesic distance measure. Furthermore, a novel method is introduced based on Total Generalized Variation (TGV) which further outperforms previous approaches in terms of the geodesic normal distance error and maintains comparable quality in the depth error domain. PMID:29389903
Döring, Matthias; Borrego, Pedro; Büch, Joachim; Martins, Andreia; Friedrich, Georg; Camacho, Ricardo Jorge; Eberle, Josef; Kaiser, Rolf; Lengauer, Thomas; Taveira, Nuno; Pfeifer, Nico
2016-12-20
CCR5-coreceptor antagonists can be used for treating HIV-2 infected individuals. Before initiating treatment with coreceptor antagonists, viral coreceptor usage should be determined to ensure that the virus can use only the CCR5 coreceptor (R5) and cannot evade the drug by using the CXCR4 coreceptor (X4-capable). However, until now, no online tool for the genotypic identification of HIV-2 coreceptor usage had been available. Furthermore, there is a lack of knowledge on the determinants of HIV-2 coreceptor usage. Therefore, we developed a data-driven web service for the prediction of HIV-2 coreceptor usage from the V3 loop of the HIV-2 glycoprotein and used the tool to identify novel discriminatory features of X4-capable variants. Using 10 runs of tenfold cross validation, we selected a linear support vector machine (SVM) as the model for geno2pheno[coreceptor-hiv2], because it outperformed the other SVMs with an area under the ROC curve (AUC) of 0.95. We found that SVMs were highly accurate in identifying HIV-2 coreceptor usage, attaining sensitivities of 73.5% and specificities of 96% during tenfold nested cross validation. The predictive performance of SVMs was not significantly different (p value 0.37) from an existing rules-based approach. Moreover, geno2pheno[coreceptor-hiv2] achieved a predictive accuracy of 100% and outperformed the existing approach on an independent data set containing nine new isolates with corresponding phenotypic measurements of coreceptor usage. geno2pheno[coreceptor-hiv2] could not only reproduce the established markers of CXCR4-usage, but also revealed novel markers: the substitutions 27K, 15G, and 8S were significantly predictive of CXCR4 usage. Furthermore, SVMs trained on the amino-acid sequences of the V1 and V2 loops were also quite accurate in predicting coreceptor usage (AUCs of 0.84 and 0.65, respectively). In this study, we developed geno2pheno[coreceptor-hiv2], the first online tool for the prediction of HIV-2 coreceptor usage from the V3 loop. Using our method, we identified novel amino-acid markers of X4-capable variants in the V3 loop and found that HIV-2 coreceptor usage is also influenced by the V1/V2 region. The tool can aid clinicians in deciding whether coreceptor antagonists such as maraviroc are a treatment option and enables epidemiological studies investigating HIV-2 coreceptor usage. geno2pheno[coreceptor-hiv2] is freely available at http://coreceptor-hiv2.geno2pheno.org .
Automatic bladder segmentation from CT images using deep CNN and 3D fully connected CRF-RNN.
Xu, Xuanang; Zhou, Fugen; Liu, Bo
2018-03-19
Automatic approach for bladder segmentation from computed tomography (CT) images is highly desirable in clinical practice. It is a challenging task since the bladder usually suffers large variations of appearance and low soft-tissue contrast in CT images. In this study, we present a deep learning-based approach which involves a convolutional neural network (CNN) and a 3D fully connected conditional random fields recurrent neural network (CRF-RNN) to perform accurate bladder segmentation. We also propose a novel preprocessing method, called dual-channel preprocessing, to further advance the segmentation performance of our approach. The presented approach works as following: first, we apply our proposed preprocessing method on the input CT image and obtain a dual-channel image which consists of the CT image and an enhanced bladder density map. Second, we exploit a CNN to predict a coarse voxel-wise bladder score map on this dual-channel image. Finally, a 3D fully connected CRF-RNN refines the coarse bladder score map and produce final fine-localized segmentation result. We compare our approach to the state-of-the-art V-net on a clinical dataset. Results show that our approach achieves superior segmentation accuracy, outperforming the V-net by a significant margin. The Dice Similarity Coefficient of our approach (92.24%) is 8.12% higher than that of the V-net. Moreover, the bladder probability maps performed by our approach present sharper boundaries and more accurate localizations compared with that of the V-net. Our approach achieves higher segmentation accuracy than the state-of-the-art method on clinical data. Both the dual-channel processing and the 3D fully connected CRF-RNN contribute to this improvement. The united deep network composed of the CNN and 3D CRF-RNN also outperforms a system where the CRF model acts as a post-processing method disconnected from the CNN.
Rotation-invariant image and video description with local binary pattern features.
Zhao, Guoying; Ahonen, Timo; Matas, Jiří; Pietikäinen, Matti
2012-04-01
In this paper, we propose a novel approach to compute rotation-invariant features from histograms of local noninvariant patterns. We apply this approach to both static and dynamic local binary pattern (LBP) descriptors. For static-texture description, we present LBP histogram Fourier (LBP-HF) features, and for dynamic-texture recognition, we present two rotation-invariant descriptors computed from the LBPs from three orthogonal planes (LBP-TOP) features in the spatiotemporal domain. LBP-HF is a novel rotation-invariant image descriptor computed from discrete Fourier transforms of LBP histograms. The approach can be also generalized to embed any uniform features into this framework, and combining the supplementary information, e.g., sign and magnitude components of the LBP, together can improve the description ability. Moreover, two variants of rotation-invariant descriptors are proposed to the LBP-TOP, which is an effective descriptor for dynamic-texture recognition, as shown by its recent success in different application problems, but it is not rotation invariant. In the experiments, it is shown that the LBP-HF and its extensions outperform noninvariant and earlier versions of the rotation-invariant LBP in the rotation-invariant texture classification. In experiments on two dynamic-texture databases with rotations or view variations, the proposed video features can effectively deal with rotation variations of dynamic textures (DTs). They also are robust with respect to changes in viewpoint, outperforming recent methods proposed for view-invariant recognition of DTs.
Tkachenko, Pavlo; Kriukova, Galyna; Aleksandrova, Marharyta; Chertov, Oleg; Renard, Eric; Pereverzyev, Sergei V
2016-10-01
Nocturnal hypoglycemia (NH) is common in patients with insulin-treated diabetes. Despite the risk associated with NH, there are only a few methods aiming at the prediction of such events based on intermittent blood glucose monitoring data and none has been validated for clinical use. Here we propose a method of combining several predictors into a new one that will perform at the level of the best involved one, or even outperform all individual candidates. The idea of the method is to use a recently developed strategy for aggregating ranking algorithms. The method has been calibrated and tested on data extracted from clinical trials, performed in the European FP7-funded project DIAdvisor. Then we have tested the proposed approach on other datasets to show the portability of the method. This feature of the method allows its simple implementation in the form of a diabetic smartphone app. On the considered datasets the proposed approach exhibits good performance in terms of sensitivity, specificity and predictive values. Moreover, the resulting predictor automatically performs at the level of the best involved method or even outperforms it. We propose a strategy for a combination of NH predictors that leads to a method exhibiting a reliable performance and the potential for everyday use by any patient who performs self-monitoring of blood glucose. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Joint Denoising/Compression of Image Contours via Shape Prior and Context Tree
NASA Astrophysics Data System (ADS)
Zheng, Amin; Cheung, Gene; Florencio, Dinei
2018-07-01
With the advent of depth sensing technologies, the extraction of object contours in images---a common and important pre-processing step for later higher-level computer vision tasks like object detection and human action recognition---has become easier. However, acquisition noise in captured depth images means that detected contours suffer from unavoidable errors. In this paper, we propose to jointly denoise and compress detected contours in an image for bandwidth-constrained transmission to a client, who can then carry out aforementioned application-specific tasks using the decoded contours as input. We first prove theoretically that in general a joint denoising / compression approach can outperform a separate two-stage approach that first denoises then encodes contours lossily. Adopting a joint approach, we first propose a burst error model that models typical errors encountered in an observed string y of directional edges. We then formulate a rate-constrained maximum a posteriori (MAP) problem that trades off the posterior probability p(x'|y) of an estimated string x' given y with its code rate R(x'). We design a dynamic programming (DP) algorithm that solves the posed problem optimally, and propose a compact context representation called total suffix tree (TST) that can reduce complexity of the algorithm dramatically. Experimental results show that our joint denoising / compression scheme outperformed a competing separate scheme in rate-distortion performance noticeably.
Predicting missing links via correlation between nodes
NASA Astrophysics Data System (ADS)
Liao, Hao; Zeng, An; Zhang, Yi-Cheng
2015-10-01
As a fundamental problem in many different fields, link prediction aims to estimate the likelihood of an existing link between two nodes based on the observed information. Since this problem is related to many applications ranging from uncovering missing data to predicting the evolution of networks, link prediction has been intensively investigated recently and many methods have been proposed so far. The essential challenge of link prediction is to estimate the similarity between nodes. Most of the existing methods are based on the common neighbor index and its variants. In this paper, we propose to calculate the similarity between nodes by the Pearson correlation coefficient. This method is found to be very effective when applied to calculate similarity based on high order paths. We finally fuse the correlation-based method with the resource allocation method, and find that the combined method can substantially outperform the existing methods, especially in sparse networks.
Comparison promotes learning and transfer of relational categories.
Kurtz, Kenneth J; Boukrina, Olga; Gentner, Dedre
2013-07-01
We investigated the effect of co-presenting training items during supervised classification learning of novel relational categories. Strong evidence exists that comparison induces a structural alignment process that renders common relational structure more salient. We hypothesized that comparisons between exemplars would facilitate learning and transfer of categories that cohere around a common relational property. The effect of comparison was investigated using learning trials that elicited a separate classification response for each item in presentation pairs that could be drawn from the same or different categories. This methodology ensures consideration of both items and invites comparison through an implicit same-different judgment inherent in making the two responses. In a test phase measuring learning and transfer, the comparison group significantly outperformed a control group receiving an equivalent training session of single-item classification learning. Comparison-based learners also outperformed the control group on a test of far transfer, that is, the ability to accurately classify items from a novel domain that was relationally alike, but surface-dissimilar, to the training materials. Theoretical and applied implications of this comparison advantage are discussed. PsycINFO Database Record (c) 2013 APA, all rights reserved.
An Identity-Based Anti-Quantum Privacy-Preserving Blind Authentication in Wireless Sensor Networks.
Zhu, Hongfei; Tan, Yu-An; Zhu, Liehuang; Wang, Xianmin; Zhang, Quanxin; Li, Yuanzhang
2018-05-22
With the development of wireless sensor networks, IoT devices are crucial for the Smart City; these devices change people's lives such as e-payment and e-voting systems. However, in these two systems, the state-of-art authentication protocols based on traditional number theory cannot defeat a quantum computer attack. In order to protect user privacy and guarantee trustworthy of big data, we propose a new identity-based blind signature scheme based on number theorem research unit lattice, this scheme mainly uses a rejection sampling theorem instead of constructing a trapdoor. Meanwhile, this scheme does not depend on complex public key infrastructure and can resist quantum computer attack. Then we design an e-payment protocol using the proposed scheme. Furthermore, we prove our scheme is secure in the random oracle, and satisfies confidentiality, integrity, and non-repudiation. Finally, we demonstrate that the proposed scheme outperforms the other traditional existing identity-based blind signature schemes in signing speed and verification speed, outperforms the other lattice-based blind signature in signing speed, verification speed, and signing secret key size.
Wang, Shunfang; Liu, Shuhui
2015-12-19
An effective representation of a protein sequence plays a crucial role in protein sub-nuclear localization. The existing representations, such as dipeptide composition (DipC), pseudo-amino acid composition (PseAAC) and position specific scoring matrix (PSSM), are insufficient to represent protein sequence due to their single perspectives. Thus, this paper proposes two fusion feature representations of DipPSSM and PseAAPSSM to integrate PSSM with DipC and PseAAC, respectively. When constructing each fusion representation, we introduce the balance factors to value the importance of its components. The optimal values of the balance factors are sought by genetic algorithm. Due to the high dimensionality of the proposed representations, linear discriminant analysis (LDA) is used to find its important low dimensional structure, which is essential for classification and location prediction. The numerical experiments on two public datasets with KNN classifier and cross-validation tests showed that in terms of the common indexes of sensitivity, specificity, accuracy and MCC, the proposed fusing representations outperform the traditional representations in protein sub-nuclear localization, and the representation treated by LDA outperforms the untreated one.
An Identity-Based Anti-Quantum Privacy-Preserving Blind Authentication in Wireless Sensor Networks
Zhu, Hongfei; Tan, Yu-an; Zhu, Liehuang; Wang, Xianmin; Zhang, Quanxin; Li, Yuanzhang
2018-01-01
With the development of wireless sensor networks, IoT devices are crucial for the Smart City; these devices change people’s lives such as e-payment and e-voting systems. However, in these two systems, the state-of-art authentication protocols based on traditional number theory cannot defeat a quantum computer attack. In order to protect user privacy and guarantee trustworthy of big data, we propose a new identity-based blind signature scheme based on number theorem research unit lattice, this scheme mainly uses a rejection sampling theorem instead of constructing a trapdoor. Meanwhile, this scheme does not depend on complex public key infrastructure and can resist quantum computer attack. Then we design an e-payment protocol using the proposed scheme. Furthermore, we prove our scheme is secure in the random oracle, and satisfies confidentiality, integrity, and non-repudiation. Finally, we demonstrate that the proposed scheme outperforms the other traditional existing identity-based blind signature schemes in signing speed and verification speed, outperforms the other lattice-based blind signature in signing speed, verification speed, and signing secret key size. PMID:29789475
The Effect of Prior Knowledge and Gender on Physics Achievement
NASA Astrophysics Data System (ADS)
Stewart, John; Henderson, Rachel
2017-01-01
Gender differences on the Conceptual Survey in Electricity and Magnetism (CSEM) have been extensively studied. Ten semesters (N=1621) of CSEM data is presented showing male students outperform female students on the CSEM posttest by 5 % (p < . 001). Male students also outperform female students on qualitative in-semester test questions by 3 % (p = . 004), but no significant difference between male and female students was found on quantitative test questions. Male students enter the class with superior prior preparation in the subject and score 4 % higher on the CSEM pretest (p < . 001). If the sample is restricted to students with little prior knowledge who answer no more than 8 of the 32 questions correctly (N=822), male and female differences on the CSEM and qualitative test questions cease to be significant. This suggests no intrinsic gender bias exists in the CSEM itself and that gender differences are the result of prior preparation measured by CSEM pretest score. Gender differences between male and female students increase with pretest score. Regression analyses are presented to further explore interactions between preparation, gender, and achievement.
Wang, Shunfang; Liu, Shuhui
2015-01-01
An effective representation of a protein sequence plays a crucial role in protein sub-nuclear localization. The existing representations, such as dipeptide composition (DipC), pseudo-amino acid composition (PseAAC) and position specific scoring matrix (PSSM), are insufficient to represent protein sequence due to their single perspectives. Thus, this paper proposes two fusion feature representations of DipPSSM and PseAAPSSM to integrate PSSM with DipC and PseAAC, respectively. When constructing each fusion representation, we introduce the balance factors to value the importance of its components. The optimal values of the balance factors are sought by genetic algorithm. Due to the high dimensionality of the proposed representations, linear discriminant analysis (LDA) is used to find its important low dimensional structure, which is essential for classification and location prediction. The numerical experiments on two public datasets with KNN classifier and cross-validation tests showed that in terms of the common indexes of sensitivity, specificity, accuracy and MCC, the proposed fusing representations outperform the traditional representations in protein sub-nuclear localization, and the representation treated by LDA outperforms the untreated one. PMID:26703574
A Pressure Control Method for Emulsion Pump Station Based on Elman Neural Network
Tan, Chao; Qi, Nan; Yao, Xingang; Wang, Zhongbin; Si, Lei
2015-01-01
In order to realize pressure control of emulsion pump station which is key equipment of coal mine in the safety production, the control requirements were analyzed and a pressure control method based on Elman neural network was proposed. The key techniques such as system framework, pressure prediction model, pressure control model, and the flowchart of proposed approach were presented. Finally, a simulation example was carried out and comparison results indicated that the proposed approach was feasible and efficient and outperformed others. PMID:25861253
NASA Astrophysics Data System (ADS)
Macian-Sorribes, Hector; Pulido-Velazquez, Manuel; Tilmant, Amaury
2015-04-01
Stochastic programming methods are better suited to deal with the inherent uncertainty of inflow time series in water resource management. However, one of the most important hurdles in their use in practical implementations is the lack of generalized Decision Support System (DSS) shells, usually based on a deterministic approach. The purpose of this contribution is to present a general-purpose DSS shell, named Explicit Stochastic Programming Advanced Tool (ESPAT), able to build and solve stochastic programming problems for most water resource systems. It implements a hydro-economic approach, optimizing the total system benefits as the sum of the benefits obtained by each user. It has been coded using GAMS, and implements a Microsoft Excel interface with a GAMS-Excel link that allows the user to introduce the required data and recover the results. Therefore, no GAMS skills are required to run the program. The tool is divided into four modules according to its capabilities: 1) the ESPATR module, which performs stochastic optimization procedures in surface water systems using a Stochastic Dual Dynamic Programming (SDDP) approach; 2) the ESPAT_RA module, which optimizes coupled surface-groundwater systems using a modified SDDP approach; 3) the ESPAT_SDP module, capable of performing stochastic optimization procedures in small-size surface systems using a standard SDP approach; and 4) the ESPAT_DET module, which implements a deterministic programming procedure using non-linear programming, able to solve deterministic optimization problems in complex surface-groundwater river basins. The case study of the Mijares river basin (Spain) is used to illustrate the method. It consists in two reservoirs in series, one aquifer and four agricultural demand sites currently managed using historical (XIV century) rights, which give priority to the most traditional irrigation district over the XX century agricultural developments. Its size makes it possible to use either the SDP or the SDDP methods. The independent use of surface and groundwater can be examined with and without the aquifer. The ESPAT_DET, ESPATR and ESPAT_SDP modules were executed for the surface system, while the ESPAT_RA and the ESPAT_DET modules were run for the surface-groundwater system. The surface system's results show a similar performance between the ESPAT_SDP and ESPATR modules, with outperform the one showed by the current policies besides being outperformed by the ESPAT_DET results, which have the advantage of the perfect foresight. The surface-groundwater system's results show a robust situation in which the differences between the module's results and the current policies are lower due the use of pumped groundwater in the XX century crops when surface water is scarce. The results are realistic, with the deterministic optimization outperforming the stochastic one, which at the same time outperforms the current policies; showing that the tool is able to stochastically optimize river-aquifer water resources systems. We are currently working in the application of these tools in the analysis of changes in systems' operation under global change conditions. ACKNOWLEDGEMENT: This study has been partially supported by the IMPADAPT project (CGL2013-48424-C2-1-R) with Spanish MINECO (Ministerio de Economía y Competitividad) funds.
Identifying relevant data for a biological database: handcrafted rules versus machine learning.
Sehgal, Aditya Kumar; Das, Sanmay; Noto, Keith; Saier, Milton H; Elkan, Charles
2011-01-01
With well over 1,000 specialized biological databases in use today, the task of automatically identifying novel, relevant data for such databases is increasingly important. In this paper, we describe practical machine learning approaches for identifying MEDLINE documents and Swiss-Prot/TrEMBL protein records, for incorporation into a specialized biological database of transport proteins named TCDB. We show that both learning approaches outperform rules created by hand by a human expert. As one of the first case studies involving two different approaches to updating a deployed database, both the methods compared and the results will be of interest to curators of many specialized databases.
Learning polynomial feedforward neural networks by genetic programming and backpropagation.
Nikolaev, N Y; Iba, H
2003-01-01
This paper presents an approach to learning polynomial feedforward neural networks (PFNNs). The approach suggests, first, finding the polynomial network structure by means of a population-based search technique relying on the genetic programming paradigm, and second, further adjustment of the best discovered network weights by an especially derived backpropagation algorithm for higher order networks with polynomial activation functions. These two stages of the PFNN learning process enable us to identify networks with good training as well as generalization performance. Empirical results show that this approach finds PFNN which outperform considerably some previous constructive polynomial network algorithms on processing benchmark time series.
A time series modeling approach in risk appraisal of violent and sexual recidivism.
Bani-Yaghoub, Majid; Fedoroff, J Paul; Curry, Susan; Amundsen, David E
2010-10-01
For over half a century, various clinical and actuarial methods have been employed to assess the likelihood of violent recidivism. Yet there is a need for new methods that can improve the accuracy of recidivism predictions. This study proposes a new time series modeling approach that generates high levels of predictive accuracy over short and long periods of time. The proposed approach outperformed two widely used actuarial instruments (i.e., the Violence Risk Appraisal Guide and the Sex Offender Risk Appraisal Guide). Furthermore, analysis of temporal risk variations based on specific time series models can add valuable information into risk assessment and management of violent offenders.
Skinnider, Michael A; Dejong, Chris A; Franczak, Brian C; McNicholas, Paul D; Magarvey, Nathan A
2017-08-16
Natural products represent a prominent source of pharmaceutically and industrially important agents. Calculating the chemical similarity of two molecules is a central task in cheminformatics, with applications at multiple stages of the drug discovery pipeline. Quantifying the similarity of natural products is a particularly important problem, as the biological activities of these molecules have been extensively optimized by natural selection. The large and structurally complex scaffolds of natural products distinguish their physical and chemical properties from those of synthetic compounds. However, no analysis of the performance of existing methods for molecular similarity calculation specific to natural products has been reported to date. Here, we present LEMONS, an algorithm for the enumeration of hypothetical modular natural product structures. We leverage this algorithm to conduct a comparative analysis of molecular similarity methods within the unique chemical space occupied by modular natural products using controlled synthetic data, and comprehensively investigate the impact of diverse biosynthetic parameters on similarity search. We additionally investigate a recently described algorithm for natural product retrobiosynthesis and alignment, and find that when rule-based retrobiosynthesis can be applied, this approach outperforms conventional two-dimensional fingerprints, suggesting it may represent a valuable approach for the targeted exploration of natural product chemical space and microbial genome mining. Our open-source algorithm is an extensible method of enumerating hypothetical natural product structures with diverse potential applications in bioinformatics.
Detecting independent and recurrent copy number aberrations using interval graphs.
Wu, Hsin-Ta; Hajirasouliha, Iman; Raphael, Benjamin J
2014-06-15
Somatic copy number aberrations SCNAS: are frequent in cancer genomes, but many of these are random, passenger events. A common strategy to distinguish functional aberrations from passengers is to identify those aberrations that are recurrent across multiple samples. However, the extensive variability in the length and position of SCNA: s makes the problem of identifying recurrent aberrations notoriously difficult. We introduce a combinatorial approach to the problem of identifying independent and recurrent SCNA: s, focusing on the key challenging of separating the overlaps in aberrations across individuals into independent events. We derive independent and recurrent SCNA: s as maximal cliques in an interval graph constructed from overlaps between aberrations. We efficiently enumerate all such cliques, and derive a dynamic programming algorithm to find an optimal selection of non-overlapping cliques, resulting in a very fast algorithm, which we call RAIG (Recurrent Aberrations from Interval Graphs). We show that RAIG outperforms other methods on simulated data and also performs well on data from three cancer types from The Cancer Genome Atlas (TCGA). In contrast to existing approaches that employ various heuristics to select independent aberrations, RAIG optimizes a well-defined objective function. We show that this allows RAIG to identify rare aberrations that are likely functional, but are obscured by overlaps with larger passenger aberrations. http://compbio.cs.brown.edu/software. © The Author 2014. Published by Oxford University Press.
Mobility based multicast routing in wireless mesh networks
NASA Astrophysics Data System (ADS)
Jain, Sanjeev; Tripathi, Vijay S.; Tiwari, Sudarshan
2013-01-01
There exist two fundamental approaches to multicast routing namely minimum cost trees and shortest path trees. The (MCT's) minimum cost tree is one which connects receiver and sources by providing a minimum number of transmissions (MNTs) the MNTs approach is generally used for energy constraint sensor and mobile ad hoc networks. In this paper we have considered node mobility and try to find out simulation based comparison of the (SPT's) shortest path tree, (MST's) minimum steiner trees and minimum number of transmission trees in wireless mesh networks by using the performance metrics like as an end to end delay, average jitter, throughput and packet delivery ratio, average unicast packet delivery ratio, etc. We have also evaluated multicast performance in the small and large wireless mesh networks. In case of multicast performance in the small networks we have found that when the traffic load is moderate or high the SPTs outperform the MSTs and MNTs in all cases. The SPTs have lowest end to end delay and average jitter in almost all cases. In case of multicast performance in the large network we have seen that the MSTs provide minimum total edge cost and minimum number of transmissions. We have also found that the one drawback of SPTs, when the group size is large and rate of multicast sending is high SPTs causes more packet losses to other flows as MCTs.
Shahid, Mohammad; Shahzad Cheema, Muhammad; Klenner, Alexander; Younesi, Erfan; Hofmann-Apitius, Martin
2013-03-01
Systems pharmacological modeling of drug mode of action for the next generation of multitarget drugs may open new routes for drug design and discovery. Computational methods are widely used in this context amongst which support vector machines (SVM) have proven successful in addressing the challenge of classifying drugs with similar features. We have applied a variety of such SVM-based approaches, namely SVM-based recursive feature elimination (SVM-RFE). We use the approach to predict the pharmacological properties of drugs widely used against complex neurodegenerative disorders (NDD) and to build an in-silico computational model for the binary classification of NDD drugs from other drugs. Application of an SVM-RFE model to a set of drugs successfully classified NDD drugs from non-NDD drugs and resulted in overall accuracy of ∼80 % with 10 fold cross validation using 40 top ranked molecular descriptors selected out of total 314 descriptors. Moreover, SVM-RFE method outperformed linear discriminant analysis (LDA) based feature selection and classification. The model reduced the multidimensional descriptors space of drugs dramatically and predicted NDD drugs with high accuracy, while avoiding over fitting. Based on these results, NDD-specific focused libraries of drug-like compounds can be designed and existing NDD-specific drugs can be characterized by a well-characterized set of molecular descriptors. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Li, Xinpeng; Li, Hong; Liu, Yun; Xiong, Wei; Fang, Sheng
2018-03-05
The release rate of atmospheric radionuclide emissions is a critical factor in the emergency response to nuclear accidents. However, there are unavoidable biases in radionuclide transport models, leading to inaccurate estimates. In this study, a method that simultaneously corrects these biases and estimates the release rate is developed. Our approach provides a more complete measurement-by-measurement correction of the biases with a coefficient matrix that considers both deterministic and stochastic deviations. This matrix and the release rate are jointly solved by the alternating minimization algorithm. The proposed method is generic because it does not rely on specific features of transport models or scenarios. It is validated against wind tunnel experiments that simulate accidental releases in a heterogonous and densely built nuclear power plant site. The sensitivities to the position, number, and quality of measurements and extendibility of the method are also investigated. The results demonstrate that this method effectively corrects the model biases, and therefore outperforms Tikhonov's method in both release rate estimation and model prediction. The proposed approach is robust to uncertainties and extendible with various center estimators, thus providing a flexible framework for robust source inversion in real accidents, even if large uncertainties exist in multiple factors. Copyright © 2017 Elsevier B.V. All rights reserved.
Haldar, Justin P.; Leahy, Richard M.
2013-01-01
This paper presents a novel family of linear transforms that can be applied to data collected from the surface of a 2-sphere in three-dimensional Fourier space. This family of transforms generalizes the previously-proposed Funk-Radon Transform (FRT), which was originally developed for estimating the orientations of white matter fibers in the central nervous system from diffusion magnetic resonance imaging data. The new family of transforms is characterized theoretically, and efficient numerical implementations of the transforms are presented for the case when the measured data is represented in a basis of spherical harmonics. After these general discussions, attention is focused on a particular new transform from this family that we name the Funk-Radon and Cosine Transform (FRACT). Based on theoretical arguments, it is expected that FRACT-based analysis should yield significantly better orientation information (e.g., improved accuracy and higher angular resolution) than FRT-based analysis, while maintaining the strong characterizability and computational efficiency of the FRT. Simulations are used to confirm these theoretical characteristics, and the practical significance of the proposed approach is illustrated with real diffusion weighted MRI brain data. These experiments demonstrate that, in addition to having strong theoretical characteristics, the proposed approach can outperform existing state-of-the-art orientation estimation methods with respect to measures such as angular resolution and robustness to noise and modeling errors. PMID:23353603
Integrating linear optimization with structural modeling to increase HIV neutralization breadth.
Sevy, Alexander M; Panda, Swetasudha; Crowe, James E; Meiler, Jens; Vorobeychik, Yevgeniy
2018-02-01
Computational protein design has been successful in modeling fixed backbone proteins in a single conformation. However, when modeling large ensembles of flexible proteins, current methods in protein design have been insufficient. Large barriers in the energy landscape are difficult to traverse while redesigning a protein sequence, and as a result current design methods only sample a fraction of available sequence space. We propose a new computational approach that combines traditional structure-based modeling using the Rosetta software suite with machine learning and integer linear programming to overcome limitations in the Rosetta sampling methods. We demonstrate the effectiveness of this method, which we call BROAD, by benchmarking the performance on increasing predicted breadth of anti-HIV antibodies. We use this novel method to increase predicted breadth of naturally-occurring antibody VRC23 against a panel of 180 divergent HIV viral strains and achieve 100% predicted binding against the panel. In addition, we compare the performance of this method to state-of-the-art multistate design in Rosetta and show that we can outperform the existing method significantly. We further demonstrate that sequences recovered by this method recover known binding motifs of broadly neutralizing anti-HIV antibodies. Finally, our approach is general and can be extended easily to other protein systems. Although our modeled antibodies were not tested in vitro, we predict that these variants would have greatly increased breadth compared to the wild-type antibody.
Li, Yehai; Wang, Kai
2018-01-01
Self-sensing capability of composite materials has been the core of intensive research over the years and particularly boosted up by the recent quantum leap in nanotechnology. The capacity of most existing self-sensing approaches is restricted to static strains or low-frequency structural vibration. In this study, a new breed of functionalized epoxy-based composites is developed and fabricated, with a graphene nanoparticle-enriched, dispersed sensing network, whereby to self-perceive broadband elastic disturbance from static strains, through low-frequency vibration to guided waves in an ultrasonic regime. Owing to the dispersed and networked sensing capability, signals can be captured at any desired part of the composites. Experimental validation has demonstrated that the functionalized composites can self-sense strains, outperforming conventional metal foil strain sensors with a significantly enhanced gauge factor and a much broader response bandwidth. Precise and fast self-response of the composites to broadband ultrasonic signals (up to 440 kHz) has revealed that the composite structure itself can serve as ultrasound sensors, comparable to piezoceramic sensors in performance, whereas avoiding the use of bulky cables and wires as used in a piezoceramic sensor network. This study has spotlighted promising potentials of the developed approach to functionalize conventional composites with a self-sensing capability of high-sensitivity yet minimized intrusion to original structures. PMID:29724032
Multispectra CWT-based algorithm (MCWT) in mass spectra for peak extraction.
Hsueh, Huey-Miin; Kuo, Hsun-Chih; Tsai, Chen-An
2008-01-01
An important objective in mass spectrometry (MS) is to identify a set of biomarkers that can be used to potentially distinguish patients between distinct treatments (or conditions) from tens or hundreds of spectra. A common two-step approach involving peak extraction and quantification is employed to identify the features of scientific interest. The selected features are then used for further investigation to understand underlying biological mechanism of individual protein or for development of genomic biomarkers to early diagnosis. However, the use of inadequate or ineffective peak detection and peak alignment algorithms in peak extraction step may lead to a high rate of false positives. Also, it is crucial to reduce the false positive rate in detecting biomarkers from ten or hundreds of spectra. Here a new procedure is introduced for feature extraction in mass spectrometry data that extends the continuous wavelet transform-based (CWT-based) algorithm to multiple spectra. The proposed multispectra CWT-based algorithm (MCWT) not only can perform peak detection for multiple spectra but also carry out peak alignment at the same time. The author' MCWT algorithm constructs a reference, which integrates information of multiple raw spectra, for feature extraction. The algorithm is applied to a SELDI-TOF mass spectra data set provided by CAMDA 2006 with known polypeptide m/z positions. This new approach is easy to implement and it outperforms the existing peak extraction method from the Bioconductor PROcess package.
The Development of Methodology to Support Comprehensive Approach: TMC
2014-05-02
complex situations (Lizotte, Bernier, Mokhtari , & Boivin, 2012). This project was part of the DRDC forward-looking Technology Investment Fund (TIF...outperformed participants in the control group (Lizotte, Bernier, Mokhtari , & Boivin, 2012. The current follow on effort originated from needs...Canadian Forces operations. Canadian Military Journal, 9, 11‐20. [7] Lizotte M., Bernier F., Mokhtari M., Boivin É. (2012). IMAGE Final Report: An
ERIC Educational Resources Information Center
Shadiev, Rustam; Wu, Ting-Ting; Huang, Yueh-Min
2017-01-01
In this study, we provide STR-texts to non-native English speaking students during English lectures to facilitate learning, attention, and meditation. We carry out an experiment to test the feasibility of our approach. Our results show that the participants in the experimental group both outperform those in the control group on the post-tests and…
Dimitriadis, Stavros I; Marimpis, Avraam D
2018-01-01
A brain-computer interface (BCI) is a channel of communication that transforms brain activity into specific commands for manipulating a personal computer or other home or electrical devices. In other words, a BCI is an alternative way of interacting with the environment by using brain activity instead of muscles and nerves. For that reason, BCI systems are of high clinical value for targeted populations suffering from neurological disorders. In this paper, we present a new processing approach in three publicly available BCI data sets: (a) a well-known multi-class ( N = 6) coded-modulated Visual Evoked potential (c-VEP)-based BCI system for able-bodied and disabled subjects; (b) a multi-class ( N = 32) c-VEP with slow and fast stimulus representation; and (c) a steady-state Visual Evoked potential (SSVEP) multi-class ( N = 5) flickering BCI system. Estimating cross-frequency coupling (CFC) and namely δ-θ [δ: (0.5-4 Hz), θ: (4-8 Hz)] phase-to-amplitude coupling (PAC) within sensor and across experimental time, we succeeded in achieving high classification accuracy and Information Transfer Rates (ITR) in the three data sets. Our approach outperformed the originally presented ITR on the three data sets. The bit rates obtained for both the disabled and able-bodied subjects reached the fastest reported level of 324 bits/min with the PAC estimator. Additionally, our approach outperformed alternative signal features such as the relative power (29.73 bits/min) and raw time series analysis (24.93 bits/min) and also the original reported bit rates of 10-25 bits/min . In the second data set, we succeeded in achieving an average ITR of 124.40 ± 11.68 for the slow 60 Hz and an average ITR of 233.99 ± 15.75 for the fast 120 Hz. In the third data set, we succeeded in achieving an average ITR of 106.44 ± 8.94. Current methodology outperforms any previous methodologies applied to each of the three free available BCI datasets.
Zhang, Ai-bing; Feng, Jie; Ward, Robert D; Wan, Ping; Gao, Qiang; Wu, Jun; Zhao, Wei-zhong
2012-01-01
Species identification via DNA barcodes is contributing greatly to current bioinventory efforts. The initial, and widely accepted, proposal was to use the protein-coding cytochrome c oxidase subunit I (COI) region as the standard barcode for animals, but recently non-coding internal transcribed spacer (ITS) genes have been proposed as candidate barcodes for both animals and plants. However, achieving a robust alignment for non-coding regions can be problematic. Here we propose two new methods (DV-RBF and FJ-RBF) to address this issue for species assignment by both coding and non-coding sequences that take advantage of the power of machine learning and bioinformatics. We demonstrate the value of the new methods with four empirical datasets, two representing typical protein-coding COI barcode datasets (neotropical bats and marine fish) and two representing non-coding ITS barcodes (rust fungi and brown algae). Using two random sub-sampling approaches, we demonstrate that the new methods significantly outperformed existing Neighbor-joining (NJ) and Maximum likelihood (ML) methods for both coding and non-coding barcodes when there was complete species coverage in the reference dataset. The new methods also out-performed NJ and ML methods for non-coding sequences in circumstances of potentially incomplete species coverage, although then the NJ and ML methods performed slightly better than the new methods for protein-coding barcodes. A 100% success rate of species identification was achieved with the two new methods for 4,122 bat queries and 5,134 fish queries using COI barcodes, with 95% confidence intervals (CI) of 99.75-100%. The new methods also obtained a 96.29% success rate (95%CI: 91.62-98.40%) for 484 rust fungi queries and a 98.50% success rate (95%CI: 96.60-99.37%) for 1094 brown algae queries, both using ITS barcodes.
The cost-constrained traveling salesman problem
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sokkappa, P.R.
1990-10-01
The Cost-Constrained Traveling Salesman Problem (CCTSP) is a variant of the well-known Traveling Salesman Problem (TSP). In the TSP, the goal is to find a tour of a given set of cities such that the total cost of the tour is minimized. In the CCTSP, each city is given a value, and a fixed cost-constraint is specified. The objective is to find a subtour of the cities that achieves maximum value without exceeding the cost-constraint. Thus, unlike the TSP, the CCTSP requires both selection and sequencing. As a consequence, most results for the TSP cannot be extended to the CCTSP.more » We show that the CCTSP is NP-hard and that no K-approximation algorithm or fully polynomial approximation scheme exists, unless P = NP. We also show that several special cases are polynomially solvable. Algorithms for the CCTSP, which outperform previous methods, are developed in three areas: upper bounding methods, exact algorithms, and heuristics. We found that a bounding strategy based on the knapsack problem performs better, both in speed and in the quality of the bounds, than methods based on the assignment problem. Likewise, we found that a branch-and-bound approach using the knapsack bound was superior to a method based on a common branch-and-bound method for the TSP. In our study of heuristic algorithms, we found that, when selecting modes for inclusion in the subtour, it is important to consider the neighborhood'' of the nodes. A node with low value that brings the subtour near many other nodes may be more desirable than an isolated node of high value. We found two types of repetition to be desirable: repetitions based on randomization in the subtour buildings process, and repetitions encouraging the inclusion of different subsets of the nodes. By varying the number and type of repetitions, we can adjust the computation time required by our method to obtain algorithms that outperform previous methods.« less
ANN modeling of DNA sequences: new strategies using DNA shape code.
Parbhane, R V; Tambe, S S; Kulkarni, B D
2000-09-01
Two new encoding strategies, namely, wedge and twist codes, which are based on the DNA helical parameters, are introduced to represent DNA sequences in artificial neural network (ANN)-based modeling of biological systems. The performance of the new coding strategies has been evaluated by conducting three case studies involving mapping (modeling) and classification applications of ANNs. The proposed coding schemes have been compared rigorously and shown to outperform the existing coding strategies especially in situations wherein limited data are available for building the ANN models.
Classifying medical relations in clinical text via convolutional neural networks.
He, Bin; Guan, Yi; Dai, Rui
2018-05-16
Deep learning research on relation classification has achieved solid performance in the general domain. This study proposes a convolutional neural network (CNN) architecture with a multi-pooling operation for medical relation classification on clinical records and explores a loss function with a category-level constraint matrix. Experiments using the 2010 i2b2/VA relation corpus demonstrate these models, which do not depend on any external features, outperform previous single-model methods and our best model is competitive with the existing ensemble-based method. Copyright © 2018. Published by Elsevier B.V.
Efficient calibration for imperfect computer models
Tuo, Rui; Wu, C. F. Jeff
2015-12-01
Many computer models contain unknown parameters which need to be estimated using physical observations. Furthermore, the calibration method based on Gaussian process models may lead to unreasonable estimate for imperfect computer models. In this work, we extend their study to calibration problems with stochastic physical data. We propose a novel method, called the L 2 calibration, and show its semiparametric efficiency. The conventional method of the ordinary least squares is also studied. Theoretical analysis shows that it is consistent but not efficient. Here, numerical examples show that the proposed method outperforms the existing ones.
Can Diastat Grafts Meet the Challenges of Daily Punctures?
Chandran, Prem K G; Messer, Diane; Sidwell, Richard A; Stubbs, David H; Nish, Andrew D
1997-01-01
To determine whether Diastat grafts can meet the challenges of daily needle punctures required for home hemodialysis (HD), a retrospective analysis was performed on the experience with 47 grafts placed in 44 patients receiving HD three times a week. The control group consisted of 17 patients who received 17 stretch polytetrafluoroethylene (s-PTFE) grafts. Apart from their ability to better contain bleeding after needle withdrawal, in all measures of longevity the Diastat grafts were outperformed by the s-PTFE grafts. No more direct data exist to address the original challenge.
Optimized atom position and coefficient coding for matching pursuit-based image compression.
Shoa, Alireza; Shirani, Shahram
2009-12-01
In this paper, we propose a new encoding algorithm for matching pursuit image coding. We show that coding performance is improved when correlations between atom positions and atom coefficients are both used in encoding. We find the optimum tradeoff between efficient atom position coding and efficient atom coefficient coding and optimize the encoder parameters. Our proposed algorithm outperforms the existing coding algorithms designed for matching pursuit image coding. Additionally, we show that our algorithm results in better rate distortion performance than JPEG 2000 at low bit rates.
Li, Ben; Sun, Zhaonan; He, Qing; Zhu, Yu; Qin, Zhaohui S
2016-03-01
Modern high-throughput biotechnologies such as microarray are capable of producing a massive amount of information for each sample. However, in a typical high-throughput experiment, only limited number of samples were assayed, thus the classical 'large p, small n' problem. On the other hand, rapid propagation of these high-throughput technologies has resulted in a substantial collection of data, often carried out on the same platform and using the same protocol. It is highly desirable to utilize the existing data when performing analysis and inference on a new dataset. Utilizing existing data can be carried out in a straightforward fashion under the Bayesian framework in which the repository of historical data can be exploited to build informative priors and used in new data analysis. In this work, using microarray data, we investigate the feasibility and effectiveness of deriving informative priors from historical data and using them in the problem of detecting differentially expressed genes. Through simulation and real data analysis, we show that the proposed strategy significantly outperforms existing methods including the popular and state-of-the-art Bayesian hierarchical model-based approaches. Our work illustrates the feasibility and benefits of exploiting the increasingly available genomics big data in statistical inference and presents a promising practical strategy for dealing with the 'large p, small n' problem. Our method is implemented in R package IPBT, which is freely available from https://github.com/benliemory/IPBT CONTACT: yuzhu@purdue.edu; zhaohui.qin@emory.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Bi-Force: large-scale bicluster editing and its application to gene expression data biclustering
Sun, Peng; Speicher, Nora K.; Röttger, Richard; Guo, Jiong; Baumbach, Jan
2014-01-01
Abstract The explosion of the biological data has dramatically reformed today's biological research. The need to integrate and analyze high-dimensional biological data on a large scale is driving the development of novel bioinformatics approaches. Biclustering, also known as ‘simultaneous clustering’ or ‘co-clustering’, has been successfully utilized to discover local patterns in gene expression data and similar biomedical data types. Here, we contribute a new heuristic: ‘Bi-Force’. It is based on the weighted bicluster editing model, to perform biclustering on arbitrary sets of biological entities, given any kind of pairwise similarities. We first evaluated the power of Bi-Force to solve dedicated bicluster editing problems by comparing Bi-Force with two existing algorithms in the BiCluE software package. We then followed a biclustering evaluation protocol in a recent review paper from Eren et al. (2013) (A comparative analysis of biclustering algorithms for gene expressiondata. Brief. Bioinform., 14:279–292.) and compared Bi-Force against eight existing tools: FABIA, QUBIC, Cheng and Church, Plaid, BiMax, Spectral, xMOTIFs and ISA. To this end, a suite of synthetic datasets as well as nine large gene expression datasets from Gene Expression Omnibus were analyzed. All resulting biclusters were subsequently investigated by Gene Ontology enrichment analysis to evaluate their biological relevance. The distinct theoretical foundation of Bi-Force (bicluster editing) is more powerful than strict biclustering. We thus outperformed existing tools with Bi-Force at least when following the evaluation protocols from Eren et al. Bi-Force is implemented in Java and integrated into the open source software package of BiCluE. The software as well as all used datasets are publicly available at http://biclue.mpi-inf.mpg.de. PMID:24682815
Optimal design of stimulus experiments for robust discrimination of biochemical reaction networks.
Flassig, R J; Sundmacher, K
2012-12-01
Biochemical reaction networks in the form of coupled ordinary differential equations (ODEs) provide a powerful modeling tool for understanding the dynamics of biochemical processes. During the early phase of modeling, scientists have to deal with a large pool of competing nonlinear models. At this point, discrimination experiments can be designed and conducted to obtain optimal data for selecting the most plausible model. Since biological ODE models have widely distributed parameters due to, e.g. biologic variability or experimental variations, model responses become distributed. Therefore, a robust optimal experimental design (OED) for model discrimination can be used to discriminate models based on their response probability distribution functions (PDFs). In this work, we present an optimal control-based methodology for designing optimal stimulus experiments aimed at robust model discrimination. For estimating the time-varying model response PDF, which results from the nonlinear propagation of the parameter PDF under the ODE dynamics, we suggest using the sigma-point approach. Using the model overlap (expected likelihood) as a robust discrimination criterion to measure dissimilarities between expected model response PDFs, we benchmark the proposed nonlinear design approach against linearization with respect to prediction accuracy and design quality for two nonlinear biological reaction networks. As shown, the sigma-point outperforms the linearization approach in the case of widely distributed parameter sets and/or existing multiple steady states. Since the sigma-point approach scales linearly with the number of model parameter, it can be applied to large systems for robust experimental planning. An implementation of the method in MATLAB/AMPL is available at http://www.uni-magdeburg.de/ivt/svt/person/rf/roed.html. flassig@mpi-magdeburg.mpg.de Supplementary data are are available at Bioinformatics online.
Surgical gesture classification from video and kinematic data.
Zappella, Luca; Béjar, Benjamín; Hager, Gregory; Vidal, René
2013-10-01
Much of the existing work on automatic classification of gestures and skill in robotic surgery is based on dynamic cues (e.g., time to completion, speed, forces, torque) or kinematic data (e.g., robot trajectories and velocities). While videos could be equally or more discriminative (e.g., videos contain semantic information not present in kinematic data), they are typically not used because of the difficulties associated with automatic video interpretation. In this paper, we propose several methods for automatic surgical gesture classification from video data. We assume that the video of a surgical task (e.g., suturing) has been segmented into video clips corresponding to a single gesture (e.g., grabbing the needle, passing the needle) and propose three methods to classify the gesture of each video clip. In the first one, we model each video clip as the output of a linear dynamical system (LDS) and use metrics in the space of LDSs to classify new video clips. In the second one, we use spatio-temporal features extracted from each video clip to learn a dictionary of spatio-temporal words, and use a bag-of-features (BoF) approach to classify new video clips. In the third one, we use multiple kernel learning (MKL) to combine the LDS and BoF approaches. Since the LDS approach is also applicable to kinematic data, we also use MKL to combine both types of data in order to exploit their complementarity. Our experiments on a typical surgical training setup show that methods based on video data perform equally well, if not better, than state-of-the-art approaches based on kinematic data. In turn, the combination of both kinematic and video data outperforms any other algorithm based on one type of data alone. Copyright © 2013 Elsevier B.V. All rights reserved.
Using the Logarithm of Odds to Define a Vector Space on Probabilistic Atlases
Pohl, Kilian M.; Fisher, John; Bouix, Sylvain; Shenton, Martha; McCarley, Robert W.; Grimson, W. Eric L.; Kikinis, Ron; Wells, William M.
2007-01-01
The Logarithm of the Odds ratio (LogOdds) is frequently used in areas such as artificial neural networks, economics, and biology, as an alternative representation of probabilities. Here, we use LogOdds to place probabilistic atlases in a linear vector space. This representation has several useful properties for medical imaging. For example, it not only encodes the shape of multiple anatomical structures but also captures some information concerning uncertainty. We demonstrate that the resulting vector space operations of addition and scalar multiplication have natural probabilistic interpretations. We discuss several examples for placing label maps into the space of LogOdds. First, we relate signed distance maps, a widely used implicit shape representation, to LogOdds and compare it to an alternative that is based on smoothing by spatial Gaussians. We find that the LogOdds approach better preserves shapes in a complex multiple object setting. In the second example, we capture the uncertainty of boundary locations by mapping multiple label maps of the same object into the LogOdds space. Third, we define a framework for non-convex interpolations among atlases that capture different time points in the aging process of a population. We evaluate the accuracy of our representation by generating a deformable shape atlas that captures the variations of anatomical shapes across a population. The deformable atlas is the result of a principal component analysis within the LogOdds space. This atlas is integrated into an existing segmentation approach for MR images. We compare the performance of the resulting implementation in segmenting 20 test cases to a similar approach that uses a more standard shape model that is based on signed distance maps. On this data set, the Bayesian classification model with our new representation outperformed the other approaches in segmenting subcortical structures. PMID:17698403
NASA Astrophysics Data System (ADS)
David, McInerney; Mark, Thyer; Dmitri, Kavetski; George, Kuczera
2017-04-01
This study provides guidance to hydrological researchers which enables them to provide probabilistic predictions of daily streamflow with the best reliability and precision for different catchment types (e.g. high/low degree of ephemerality). Reliable and precise probabilistic prediction of daily catchment-scale streamflow requires statistical characterization of residual errors of hydrological models. It is commonly known that hydrological model residual errors are heteroscedastic, i.e. there is a pattern of larger errors in higher streamflow predictions. Although multiple approaches exist for representing this heteroscedasticity, few studies have undertaken a comprehensive evaluation and comparison of these approaches. This study fills this research gap by evaluating 8 common residual error schemes, including standard and weighted least squares, the Box-Cox transformation (with fixed and calibrated power parameter, lambda) and the log-sinh transformation. Case studies include 17 perennial and 6 ephemeral catchments in Australia and USA, and two lumped hydrological models. We find the choice of heteroscedastic error modelling approach significantly impacts on predictive performance, though no single scheme simultaneously optimizes all performance metrics. The set of Pareto optimal schemes, reflecting performance trade-offs, comprises Box-Cox schemes with lambda of 0.2 and 0.5, and the log scheme (lambda=0, perennial catchments only). These schemes significantly outperform even the average-performing remaining schemes (e.g., across ephemeral catchments, median precision tightens from 105% to 40% of observed streamflow, and median biases decrease from 25% to 4%). Theoretical interpretations of empirical results highlight the importance of capturing the skew/kurtosis of raw residuals and reproducing zero flows. Recommendations for researchers and practitioners seeking robust residual error schemes for practical work are provided.
On the enhanced detectability of GPS anomalous behavior with relative entropy
NASA Astrophysics Data System (ADS)
Cho, Jeongho
2016-10-01
A standard receiver autonomous integrity monitoring (RAIM) technique for the global positioning system (GPS) has been dedicated to provide an integrity monitoring capability for safety-critical GPS applications, such as in civil aviation for the en-route (ER) through non-precision approach (NPA) or lateral navigation (LNAV). The performance of the existing RAIM method, however, may not meet more stringent aviation requirements for availability and integrity during the precision approach and landing phases of flight due to insufficient observables and/or untimely warning to the user beyond a specified time-to-alert in the event of a significant GPS failure. This has led to an enhanced RAIM architecture ensuring stricter integrity requirement by greatly decreasing the detection time when a satellite failure or a measurement error has occurred. We thus attempted to devise a user integrity monitor which is capable of identifying the GPS failure more rapidly than a standard RAIM scheme by incorporating the RAIM with the relative entropy, which is a likelihood ratio approach to assess the inconsistence between two data streams, quite different from a Euclidean distance. In addition, the delay-coordinate embedding technique needs to be considered and preprocessed to associate the discriminant measure obtained from the RAIM with the relative entropy in the new RAIM design. In simulation results, we demonstrate that the proposed user integrity monitor outperforms the standard RAIM with a higher level of detection rate of anomalies which could be hazardous to the users in the approach or landing phase and is a very promising alternative for the detection of deviations in GPS signal. The comparison also shows that it enables to catch even small anomalous gradients more rapidly than a typical user integrity monitor.
SPAR: a security- and power-aware routing protocol for wireless ad hoc and sensor networks
NASA Astrophysics Data System (ADS)
Oberoi, Vikram; Chigan, Chunxiao
2005-05-01
Wireless Ad Hoc and Sensor Networks (WAHSNs) are vulnerable to extensive attacks as well as severe resource constraints. To fulfill the security needs, many security enhancements have been proposed. Like wise, from resource constraint perspective, many power aware schemes have been proposed to save the battery power. However, we observe that for the severely resource limited and extremely vulnerable WAHSNs, taking security or power (or any other resource) alone into consideration for protocol design is rather inadequate toward the truly "secure-and-useful" WAHSNs. For example, from resource constraint perspective, we identify one of the potential problems, the Security-Capable-Congestion (SCC) behavior, for the WAHSNs routing protocols where only the security are concerned. On the other hand, the design approach where only scarce resource is concerned, such as many power-aware WAHSNs protocols, leaves security unconsidered and is undesirable to many WAHSNs application scenarios. Motivated by these observations, we propose a co-design approach, where both the high security and effective resource consumption are targeted for WAHSNs protocol design. Specifically, we propose a novel routing protocol, Security- and Power- Aware Routing (SPAR) protocol based on this co-design approach. In SPAR, the routing decisions are made based on both security and power as routing criteria. The idea of the SPAR mechanism is routing protocol independent and therefore can be broadly integrated into any of the existing WAHSNs routing protocols. The simulation results show that SPAR outperforms the WAHSNs routing protocols where security or power alone is considered, significantly. This research finding demonstrates the proposed security- and resource- aware co-design approach is promising towards the truly "secure-and-useful" WAHSNs.
Web Image Search Re-ranking with Click-based Similarity and Typicality.
Yang, Xiaopeng; Mei, Tao; Zhang, Yong Dong; Liu, Jie; Satoh, Shin'ichi
2016-07-20
In image search re-ranking, besides the well known semantic gap, intent gap, which is the gap between the representation of users' query/demand and the real intent of the users, is becoming a major problem restricting the development of image retrieval. To reduce human effects, in this paper, we use image click-through data, which can be viewed as the "implicit feedback" from users, to help overcome the intention gap, and further improve the image search performance. Generally, the hypothesis visually similar images should be close in a ranking list and the strategy images with higher relevance should be ranked higher than others are widely accepted. To obtain satisfying search results, thus, image similarity and the level of relevance typicality are determinate factors correspondingly. However, when measuring image similarity and typicality, conventional re-ranking approaches only consider visual information and initial ranks of images, while overlooking the influence of click-through data. This paper presents a novel re-ranking approach, named spectral clustering re-ranking with click-based similarity and typicality (SCCST). First, to learn an appropriate similarity measurement, we propose click-based multi-feature similarity learning algorithm (CMSL), which conducts metric learning based on clickbased triplets selection, and integrates multiple features into a unified similarity space via multiple kernel learning. Then based on the learnt click-based image similarity measure, we conduct spectral clustering to group visually and semantically similar images into same clusters, and get the final re-rank list by calculating click-based clusters typicality and withinclusters click-based image typicality in descending order. Our experiments conducted on two real-world query-image datasets with diverse representative queries show that our proposed reranking approach can significantly improve initial search results, and outperform several existing re-ranking approaches.
Choice-Based Conjoint Analysis: Classification vs. Discrete Choice Models
NASA Astrophysics Data System (ADS)
Giesen, Joachim; Mueller, Klaus; Taneva, Bilyana; Zolliker, Peter
Conjoint analysis is a family of techniques that originated in psychology and later became popular in market research. The main objective of conjoint analysis is to measure an individual's or a population's preferences on a class of options that can be described by parameters and their levels. We consider preference data obtained in choice-based conjoint analysis studies, where one observes test persons' choices on small subsets of the options. There are many ways to analyze choice-based conjoint analysis data. Here we discuss the intuition behind a classification based approach, and compare this approach to one based on statistical assumptions (discrete choice models) and to a regression approach. Our comparison on real and synthetic data indicates that the classification approach outperforms the discrete choice models.
Improving Hospital-wide Patient Scheduling Decisions by Clinical Pathway Mining.
Gartner, Daniel; Arnolds, Ines V; Nickel, Stefan
2015-01-01
Recent research has highlighted the need for solving hospital-wide patient scheduling problems. Inpatient scheduling, patient activities have to be scheduled on scarce hospital resources such that temporal relations between activities (e.g. for recovery times) are ensured. Common objectives are, among others, the minimization of the length of stay (LOS). In this paper, we consider a hospital-wide patient scheduling problem with LOS minimization based on uncertain clinical pathways. We approach the problem in three stages: First, we learn most likely clinical pathways using a sequential pattern mining approach. Second, we provide a mathematical model for patient scheduling and finally, we combine the two approaches. In an experimental study carried out using real-world data, we show that our approach outperforms baseline approaches on two metrics.
Mixed Criticality Scheduling for Industrial Wireless Sensor Networks
Jin, Xi; Xia, Changqing; Xu, Huiting; Wang, Jintao; Zeng, Peng
2016-01-01
Wireless sensor networks (WSNs) have been widely used in industrial systems. Their real-time performance and reliability are fundamental to industrial production. Many works have studied the two aspects, but only focus on single criticality WSNs. Mixed criticality requirements exist in many advanced applications in which different data flows have different levels of importance (or criticality). In this paper, first, we propose a scheduling algorithm, which guarantees the real-time performance and reliability requirements of data flows with different levels of criticality. The algorithm supports centralized optimization and adaptive adjustment. It is able to improve both the scheduling performance and flexibility. Then, we provide the schedulability test through rigorous theoretical analysis. We conduct extensive simulations, and the results demonstrate that the proposed scheduling algorithm and analysis significantly outperform existing ones. PMID:27589741
Modeling Aromatic Liquids: Toluene, Phenol, and Pyridine.
Baker, Christopher M; Grant, Guy H
2007-03-01
Aromatic groups are now acknowledged to play an important role in many systems of interest. However, existing molecular mechanics methods provide a poor representation of these groups. In a previous paper, we have shown that the molecular mechanics treatment of benzene can be improved by the incorporation of an explicit representation of the aromatic π electrons. Here, we develop this concept further, developing charge-separation models for toluene, phenol, and pyridine. Monte Carlo simulations are used to parametrize the models, via the reproduction of experimental thermodynamic data, and our models are shown to outperform an existing atom-centered model. The models are then used to make predictions about the structures of the liquids at the molecular level and are tested further through their application to the modeling of gas-phase dimers and cation-π interactions.
FIND: difFerential chromatin INteractions Detection using a spatial Poisson process.
Djekidel, Mohamed Nadhir; Chen, Yang; Zhang, Michael Q
2018-02-12
Polymer-based simulations and experimental studies indicate the existence of a spatial dependency between the adjacent DNA fibers involved in the formation of chromatin loops. However, the existing strategies for detecting differential chromatin interactions assume that the interacting segments are spatially independent from the other segments nearby. To resolve this issue, we developed a new computational method, FIND, which considers the local spatial dependency between interacting loci. FIND uses a spatial Poisson process to detect differential chromatin interactions that show a significant difference in their interaction frequency and the interaction frequency of their neighbors. Simulation and biological data analysis show that FIND outperforms the widely used count-based methods and has a better signal-to-noise ratio. © 2018 Djekidel et al.; Published by Cold Spring Harbor Laboratory Press.
Cooperative Spatial Retreat for Resilient Drone Networks.
Kang, Jin-Hyeok; Kwon, Young-Min; Park, Kyung-Joon
2017-05-03
Drones are broadening their scope to various applications such as networking, package delivery, agriculture, rescue, and many more. For proper operation of drones, reliable communication should be guaranteed because drones are remotely controlled. When drones experience communication failure due to bad channel condition, interference, or jamming in a certain area, one existing solution is to exploit mobility or so-called spatial retreat to evacuate them from the communication failure area. However, the conventional spatial retreat scheme moves drones in random directions, which results in inefficient movement with significant evacuation time and waste of battery lifetime. In this paper, we propose a novel spatial retreat technique that takes advantage of cooperation between drones for resilient networking, which is called cooperative spatial retreat (CSR). Our performance evaluation shows that the proposed CSR significantly outperforms existing schemes.
Fuzzy multinomial logistic regression analysis: A multi-objective programming approach
NASA Astrophysics Data System (ADS)
Abdalla, Hesham A.; El-Sayed, Amany A.; Hamed, Ramadan
2017-05-01
Parameter estimation for multinomial logistic regression is usually based on maximizing the likelihood function. For large well-balanced datasets, Maximum Likelihood (ML) estimation is a satisfactory approach. Unfortunately, ML can fail completely or at least produce poor results in terms of estimated probabilities and confidence intervals of parameters, specially for small datasets. In this study, a new approach based on fuzzy concepts is proposed to estimate parameters of the multinomial logistic regression. The study assumes that the parameters of multinomial logistic regression are fuzzy. Based on the extension principle stated by Zadeh and Bárdossy's proposition, a multi-objective programming approach is suggested to estimate these fuzzy parameters. A simulation study is used to evaluate the performance of the new approach versus Maximum likelihood (ML) approach. Results show that the new proposed model outperforms ML in cases of small datasets.
A Novel Local Learning based Approach With Application to Breast Cancer Diagnosis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xu, Songhua; Tourassi, Georgia
2012-01-01
The purpose of this study is to develop and evaluate a novel local learning-based approach for computer-assisted diagnosis of breast cancer. Our new local learning based algorithm using the linear logistic regression method as its base learner is described. Overall, our algorithm will perform its stochastic searching process until the total allowed computing time is used up by our random walk process in identifying the most suitable population subdivision scheme and their corresponding individual base learners. The proposed local learning-based approach was applied for the prediction of breast cancer given 11 mammographic and clinical findings reported by physicians using themore » BI-RADS lexicon. Our database consisted of 850 patients with biopsy confirmed diagnosis (290 malignant and 560 benign). We also compared the performance of our method with a collection of publicly available state-of-the-art machine learning methods. Predictive performance for all classifiers was evaluated using 10-fold cross validation and Receiver Operating Characteristics (ROC) analysis. Figure 1 reports the performance of 54 machine learning methods implemented in the machine learning toolkit Weka (version 3.0). We introduced a novel local learning-based classifier and compared it with an extensive list of other classifiers for the problem of breast cancer diagnosis. Our experiments show that the algorithm superior prediction performance outperforming a wide range of other well established machine learning techniques. Our conclusion complements the existing understanding in the machine learning field that local learning may capture complicated, non-linear relationships exhibited by real-world datasets.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grambow, Colin A.; Jamal, Adeel; Li, Yi -Pei
Ketohydroperoxides are important in liquid-phase autoxidation and in gas-phase partial oxidation and pre-ignition chemistry, but because of their low concentration, instability, and various analytical chemistry limitations, it has been challenging to experimentally determine their reactivity, and only a few pathways are known. In the present work, 75 elementary-step unimolecular reactions of the simplest γ-ketohydroperoxide, 3-hydroperoxypropanal, were discovered by a combination of density functional theory with several automated transition-state search algorithms: the Berny algorithm coupled with the freezing string method, single- and double-ended growing string methods, the heuristic KinBot algorithm, and the single-component artificial force induced reaction method (SC-AFIR). The presentmore » joint approach significantly outperforms previous manual and automated transition-state searches – 68 of the reactions of γ-ketohydroperoxide discovered here were previously unknown and completely unexpected. All of the methods found the lowest-energy transition state, which corresponds to the first step of the Korcek mechanism, but each algorithm except for SC-AFIR detected several reactions not found by any of the other methods. We show that the low-barrier chemical reactions involve promising new chemistry that may be relevant in atmospheric and combustion systems. Our study highlights the complexity of chemical space exploration and the advantage of combined application of several approaches. Altogether, the present work demonstrates both the power and the weaknesses of existing fully automated approaches for reaction discovery which suggest possible directions for further method development and assessment in order to enable reliable discovery of all important reactions of any specified reactant(s).« less
Grambow, Colin A.; Jamal, Adeel; Li, Yi -Pei; ...
2017-12-22
Ketohydroperoxides are important in liquid-phase autoxidation and in gas-phase partial oxidation and pre-ignition chemistry, but because of their low concentration, instability, and various analytical chemistry limitations, it has been challenging to experimentally determine their reactivity, and only a few pathways are known. In the present work, 75 elementary-step unimolecular reactions of the simplest γ-ketohydroperoxide, 3-hydroperoxypropanal, were discovered by a combination of density functional theory with several automated transition-state search algorithms: the Berny algorithm coupled with the freezing string method, single- and double-ended growing string methods, the heuristic KinBot algorithm, and the single-component artificial force induced reaction method (SC-AFIR). The presentmore » joint approach significantly outperforms previous manual and automated transition-state searches – 68 of the reactions of γ-ketohydroperoxide discovered here were previously unknown and completely unexpected. All of the methods found the lowest-energy transition state, which corresponds to the first step of the Korcek mechanism, but each algorithm except for SC-AFIR detected several reactions not found by any of the other methods. We show that the low-barrier chemical reactions involve promising new chemistry that may be relevant in atmospheric and combustion systems. Our study highlights the complexity of chemical space exploration and the advantage of combined application of several approaches. Altogether, the present work demonstrates both the power and the weaknesses of existing fully automated approaches for reaction discovery which suggest possible directions for further method development and assessment in order to enable reliable discovery of all important reactions of any specified reactant(s).« less
An accelerated image matching technique for UAV orthoimage registration
NASA Astrophysics Data System (ADS)
Tsai, Chung-Hsien; Lin, Yu-Ching
2017-06-01
Using an Unmanned Aerial Vehicle (UAV) drone with an attached non-metric camera has become a popular low-cost approach for collecting geospatial data. A well-georeferenced orthoimage is a fundamental product for geomatics professionals. To achieve high positioning accuracy of orthoimages, precise sensor position and orientation data, or a number of ground control points (GCPs), are often required. Alternatively, image registration is a solution for improving the accuracy of a UAV orthoimage, as long as a historical reference image is available. This study proposes a registration scheme, including an Accelerated Binary Robust Invariant Scalable Keypoints (ABRISK) algorithm and spatial analysis of corresponding control points for image registration. To determine a match between two input images, feature descriptors from one image are compared with those from another image. A "Sorting Ring" is used to filter out uncorrected feature pairs as early as possible in the stage of matching feature points, to speed up the matching process. The results demonstrate that the proposed ABRISK approach outperforms the vector-based Scale Invariant Feature Transform (SIFT) approach where radiometric variations exist. ABRISK is 19.2 times and 312 times faster than SIFT for image sizes of 1000 × 1000 pixels and 4000 × 4000 pixels, respectively. ABRISK is 4.7 times faster than Binary Robust Invariant Scalable Keypoints (BRISK). Furthermore, the positional accuracy of the UAV orthoimage after applying the proposed image registration scheme is improved by an average of root mean square error (RMSE) of 2.58 m for six test orthoimages whose spatial resolutions vary from 6.7 cm to 10.7 cm.
A Particle Batch Smoother Approach to Snow Water Equivalent Estimation
NASA Technical Reports Server (NTRS)
Margulis, Steven A.; Girotto, Manuela; Cortes, Gonzalo; Durand, Michael
2015-01-01
This paper presents a newly proposed data assimilation method for historical snow water equivalent SWE estimation using remotely sensed fractional snow-covered area fSCA. The newly proposed approach consists of a particle batch smoother (PBS), which is compared to a previously applied Kalman-based ensemble batch smoother (EnBS) approach. The methods were applied over the 27-yr Landsat 5 record at snow pillow and snow course in situ verification sites in the American River basin in the Sierra Nevada (United States). This basin is more densely vegetated and thus more challenging for SWE estimation than the previous applications of the EnBS. Both data assimilation methods provided significant improvement over the prior (modeling only) estimates, with both able to significantly reduce prior SWE biases. The prior RMSE values at the snow pillow and snow course sites were reduced by 68%-82% and 60%-68%, respectively, when applying the data assimilation methods. This result is encouraging for a basin like the American where the moderate to high forest cover will necessarily obscure more of the snow-covered ground surface than in previously examined, less-vegetated basins. The PBS generally outperformed the EnBS: for snow pillows the PBSRMSE was approx.54%of that seen in the EnBS, while for snow courses the PBSRMSE was approx.79%of the EnBS. Sensitivity tests show relative insensitivity for both the PBS and EnBS results to ensemble size and fSCA measurement error, but a higher sensitivity for the EnBS to the mean prior precipitation input, especially in the case where significant prior biases exist.
Convolutional Dictionary Learning: Acceleration and Convergence
NASA Astrophysics Data System (ADS)
Chun, Il Yong; Fessler, Jeffrey A.
2018-04-01
Convolutional dictionary learning (CDL or sparsifying CDL) has many applications in image processing and computer vision. There has been growing interest in developing efficient algorithms for CDL, mostly relying on the augmented Lagrangian (AL) method or the variant alternating direction method of multipliers (ADMM). When their parameters are properly tuned, AL methods have shown fast convergence in CDL. However, the parameter tuning process is not trivial due to its data dependence and, in practice, the convergence of AL methods depends on the AL parameters for nonconvex CDL problems. To moderate these problems, this paper proposes a new practically feasible and convergent Block Proximal Gradient method using a Majorizer (BPG-M) for CDL. The BPG-M-based CDL is investigated with different block updating schemes and majorization matrix designs, and further accelerated by incorporating some momentum coefficient formulas and restarting techniques. All of the methods investigated incorporate a boundary artifacts removal (or, more generally, sampling) operator in the learning model. Numerical experiments show that, without needing any parameter tuning process, the proposed BPG-M approach converges more stably to desirable solutions of lower objective values than the existing state-of-the-art ADMM algorithm and its memory-efficient variant do. Compared to the ADMM approaches, the BPG-M method using a multi-block updating scheme is particularly useful in single-threaded CDL algorithm handling large datasets, due to its lower memory requirement and no polynomial computational complexity. Image denoising experiments show that, for relatively strong additive white Gaussian noise, the filters learned by BPG-M-based CDL outperform those trained by the ADMM approach.
Volkova, Svitlana; Ayton, Ellyn; Porterfield, Katherine; ...
2017-12-15
This work is the first to take advantage of recurrent neural networks to predict influenza-like-illness (ILI) dynamics from various linguistic signals extracted from social media data. Unlike other approaches that rely on timeseries analysis of historical ILI data [1, 2] and the state-of-the-art machine learning models [3, 4], we build and evaluate the predictive power of Long Short Term Memory (LSTMs) architectures capable of nowcasting (predicting in \\real-time") and forecasting (predicting the future) ILI dynamics in the 2011 { 2014 influenza seasons. To build our models we integrate information people post in social media e.g., topics, stylistic and syntactic patterns,more » emotions and opinions, and communication behavior. We then quantitatively evaluate the predictive power of different social media signals and contrast the performance of the-state-of-the-art regression models with neural networks. Finally, we combine ILI and social media signals to build joint neural network models for ILI dynamics prediction. Unlike the majority of the existing work, we specifically focus on developing models for local rather than national ILI surveillance [1], specifically for military rather than general populations [3] in 26 U.S. and six international locations. Our approach demonstrates several advantages: (a) Neural network models learned from social media data yield the best performance compared to previously used regression models. (b) Previously under-explored language and communication behavior features are more predictive of ILI dynamics than syntactic and stylistic signals expressed in social media. (c) Neural network models learned exclusively from social media signals yield comparable or better performance to the models learned from ILI historical data, thus, signals from social media can be potentially used to accurately forecast ILI dynamics for the regions where ILI historical data is not available. (d) Neural network models learned from combined ILI and social media signals significantly outperform models that rely solely on ILI historical data, which adds to a great potential of alternative public sources for ILI dynamics prediction. (e) Location-specific models outperform previously used location-independent models e.g., U.S. only. (f) Prediction results significantly vary across geolocations depending on the amount of social media data available and ILI activity patterns.« less
Frog: Asynchronous Graph Processing on GPU with Hybrid Coloring Model
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shi, Xuanhua; Luo, Xuan; Liang, Junling
GPUs have been increasingly used to accelerate graph processing for complicated computational problems regarding graph theory. Many parallel graph algorithms adopt the asynchronous computing model to accelerate the iterative convergence. Unfortunately, the consistent asynchronous computing requires locking or atomic operations, leading to significant penalties/overheads when implemented on GPUs. As such, coloring algorithm is adopted to separate the vertices with potential updating conflicts, guaranteeing the consistency/correctness of the parallel processing. Common coloring algorithms, however, may suffer from low parallelism because of a large number of colors generally required for processing a large-scale graph with billions of vertices. We propose a light-weightmore » asynchronous processing framework called Frog with a preprocessing/hybrid coloring model. The fundamental idea is based on Pareto principle (or 80-20 rule) about coloring algorithms as we observed through masses of realworld graph coloring cases. We find that a majority of vertices (about 80%) are colored with only a few colors, such that they can be read and updated in a very high degree of parallelism without violating the sequential consistency. Accordingly, our solution separates the processing of the vertices based on the distribution of colors. In this work, we mainly answer three questions: (1) how to partition the vertices in a sparse graph with maximized parallelism, (2) how to process large-scale graphs that cannot fit into GPU memory, and (3) how to reduce the overhead of data transfers on PCIe while processing each partition. We conduct experiments on real-world data (Amazon, DBLP, YouTube, RoadNet-CA, WikiTalk and Twitter) to evaluate our approach and make comparisons with well-known non-preprocessed (such as Totem, Medusa, MapGraph and Gunrock) and preprocessed (Cusha) approaches, by testing four classical algorithms (BFS, PageRank, SSSP and CC). On all the tested applications and datasets, Frog is able to significantly outperform existing GPU-based graph processing systems except Gunrock and MapGraph. MapGraph gets better performance than Frog when running BFS on RoadNet-CA. The comparison between Gunrock and Frog is inconclusive. Frog can outperform Gunrock more than 1.04X when running PageRank and SSSP, while the advantage of Frog is not obvious when running BFS and CC on some datasets especially for RoadNet-CA.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Volkova, Svitlana; Ayton, Ellyn; Porterfield, Katherine
This work is the first to take advantage of recurrent neural networks to predict influenza-like-illness (ILI) dynamics from various linguistic signals extracted from social media data. Unlike other approaches that rely on timeseries analysis of historical ILI data [1, 2] and the state-of-the-art machine learning models [3, 4], we build and evaluate the predictive power of Long Short Term Memory (LSTMs) architectures capable of nowcasting (predicting in \\real-time") and forecasting (predicting the future) ILI dynamics in the 2011 { 2014 influenza seasons. To build our models we integrate information people post in social media e.g., topics, stylistic and syntactic patterns,more » emotions and opinions, and communication behavior. We then quantitatively evaluate the predictive power of different social media signals and contrast the performance of the-state-of-the-art regression models with neural networks. Finally, we combine ILI and social media signals to build joint neural network models for ILI dynamics prediction. Unlike the majority of the existing work, we specifically focus on developing models for local rather than national ILI surveillance [1], specifically for military rather than general populations [3] in 26 U.S. and six international locations. Our approach demonstrates several advantages: (a) Neural network models learned from social media data yield the best performance compared to previously used regression models. (b) Previously under-explored language and communication behavior features are more predictive of ILI dynamics than syntactic and stylistic signals expressed in social media. (c) Neural network models learned exclusively from social media signals yield comparable or better performance to the models learned from ILI historical data, thus, signals from social media can be potentially used to accurately forecast ILI dynamics for the regions where ILI historical data is not available. (d) Neural network models learned from combined ILI and social media signals significantly outperform models that rely solely on ILI historical data, which adds to a great potential of alternative public sources for ILI dynamics prediction. (e) Location-specific models outperform previously used location-independent models e.g., U.S. only. (f) Prediction results significantly vary across geolocations depending on the amount of social media data available and ILI activity patterns.« less
An improved advertising CTR prediction approach based on the fuzzy deep neural network
Gao, Shu; Li, Mingjiang
2018-01-01
Combining a deep neural network with fuzzy theory, this paper proposes an advertising click-through rate (CTR) prediction approach based on a fuzzy deep neural network (FDNN). In this approach, fuzzy Gaussian-Bernoulli restricted Boltzmann machine (FGBRBM) is first applied to input raw data from advertising datasets. Next, fuzzy restricted Boltzmann machine (FRBM) is used to construct the fuzzy deep belief network (FDBN) with the unsupervised method layer by layer. Finally, fuzzy logistic regression (FLR) is utilized for modeling the CTR. The experimental results show that the proposed FDNN model outperforms several baseline models in terms of both data representation capability and robustness in advertising click log datasets with noise. PMID:29727443
NASA Astrophysics Data System (ADS)
Ryzhikov, I. S.; Semenkin, E. S.; Akhmedova, Sh A.
2017-02-01
A novel order reduction method for linear time invariant systems is described. The method is based on reducing the initial problem to an optimization one, using the proposed model representation, and solving the problem with an efficient optimization algorithm. The proposed method of determining the model allows all the parameters of the model with lower order to be identified and by definition, provides the model with the required steady-state. As a powerful optimization tool, the meta-heuristic Co-Operation of Biology-Related Algorithms was used. Experimental results proved that the proposed approach outperforms other approaches and that the reduced order model achieves a high level of accuracy.
An improved advertising CTR prediction approach based on the fuzzy deep neural network.
Jiang, Zilong; Gao, Shu; Li, Mingjiang
2018-01-01
Combining a deep neural network with fuzzy theory, this paper proposes an advertising click-through rate (CTR) prediction approach based on a fuzzy deep neural network (FDNN). In this approach, fuzzy Gaussian-Bernoulli restricted Boltzmann machine (FGBRBM) is first applied to input raw data from advertising datasets. Next, fuzzy restricted Boltzmann machine (FRBM) is used to construct the fuzzy deep belief network (FDBN) with the unsupervised method layer by layer. Finally, fuzzy logistic regression (FLR) is utilized for modeling the CTR. The experimental results show that the proposed FDNN model outperforms several baseline models in terms of both data representation capability and robustness in advertising click log datasets with noise.
A Background Noise Reduction Technique Using Adaptive Noise Cancellation for Microphone Arrays
NASA Technical Reports Server (NTRS)
Spalt, Taylor B.; Fuller, Christopher R.; Brooks, Thomas F.; Humphreys, William M., Jr.; Brooks, Thomas F.
2011-01-01
Background noise in wind tunnel environments poses a challenge to acoustic measurements due to possible low or negative Signal to Noise Ratios (SNRs) present in the testing environment. This paper overviews the application of time domain Adaptive Noise Cancellation (ANC) to microphone array signals with an intended application of background noise reduction in wind tunnels. An experiment was conducted to simulate background noise from a wind tunnel circuit measured by an out-of-flow microphone array in the tunnel test section. A reference microphone was used to acquire a background noise signal which interfered with the desired primary noise source signal at the array. The technique s efficacy was investigated using frequency spectra from the array microphones, array beamforming of the point source region, and subsequent deconvolution using the Deconvolution Approach for the Mapping of Acoustic Sources (DAMAS) algorithm. Comparisons were made with the conventional techniques for improving SNR of spectral and Cross-Spectral Matrix subtraction. The method was seen to recover the primary signal level in SNRs as low as -29 dB and outperform the conventional methods. A second processing approach using the center array microphone as the noise reference was investigated for more general applicability of the ANC technique. It outperformed the conventional methods at the -29 dB SNR but yielded less accurate results when coherence over the array dropped. This approach could possibly improve conventional testing methodology but must be investigated further under more realistic testing conditions.
Rich Representations with Exposed Semantics for Deep Visual Reasoning
2016-06-01
Gupta, Abh inav; Hebert, Martial ; Aminoff, Elissa; Park, HyunSoo; Forsyth, David; Shi, Jianbo; Tarr, Michael Se. TASK NUMBER Sf. WORK UNIT NUMBER 7...approach performs equally well or better when compared to the state-of-the- art . We also expanded our research on perceiving the interactions between...cross-instance correspondences. We demonstrated that our end-to-end trained ConvNet supervised by cycle-consistency outperforms state-of-the- art
ERIC Educational Resources Information Center
Price-Mohr, Ruth; Price, Colin
2017-01-01
A survey of primary schools in England found that girls outperform boys in English across all phases (Ofsted in Moving English forward. Ofsted, Manchester, 2012). The gender gap remains an on-going issue in England, especially for reading attainment. This paper presents evidence of gender differences in learning to read that emerged during the…
Video Denoising via Dynamic Video Layering
NASA Astrophysics Data System (ADS)
Guo, Han; Vaswani, Namrata
2018-07-01
Video denoising refers to the problem of removing "noise" from a video sequence. Here the term "noise" is used in a broad sense to refer to any corruption or outlier or interference that is not the quantity of interest. In this work, we develop a novel approach to video denoising that is based on the idea that many noisy or corrupted videos can be split into three parts - the "low-rank layer", the "sparse layer", and a small residual (which is small and bounded). We show, using extensive experiments, that our denoising approach outperforms the state-of-the-art denoising algorithms.
The Betting Odds Rating System: Using soccer forecasts to forecast soccer.
Wunderlich, Fabian; Memmert, Daniel
2018-01-01
Betting odds are frequently found to outperform mathematical models in sports related forecasting tasks, however the factors contributing to betting odds are not fully traceable and in contrast to rating-based forecasts no straightforward measure of team-specific quality is deducible from the betting odds. The present study investigates the approach of combining the methods of mathematical models and the information included in betting odds. A soccer forecasting model based on the well-known ELO rating system and taking advantage of betting odds as a source of information is presented. Data from almost 15.000 soccer matches (seasons 2007/2008 until 2016/2017) are used, including both domestic matches (English Premier League, German Bundesliga, Spanish Primera Division and Italian Serie A) and international matches (UEFA Champions League, UEFA Europe League). The novel betting odds based ELO model is shown to outperform classic ELO models, thus demonstrating that betting odds prior to a match contain more relevant information than the result of the match itself. It is shown how the novel model can help to gain valuable insights into the quality of soccer teams and its development over time, thus having a practical benefit in performance analysis. Moreover, it is argued that network based approaches might help in further improving rating and forecasting methods.
The Betting Odds Rating System: Using soccer forecasts to forecast soccer
Memmert, Daniel
2018-01-01
Betting odds are frequently found to outperform mathematical models in sports related forecasting tasks, however the factors contributing to betting odds are not fully traceable and in contrast to rating-based forecasts no straightforward measure of team-specific quality is deducible from the betting odds. The present study investigates the approach of combining the methods of mathematical models and the information included in betting odds. A soccer forecasting model based on the well-known ELO rating system and taking advantage of betting odds as a source of information is presented. Data from almost 15.000 soccer matches (seasons 2007/2008 until 2016/2017) are used, including both domestic matches (English Premier League, German Bundesliga, Spanish Primera Division and Italian Serie A) and international matches (UEFA Champions League, UEFA Europe League). The novel betting odds based ELO model is shown to outperform classic ELO models, thus demonstrating that betting odds prior to a match contain more relevant information than the result of the match itself. It is shown how the novel model can help to gain valuable insights into the quality of soccer teams and its development over time, thus having a practical benefit in performance analysis. Moreover, it is argued that network based approaches might help in further improving rating and forecasting methods. PMID:29870554
Gender differences in mathematics achievement in Beijing: A meta-analysis.
Li, Meijuan; Zhang, Yongmei; Liu, Hongyun; Hao, Yi
2017-12-19
The topic of gender differences in mathematical performance has received considerable attention in the fields of education, sociology, economics and psychology. We analysed gender differences based on data from the Beijing Assessment of Educational Quality in China. A large data set of Grade 5 and Grade 8 students who took the mathematical test from 2008 to 2013 (n = 73,318) were analysed. Meta-analysis was used in this research. The findings were as follows. (1) No gender differences in mathematical achievement exist among students in Grade 5, relatively small gender differences exist in Grade 8, females scored higher than males, and variance of male students is larger than that of females in both Grade 5 and Grade 8. (2) Except for statistics and probability, gender differences in other domains in Grade 8 are significantly higher than those in Grade 5, and female students outperform males. (3) The ratio of students of both gender in Grade 5 and Grade 8 at the 95-100% percentile level shows no significant differences. However, the ratio of male students is significantly higher than that of females at the 0-5% percentile level. (4) In Grade 5, the extent to which females outperformed males in low SES group is larger than that in higher SES groups, and in Grade 8, the magnitude of gender differences in urban schools is smaller than that in rural schools. There is a small gender difference among the 8th graders, with the male disadvantage at the bottom of the distribution. And gender differences also vary across school locations. © 2017 The British Psychological Society.
Treatment planning for spinal radiosurgery : A competitive multiplatform benchmark challenge.
Moustakis, Christos; Chan, Mark K H; Kim, Jinkoo; Nilsson, Joakim; Bergman, Alanah; Bichay, Tewfik J; Palazon Cano, Isabel; Cilla, Savino; Deodato, Francesco; Doro, Raffaela; Dunst, Jürgen; Eich, Hans Theodor; Fau, Pierre; Fong, Ming; Haverkamp, Uwe; Heinze, Simon; Hildebrandt, Guido; Imhoff, Detlef; de Klerck, Erik; Köhn, Janett; Lambrecht, Ulrike; Loutfi-Krauss, Britta; Ebrahimi, Fatemeh; Masi, Laura; Mayville, Alan H; Mestrovic, Ante; Milder, Maaike; Morganti, Alessio G; Rades, Dirk; Ramm, Ulla; Rödel, Claus; Siebert, Frank-Andre; den Toom, Wilhelm; Wang, Lei; Wurster, Stefan; Schweikard, Achim; Soltys, Scott G; Ryu, Samuel; Blanck, Oliver
2018-05-25
To investigate the quality of treatment plans of spinal radiosurgery derived from different planning and delivery systems. The comparisons include robotic delivery and intensity modulated arc therapy (IMAT) approaches. Multiple centers with equal systems were used to reduce a bias based on individual's planning abilities. The study used a series of three complex spine lesions to maximize the difference in plan quality among the various approaches. Internationally recognized experts in the field of treatment planning and spinal radiosurgery from 12 centers with various treatment planning systems participated. For a complex spinal lesion, the results were compared against a previously published benchmark plan derived for CyberKnife radiosurgery (CKRS) using circular cones only. For two additional cases, one with multiple small lesions infiltrating three vertebrae and a single vertebra lesion treated with integrated boost, the results were compared against a benchmark plan generated using a best practice guideline for CKRS. All plans were rated based on a previously established ranking system. All 12 centers could reach equality (n = 4) or outperform (n = 8) the benchmark plan. For the multiple lesions and the single vertebra lesion plan only 5 and 3 of the 12 centers, respectively, reached equality or outperformed the best practice benchmark plan. However, the absolute differences in target and critical structure dosimetry were small and strongly planner-dependent rather than system-dependent. Overall, gantry-based IMAT with simple planning techniques (two coplanar arcs) produced faster treatments and significantly outperformed static gantry intensity modulated radiation therapy (IMRT) and multileaf collimator (MLC) or non-MLC CKRS treatment plan quality regardless of the system (mean rank out of 4 was 1.2 vs. 3.1, p = 0.002). High plan quality for complex spinal radiosurgery was achieved among all systems and all participating centers in this planning challenge. This study concludes that simple IMAT techniques can generate significantly better plan quality compared to previous established CKRS benchmarks.
Frappier, Vincent; Najmanovich, Rafael J.
2014-01-01
Normal mode analysis (NMA) methods are widely used to study dynamic aspects of protein structures. Two critical components of NMA methods are coarse-graining in the level of simplification used to represent protein structures and the choice of potential energy functional form. There is a trade-off between speed and accuracy in different choices. In one extreme one finds accurate but slow molecular-dynamics based methods with all-atom representations and detailed atom potentials. On the other extreme, fast elastic network model (ENM) methods with Cα−only representations and simplified potentials that based on geometry alone, thus oblivious to protein sequence. Here we present ENCoM, an Elastic Network Contact Model that employs a potential energy function that includes a pairwise atom-type non-bonded interaction term and thus makes it possible to consider the effect of the specific nature of amino-acids on dynamics within the context of NMA. ENCoM is as fast as existing ENM methods and outperforms such methods in the generation of conformational ensembles. Here we introduce a new application for NMA methods with the use of ENCoM in the prediction of the effect of mutations on protein stability. While existing methods are based on machine learning or enthalpic considerations, the use of ENCoM, based on vibrational normal modes, is based on entropic considerations. This represents a novel area of application for NMA methods and a novel approach for the prediction of the effect of mutations. We compare ENCoM to a large number of methods in terms of accuracy and self-consistency. We show that the accuracy of ENCoM is comparable to that of the best existing methods. We show that existing methods are biased towards the prediction of destabilizing mutations and that ENCoM is less biased at predicting stabilizing mutations. PMID:24762569
A Hybrid Ant Colony Optimization Algorithm for the Extended Capacitated Arc Routing Problem.
Li-Ning Xing; Rohlfshagen, P; Ying-Wu Chen; Xin Yao
2011-08-01
The capacitated arc routing problem (CARP) is representative of numerous practical applications, and in order to widen its scope, we consider an extended version of this problem that entails both total service time and fixed investment costs. We subsequently propose a hybrid ant colony optimization (ACO) algorithm (HACOA) to solve instances of the extended CARP. This approach is characterized by the exploitation of heuristic information, adaptive parameters, and local optimization techniques: Two kinds of heuristic information, arc cluster information and arc priority information, are obtained continuously from the solutions sampled to guide the subsequent optimization process. The adaptive parameters ease the burden of choosing initial values and facilitate improved and more robust results. Finally, local optimization, based on the two-opt heuristic, is employed to improve the overall performance of the proposed algorithm. The resulting HACOA is tested on four sets of benchmark problems containing a total of 87 instances with up to 140 nodes and 380 arcs. In order to evaluate the effectiveness of the proposed method, some existing capacitated arc routing heuristics are extended to cope with the extended version of this problem; the experimental results indicate that the proposed ACO method outperforms these heuristics.
Surrogate marker analysis in cancer clinical trials through time-to-event mediation techniques.
Vandenberghe, Sjouke; Duchateau, Luc; Slaets, Leen; Bogaerts, Jan; Vansteelandt, Stijn
2017-01-01
The meta-analytic approach is the gold standard for validation of surrogate markers, but has the drawback of requiring data from several trials. We refine modern mediation analysis techniques for time-to-event endpoints and apply them to investigate whether pathological complete response can be used as a surrogate marker for disease-free survival in the EORTC 10994/BIG 1-00 randomised phase 3 trial in which locally advanced breast cancer patients were randomised to either taxane or anthracycline based neoadjuvant chemotherapy. In the mediation analysis, the treatment effect is decomposed into an indirect effect via pathological complete response and the remaining direct effect. It shows that only 4.2% of the treatment effect on disease-free survival after five years is mediated by the treatment effect on pathological complete response. There is thus no evidence from our analysis that pathological complete response is a valuable surrogate marker to evaluate the effect of taxane versus anthracycline based chemotherapies on progression free survival of locally advanced breast cancer patients. The proposed analysis strategy is broadly applicable to mediation analyses of time-to-event endpoints, is easy to apply and outperforms existing strategies in terms of precision as well as robustness against model misspecification.
Collaborative Filtering Recommendation on Users' Interest Sequences.
Cheng, Weijie; Yin, Guisheng; Dong, Yuxin; Dong, Hongbin; Zhang, Wansong
2016-01-01
As an important factor for improving recommendations, time information has been introduced to model users' dynamic preferences in many papers. However, the sequence of users' behaviour is rarely studied in recommender systems. Due to the users' unique behavior evolution patterns and personalized interest transitions among items, users' similarity in sequential dimension should be introduced to further distinguish users' preferences and interests. In this paper, we propose a new collaborative filtering recommendation method based on users' interest sequences (IS) that rank users' ratings or other online behaviors according to the timestamps when they occurred. This method extracts the semantics hidden in the interest sequences by the length of users' longest common sub-IS (LCSIS) and the count of users' total common sub-IS (ACSIS). Then, these semantics are utilized to obtain users' IS-based similarities and, further, to refine the similarities acquired from traditional collaborative filtering approaches. With these updated similarities, transition characteristics and dynamic evolution patterns of users' preferences are considered. Our new proposed method was compared with state-of-the-art time-aware collaborative filtering algorithms on datasets MovieLens, Flixster and Ciao. The experimental results validate that the proposed recommendation method is effective and outperforms several existing algorithms in the accuracy of rating prediction.
Collaborative Filtering Recommendation on Users’ Interest Sequences
Cheng, Weijie; Yin, Guisheng; Dong, Yuxin; Dong, Hongbin; Zhang, Wansong
2016-01-01
As an important factor for improving recommendations, time information has been introduced to model users’ dynamic preferences in many papers. However, the sequence of users’ behaviour is rarely studied in recommender systems. Due to the users’ unique behavior evolution patterns and personalized interest transitions among items, users’ similarity in sequential dimension should be introduced to further distinguish users’ preferences and interests. In this paper, we propose a new collaborative filtering recommendation method based on users’ interest sequences (IS) that rank users’ ratings or other online behaviors according to the timestamps when they occurred. This method extracts the semantics hidden in the interest sequences by the length of users’ longest common sub-IS (LCSIS) and the count of users’ total common sub-IS (ACSIS). Then, these semantics are utilized to obtain users’ IS-based similarities and, further, to refine the similarities acquired from traditional collaborative filtering approaches. With these updated similarities, transition characteristics and dynamic evolution patterns of users’ preferences are considered. Our new proposed method was compared with state-of-the-art time-aware collaborative filtering algorithms on datasets MovieLens, Flixster and Ciao. The experimental results validate that the proposed recommendation method is effective and outperforms several existing algorithms in the accuracy of rating prediction. PMID:27195787
Classification of neocortical interneurons using affinity propagation.
Santana, Roberto; McGarry, Laura M; Bielza, Concha; Larrañaga, Pedro; Yuste, Rafael
2013-01-01
In spite of over a century of research on cortical circuits, it is still unknown how many classes of cortical neurons exist. In fact, neuronal classification is a difficult problem because it is unclear how to designate a neuronal cell class and what are the best characteristics to define them. Recently, unsupervised classifications using cluster analysis based on morphological, physiological, or molecular characteristics, have provided quantitative and unbiased identification of distinct neuronal subtypes, when applied to selected datasets. However, better and more robust classification methods are needed for increasingly complex and larger datasets. Here, we explored the use of affinity propagation, a recently developed unsupervised classification algorithm imported from machine learning, which gives a representative example or exemplar for each cluster. As a case study, we applied affinity propagation to a test dataset of 337 interneurons belonging to four subtypes, previously identified based on morphological and physiological characteristics. We found that affinity propagation correctly classified most of the neurons in a blind, non-supervised manner. Affinity propagation outperformed Ward's method, a current standard clustering approach, in classifying the neurons into 4 subtypes. Affinity propagation could therefore be used in future studies to validly classify neurons, as a first step to help reverse engineer neural circuits.
Shi, Xingjie; Zhao, Qing; Huang, Jian; Xie, Yang; Ma, Shuangge
2015-01-01
Motivation: Both gene expression levels (GEs) and copy number alterations (CNAs) have important biological implications. GEs are partly regulated by CNAs, and much effort has been devoted to understanding their relations. The regulation analysis is challenging with one gene expression possibly regulated by multiple CNAs and one CNA potentially regulating the expressions of multiple genes. The correlations among GEs and among CNAs make the analysis even more complicated. The existing methods have limitations and cannot comprehensively describe the regulation. Results: A sparse double Laplacian shrinkage method is developed. It jointly models the effects of multiple CNAs on multiple GEs. Penalization is adopted to achieve sparsity and identify the regulation relationships. Network adjacency is computed to describe the interconnections among GEs and among CNAs. Two Laplacian shrinkage penalties are imposed to accommodate the network adjacency measures. Simulation shows that the proposed method outperforms the competing alternatives with more accurate marker identification. The Cancer Genome Atlas data are analysed to further demonstrate advantages of the proposed method. Availability and implementation: R code is available at http://works.bepress.com/shuangge/49/ Contact: shuangge.ma@yale.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26342102
Probabilistic multi-person localisation and tracking in image sequences
NASA Astrophysics Data System (ADS)
Klinger, T.; Rottensteiner, F.; Heipke, C.
2017-05-01
The localisation and tracking of persons in image sequences in commonly guided by recursive filters. Especially in a multi-object tracking environment, where mutual occlusions are inherent, the predictive model is prone to drift away from the actual target position when not taking context into account. Further, if the image-based observations are imprecise, the trajectory is prone to be updated towards a wrong position. In this work we address both these problems by using a new predictive model on the basis of Gaussian Process Regression, and by using generic object detection, as well as instance-specific classification, for refined localisation. The predictive model takes into account the motion of every tracked pedestrian in the scene and the prediction is executed with respect to the velocities of neighbouring persons. In contrast to existing methods our approach uses a Dynamic Bayesian Network in which the state vector of a recursive Bayes filter, as well as the location of the tracked object in the image, are modelled as unknowns. This allows the detection to be corrected before it is incorporated into the recursive filter. Our method is evaluated on a publicly available benchmark dataset and outperforms related methods in terms of geometric precision and tracking accuracy.
Xiao, Zhu; Havyarimana, Vincent; Li, Tong; Wang, Dong
2016-01-01
In this paper, a novel nonlinear framework of smoothing method, non-Gaussian delayed particle smoother (nGDPS), is proposed, which enables vehicle state estimation (VSE) with high accuracy taking into account the non-Gaussianity of the measurement and process noises. Within the proposed method, the multivariate Student’s t-distribution is adopted in order to compute the probability distribution function (PDF) related to the process and measurement noises, which are assumed to be non-Gaussian distributed. A computation approach based on Ensemble Kalman Filter (EnKF) is designed to cope with the mean and the covariance matrix of the proposal non-Gaussian distribution. A delayed Gibbs sampling algorithm, which incorporates smoothing of the sampled trajectories over a fixed-delay, is proposed to deal with the sample degeneracy of particles. The performance is investigated based on the real-world data, which is collected by low-cost on-board vehicle sensors. The comparison study based on the real-world experiments and the statistical analysis demonstrates that the proposed nGDPS has significant improvement on the vehicle state accuracy and outperforms the existing filtering and smoothing methods. PMID:27187405
Wang, Yubo; Veluvolu, Kalyana C
2017-06-14
It is often difficult to analyze biological signals because of their nonlinear and non-stationary characteristics. This necessitates the usage of time-frequency decomposition methods for analyzing the subtle changes in these signals that are often connected to an underlying phenomena. This paper presents a new approach to analyze the time-varying characteristics of such signals by employing a simple truncated Fourier series model, namely the band-limited multiple Fourier linear combiner (BMFLC). In contrast to the earlier designs, we first identified the sparsity imposed on the signal model in order to reformulate the model to a sparse linear regression model. The coefficients of the proposed model are then estimated by a convex optimization algorithm. The performance of the proposed method was analyzed with benchmark test signals. An energy ratio metric is employed to quantify the spectral performance and results show that the proposed method Sparse-BMFLC has high mean energy (0.9976) ratio and outperforms existing methods such as short-time Fourier transfrom (STFT), continuous Wavelet transform (CWT) and BMFLC Kalman Smoother. Furthermore, the proposed method provides an overall 6.22% in reconstruction error.
SPONGY (SPam ONtoloGY): Email Classification Using Two-Level Dynamic Ontology
2014-01-01
Email is one of common communication methods between people on the Internet. However, the increase of email misuse/abuse has resulted in an increasing volume of spam emails over recent years. An experimental system has been designed and implemented with the hypothesis that this method would outperform existing techniques, and the experimental results showed that indeed the proposed ontology-based approach improves spam filtering accuracy significantly. In this paper, two levels of ontology spam filters were implemented: a first level global ontology filter and a second level user-customized ontology filter. The use of the global ontology filter showed about 91% of spam filtered, which is comparable with other methods. The user-customized ontology filter was created based on the specific user's background as well as the filtering mechanism used in the global ontology filter creation. The main contributions of the paper are (1) to introduce an ontology-based multilevel filtering technique that uses both a global ontology and an individual filter for each user to increase spam filtering accuracy and (2) to create a spam filter in the form of ontology, which is user-customized, scalable, and modularized, so that it can be embedded to many other systems for better performance. PMID:25254240
Unsupervised active learning based on hierarchical graph-theoretic clustering.
Hu, Weiming; Hu, Wei; Xie, Nianhua; Maybank, Steve
2009-10-01
Most existing active learning approaches are supervised. Supervised active learning has the following problems: inefficiency in dealing with the semantic gap between the distribution of samples in the feature space and their labels, lack of ability in selecting new samples that belong to new categories that have not yet appeared in the training samples, and lack of adaptability to changes in the semantic interpretation of sample categories. To tackle these problems, we propose an unsupervised active learning framework based on hierarchical graph-theoretic clustering. In the framework, two promising graph-theoretic clustering algorithms, namely, dominant-set clustering and spectral clustering, are combined in a hierarchical fashion. Our framework has some advantages, such as ease of implementation, flexibility in architecture, and adaptability to changes in the labeling. Evaluations on data sets for network intrusion detection, image classification, and video classification have demonstrated that our active learning framework can effectively reduce the workload of manual classification while maintaining a high accuracy of automatic classification. It is shown that, overall, our framework outperforms the support-vector-machine-based supervised active learning, particularly in terms of dealing much more efficiently with new samples whose categories have not yet appeared in the training samples.
NASA Astrophysics Data System (ADS)
Lorenzi, Juan M.; Stecher, Thomas; Reuter, Karsten; Matera, Sebastian
2017-10-01
Many problems in computational materials science and chemistry require the evaluation of expensive functions with locally rapid changes, such as the turn-over frequency of first principles kinetic Monte Carlo models for heterogeneous catalysis. Because of the high computational cost, it is often desirable to replace the original with a surrogate model, e.g., for use in coupled multiscale simulations. The construction of surrogates becomes particularly challenging in high-dimensions. Here, we present a novel version of the modified Shepard interpolation method which can overcome the curse of dimensionality for such functions to give faithful reconstructions even from very modest numbers of function evaluations. The introduction of local metrics allows us to take advantage of the fact that, on a local scale, rapid variation often occurs only across a small number of directions. Furthermore, we use local error estimates to weigh different local approximations, which helps avoid artificial oscillations. Finally, we test our approach on a number of challenging analytic functions as well as a realistic kinetic Monte Carlo model. Our method not only outperforms existing isotropic metric Shepard methods but also state-of-the-art Gaussian process regression.
Lorenzi, Juan M; Stecher, Thomas; Reuter, Karsten; Matera, Sebastian
2017-10-28
Many problems in computational materials science and chemistry require the evaluation of expensive functions with locally rapid changes, such as the turn-over frequency of first principles kinetic Monte Carlo models for heterogeneous catalysis. Because of the high computational cost, it is often desirable to replace the original with a surrogate model, e.g., for use in coupled multiscale simulations. The construction of surrogates becomes particularly challenging in high-dimensions. Here, we present a novel version of the modified Shepard interpolation method which can overcome the curse of dimensionality for such functions to give faithful reconstructions even from very modest numbers of function evaluations. The introduction of local metrics allows us to take advantage of the fact that, on a local scale, rapid variation often occurs only across a small number of directions. Furthermore, we use local error estimates to weigh different local approximations, which helps avoid artificial oscillations. Finally, we test our approach on a number of challenging analytic functions as well as a realistic kinetic Monte Carlo model. Our method not only outperforms existing isotropic metric Shepard methods but also state-of-the-art Gaussian process regression.
Planning nonlinear access paths for temporal bone surgery.
Fauser, Johannes; Sakas, Georgios; Mukhopadhyay, Anirban
2018-05-01
Interventions at the otobasis operate in the narrow region of the temporal bone where several highly sensitive organs define obstacles with minimal clearance for surgical instruments. Nonlinear trajectories for potential minimally invasive interventions can provide larger distances to risk structures and optimized orientations of surgical instruments, thus improving clinical outcomes when compared to existing linear approaches. In this paper, we present fast and accurate planning methods for such nonlinear access paths. We define a specific motion planning problem in [Formula: see text] with notable constraints in computation time and goal pose that reflect the requirements of temporal bone surgery. We then present [Formula: see text]-RRT-Connect: two suitable motion planners based on bidirectional Rapidly exploring Random Tree (RRT) to solve this problem efficiently. The benefits of [Formula: see text]-RRT-Connect are demonstrated on real CT data of patients. Their general performance is shown on a large set of realistic synthetic anatomies. We also show that these new algorithms outperform state-of-the-art methods based on circular arcs or Bézier-Splines when applied to this specific problem. With this work, we demonstrate that preoperative and intra-operative planning of nonlinear access paths is possible for minimally invasive surgeries at the otobasis.
Galaviz-Mosqueda, Alejandro; Villarreal-Reyes, Salvador; Galeana-Zapién, Hiram; Rubio-Loyola, Javier; Covarrubias-Rosales, David H.
2014-01-01
Vehicular ad hoc networks (VANETs) have been identified as a key technology to enable intelligent transport systems (ITS), which are aimed to radically improve the safety, comfort, and greenness of the vehicles in the road. However, in order to fully exploit VANETs potential, several issues must be addressed. Because of the high dynamic of VANETs and the impairments in the wireless channel, one key issue arising when working with VANETs is the multihop dissemination of broadcast packets for safety and infotainment applications. In this paper a reliable low-overhead multihop broadcast (RLMB) protocol is proposed to address the well-known broadcast storm problem. The proposed RLMB takes advantage of the hello messages exchanged between the vehicles and it processes such information to intelligently select a relay set and reduce the redundant broadcast. Additionally, to reduce the hello messages rate dependency, RLMB uses a point-to-zone link evaluation approach. RLMB performance is compared with one of the leading multihop broadcast protocols existing to date. Performance metrics show that our RLMB solution outperforms the leading protocol in terms of important metrics such as packet dissemination ratio, overhead, and delay. PMID:25133224
Su, Jin-He; Piao, Ying-Chao; Luo, Ze; Yan, Bao-Ping
2018-04-26
With the application of various data acquisition devices, a large number of animal movement data can be used to label presence data in remote sensing images and predict species distribution. In this paper, a two-stage classification approach for combining movement data and moderate-resolution remote sensing images was proposed. First, we introduced a new density-based clustering method to identify stopovers from migratory birds’ movement data and generated classification samples based on the clustering result. We split the remote sensing images into 16 × 16 patches and labeled them as positive samples if they have overlap with stopovers. Second, a multi-convolution neural network model is proposed for extracting the features from temperature data and remote sensing images, respectively. Then a Support Vector Machines (SVM) model was used to combine the features together and predict classification results eventually. The experimental analysis was carried out on public Landsat 5 TM images and a GPS dataset was collected on 29 birds over three years. The results indicated that our proposed method outperforms the existing baseline methods and was able to achieve good performance in habitat suitability prediction.
Vehicle Surveillance with a Generic, Adaptive, 3D Vehicle Model.
Leotta, Matthew J; Mundy, Joseph L
2011-07-01
In automated surveillance, one is often interested in tracking road vehicles, measuring their shape in 3D world space, and determining vehicle classification. To address these tasks simultaneously, an effective approach is the constrained alignment of a prior model of 3D vehicle shape to images. Previous 3D vehicle models are either generic but overly simple or rigid and overly complex. Rigid models represent exactly one vehicle design, so a large collection is needed. A single generic model can deform to a wide variety of shapes, but those shapes have been far too primitive. This paper uses a generic 3D vehicle model that deforms to match a wide variety of passenger vehicles. It is adjustable in complexity between the two extremes. The model is aligned to images by predicting and matching image intensity edges. Novel algorithms are presented for fitting models to multiple still images and simultaneous tracking while estimating shape in video. Experiments compare the proposed model to simple generic models in accuracy and reliability of 3D shape recovery from images and tracking in video. Standard techniques for classification are also used to compare the models. The proposed model outperforms the existing simple models at each task.
Metal artifact reduction for CT-based luggage screening.
Karimi, Seemeen; Martz, Harry; Cosman, Pamela
2015-01-01
In aviation security, checked luggage is screened by computed tomography scanning. Metal objects in the bags create artifacts that degrade image quality. Though there exist metal artifact reduction (MAR) methods mainly in medical imaging literature, they require knowledge of the materials in the scan, or are outlier rejection methods. To improve and evaluate a MAR method we previously introduced, that does not require knowledge of the materials in the scan, and gives good results on data with large quantities and different kinds of metal. We describe in detail an optimization which de-emphasizes metal projections and has a constraint for beam hardening and scatter. This method isolates and reduces artifacts in an intermediate image, which is then fed to a previously published sinogram replacement method. We evaluate the algorithm for luggage data containing multiple and large metal objects. We define measures of artifact reduction, and compare this method against others in MAR literature. Metal artifacts were reduced in our test images, even for multiple and large metal objects, without much loss of structure or resolution. Our MAR method outperforms the methods with which we compared it. Our approach does not make assumptions about image content, nor does it discard metal projections.
GARFIELD-NGS: Genomic vARiants FIltering by dEep Learning moDels in NGS.
Ravasio, Viola; Ritelli, Marco; Legati, Andrea; Giacopuzzi, Edoardo
2018-04-14
Exome sequencing approach is extensively used in research and diagnostic laboratories to discover pathological variants and study genetic architecture of human diseases. However, a significant proportion of identified genetic variants are actually false positive calls, and this pose serious challenges for variants interpretation. Here, we propose a new tool named GARFIELD-NGS (Genomic vARiants FIltering by dEep Learning moDels in NGS), which rely on deep learning models to dissect false and true variants in exome sequencing experiments performed with Illumina or ION platforms. GARFIELD-NGS showed strong performances for both SNP and INDEL variants (AUC 0.71 - 0.98) and outperformed established hard filters. The method is robust also at low coverage down to 30X and can be applied on data generated with the recent Illumina two-colour chemistry. GARFIELD-NGS processes standard VCF file and produces a regular VCF output. Thus, it can be easily integrated in existing analysis pipeline, allowing application of different thresholds based on desired level of sensitivity and specificity. GARFIELD-NGS available at https://github.com/gedoardo83/GARFIELD-NGS. edoardo.giacopuzzi@unibs.it. Supplementary data are available at Bioinformatics online.
Learning to rank using user clicks and visual features for image retrieval.
Yu, Jun; Tao, Dacheng; Wang, Meng; Rui, Yong
2015-04-01
The inconsistency between textual features and visual contents can cause poor image search results. To solve this problem, click features, which are more reliable than textual information in justifying the relevance between a query and clicked images, are adopted in image ranking model. However, the existing ranking model cannot integrate visual features, which are efficient in refining the click-based search results. In this paper, we propose a novel ranking model based on the learning to rank framework. Visual features and click features are simultaneously utilized to obtain the ranking model. Specifically, the proposed approach is based on large margin structured output learning and the visual consistency is integrated with the click features through a hypergraph regularizer term. In accordance with the fast alternating linearization method, we design a novel algorithm to optimize the objective function. This algorithm alternately minimizes two different approximations of the original objective function by keeping one function unchanged and linearizing the other. We conduct experiments on a large-scale dataset collected from the Microsoft Bing image search engine, and the results demonstrate that the proposed learning to rank models based on visual features and user clicks outperforms state-of-the-art algorithms.
SPONGY (SPam ONtoloGY): email classification using two-level dynamic ontology.
Youn, Seongwook
2014-01-01
Email is one of common communication methods between people on the Internet. However, the increase of email misuse/abuse has resulted in an increasing volume of spam emails over recent years. An experimental system has been designed and implemented with the hypothesis that this method would outperform existing techniques, and the experimental results showed that indeed the proposed ontology-based approach improves spam filtering accuracy significantly. In this paper, two levels of ontology spam filters were implemented: a first level global ontology filter and a second level user-customized ontology filter. The use of the global ontology filter showed about 91% of spam filtered, which is comparable with other methods. The user-customized ontology filter was created based on the specific user's background as well as the filtering mechanism used in the global ontology filter creation. The main contributions of the paper are (1) to introduce an ontology-based multilevel filtering technique that uses both a global ontology and an individual filter for each user to increase spam filtering accuracy and (2) to create a spam filter in the form of ontology, which is user-customized, scalable, and modularized, so that it can be embedded to many other systems for better performance.
Social Milieu Oriented Routing: A New Dimension to Enhance Network Security in WSNs.
Liu, Lianggui; Chen, Li; Jia, Huiling
2016-02-19
In large-scale wireless sensor networks (WSNs), in order to enhance network security, it is crucial for a trustor node to perform social milieu oriented routing to a target a trustee node to carry out trust evaluation. This challenging social milieu oriented routing with more than one end-to-end Quality of Trust (QoT) constraint has proved to be NP-complete. Heuristic algorithms with polynomial and pseudo-polynomial-time complexities are often used to deal with this challenging problem. However, existing solutions cannot guarantee the efficiency of searching; that is, they can hardly avoid obtaining partial optimal solutions during a searching process. Quantum annealing (QA) uses delocalization and tunneling to avoid falling into local minima without sacrificing execution time. This has been proven a promising way to many optimization problems in recently published literatures. In this paper, for the first time, with the help of a novel approach, that is, configuration path-integral Monte Carlo (CPIMC) simulations, a QA-based optimal social trust path (QA_OSTP) selection algorithm is applied to the extraction of the optimal social trust path in large-scale WSNs. Extensive experiments have been conducted, and the experiment results demonstrate that QA_OSTP outperforms its heuristic opponents.
Deviation-based spam-filtering method via stochastic approach
NASA Astrophysics Data System (ADS)
Lee, Daekyung; Lee, Mi Jin; Kim, Beom Jun
2018-03-01
In the presence of a huge number of possible purchase choices, ranks or ratings of items by others often play very important roles for a buyer to make a final purchase decision. Perfectly objective rating is an impossible task to achieve, and we often use an average rating built on how previous buyers estimated the quality of the product. The problem of using a simple average rating is that it can easily be polluted by careless users whose evaluation of products cannot be trusted, and by malicious spammers who try to bias the rating result on purpose. In this letter we suggest how trustworthiness of individual users can be systematically and quantitatively reflected to build a more reliable rating system. We compute the suitably defined reliability of each user based on the user's rating pattern for all products she evaluated. We call our proposed method as the deviation-based ranking, since the statistical significance of each user's rating pattern with respect to the average rating pattern is the key ingredient. We find that our deviation-based ranking method outperforms existing methods in filtering out careless random evaluators as well as malicious spammers.
Atmospheric Pressure Plasma Jet-Assisted Synthesis of Zeolite-Based Low-k Thin Films.
Huang, Kai-Yu; Chi, Heng-Yu; Kao, Peng-Kai; Huang, Fei-Hung; Jian, Qi-Ming; Cheng, I-Chun; Lee, Wen-Ya; Hsu, Cheng-Che; Kang, Dun-Yen
2018-01-10
Zeolites are ideal low-dielectric constant (low-k) materials. This paper reports on a novel plasma-assisted approach to the synthesis of low-k thin films comprising pure-silica zeolite MFI. The proposed method involves treating the aged solution using an atmospheric pressure plasma jet (APPJ). The high reactivity of the resulting nitrogen plasma helps to produce zeolite crystals with high crystallinity and uniform crystal size distribution. The APPJ treatment also remarkably reduces the time for hydrothermal reaction. The zeolite MFI suspensions synthesized with the APPJ treatment are used for the wet deposition to form thin films. The deposited zeolite thin films possessed dense morphology and high crystallinity, which overcome the trade-off between crystallinity and film quality. Zeolite thin films synthesized using the proposed APPJ treatment achieve low leakage current (on the order of 10 -8 A/cm 2 ) and high Young's modulus (12 GPa), outperforming the control sample synthesized without plasma treatment. The dielectric constant of our zeolite thin films was as low as 1.41. The overall performance of the low-k thin films synthesized with the APPJ treatment far exceed existing low-k films comprising pure-silica MFI.
Brandstätter, Eduard; Gigerenzer, Gerd; Hertwig, Ralph
2008-01-01
E. Brandstätter, G. Gigerenzer, and R. Hertwig (2006) showed that the priority heuristic matches or outperforms modifications of expected utility theory in predicting choice in 4 diverse problem sets. M. H. Birnbaum (2008) argued that sets exist in which the opposite is true. The authors agree--but stress that all choice strategies have regions of good and bad performance. The accuracy of various strategies systematically depends on choice difficulty, which the authors consider a triggering variable underlying strategy selection. Agreeing with E. J. Johnson, M. Schulte-Mecklenbeck, and M. C. Willemsen (2008) that process (not "as-if") models need to be formulated, the authors show how quantitative predictions can be derived and test them. Finally, they demonstrate that many of Birnbaum's and M. O. Rieger and M. Wang's (2008) case studies championing their preferred models involved biased tests in which the priority heuristic predicted data, whereas the parameterized models were fitted to the same data. The authors propose an adaptive toolbox approach of risky choice, according to which people first seek a no-conflict solution before resorting to conflict-resolving strategies such as the priority heuristic. (c) 2008 APA, all rights reserved
De Luca, Andrea; Flandre, Philippe; Dunn, David; Zazzi, Maurizio; Wensing, Annemarie; Santoro, Maria Mercedes; Günthard, Huldrych F; Wittkop, Linda; Kordossis, Theodoros; Garcia, Federico; Castagna, Antonella; Cozzi-Lepri, Alessandro; Churchill, Duncan; De Wit, Stéphane; Brockmeyer, Norbert H; Imaz, Arkaitz; Mussini, Cristina; Obel, Niels; Perno, Carlo Federico; Roca, Bernardino; Reiss, Peter; Schülter, Eugen; Torti, Carlo; van Sighem, Ard; Zangerle, Robert; Descamps, Diane
2016-05-01
The objective of this study was to improve the prediction of the impact of HIV-1 protease mutations in different viral subtypes on virological response to darunavir. Darunavir-containing treatment change episodes (TCEs) in patients previously failing PIs were selected from large European databases. HIV-1 subtype B-infected patients were used as the derivation dataset and HIV-1 non-B-infected patients were used as the validation dataset. The adjusted association of each mutation with week 8 HIV RNA change from baseline was analysed by linear regression. A prediction model was derived based on best subset least squares estimation with mutational weights corresponding to regression coefficients. Virological outcome prediction accuracy was compared with that from existing genotypic resistance interpretation systems (GISs) (ANRS 2013, Rega 9.1.0 and HIVdb 7.0). TCEs were selected from 681 subtype B-infected and 199 non-B-infected adults. Accompanying drugs were NRTIs in 87%, NNRTIs in 27% and raltegravir or maraviroc or enfuvirtide in 53%. The prediction model included weighted protease mutations, HIV RNA, CD4 and activity of accompanying drugs. The model's association with week 8 HIV RNA change in the subtype B (derivation) set was R(2) = 0.47 [average squared error (ASE) = 0.67, P < 10(-6)]; in the non-B (validation) set, ASE was 0.91. Accuracy investigated by means of area under the receiver operating characteristic curves with a binary response (above the threshold value of HIV RNA reduction) showed that our final model outperformed models with existing interpretation systems in both training and validation sets. A model with a new darunavir-weighted mutation score outperformed existing GISs in both B and non-B subtypes in predicting virological response to darunavir. © The Author 2016. Published by Oxford University Press on behalf of the British Society for Antimicrobial Chemotherapy. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
An Exact Algorithm to Compute the Double-Cut-and-Join Distance for Genomes with Duplicate Genes.
Shao, Mingfu; Lin, Yu; Moret, Bernard M E
2015-05-01
Computing the edit distance between two genomes is a basic problem in the study of genome evolution. The double-cut-and-join (DCJ) model has formed the basis for most algorithmic research on rearrangements over the last few years. The edit distance under the DCJ model can be computed in linear time for genomes without duplicate genes, while the problem becomes NP-hard in the presence of duplicate genes. In this article, we propose an integer linear programming (ILP) formulation to compute the DCJ distance between two genomes with duplicate genes. We also provide an efficient preprocessing approach to simplify the ILP formulation while preserving optimality. Comparison on simulated genomes demonstrates that our method outperforms MSOAR in computing the edit distance, especially when the genomes contain long duplicated segments. We also apply our method to assign orthologous gene pairs among human, mouse, and rat genomes, where once again our method outperforms MSOAR.
Peyster, Eliot G.; Frank, Renee; Margulies, Kenneth B.; Feldman, Michael D.
2018-01-01
Over 26 million people worldwide suffer from heart failure annually. When the cause of heart failure cannot be identified, endomyocardial biopsy (EMB) represents the gold-standard for the evaluation of disease. However, manual EMB interpretation has high inter-rater variability. Deep convolutional neural networks (CNNs) have been successfully applied to detect cancer, diabetic retinopathy, and dermatologic lesions from images. In this study, we develop a CNN classifier to detect clinical heart failure from H&E stained whole-slide images from a total of 209 patients, 104 patients were used for training and the remaining 105 patients for independent testing. The CNN was able to identify patients with heart failure or severe pathology with a 99% sensitivity and 94% specificity on the test set, outperforming conventional feature-engineering approaches. Importantly, the CNN outperformed two expert pathologists by nearly 20%. Our results suggest that deep learning analytics of EMB can be used to predict cardiac outcome. PMID:29614076
Cross-cultural differences in memory: the role of culture-based stereotypes about aging.
Yoon, C; Hasher, L; Feinberg, F; Rahhal, T A; Winocur, G
2000-12-01
The extent to which cultural stereotypes about aging contribute to age differences in memory performance is investigated by comparing younger and older Anglophone Canadians to demographically matched Chinese Canadians, who tend to hold more positive views of aging. Four memory tests were administered. In contrast to B. Levy and E. Langer's (1994) findings, younger adults in both cultural groups outperformed their older comparison group on all memory tests. For 2 tests, which made use of visual stimuli resembling ideographic characters in written Chinese, the older Chinese Canadians approached, but did not reach, the performance achieved by their younger counterparts, as well as outperformed the older Anglophone Canadians. However, on the other two tests, which assess memory for complex figures and abstract designs, no differences were observed between the older Chinese and Anglophone Canadians. Path analysis results suggest that this pattern of findings is not easily attributed to a wholly culturally based account of age differences in memory performance.
Nirschl, Jeffrey J; Janowczyk, Andrew; Peyster, Eliot G; Frank, Renee; Margulies, Kenneth B; Feldman, Michael D; Madabhushi, Anant
2018-01-01
Over 26 million people worldwide suffer from heart failure annually. When the cause of heart failure cannot be identified, endomyocardial biopsy (EMB) represents the gold-standard for the evaluation of disease. However, manual EMB interpretation has high inter-rater variability. Deep convolutional neural networks (CNNs) have been successfully applied to detect cancer, diabetic retinopathy, and dermatologic lesions from images. In this study, we develop a CNN classifier to detect clinical heart failure from H&E stained whole-slide images from a total of 209 patients, 104 patients were used for training and the remaining 105 patients for independent testing. The CNN was able to identify patients with heart failure or severe pathology with a 99% sensitivity and 94% specificity on the test set, outperforming conventional feature-engineering approaches. Importantly, the CNN outperformed two expert pathologists by nearly 20%. Our results suggest that deep learning analytics of EMB can be used to predict cardiac outcome.
Waytowich, Nicholas R.; Lawhern, Vernon J.; Bohannon, Addison W.; Ball, Kenneth R.; Lance, Brent J.
2016-01-01
Recent advances in signal processing and machine learning techniques have enabled the application of Brain-Computer Interface (BCI) technologies to fields such as medicine, industry, and recreation; however, BCIs still suffer from the requirement of frequent calibration sessions due to the intra- and inter-individual variability of brain-signals, which makes calibration suppression through transfer learning an area of increasing interest for the development of practical BCI systems. In this paper, we present an unsupervised transfer method (spectral transfer using information geometry, STIG), which ranks and combines unlabeled predictions from an ensemble of information geometry classifiers built on data from individual training subjects. The STIG method is validated in both off-line and real-time feedback analysis during a rapid serial visual presentation task (RSVP). For detection of single-trial, event-related potentials (ERPs), the proposed method can significantly outperform existing calibration-free techniques as well as outperform traditional within-subject calibration techniques when limited data is available. This method demonstrates that unsupervised transfer learning for single-trial detection in ERP-based BCIs can be achieved without the requirement of costly training data, representing a step-forward in the overall goal of achieving a practical user-independent BCI system. PMID:27713685
Waytowich, Nicholas R; Lawhern, Vernon J; Bohannon, Addison W; Ball, Kenneth R; Lance, Brent J
2016-01-01
Recent advances in signal processing and machine learning techniques have enabled the application of Brain-Computer Interface (BCI) technologies to fields such as medicine, industry, and recreation; however, BCIs still suffer from the requirement of frequent calibration sessions due to the intra- and inter-individual variability of brain-signals, which makes calibration suppression through transfer learning an area of increasing interest for the development of practical BCI systems. In this paper, we present an unsupervised transfer method (spectral transfer using information geometry, STIG), which ranks and combines unlabeled predictions from an ensemble of information geometry classifiers built on data from individual training subjects. The STIG method is validated in both off-line and real-time feedback analysis during a rapid serial visual presentation task (RSVP). For detection of single-trial, event-related potentials (ERPs), the proposed method can significantly outperform existing calibration-free techniques as well as outperform traditional within-subject calibration techniques when limited data is available. This method demonstrates that unsupervised transfer learning for single-trial detection in ERP-based BCIs can be achieved without the requirement of costly training data, representing a step-forward in the overall goal of achieving a practical user-independent BCI system.
NASA Astrophysics Data System (ADS)
Miller, S. M.; Andrews, A. E.; Benmergui, J. S.; Commane, R.; Dlugokencky, E. J.; Janssens-Maenhout, G.; Melton, J. R.; Michalak, A. M.; Sweeney, C.; Worthy, D. E. J.
2015-12-01
Existing estimates of methane fluxes from wetlands differ in both magnitude and distribution across North America. We discuss seven different bottom-up methane estimates in the context of atmospheric methane data collected across the US and Canada. In the first component of this study, we explore whether the observation network can even detect a methane pattern from wetlands. We find that the observation network can identify a methane pattern from Canadian wetlands but not reliably from US wetlands. Over Canada, the network can even identify spatial patterns at multi-provence scales. Over the US, by contrast, anthropogenic emissions and modeling errors obscure atmospheric patterns from wetland fluxes. In the second component of the study, we then use these observations to reconcile disagreements in the magnitude, seasonal cycle, and spatial distribution of existing estimates. Most existing estimates predict fluxes that are too large with a seasonal cycle that is too narrow. A model known as LPJ-Bern has a spatial distribution most consistent with atmospheric observations. By contrast, a spatially-constant model outperforms the distribution of most existing flux estimates across Canada. The results presented here provide several pathways to reduce disagreements among existing wetland flux estimates across North America.
Supervised Learning Based Hypothesis Generation from Biomedical Literature.
Sang, Shengtian; Yang, Zhihao; Li, Zongyao; Lin, Hongfei
2015-01-01
Nowadays, the amount of biomedical literatures is growing at an explosive speed, and there is much useful knowledge undiscovered in this literature. Researchers can form biomedical hypotheses through mining these works. In this paper, we propose a supervised learning based approach to generate hypotheses from biomedical literature. This approach splits the traditional processing of hypothesis generation with classic ABC model into AB model and BC model which are constructed with supervised learning method. Compared with the concept cooccurrence and grammar engineering-based approaches like SemRep, machine learning based models usually can achieve better performance in information extraction (IE) from texts. Then through combining the two models, the approach reconstructs the ABC model and generates biomedical hypotheses from literature. The experimental results on the three classic Swanson hypotheses show that our approach outperforms SemRep system.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jordan, Dirk C.; Deceglie, Michael G.; Kurtz, Sarah R.
What is the best method to determine long-term PV system performance and degradation rates? Ideally, one universally applicable methodology would be desirable so that a single number could be derived. However, data sets vary in their attributes and evidence is presented that defining two methodologies may be preferable. Monte Carlo simulations of artificial performance data allowed investigation of different methodologies and their respective confidence intervals. Tradeoffs between different approaches were delineated, elucidating as to why two separate approaches may need to be included in a standard. Regression approaches tend to be preferable when data sets are less contaminated by seasonality,more » noise and occurrence of outliers although robust regression can significantly improve the accuracy when outliers are present. In the presence of outliers, marked seasonality, or strong soiling events, year-on-year approaches tend to outperform regression approaches.« less
NASA Astrophysics Data System (ADS)
Krasichkov, Alexander S.; Grigoriev, Eugene B.; Bogachev, Mikhail I.; Nifontov, Eugene M.
2015-10-01
We suggest an analytical approach to the adaptive thresholding in a shape anomaly detection problem. We find an analytical expression for the distribution of the cosine similarity score between a reference shape and an observational shape hindered by strong measurement noise that depends solely on the noise level and is independent of the particular shape analyzed. The analytical treatment is also confirmed by computer simulations and shows nearly perfect agreement. Using this analytical solution, we suggest an improved shape anomaly detection approach based on adaptive thresholding. We validate the noise robustness of our approach using typical shapes of normal and pathological electrocardiogram cycles hindered by additive white noise. We show explicitly that under high noise levels our approach considerably outperforms the conventional tactic that does not take into account variations in the noise level.
Yamagata, Koichi; Yamanishi, Ayako; Kokubu, Chikara; Takeda, Junji; Sese, Jun
2016-01-01
An important challenge in cancer genomics is precise detection of structural variations (SVs) by high-throughput short-read sequencing, which is hampered by the high false discovery rates of existing analysis tools. Here, we propose an accurate SV detection method named COSMOS, which compares the statistics of the mapped read pairs in tumor samples with isogenic normal control samples in a distinct asymmetric manner. COSMOS also prioritizes the candidate SVs using strand-specific read-depth information. Performance tests on modeled tumor genomes revealed that COSMOS outperformed existing methods in terms of F-measure. We also applied COSMOS to an experimental mouse cell-based model, in which SVs were induced by genome engineering and gamma-ray irradiation, followed by polymerase chain reaction-based confirmation. The precision of COSMOS was 84.5%, while the next best existing method was 70.4%. Moreover, the sensitivity of COSMOS was the highest, indicating that COSMOS has great potential for cancer genome analysis. PMID:26833260
Coping efficiently with now-relative medical data.
Stantic, Bela; Terenziani, Paolo; Sattar, Abdul
2008-11-06
In Medical Informatics, there is an increasing awareness that temporal information plays a crucial role, so that suitable database approaches are needed to store and support it. Specifically, most clinical data are intrinsically temporal, and a relevant part of them are now-relative (i.e., they are valid at the current time). Even if previous studies indicate that the treatment of now-relative data has a crucial impact on efficiency, current approaches have several limitations. In this paper we propose a novel approach, which is based on a new representation of now, and on query transformations. We also experimentally demonstrate that our approach outperforms its best competitors in the literature to the extent of a factor of more than ten, both in number of disk accesses and of CPU usage.
Global Search Capabilities of Indirect Methods for Impulsive Transfers
NASA Astrophysics Data System (ADS)
Shen, Hong-Xin; Casalino, Lorenzo; Luo, Ya-Zhong
2015-09-01
An optimization method which combines an indirect method with homotopic approach is proposed and applied to impulsive trajectories. Minimum-fuel, multiple-impulse solutions, with either fixed or open time are obtained. The homotopic approach at hand is relatively straightforward to implement and does not require an initial guess of adjoints, unlike previous adjoints estimation methods. A multiple-revolution Lambert solver is used to find multiple starting solutions for the homotopic procedure; this approach can guarantee to obtain multiple local solutions without relying on the user's intuition, thus efficiently exploring the solution space to find the global optimum. The indirect/homotopic approach proves to be quite effective and efficient in finding optimal solutions, and outperforms the joint use of evolutionary algorithms and deterministic methods in the test cases.
Multi-Object Tracking with Correlation Filter for Autonomous Vehicle.
Zhao, Dawei; Fu, Hao; Xiao, Liang; Wu, Tao; Dai, Bin
2018-06-22
Multi-object tracking is a crucial problem for autonomous vehicle. Most state-of-the-art approaches adopt the tracking-by-detection strategy, which is a two-step procedure consisting of the detection module and the tracking module. In this paper, we improve both steps. We improve the detection module by incorporating the temporal information, which is beneficial for detecting small objects. For the tracking module, we propose a novel compressed deep Convolutional Neural Network (CNN) feature based Correlation Filter tracker. By carefully integrating these two modules, the proposed multi-object tracking approach has the ability of re-identification (ReID) once the tracked object gets lost. Extensive experiments were performed on the KITTI and MOT2015 tracking benchmarks. Results indicate that our approach outperforms most state-of-the-art tracking approaches.
Automatic Identification of Character Types from Film Dialogs
Skowron, Marcin; Trapp, Martin; Payr, Sabine; Trappl, Robert
2016-01-01
ABSTRACT We study the detection of character types from fictional dialog texts such as screenplays. As approaches based on the analysis of utterances’ linguistic properties are not sufficient to identify all fictional character types, we develop an integrative approach that complements linguistic analysis with interactive and communication characteristics, and show that it can improve the identification performance. The interactive characteristics of fictional characters are captured by the descriptive analysis of semantic graphs weighted by linguistic markers of expressivity and social role. For this approach, we introduce a new data set of action movie character types with their corresponding sequences of dialogs. The evaluation results demonstrate that the integrated approach outperforms baseline approaches on the presented data set. Comparative in-depth analysis of a single screenplay leads on to the discussion of possible limitations of this approach and to directions for future research. PMID:29118463
Cooperative Spatial Retreat for Resilient Drone Networks †
Kang, Jin-Hyeok; Kwon, Young-Min; Park, Kyung-Joon
2017-01-01
Drones are broadening their scope to various applications such as networking, package delivery, agriculture, rescue, and many more. For proper operation of drones, reliable communication should be guaranteed because drones are remotely controlled. When drones experience communication failure due to bad channel condition, interference, or jamming in a certain area, one existing solution is to exploit mobility or so-called spatial retreat to evacuate them from the communication failure area. However, the conventional spatial retreat scheme moves drones in random directions, which results in inefficient movement with significant evacuation time and waste of battery lifetime. In this paper, we propose a novel spatial retreat technique that takes advantage of cooperation between drones for resilient networking, which is called cooperative spatial retreat (CSR). Our performance evaluation shows that the proposed CSR significantly outperforms existing schemes. PMID:28467390
Micromechanical thermogravimetry
NASA Astrophysics Data System (ADS)
Berger, R.; Lang, H. P.; Gerber, Ch.; Gimzewski, J. K.; Fabian, J. H.; Scandella, L.; Meyer, E.; Güntherodt, H.-J.
1998-09-01
We demonstrate a new method for thermal analysis of nanogram quantities of material using a micromechanical thermogravimetric technique. The cantilever-type device uses an integrated piezoresistor to sense bending and simultaneously to ramp the temperature and control temperature cycles. It has a mass resolution in the picogram range. A quantitative analysis of the dehydration of copper-sulfate-pentahydrate (CuSO 4·5H 2O) is presented. The technique outperforms current thermogravimetric approaches by five orders of magnitude.
Innovative Approach for High Strength, High Thermal Conductive Composite Materials: Data Base
2013-11-01
pitch fiber types, from which we were able to down select K6356U pitch fiber with balanced TC and strength properties. A prepreg processing line was...Creating a robust prepreg processing line to infuse unidirectional pitch fiber tape that can be used with other fibers…Pan-based carbon or glass...pitch fiber composites • Compression molding process outperforms autoclaving in mechanical and thermal properties using the same prepreg material and
Evaluation and integration of existing methods for computational prediction of allergens
2013-01-01
Background Allergy involves a series of complex reactions and factors that contribute to the development of the disease and triggering of the symptoms, including rhinitis, asthma, atopic eczema, skin sensitivity, even acute and fatal anaphylactic shock. Prediction and evaluation of the potential allergenicity is of importance for safety evaluation of foods and other environment factors. Although several computational approaches for assessing the potential allergenicity of proteins have been developed, their performance and relative merits and shortcomings have not been compared systematically. Results To evaluate and improve the existing methods for allergen prediction, we collected an up-to-date definitive dataset consisting of 989 known allergens and massive putative non-allergens. The three most widely used allergen computational prediction approaches including sequence-, motif- and SVM-based (Support Vector Machine) methods were systematically compared using the defined parameters and we found that SVM-based method outperformed the other two methods with higher accuracy and specificity. The sequence-based method with the criteria defined by FAO/WHO (FAO: Food and Agriculture Organization of the United Nations; WHO: World Health Organization) has higher sensitivity of over 98%, but having a low specificity. The advantage of motif-based method is the ability to visualize the key motif within the allergen. Notably, the performances of the sequence-based method defined by FAO/WHO and motif eliciting strategy could be improved by the optimization of parameters. To facilitate the allergen prediction, we integrated these three methods in a web-based application proAP, which provides the global search of the known allergens and a powerful tool for allergen predication. Flexible parameter setting and batch prediction were also implemented. The proAP can be accessed at http://gmobl.sjtu.edu.cn/proAP/main.html. Conclusions This study comprehensively evaluated sequence-, motif- and SVM-based computational prediction approaches for allergens and optimized their parameters to obtain better performance. These findings may provide helpful guidance for the researchers in allergen-prediction. Furthermore, we integrated these methods into a web application proAP, greatly facilitating users to do customizable allergen search and prediction. PMID:23514097
Evaluation and integration of existing methods for computational prediction of allergens.
Wang, Jing; Yu, Yabin; Zhao, Yunan; Zhang, Dabing; Li, Jing
2013-01-01
Allergy involves a series of complex reactions and factors that contribute to the development of the disease and triggering of the symptoms, including rhinitis, asthma, atopic eczema, skin sensitivity, even acute and fatal anaphylactic shock. Prediction and evaluation of the potential allergenicity is of importance for safety evaluation of foods and other environment factors. Although several computational approaches for assessing the potential allergenicity of proteins have been developed, their performance and relative merits and shortcomings have not been compared systematically. To evaluate and improve the existing methods for allergen prediction, we collected an up-to-date definitive dataset consisting of 989 known allergens and massive putative non-allergens. The three most widely used allergen computational prediction approaches including sequence-, motif- and SVM-based (Support Vector Machine) methods were systematically compared using the defined parameters and we found that SVM-based method outperformed the other two methods with higher accuracy and specificity. The sequence-based method with the criteria defined by FAO/WHO (FAO: Food and Agriculture Organization of the United Nations; WHO: World Health Organization) has higher sensitivity of over 98%, but having a low specificity. The advantage of motif-based method is the ability to visualize the key motif within the allergen. Notably, the performances of the sequence-based method defined by FAO/WHO and motif eliciting strategy could be improved by the optimization of parameters. To facilitate the allergen prediction, we integrated these three methods in a web-based application proAP, which provides the global search of the known allergens and a powerful tool for allergen predication. Flexible parameter setting and batch prediction were also implemented. The proAP can be accessed at http://gmobl.sjtu.edu.cn/proAP/main.html. This study comprehensively evaluated sequence-, motif- and SVM-based computational prediction approaches for allergens and optimized their parameters to obtain better performance. These findings may provide helpful guidance for the researchers in allergen-prediction. Furthermore, we integrated these methods into a web application proAP, greatly facilitating users to do customizable allergen search and prediction.
Prediction of drug synergy in cancer using ensemble-based machine learning techniques
NASA Astrophysics Data System (ADS)
Singh, Harpreet; Rana, Prashant Singh; Singh, Urvinder
2018-04-01
Drug synergy prediction plays a significant role in the medical field for inhibiting specific cancer agents. It can be developed as a pre-processing tool for therapeutic successes. Examination of different drug-drug interaction can be done by drug synergy score. It needs efficient regression-based machine learning approaches to minimize the prediction errors. Numerous machine learning techniques such as neural networks, support vector machines, random forests, LASSO, Elastic Nets, etc., have been used in the past to realize requirement as mentioned above. However, these techniques individually do not provide significant accuracy in drug synergy score. Therefore, the primary objective of this paper is to design a neuro-fuzzy-based ensembling approach. To achieve this, nine well-known machine learning techniques have been implemented by considering the drug synergy data. Based on the accuracy of each model, four techniques with high accuracy are selected to develop ensemble-based machine learning model. These models are Random forest, Fuzzy Rules Using Genetic Cooperative-Competitive Learning method (GFS.GCCL), Adaptive-Network-Based Fuzzy Inference System (ANFIS) and Dynamic Evolving Neural-Fuzzy Inference System method (DENFIS). Ensembling is achieved by evaluating the biased weighted aggregation (i.e. adding more weights to the model with a higher prediction score) of predicted data by selected models. The proposed and existing machine learning techniques have been evaluated on drug synergy score data. The comparative analysis reveals that the proposed method outperforms others in terms of accuracy, root mean square error and coefficient of correlation.
NASA Astrophysics Data System (ADS)
Guan, Weipeng; Wu, Yuxiang; Xie, Canyu; Chen, Hao; Cai, Ye; Chen, Yingcong
2017-10-01
An indoor positioning algorithm based on visible light communication (VLC) is presented. This algorithm is used to calculate a three-dimensional (3-D) coordinate of an indoor optical wireless environment, which includes sufficient orders of multipath reflections from reflecting surfaces of the room. Leveraging the global optimization ability of the genetic algorithm (GA), an innovative framework for 3-D position estimation based on a modified genetic algorithm is proposed. Unlike other techniques using VLC for positioning, the proposed system can achieve indoor 3-D localization without making assumptions about the height or acquiring the orientation angle of the mobile terminal. Simulation results show that an average localization error of less than 1.02 cm can be achieved. In addition, in most VLC-positioning systems, the effect of reflection is always neglected and its performance is limited by reflection, which makes the results not so accurate for a real scenario and the positioning errors at the corners are relatively larger than other places. So, we take the first-order reflection into consideration and use artificial neural network to match the model of a nonlinear channel. The studies show that under the nonlinear matching of direct and reflected channels the average positioning errors of four corners decrease from 11.94 to 0.95 cm. The employed algorithm is emerged as an effective and practical method for indoor localization and outperform other existing indoor wireless localization approaches.
A new pre-classification method based on associative matching method
NASA Astrophysics Data System (ADS)
Katsuyama, Yutaka; Minagawa, Akihiro; Hotta, Yoshinobu; Omachi, Shinichiro; Kato, Nei
2010-01-01
Reducing the time complexity of character matching is critical to the development of efficient Japanese Optical Character Recognition (OCR) systems. To shorten processing time, recognition is usually split into separate preclassification and recognition stages. For high overall recognition performance, the pre-classification stage must both have very high classification accuracy and return only a small number of putative character categories for further processing. Furthermore, for any practical system, the speed of the pre-classification stage is also critical. The associative matching (AM) method has often been used for fast pre-classification, because its use of a hash table and reliance solely on logical bit operations to select categories makes it highly efficient. However, redundant certain level of redundancy exists in the hash table because it is constructed using only the minimum and maximum values of the data on each axis and therefore does not take account of the distribution of the data. We propose a modified associative matching method that satisfies the performance criteria described above but in a fraction of the time by modifying the hash table to reflect the underlying distribution of training characters. Furthermore, we show that our approach outperforms pre-classification by clustering, ANN and conventional AM in terms of classification accuracy, discriminative power and speed. Compared to conventional associative matching, the proposed approach results in a 47% reduction in total processing time across an evaluation test set comprising 116,528 Japanese character images.
Optimizing the Energy and Throughput of a Water-Quality Monitoring System.
Olatinwo, Segun O; Joubert, Trudi-H
2018-04-13
This work presents a new approach to the maximization of energy and throughput in a wireless sensor network (WSN), with the intention of applying the approach to water-quality monitoring. Water-quality monitoring using WSN technology has become an interesting research area. Energy scarcity is a critical issue that plagues the widespread deployment of WSN systems. Different power supplies, harvesting energy from sustainable sources, have been explored. However, when energy-efficient models are not put in place, energy harvesting based WSN systems may experience an unstable energy supply, resulting in an interruption in communication, and low system throughput. To alleviate these problems, this paper presents the joint maximization of the energy harvested by sensor nodes and their information-transmission rate using a sum-throughput technique. A wireless information and power transfer (WIPT) method is considered by harvesting energy from dedicated radio frequency sources. Due to the doubly near-far condition that confronts WIPT systems, a new WIPT system is proposed to improve the fairness of resource utilization in the network. Numerical simulation results are presented to validate the mathematical formulations for the optimization problem, which maximize the energy harvested and the overall throughput rate. Defining the performance metrics of achievable throughput and fairness in resource sharing, the proposed WIPT system outperforms an existing state-of-the-art WIPT system, with the comparison based on numerical simulations of both systems. The improved energy efficiency of the proposed WIPT system contributes to addressing the problem of energy scarcity.
Adaptive and iterative methods for simulations of nanopores with the PNP-Stokes equations
NASA Astrophysics Data System (ADS)
Mitscha-Baude, Gregor; Buttinger-Kreuzhuber, Andreas; Tulzer, Gerhard; Heitzinger, Clemens
2017-06-01
We present a 3D finite element solver for the nonlinear Poisson-Nernst-Planck (PNP) equations for electrodiffusion, coupled to the Stokes system of fluid dynamics. The model serves as a building block for the simulation of macromolecule dynamics inside nanopore sensors. The source code is released online at http://github.com/mitschabaude/nanopores. We add to existing numerical approaches by deploying goal-oriented adaptive mesh refinement. To reduce the computation overhead of mesh adaptivity, our error estimator uses the much cheaper Poisson-Boltzmann equation as a simplified model, which is justified on heuristic grounds but shown to work well in practice. To address the nonlinearity in the full PNP-Stokes system, three different linearization schemes are proposed and investigated, with two segregated iterative approaches both outperforming a naive application of Newton's method. Numerical experiments are reported on a real-world nanopore sensor geometry. We also investigate two different models for the interaction of target molecules with the nanopore sensor through the PNP-Stokes equations. In one model, the molecule is of finite size and is explicitly built into the geometry; while in the other, the molecule is located at a single point and only modeled implicitly - after solution of the system - which is computationally favorable. We compare the resulting force profiles of the electric and velocity fields acting on the molecule, and conclude that the point-size model fails to capture important physical effects such as the dependence of charge selectivity of the sensor on the molecule radius.
iRSpot-EL: identify recombination spots with an ensemble learning approach.
Liu, Bin; Wang, Shanyi; Long, Ren; Chou, Kuo-Chen
2017-01-01
Coexisting in a DNA system, meiosis and recombination are two indispensible aspects for cell reproduction and growth. With the avalanche of genome sequences emerging in the post-genomic age, it is an urgent challenge to acquire the information of DNA recombination spots because it can timely provide very useful insights into the mechanism of meiotic recombination and the process of genome evolution. To address such a challenge, we have developed a predictor, called IRSPOT-EL: , by fusing different modes of pseudo K-tuple nucleotide composition and mode of dinucleotide-based auto-cross covariance into an ensemble classifier of clustering approach. Five-fold cross tests on a widely used benchmark dataset have indicated that the new predictor remarkably outperforms its existing counterparts. Particularly, far beyond their reach, the new predictor can be easily used to conduct the genome-wide analysis and the results obtained are quite consistent with the experimental map. For the convenience of most experimental scientists, a user-friendly web-server for iRSpot-EL has been established at http://bioinformatics.hitsz.edu.cn/iRSpot-EL/, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved. bliu@gordonlifescience.org or bliu@insun.hit.edu.cnSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
NASA Astrophysics Data System (ADS)
Gloger, Oliver; Tönnies, Klaus; Mensel, Birger; Völzke, Henry
2015-11-01
In epidemiological studies as well as in clinical practice the amount of produced medical image data strongly increased in the last decade. In this context organ segmentation in MR volume data gained increasing attention for medical applications. Especially in large-scale population-based studies organ volumetry is highly relevant requiring exact organ segmentation. Since manual segmentation is time-consuming and prone to reader variability, large-scale studies need automatized methods to perform organ segmentation. Fully automatic organ segmentation in native MR image data has proven to be a very challenging task. Imaging artifacts as well as inter- and intrasubject MR-intensity differences complicate the application of supervised learning strategies. Thus, we propose a modularized framework of a two-stepped probabilistic approach that generates subject-specific probability maps for renal parenchyma tissue, which are refined subsequently by using several, extended segmentation strategies. We present a three class-based support vector machine recognition system that incorporates Fourier descriptors as shape features to recognize and segment characteristic parenchyma parts. Probabilistic methods use the segmented characteristic parenchyma parts to generate high quality subject-specific parenchyma probability maps. Several refinement strategies including a final shape-based 3D level set segmentation technique are used in subsequent processing modules to segment renal parenchyma. Furthermore, our framework recognizes and excludes renal cysts from parenchymal volume, which is important to analyze renal functions. Volume errors and Dice coefficients show that our presented framework outperforms existing approaches.
Haldar, Justin P; Leahy, Richard M
2013-05-01
This paper presents a novel family of linear transforms that can be applied to data collected from the surface of a 2-sphere in three-dimensional Fourier space. This family of transforms generalizes the previously-proposed Funk-Radon Transform (FRT), which was originally developed for estimating the orientations of white matter fibers in the central nervous system from diffusion magnetic resonance imaging data. The new family of transforms is characterized theoretically, and efficient numerical implementations of the transforms are presented for the case when the measured data is represented in a basis of spherical harmonics. After these general discussions, attention is focused on a particular new transform from this family that we name the Funk-Radon and Cosine Transform (FRACT). Based on theoretical arguments, it is expected that FRACT-based analysis should yield significantly better orientation information (e.g., improved accuracy and higher angular resolution) than FRT-based analysis, while maintaining the strong characterizability and computational efficiency of the FRT. Simulations are used to confirm these theoretical characteristics, and the practical significance of the proposed approach is illustrated with real diffusion weighted MRI brain data. These experiments demonstrate that, in addition to having strong theoretical characteristics, the proposed approach can outperform existing state-of-the-art orientation estimation methods with respect to measures such as angular resolution and robustness to noise and modeling errors. Copyright © 2013 Elsevier Inc. All rights reserved.
Comparison of tablet-based strategies for incision planning in laser microsurgery
NASA Astrophysics Data System (ADS)
Schoob, Andreas; Lekon, Stefan; Kundrat, Dennis; Kahrs, Lüder A.; Mattos, Leonardo S.; Ortmaier, Tobias
2015-03-01
Recent research has revealed that incision planning in laser surgery deploying stylus and tablet outperforms state-of-the-art micro-manipulator-based laser control. Providing more detailed quantitation regarding that approach, a comparative study of six tablet-based strategies for laser path planning is presented. Reference strategy is defined by monoscopic visualization and continuous path drawing on a graphics tablet. Further concepts deploying stereoscopic or a synthesized laser view, point-based path definition, real-time teleoperation or a pen display are compared with the reference scenario. Volunteers were asked to redraw and ablate stamped lines on a sample. Performance is assessed by measuring planning accuracy, completion time and ease of use. Results demonstrate that significant differences exist between proposed concepts. The reference strategy provides more accurate incision planning than the stereo or laser view scenario. Real-time teleoperation performs best with respect to completion time without indicating any significant deviation in accuracy and usability. Point-based planning as well as the pen display provide most accurate planning and increased ease of use compared to the reference strategy. As a result, combining the pen display approach with point-based planning has potential to become a powerful strategy because of benefiting from improved hand-eye-coordination on the one hand and from a simple but accurate technique for path definition on the other hand. These findings as well as the overall usability scale indicating high acceptance and consistence of proposed strategies motivate further advanced tablet-based planning in laser microsurgery.
Gloger, Oliver; Tönnies, Klaus; Mensel, Birger; Völzke, Henry
2015-11-21
In epidemiological studies as well as in clinical practice the amount of produced medical image data strongly increased in the last decade. In this context organ segmentation in MR volume data gained increasing attention for medical applications. Especially in large-scale population-based studies organ volumetry is highly relevant requiring exact organ segmentation. Since manual segmentation is time-consuming and prone to reader variability, large-scale studies need automatized methods to perform organ segmentation. Fully automatic organ segmentation in native MR image data has proven to be a very challenging task. Imaging artifacts as well as inter- and intrasubject MR-intensity differences complicate the application of supervised learning strategies. Thus, we propose a modularized framework of a two-stepped probabilistic approach that generates subject-specific probability maps for renal parenchyma tissue, which are refined subsequently by using several, extended segmentation strategies. We present a three class-based support vector machine recognition system that incorporates Fourier descriptors as shape features to recognize and segment characteristic parenchyma parts. Probabilistic methods use the segmented characteristic parenchyma parts to generate high quality subject-specific parenchyma probability maps. Several refinement strategies including a final shape-based 3D level set segmentation technique are used in subsequent processing modules to segment renal parenchyma. Furthermore, our framework recognizes and excludes renal cysts from parenchymal volume, which is important to analyze renal functions. Volume errors and Dice coefficients show that our presented framework outperforms existing approaches.
Content-Adaptive Sketch Portrait Generation by Decompositional Representation Learning.
Zhang, Dongyu; Lin, Liang; Chen, Tianshui; Wu, Xian; Tan, Wenwei; Izquierdo, Ebroul
2017-01-01
Sketch portrait generation benefits a wide range of applications such as digital entertainment and law enforcement. Although plenty of efforts have been dedicated to this task, several issues still remain unsolved for generating vivid and detail-preserving personal sketch portraits. For example, quite a few artifacts may exist in synthesizing hairpins and glasses, and textural details may be lost in the regions of hair or mustache. Moreover, the generalization ability of current systems is somewhat limited since they usually require elaborately collecting a dictionary of examples or carefully tuning features/components. In this paper, we present a novel representation learning framework that generates an end-to-end photo-sketch mapping through structure and texture decomposition. In the training stage, we first decompose the input face photo into different components according to their representational contents (i.e., structural and textural parts) by using a pre-trained convolutional neural network (CNN). Then, we utilize a branched fully CNN for learning structural and textural representations, respectively. In addition, we design a sorted matching mean square error metric to measure texture patterns in the loss function. In the stage of sketch rendering, our approach automatically generates structural and textural representations for the input photo and produces the final result via a probabilistic fusion scheme. Extensive experiments on several challenging benchmarks suggest that our approach outperforms example-based synthesis algorithms in terms of both perceptual and objective metrics. In addition, the proposed method also has better generalization ability across data set without additional training.
Scuba: scalable kernel-based gene prioritization.
Zampieri, Guido; Tran, Dinh Van; Donini, Michele; Navarin, Nicolò; Aiolli, Fabio; Sperduti, Alessandro; Valle, Giorgio
2018-01-25
The uncovering of genes linked to human diseases is a pressing challenge in molecular biology and precision medicine. This task is often hindered by the large number of candidate genes and by the heterogeneity of the available information. Computational methods for the prioritization of candidate genes can help to cope with these problems. In particular, kernel-based methods are a powerful resource for the integration of heterogeneous biological knowledge, however, their practical implementation is often precluded by their limited scalability. We propose Scuba, a scalable kernel-based method for gene prioritization. It implements a novel multiple kernel learning approach, based on a semi-supervised perspective and on the optimization of the margin distribution. Scuba is optimized to cope with strongly unbalanced settings where known disease genes are few and large scale predictions are required. Importantly, it is able to efficiently deal both with a large amount of candidate genes and with an arbitrary number of data sources. As a direct consequence of scalability, Scuba integrates also a new efficient strategy to select optimal kernel parameters for each data source. We performed cross-validation experiments and simulated a realistic usage setting, showing that Scuba outperforms a wide range of state-of-the-art methods. Scuba achieves state-of-the-art performance and has enhanced scalability compared to existing kernel-based approaches for genomic data. This method can be useful to prioritize candidate genes, particularly when their number is large or when input data is highly heterogeneous. The code is freely available at https://github.com/gzampieri/Scuba .
Sun, Zhichao; Mukherjee, Bhramar; Estes, Jason P; Vokonas, Pantel S; Park, Sung Kyun
2017-08-15
Joint effects of genetic and environmental factors have been increasingly recognized in the development of many complex human diseases. Despite the popularity of case-control and case-only designs, longitudinal cohort studies that can capture time-varying outcome and exposure information have long been recommended for gene-environment (G × E) interactions. To date, literature on sampling designs for longitudinal studies of G × E interaction is quite limited. We therefore consider designs that can prioritize a subsample of the existing cohort for retrospective genotyping on the basis of currently available outcome, exposure, and covariate data. In this work, we propose stratified sampling based on summaries of individual exposures and outcome trajectories and develop a full conditional likelihood approach for estimation that adjusts for the biased sample. We compare the performance of our proposed design and analysis with combinations of different sampling designs and estimation approaches via simulation. We observe that the full conditional likelihood provides improved estimates for the G × E interaction and joint exposure effects over uncorrected complete-case analysis, and the exposure enriched outcome trajectory dependent design outperforms other designs in terms of estimation efficiency and power for detection of the G × E interaction. We also illustrate our design and analysis using data from the Normative Aging Study, an ongoing longitudinal cohort study initiated by the Veterans Administration in 1963. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Optimizing the Energy and Throughput of a Water-Quality Monitoring System
Olatinwo, Segun O.
2018-01-01
This work presents a new approach to the maximization of energy and throughput in a wireless sensor network (WSN), with the intention of applying the approach to water-quality monitoring. Water-quality monitoring using WSN technology has become an interesting research area. Energy scarcity is a critical issue that plagues the widespread deployment of WSN systems. Different power supplies, harvesting energy from sustainable sources, have been explored. However, when energy-efficient models are not put in place, energy harvesting based WSN systems may experience an unstable energy supply, resulting in an interruption in communication, and low system throughput. To alleviate these problems, this paper presents the joint maximization of the energy harvested by sensor nodes and their information-transmission rate using a sum-throughput technique. A wireless information and power transfer (WIPT) method is considered by harvesting energy from dedicated radio frequency sources. Due to the doubly near–far condition that confronts WIPT systems, a new WIPT system is proposed to improve the fairness of resource utilization in the network. Numerical simulation results are presented to validate the mathematical formulations for the optimization problem, which maximize the energy harvested and the overall throughput rate. Defining the performance metrics of achievable throughput and fairness in resource sharing, the proposed WIPT system outperforms an existing state-of-the-art WIPT system, with the comparison based on numerical simulations of both systems. The improved energy efficiency of the proposed WIPT system contributes to addressing the problem of energy scarcity. PMID:29652866
Su, Jin-He; Piao, Ying-Chao; Luo, Ze; Yan, Bao-Ping
2018-01-01
Simple Summary The understanding of the spatio-temporal distribution of the species habitats would facilitate wildlife resource management and conservation efforts. Existing methods have poor performance due to the limited availability of training samples. More recently, location-aware sensors have been widely used to track animal movements. The aim of the study was to generate suitability maps of bar-head geese using movement data coupled with environmental parameters, such as remote sensing images and temperature data. Therefore, we modified a deep convolutional neural network for the multi-scale inputs. The results indicate that the proposed method can identify the areas with the dense goose species around Qinghai Lake. In addition, this approach might also be interesting for implementation in other species with different niche factors or in areas where biological survey data are scarce. Abstract With the application of various data acquisition devices, a large number of animal movement data can be used to label presence data in remote sensing images and predict species distribution. In this paper, a two-stage classification approach for combining movement data and moderate-resolution remote sensing images was proposed. First, we introduced a new density-based clustering method to identify stopovers from migratory birds’ movement data and generated classification samples based on the clustering result. We split the remote sensing images into 16 × 16 patches and labeled them as positive samples if they have overlap with stopovers. Second, a multi-convolution neural network model is proposed for extracting the features from temperature data and remote sensing images, respectively. Then a Support Vector Machines (SVM) model was used to combine the features together and predict classification results eventually. The experimental analysis was carried out on public Landsat 5 TM images and a GPS dataset was collected on 29 birds over three years. The results indicated that our proposed method outperforms the existing baseline methods and was able to achieve good performance in habitat suitability prediction. PMID:29701686
Wang, Huilin; Wang, Mingjun; Tan, Hao; Li, Yuan; Zhang, Ziding; Song, Jiangning
2014-01-01
X-ray crystallography is the primary approach to solve the three-dimensional structure of a protein. However, a major bottleneck of this method is the failure of multi-step experimental procedures to yield diffraction-quality crystals, including sequence cloning, protein material production, purification, crystallization and ultimately, structural determination. Accordingly, prediction of the propensity of a protein to successfully undergo these experimental procedures based on the protein sequence may help narrow down laborious experimental efforts and facilitate target selection. A number of bioinformatics methods based on protein sequence information have been developed for this purpose. However, our knowledge on the important determinants of propensity for a protein sequence to produce high diffraction-quality crystals remains largely incomplete. In practice, most of the existing methods display poorer performance when evaluated on larger and updated datasets. To address this problem, we constructed an up-to-date dataset as the benchmark, and subsequently developed a new approach termed 'PredPPCrys' using the support vector machine (SVM). Using a comprehensive set of multifaceted sequence-derived features in combination with a novel multi-step feature selection strategy, we identified and characterized the relative importance and contribution of each feature type to the prediction performance of five individual experimental steps required for successful crystallization. The resulting optimal candidate features were used as inputs to build the first-level SVM predictor (PredPPCrys I). Next, prediction outputs of PredPPCrys I were used as the input to build second-level SVM classifiers (PredPPCrys II), which led to significantly enhanced prediction performance. Benchmarking experiments indicated that our PredPPCrys method outperforms most existing procedures on both up-to-date and previous datasets. In addition, the predicted crystallization targets of currently non-crystallizable proteins were provided as compendium data, which are anticipated to facilitate target selection and design for the worldwide structural genomics consortium. PredPPCrys is freely available at http://www.structbioinfor.org/PredPPCrys.
Evaluating performance of stormwater sampling approaches using a dynamic watershed model.
Ackerman, Drew; Stein, Eric D; Ritter, Kerry J
2011-09-01
Accurate quantification of stormwater pollutant levels is essential for estimating overall contaminant discharge to receiving waters. Numerous sampling approaches exist that attempt to balance accuracy against the costs associated with the sampling method. This study employs a novel and practical approach of evaluating the accuracy of different stormwater monitoring methodologies using stormflows and constituent concentrations produced by a fully validated continuous simulation watershed model. A major advantage of using a watershed model to simulate pollutant concentrations is that a large number of storms representing a broad range of conditions can be applied in testing the various sampling approaches. Seventy-eight distinct methodologies were evaluated by "virtual samplings" of 166 simulated storms of varying size, intensity and duration, representing 14 years of storms in Ballona Creek near Los Angeles, California. The 78 methods can be grouped into four general strategies: volume-paced compositing, time-paced compositing, pollutograph sampling, and microsampling. The performances of each sampling strategy was evaluated by comparing the (1) median relative error between the virtually sampled and the true modeled event mean concentration (EMC) of each storm (accuracy), (2) median absolute deviation about the median or "MAD" of the relative error or (precision), and (3) the percentage of storms where sampling methods were within 10% of the true EMC (combined measures of accuracy and precision). Finally, costs associated with site setup, sampling, and laboratory analysis were estimated for each method. Pollutograph sampling consistently outperformed the other three methods both in terms of accuracy and precision, but was the most costly method evaluated. Time-paced sampling consistently underestimated while volume-paced sampling over estimated the storm EMCs. Microsampling performance approached that of pollutograph sampling at a substantial cost savings. The most efficient method for routine stormwater monitoring in terms of a balance between performance and cost was volume-paced microsampling, with variable sample pacing to ensure that the entirety of the storm was captured. Pollutograph sampling is recommended if the data are to be used for detailed analysis of runoff dynamics.
Predicting drug-target interactions using restricted Boltzmann machines.
Wang, Yuhao; Zeng, Jianyang
2013-07-01
In silico prediction of drug-target interactions plays an important role toward identifying and developing new uses of existing or abandoned drugs. Network-based approaches have recently become a popular tool for discovering new drug-target interactions (DTIs). Unfortunately, most of these network-based approaches can only predict binary interactions between drugs and targets, and information about different types of interactions has not been well exploited for DTI prediction in previous studies. On the other hand, incorporating additional information about drug-target relationships or drug modes of action can improve prediction of DTIs. Furthermore, the predicted types of DTIs can broaden our understanding about the molecular basis of drug action. We propose a first machine learning approach to integrate multiple types of DTIs and predict unknown drug-target relationships or drug modes of action. We cast the new DTI prediction problem into a two-layer graphical model, called restricted Boltzmann machine, and apply a practical learning algorithm to train our model and make predictions. Tests on two public databases show that our restricted Boltzmann machine model can effectively capture the latent features of a DTI network and achieve excellent performance on predicting different types of DTIs, with the area under precision-recall curve up to 89.6. In addition, we demonstrate that integrating multiple types of DTIs can significantly outperform other predictions either by simply mixing multiple types of interactions without distinction or using only a single interaction type. Further tests show that our approach can infer a high fraction of novel DTIs that has been validated by known experiments in the literature or other databases. These results indicate that our approach can have highly practical relevance to DTI prediction and drug repositioning, and hence advance the drug discovery process. Software and datasets are available on request. Supplementary data are available at Bioinformatics online.
Measuring the mass distribution in stellar systems
NASA Astrophysics Data System (ADS)
Tremaine, Scott
2018-06-01
One of the fundamental tasks of dynamical astronomy is to infer the distribution of mass in a stellar system from a snapshot of the positions and velocities of its stars. The usual approach to this task (e.g. Schwarzschild's method) involves fitting parametrized forms of the gravitational potential and the phase-space distribution to the data. We review the practical and conceptual difficulties in this approach and describe a novel statistical method for determining the mass distribution that does not require determining the phase-space distribution of the stars. We show that this new estimator out-performs other distribution-free estimators for the harmonic and Kepler potentials.
Quantifying edge significance on maintaining global connectivity
Qian, Yuhua; Li, Yebin; Zhang, Min; Ma, Guoshuai; Lu, Furong
2017-01-01
Global connectivity is a quite important issue for networks. The failures of some key edges may lead to breakdown of the whole system. How to find them will provide a better understanding on system robustness. Based on topological information, we propose an approach named LE (link entropy) to quantify the edge significance on maintaining global connectivity. Then we compare the LE with the other six acknowledged indices on the edge significance: the edge betweenness centrality, degree product, bridgeness, diffusion importance, topological overlap and k-path edge centrality. Experimental results show that the LE approach outperforms in quantifying edge significance on maintaining global connectivity. PMID:28349923
Fuzzy-Rough Nearest Neighbour Classification
NASA Astrophysics Data System (ADS)
Jensen, Richard; Cornelis, Chris
A new fuzzy-rough nearest neighbour (FRNN) classification algorithm is presented in this paper, as an alternative to Sarkar's fuzzy-rough ownership function (FRNN-O) approach. By contrast to the latter, our method uses the nearest neighbours to construct lower and upper approximations of decision classes, and classifies test instances based on their membership to these approximations. In the experimental analysis, we evaluate our approach with both classical fuzzy-rough approximations (based on an implicator and a t-norm), as well as with the recently introduced vaguely quantified rough sets. Preliminary results are very good, and in general FRNN outperforms FRNN-O, as well as the traditional fuzzy nearest neighbour (FNN) algorithm.
Classification of clinically useful sentences in clinical evidence resources.
Morid, Mohammad Amin; Fiszman, Marcelo; Raja, Kalpana; Jonnalagadda, Siddhartha R; Del Fiol, Guilherme
2016-04-01
Most patient care questions raised by clinicians can be answered by online clinical knowledge resources. However, important barriers still challenge the use of these resources at the point of care. To design and assess a method for extracting clinically useful sentences from synthesized online clinical resources that represent the most clinically useful information for directly answering clinicians' information needs. We developed a Kernel-based Bayesian Network classification model based on different domain-specific feature types extracted from sentences in a gold standard composed of 18 UpToDate documents. These features included UMLS concepts and their semantic groups, semantic predications extracted by SemRep, patient population identified by a pattern-based natural language processing (NLP) algorithm, and cue words extracted by a feature selection technique. Algorithm performance was measured in terms of precision, recall, and F-measure. The feature-rich approach yielded an F-measure of 74% versus 37% for a feature co-occurrence method (p<0.001). Excluding predication, population, semantic concept or text-based features reduced the F-measure to 62%, 66%, 58% and 69% respectively (p<0.01). The classifier applied to Medline sentences reached an F-measure of 73%, which is equivalent to the performance of the classifier on UpToDate sentences (p=0.62). The feature-rich approach significantly outperformed general baseline methods. This approach significantly outperformed classifiers based on a single type of feature. Different types of semantic features provided a unique contribution to overall classification performance. The classifier's model and features used for UpToDate generalized well to Medline abstracts. Copyright © 2016 Elsevier Inc. All rights reserved.
Dellicour, Simon; Flot, Jean-François
2015-11-01
Most single-locus molecular approaches to species delimitation available to date have been designed and tested on data sets comprising at least tens of species, whereas the opposite case (species-poor data sets for which the hypothesis that all individuals are conspecific cannot by rejected beforehand) has rarely been the focus of such attempts. Here we compare the performance of barcode gap detection, haplowebs and generalized mixed Yule-coalescent (GMYC) models to delineate chimpanzees and bonobos using nuclear sequence markers, then apply these single-locus species delimitation methods to data sets of one, three, or six species simulated under a wide range of population sizes, speciation rates, mutation rates and sampling efforts. Our results show that barcode gap detection and GMYC models are unable to delineate species properly in data sets composed of one or two species, two situations in which haplowebs outperform them. For data sets composed of three or six species, bGMYC and haplowebs outperform the single-threshold and multiple-threshold versions of GMYC, whereas a clear barcode gap is only observed when population sizes and speciation rates are both small. The latter conditions represent a "sweet spot" for molecular taxonomy where all the single-locus approaches tested work well; however, the performance of these methods decreases strongly when population sizes and speciation rates are high, suggesting that multilocus approaches may be necessary to tackle such cases. © The Author(s) 2015. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Bayerstadler, Andreas; Benstetter, Franz; Heumann, Christian; Winter, Fabian
2014-09-01
Predictive Modeling (PM) techniques are gaining importance in the worldwide health insurance business. Modern PM methods are used for customer relationship management, risk evaluation or medical management. This article illustrates a PM approach that enables the economic potential of (cost-) effective disease management programs (DMPs) to be fully exploited by optimized candidate selection as an example of successful data-driven business management. The approach is based on a Generalized Linear Model (GLM) that is easy to apply for health insurance companies. By means of a small portfolio from an emerging country, we show that our GLM approach is stable compared to more sophisticated regression techniques in spite of the difficult data environment. Additionally, we demonstrate for this example of a setting that our model can compete with the expensive solutions offered by professional PM vendors and outperforms non-predictive standard approaches for DMP selection commonly used in the market.
Modeling and predicting historical volatility in exchange rate markets
NASA Astrophysics Data System (ADS)
Lahmiri, Salim
2017-04-01
Volatility modeling and forecasting of currency exchange rate is an important task in several business risk management tasks; including treasury risk management, derivatives pricing, and portfolio risk evaluation. The purpose of this study is to present a simple and effective approach for predicting historical volatility of currency exchange rate. The approach is based on a limited set of technical indicators as inputs to the artificial neural networks (ANN). To show the effectiveness of the proposed approach, it was applied to forecast US/Canada and US/Euro exchange rates volatilities. The forecasting results show that our simple approach outperformed the conventional GARCH and EGARCH with different distribution assumptions, and also the hybrid GARCH and EGARCH with ANN in terms of mean absolute error, mean of squared errors, and Theil's inequality coefficient. Because of the simplicity and effectiveness of the approach, it is promising for US currency volatility prediction tasks.
Rodríguez-Entrena, Macario; Salazar-Ordóñez, Melania; Becerra-Alonso, David
2016-03-30
This paper studies which of the attitudinal, cognitive and socio-economic factors determine the willingness to purchase genetically modified (GM) food, enabling the forecasting of consumers' behaviour in Andalusia, southern Spain. This classification has been made by a standard multilayer perceptron neural network trained with extreme learning machine. Later, an ordered logistic regression was applied to determine whether the neural network can outperform this traditional econometric approach. The results show that the highest relative contributions lie in the variables related to perceived risks of GM food, while the perceived benefits have a lower influence. In addition, an innovative attitude towards food presents a strong link, as does the perception of food safety. The variables with the least relative contribution are subjective knowledge about GM food and the consumers' age. The neural network approach outperforms the correct classification percentage from the ordered logistic regression. The perceived risks must be considered as a critical factor. A strategy to improve the GM food acceptance is to develop a transparent and balanced information framework that makes the potential risk understandable by society, and make them aware of the risk assessments for GM food in the EU. For its success, it is essential to improve the trust in EU institutions and scientific regulatory authorities. © 2015 Society of Chemical Industry.
NIMEFI: gene regulatory network inference using multiple ensemble feature importance algorithms.
Ruyssinck, Joeri; Huynh-Thu, Vân Anh; Geurts, Pierre; Dhaene, Tom; Demeester, Piet; Saeys, Yvan
2014-01-01
One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm. This method decomposes the network inference task into separate regression problems for each gene in the network in which the expression values of a particular target gene are predicted using all other genes as possible predictors. Next, using tree-based ensemble methods, an importance measure for each predictor gene is calculated with respect to the target gene and a high feature importance is considered as putative evidence of a regulatory link existing between both genes. The contribution of this work is twofold. First, we generalize the regression decomposition strategy of GENIE3 to other feature importance methods. We compare the performance of support vector regression, the elastic net, random forest regression, symbolic regression and their ensemble variants in this setting to the original GENIE3 algorithm. To create the ensemble variants, we propose a subsampling approach which allows us to cast any feature selection algorithm that produces a feature ranking into an ensemble feature importance algorithm. We demonstrate that the ensemble setting is key to the network inference task, as only ensemble variants achieve top performance. As second contribution, we explore the effect of using rankwise averaged predictions of multiple ensemble algorithms as opposed to only one. We name this approach NIMEFI (Network Inference using Multiple Ensemble Feature Importance algorithms) and show that this approach outperforms all individual methods in general, although on a specific network a single method can perform better. An implementation of NIMEFI has been made publicly available.
A prior feature SVM – MRF based method for mouse brain segmentation
Wu, Teresa; Bae, Min Hyeok; Zhang, Min; Pan, Rong; Badea, Alexandra
2012-01-01
We introduce an automated method, called prior feature Support Vector Machine- Markov Random Field (pSVMRF), to segment three-dimensional mouse brain Magnetic Resonance Microscopy (MRM) images. Our earlier work, extended MRF (eMRF) integrated Support Vector Machine (SVM) and Markov Random Field (MRF) approaches, leading to improved segmentation accuracy; however, the computation of eMRF is very expensive, which may limit its performance on segmentation and robustness. In this study pSVMRF reduces training and testing time for SVM, while boosting segmentation performance. Unlike the eMRF approach, where MR intensity information and location priors are linearly combined, pSVMRF combines this information in a nonlinear fashion, and enhances the discriminative ability of the algorithm. We validate the proposed method using MR imaging of unstained and actively stained mouse brain specimens, and compare segmentation accuracy with two existing methods: eMRF and MRF. C57BL/6 mice are used for training and testing, using cross validation. For formalin fixed C57BL/6 specimens, pSVMRF outperforms both eMRF and MRF. The segmentation accuracy for C57BL/6 brains, stained or not, was similar for larger structures like hippocampus and caudate putamen, (~87%), but increased substantially for smaller regions like susbtantia nigra (from 78.36% to 91.55%), and anterior commissure (from ~50% to ~80%). To test segmentation robustness against increased anatomical variability we add two strains, BXD29 and a transgenic mouse model of Alzheimer’s Disease. Segmentation accuracy for new strains is 80% for hippocampus, and caudate putamen, indicating that pSVMRF is a promising approach for phenotyping mouse models of human brain disorders. PMID:21988893
A prior feature SVM-MRF based method for mouse brain segmentation.
Wu, Teresa; Bae, Min Hyeok; Zhang, Min; Pan, Rong; Badea, Alexandra
2012-02-01
We introduce an automated method, called prior feature Support Vector Machine-Markov Random Field (pSVMRF), to segment three-dimensional mouse brain Magnetic Resonance Microscopy (MRM) images. Our earlier work, extended MRF (eMRF) integrated Support Vector Machine (SVM) and Markov Random Field (MRF) approaches, leading to improved segmentation accuracy; however, the computation of eMRF is very expensive, which may limit its performance on segmentation and robustness. In this study pSVMRF reduces training and testing time for SVM, while boosting segmentation performance. Unlike the eMRF approach, where MR intensity information and location priors are linearly combined, pSVMRF combines this information in a nonlinear fashion, and enhances the discriminative ability of the algorithm. We validate the proposed method using MR imaging of unstained and actively stained mouse brain specimens, and compare segmentation accuracy with two existing methods: eMRF and MRF. C57BL/6 mice are used for training and testing, using cross validation. For formalin fixed C57BL/6 specimens, pSVMRF outperforms both eMRF and MRF. The segmentation accuracy for C57BL/6 brains, stained or not, was similar for larger structures like hippocampus and caudate putamen, (~87%), but increased substantially for smaller regions like susbtantia nigra (from 78.36% to 91.55%), and anterior commissure (from ~50% to ~80%). To test segmentation robustness against increased anatomical variability we add two strains, BXD29 and a transgenic mouse model of Alzheimer's disease. Segmentation accuracy for new strains is 80% for hippocampus, and caudate putamen, indicating that pSVMRF is a promising approach for phenotyping mouse models of human brain disorders. Copyright © 2011 Elsevier Inc. All rights reserved.
Computer vision-based automated peak picking applied to protein NMR spectra.
Klukowski, Piotr; Walczak, Michal J; Gonczarek, Adam; Boudet, Julien; Wider, Gerhard
2015-09-15
A detailed analysis of multidimensional NMR spectra of macromolecules requires the identification of individual resonances (peaks). This task can be tedious and time-consuming and often requires support by experienced users. Automated peak picking algorithms were introduced more than 25 years ago, but there are still major deficiencies/flaws that often prevent complete and error free peak picking of biological macromolecule spectra. The major challenges of automated peak picking algorithms is both the distinction of artifacts from real peaks particularly from those with irregular shapes and also picking peaks in spectral regions with overlapping resonances which are very hard to resolve by existing computer algorithms. In both of these cases a visual inspection approach could be more effective than a 'blind' algorithm. We present a novel approach using computer vision (CV) methodology which could be better adapted to the problem of peak recognition. After suitable 'training' we successfully applied the CV algorithm to spectra of medium-sized soluble proteins up to molecular weights of 26 kDa and to a 130 kDa complex of a tetrameric membrane protein in detergent micelles. Our CV approach outperforms commonly used programs. With suitable training datasets the application of the presented method can be extended to automated peak picking in multidimensional spectra of nucleic acids or carbohydrates and adapted to solid-state NMR spectra. CV-Peak Picker is available upon request from the authors. gsw@mol.biol.ethz.ch; michal.walczak@mol.biol.ethz.ch; adam.gonczarek@pwr.edu.pl Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
V S, Unni; Mishra, Deepak; Subrahmanyam, G R K S
2016-12-01
The need for image fusion in current image processing systems is increasing mainly due to the increased number and variety of image acquisition techniques. Image fusion is the process of combining substantial information from several sensors using mathematical techniques in order to create a single composite image that will be more comprehensive and thus more useful for a human operator or other computer vision tasks. This paper presents a new approach to multifocus image fusion based on sparse signal representation. Block-based compressive sensing integrated with a projection-driven compressive sensing (CS) recovery that encourages sparsity in the wavelet domain is used as a method to get the focused image from a set of out-of-focus images. Compression is achieved during the image acquisition process using a block compressive sensing method. An adaptive thresholding technique within the smoothed projected Landweber recovery process reconstructs high-resolution focused images from low-dimensional CS measurements of out-of-focus images. Discrete wavelet transform and dual-tree complex wavelet transform are used as the sparsifying basis for the proposed fusion. The main finding lies in the fact that sparsification enables a better selection of the fusion coefficients and hence better fusion. A Laplacian mixture model fit is done in the wavelet domain and estimation of the probability density function (pdf) parameters by expectation maximization leads us to the proper selection of the coefficients of the fused image. Using the proposed method compared with the fusion scheme without employing the projected Landweber (PL) scheme and the other existing CS-based fusion approaches, it is observed that with fewer samples itself, the proposed method outperforms other approaches.
Le Pogam, Adrien; Hatt, Mathieu; Descourt, Patrice; Boussion, Nicolas; Tsoumpas, Charalampos; Turkheimer, Federico E; Prunier-Aesch, Caroline; Baulieu, Jean-Louis; Guilloteau, Denis; Visvikis, Dimitris
2011-09-01
Partial volume effects (PVEs) are consequences of the limited spatial resolution in emission tomography leading to underestimation of uptake in tissues of size similar to the point spread function (PSF) of the scanner as well as activity spillover between adjacent structures. Among PVE correction methodologies, a voxel-wise mutual multiresolution analysis (MMA) was recently introduced. MMA is based on the extraction and transformation of high resolution details from an anatomical image (MR/CT) and their subsequent incorporation into a low-resolution PET image using wavelet decompositions. Although this method allows creating PVE corrected images, it is based on a 2D global correlation model, which may introduce artifacts in regions where no significant correlation exists between anatomical and functional details. A new model was designed to overcome these two issues (2D only and global correlation) using a 3D wavelet decomposition process combined with a local analysis. The algorithm was evaluated on synthetic, simulated and patient images, and its performance was compared to the original approach as well as the geometric transfer matrix (GTM) method. Quantitative performance was similar to the 2D global model and GTM in correlated cases. In cases where mismatches between anatomical and functional information were present, the new model outperformed the 2D global approach, avoiding artifacts and significantly improving quality of the corrected images and their quantitative accuracy. A new 3D local model was proposed for a voxel-wise PVE correction based on the original mutual multiresolution analysis approach. Its evaluation demonstrated an improved and more robust qualitative and quantitative accuracy compared to the original MMA methodology, particularly in the absence of full correlation between anatomical and functional information.
Le Pogam, Adrien; Hatt, Mathieu; Descourt, Patrice; Boussion, Nicolas; Tsoumpas, Charalampos; Turkheimer, Federico E.; Prunier-Aesch, Caroline; Baulieu, Jean-Louis; Guilloteau, Denis; Visvikis, Dimitris
2011-01-01
Purpose Partial volume effects (PVE) are consequences of the limited spatial resolution in emission tomography leading to under-estimation of uptake in tissues of size similar to the point spread function (PSF) of the scanner as well as activity spillover between adjacent structures. Among PVE correction methodologies, a voxel-wise mutual multi-resolution analysis (MMA) was recently introduced. MMA is based on the extraction and transformation of high resolution details from an anatomical image (MR/CT) and their subsequent incorporation into a low resolution PET image using wavelet decompositions. Although this method allows creating PVE corrected images, it is based on a 2D global correlation model which may introduce artefacts in regions where no significant correlation exists between anatomical and functional details. Methods A new model was designed to overcome these two issues (2D only and global correlation) using a 3D wavelet decomposition process combined with a local analysis. The algorithm was evaluated on synthetic, simulated and patient images, and its performance was compared to the original approach as well as the geometric transfer matrix (GTM) method. Results Quantitative performance was similar to the 2D global model and GTM in correlated cases. In cases where mismatches between anatomical and functional information were present the new model outperformed the 2D global approach, avoiding artefacts and significantly improving quality of the corrected images and their quantitative accuracy. Conclusions A new 3D local model was proposed for a voxel-wise PVE correction based on the original mutual multi-resolution analysis approach. Its evaluation demonstrated an improved and more robust qualitative and quantitative accuracy compared to the original MMA methodology, particularly in the absence of full correlation between anatomical and functional information. PMID:21978037
A hybrid algorithm for speckle noise reduction of ultrasound images.
Singh, Karamjeet; Ranade, Sukhjeet Kaur; Singh, Chandan
2017-09-01
Medical images are contaminated by multiplicative speckle noise which significantly reduce the contrast of ultrasound images and creates a negative effect on various image interpretation tasks. In this paper, we proposed a hybrid denoising approach which collaborate the both local and nonlocal information in an efficient manner. The proposed hybrid algorithm consist of three stages in which at first stage the use of local statistics in the form of guided filter is used to reduce the effect of speckle noise initially. Then, an improved speckle reducing bilateral filter (SRBF) is developed to further reduce the speckle noise from the medical images. Finally, to reconstruct the diffused edges we have used the efficient post-processing technique which jointly considered the advantages of both bilateral and nonlocal mean (NLM) filter for the attenuation of speckle noise efficiently. The performance of proposed hybrid algorithm is evaluated on synthetic, simulated and real ultrasound images. The experiments conducted on various test images demonstrate that our proposed hybrid approach outperforms the various traditional speckle reduction approaches included recently proposed NLM and optimized Bayesian-based NLM. The results of various quantitative, qualitative measures and by visual inspection of denoise synthetic and real ultrasound images demonstrate that the proposed hybrid algorithm have strong denoising capability and able to preserve the fine image details such as edge of a lesion better than previously developed methods for speckle noise reduction. The denoising and edge preserving capability of hybrid algorithm is far better than existing traditional and recently proposed speckle reduction (SR) filters. The success of proposed algorithm would help in building the lay foundation for inventing the hybrid algorithms for denoising of ultrasound images. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Almosallam, Ibrahim A.; Jarvis, Matt J.; Roberts, Stephen J.
2016-10-01
The next generation of cosmology experiments will be required to use photometric redshifts rather than spectroscopic redshifts. Obtaining accurate and well-characterized photometric redshift distributions is therefore critical for Euclid, the Large Synoptic Survey Telescope and the Square Kilometre Array. However, determining accurate variance predictions alongside single point estimates is crucial, as they can be used to optimize the sample of galaxies for the specific experiment (e.g. weak lensing, baryon acoustic oscillations, supernovae), trading off between completeness and reliability in the galaxy sample. The various sources of uncertainty in measurements of the photometry and redshifts put a lower bound on the accuracy that any model can hope to achieve. The intrinsic uncertainty associated with estimates is often non-uniform and input-dependent, commonly known in statistics as heteroscedastic noise. However, existing approaches are susceptible to outliers and do not take into account variance induced by non-uniform data density and in most cases require manual tuning of many parameters. In this paper, we present a Bayesian machine learning approach that jointly optimizes the model with respect to both the predictive mean and variance we refer to as Gaussian processes for photometric redshifts (GPZ). The predictive variance of the model takes into account both the variance due to data density and photometric noise. Using the Sloan Digital Sky Survey (SDSS) DR12 data, we show that our approach substantially outperforms other machine learning methods for photo-z estimation and their associated variance, such as TPZ and ANNZ2. We provide a MATLAB and PYTHON implementations that are available to download at https://github.com/OxfordML/GPz.
Simon, Scott; Grey, Casey Paul; Massenzo, Trisha; Simpson, David G; Longest, P Worth
2014-11-01
Current technology for endovascular thrombectomy in ischemic stroke utilizes static loading and is successful in approximately 85% of cases. Existing technology uses either static suction (applied via a continuous pump or syringe) or flow arrest with a proximal balloon. In this paper we evaluate the potential of cyclic loading in aspiration thrombectomy. In order to evaluate the efficacy of cyclic aspiration, a model was created using a Penumbra aspiration system, three-way valve and Penumbra 5Max catheter. Synthetic clots were aspirated at different frequencies and using different aspiration mediums. Success or failure of clot removal and time were recorded. All statistical analyses were based on either a one-way or two-way analysis of variance, Holm-Sidak pairwise multiple comparison procedure (α=0.05). Cyclic aspiration outperformed static aspiration in overall clot removal and removal speed (p<0.001). Within cyclic aspiration, Max Hz frequencies (∼6.3 Hz) cleared clots faster than 1 Hz (p<0.001) and 2 Hz (p=0.024). Loading cycle dynamics (specific pressure waveforms) affected speed and overall clearance (p<0.001). Water as the aspiration medium was more effective at clearing clots than air (p=0.019). Cyclic aspiration significantly outperformed static aspiration in speed and overall clearance of synthetic clots in our experimental model. Within cyclic aspiration, efficacy is improved by increasing cycle frequency, utilizing specific pressure cycle waveforms and using water rather than air as the aspiration medium. These findings provide a starting point for altering existing thrombectomy technology or perhaps the development of new technologies with higher recanalization rates. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Cierniak, Robert; Lorent, Anna
2016-09-01
The main aim of this paper is to investigate properties of our originally formulated statistical model-based iterative approach applied to the image reconstruction from projections problem which are related to its conditioning, and, in this manner, to prove a superiority of this approach over ones recently used by other authors. The reconstruction algorithm based on this conception uses a maximum likelihood estimation with an objective adjusted to the probability distribution of measured signals obtained from an X-ray computed tomography system with parallel beam geometry. The analysis and experimental results presented here show that our analytical approach outperforms the referential algebraic methodology which is explored widely in the literature and exploited in various commercial implementations. Copyright © 2016 Elsevier Ltd. All rights reserved.
A computational approach to compare regression modelling strategies in prediction research.
Pajouheshnia, Romin; Pestman, Wiebe R; Teerenstra, Steven; Groenwold, Rolf H H
2016-08-25
It is often unclear which approach to fit, assess and adjust a model will yield the most accurate prediction model. We present an extension of an approach for comparing modelling strategies in linear regression to the setting of logistic regression and demonstrate its application in clinical prediction research. A framework for comparing logistic regression modelling strategies by their likelihoods was formulated using a wrapper approach. Five different strategies for modelling, including simple shrinkage methods, were compared in four empirical data sets to illustrate the concept of a priori strategy comparison. Simulations were performed in both randomly generated data and empirical data to investigate the influence of data characteristics on strategy performance. We applied the comparison framework in a case study setting. Optimal strategies were selected based on the results of a priori comparisons in a clinical data set and the performance of models built according to each strategy was assessed using the Brier score and calibration plots. The performance of modelling strategies was highly dependent on the characteristics of the development data in both linear and logistic regression settings. A priori comparisons in four empirical data sets found that no strategy consistently outperformed the others. The percentage of times that a model adjustment strategy outperformed a logistic model ranged from 3.9 to 94.9 %, depending on the strategy and data set. However, in our case study setting the a priori selection of optimal methods did not result in detectable improvement in model performance when assessed in an external data set. The performance of prediction modelling strategies is a data-dependent process and can be highly variable between data sets within the same clinical domain. A priori strategy comparison can be used to determine an optimal logistic regression modelling strategy for a given data set before selecting a final modelling approach.
Liu, Zhenqiu; Hsiao, William; Cantarel, Brandi L; Drábek, Elliott Franco; Fraser-Liggett, Claire
2011-12-01
Direct sequencing of microbes in human ecosystems (the human microbiome) has complemented single genome cultivation and sequencing to understand and explore the impact of commensal microbes on human health. As sequencing technologies improve and costs decline, the sophistication of data has outgrown available computational methods. While several existing machine learning methods have been adapted for analyzing microbiome data recently, there is not yet an efficient and dedicated algorithm available for multiclass classification of human microbiota. By combining instance-based and model-based learning, we propose a novel sparse distance-based learning method for simultaneous class prediction and feature (variable or taxa, which is used interchangeably) selection from multiple treatment populations on the basis of 16S rRNA sequence count data. Our proposed method simultaneously minimizes the intraclass distance and maximizes the interclass distance with many fewer estimated parameters than other methods. It is very efficient for problems with small sample sizes and unbalanced classes, which are common in metagenomic studies. We implemented this method in a MATLAB toolbox called MetaDistance. We also propose several approaches for data normalization and variance stabilization transformation in MetaDistance. We validate this method on several real and simulated 16S rRNA datasets to show that it outperforms existing methods for classifying metagenomic data. This article is the first to address simultaneous multifeature selection and class prediction with metagenomic count data. The MATLAB toolbox is freely available online at http://metadistance.igs.umaryland.edu/. zliu@umm.edu Supplementary data are available at Bioinformatics online.